Copyright © IFAC Robot Control. Vienna. Austria. 1991
PASSIVITY OF ROBOT DYNAMICS IN LEARNING CONTROL S. Arimoto and T. Naniwa Faculty of Engineering. University of Tokyo . Bunkyo-ku. Tokyo . J13 Japan
.-\b~tract. Learning control is a new approach to the problem of skill refinement for robotic manipulators. It is considered to be a mathematical model of motor program learning for skilled motions in the central nC1TOUS system. This paper proposes a class of learni ng control algorithms for bettering operation of the robot ann under a geometrical end-point constraint at the next trial on the basis of the previous operation data. The command input torque is updated by a linear modification of present joint velocity errors deviated from the desired "clocity trajectory in addition to the previous input. It is shown that motion trajectories approach an 10neighhorhood of the desired one in the sense of squared integral norm provided the local feedback loop consists of both position and velocity fe ed backs plus a feedback term of the error forc e vector between the reaction force and desired force on the end-point constrained surface. It is ex plored that various passivity properties of residual elTor dynamics of the manipulator play a crucial role in the proof of uniform boundedness and conycrgcnce of position and velocity trajectories.
"e)·words. Learning control: Passivity of robot dynamics; Robot manipulator: Trajectory and force control. 1. I\,TRODUCTIO N
2. PASSIVITY PROPERTIES OF ROBOT DYNAMICS
Learning control is a new approach for the control problem of skill refin ement. 'Ye humans are able to acquire the skill via a long series of repeated exercises. Motivated by this observation . much literat ure conce rned with learning control techniques for robotic systems has accumulated very recently (Arimoto et al . 1984a, 1984b, 1985; Craig, 1984; Bondi et aI, 1988; Arimoto, 1985; Kawamura et aI, 1985, 1988; Arimoto, 1990) , mainly during the past several years. However, most of tlH' papers were so far concerned with only a class of tasks descrihed in terms of joint trajectory tracking. In other words , for a desired motion giwn to a robot arm in which its end-point is free to mm'e in the threedimensional space, the arm rep eats exercises to reduce the trajectory tracking errors gradually and eventually can learn to exactly trace the giYen desired motion. Howewr, there is a variety of ta.';ks for robotic systems that must be described in terms of end-point constraint . 'Vriting with a pen is such an example. since th e tip of th e pen must mow in touch with a giwn sheet of paper fixed on a table. In such a case. not only a desired end-point trajectory but also a desired time-eyollltion of contact forc e acting on the surfac e along the end-point trajectory must be specified. This is called traditionally a hybrid (position and forc e) control problem in the field of robotics. This paper proposps a simple lea rning control algorithm , which is effectiw for a class of tasks with geometrical endpoint constraints. ender the assumptions that the contact force can be meas ured yia a force sensor and t he local sen'o loop consist s of a ked back of the error force \'Cctor being eyaluated at joint s in addition to the ordina ry feed back of both position and wloc ity signals. it is shown that the learning control algorithm assurcs the uniform boundedness ofrepea ted motions during repetiti\'C learning and in the sequel leads to the conwrgencc within an :- neighborhood of the desired motion in the sense of squared integral fun ction norm. In the proof of both uniform bOllndedness and con"ergence of position and "elocity traj ectories. yarious passiyity properties of residual error dynamics of the robot arm are yital the same as in the case of ordinary joint-trajectory tracking with free end-point ('-\ rimoto. 1985: I-':awamura et al, 1985, 1988: .-\rimoto 1990 ).
A class of serial-link manipulators with all revolute-type joints is considered in the paper. First we discuss the passivity of dynamics of such a manipulator with free end-point, which can be described in terms of joint coordinates vector q = (ql, . .. ,qn)T in the following way:
(Ho
+ H(q))ij + (Bo + H(q))q - ~~ + g(q)
= u
(1)
where H denotes an inertia matrix, Ho a positive diagonal matrix representing inertial terms of internal load distribution of actuators, J{ = qT(Ho + H(q))q/2 the kinetic energy, g( q) a vector of gravity terms, u a vector of input torques generated at servo actuators, Bo a positive definite matrix representing damping factors and coefficients of electro-motive forces. It is well known that the inertia matrix H(q) is symmetric and positive definite and, moreover, each entry of H is constant or a trigonometric function in components of joint yector q. Hence, H(q) and any of partial derivatives of H(q) with respect to q; are uniformly Lipschitz continuous in q. Next it is important to observe that eq.(l) can be written in the form
(Ho
+ H(q))ij + (Bo + ~H(q))q + S(q , q)q + g(q)
= u (2)
where
As was pointed out first by Arimoto and Miyazaki (1984 & 1985) and later but independently by Koditschek (1987), S( qk, qk) is skew symmetric, in other words,
qTS(q, q)q
=0
and more generally
rTS(q,q)r
= O.
(4)
Now we show the passivity of robot dynamics. More precisely, the following property follows directly from the skew-symmetry of matrix S in eq.(2). Property 1 The passivity of velocity vector q with respect to torque input u is valid, i.e., it follows that
(5)
159
for an" I ~ 0 and" fixed constant -, depC'nding o llly ini tial state ,,- (0 )( = 1'1 (0 ), 'jI O) )) , In fact, \I'p sce
0 11
the o
10' 1/" (T )II ( , )d, = 10' 'l [( Ho + H ('J llij + ( fl u +
=
1"u
~H I 'J l)lj + S('J,lj )lj + fJ ( fJ l ] d,
{I-Ij r (Hu + H (fJ ) )lj + G(q) } + q'Tfl0'1" ,] I
[t!-
rll
::!
(C) f ig .1
The ti p of
th~
en d effe c t o r is in t:ouch
with a surfa c e $ (x)" O
\\' h ere \' (I ) is ddilll'd by
= ~,/ (I ) {Hu + H (IJ( I) )} ij + G ('1(t ))
(, )
(15 )
a lld G (q) dcnot'" th,' potentia l fll n(' tion indllced by the g rm' it y forcE', i,('"
I\'hich ill fact follo\l's directl~' from ([ol df = 0, Th en, ];" taki ng th e inner product o f 'i ( T) \I'ith eq ,(12) , it is poss ib le to oilt ain t he s ame conclus ion as ill Property I , This "crifics
\"(t)
e
(8 )
g('}) = (DGjD,/ "", DGjDfJ")T Since the constan t t erm o f potential is arbitrar\', it sona ble to aSSIIIlle that Ill in G (q ) q
= 0,
IS
3 P-TYPE LEA,R\"I\" G CO\"TRO L
rea-
For a desired m o ti on traj ec t o r~' which is giwn in t erms of joint \"l' l oc it~· rid(!) ow r t E [0, Tj , the learni ng control law is described in the followi ng recursi,'c fo rm (sce Fi g,2)
(9 )
( I G)
Then, F(I) ~ 0 and therefore eq,(G) implies where
10' i/(t)lI(t)dl ~ -F(O ) ="
I'k
denotes th e resid ua l error defined by
(10 )
( 17) which pro",,~s Property 1, The passi\'ity of robot dynamics is quite na t ura l as \\'e ll as th e pass i\'ity of ('lcc nical lurnped -pa ram ctcr circu it s, In fact , th e left-hand s ide of cq,(5) impli es the t ota l wor k done by torques generated a t ac tu ators d urin g thc time iut ('l'\'al [0 ,Tj, This quantity is reasonab ly equ i,'alent t o t he increase or decrease of total energy, F (t) - \ ' (0), plus t he energy consu mption durin g t h e time iuten'al [0, Tj as shown in eq,(6 ), \" ow we consider tll(' robo t dynamics in the case that th e end-effec t or is in touch with a ~Ilfface as show n in Fig, I, Suppose th at th e surfa ce is desc rilwd by a sca la r eqll a ti on
and 4> denotes a positive definite co nsta nt ga in m a trix , The recursi\'e form o f cq ,( IG ) is ca ll ed the P-type learning a lgo rithm s in ce a proportional t erm of the velocity error is used in mod ification o f the in pu t torque, Differently from a D-t:-'pe learning algorithm (A rimoto e t aL 1984a, 1984b, 1985 ) in \I'hich the deri\'ati\'e of the "elocity error s igna l is uscd, a certai n c' xt endcd concep t of passi"ity conce rn ing the residua l error dynamics played a cruc ia l role in the proof (.-\rimoto, 1985, 1990: !\awa mura et a L 1985, 1988) o f uniform bou ndedness and con\"Crgence of the moti on trajectories durin g repc titi \"(~ karn ing, To ga in a s ight int o this, it is con"C'ni('nt t o not e tha t s ubtraction o f the id ea l input II,/(t ) realizing th e des ired trajec t ory qd(t) from b ot h sides of eq ,( IG) , 'ie lrl s
( 11 )
where ,I' = (,1'1, ,,-l,,-:J)T denotes th" ca rtcsia n coordinat es (t as k coordin a t es ) fixed at th e inertia l reference fralll c and the cont act is fri ct ionlrss, T hen, th e d\'llamics is expr('ssed by t he form ( H o+ H ( '} ) )ij+ (flo + ~ H ( q) )I} + S (q, 'j )tj+ '!(q) = 2
( l S)
J; (q )f + /"
\I'h ere (19)
( 1::! )
where f is tl](' ma g nitude of thC' contact for«(' as sho\l'1I in Fi g ,1 and Jo( q ) is th e 1 XII Ja cobian m atrix of 0 \I'ith respec t to joint vector 'I, that is, ,3
Jo ( q) =
(
00 0,'-' OfJ I " ' "
~ 0,1"
3
~
00 0,1") D,r'
oq"
00 iJr = 0,1'
oq'
Thcn, it foll o\l', from eq,(lS ) that
~lI r~ I 4>- I ~uk~1
= ~ llr4> - I~lIk -
2rr~Uk
He nce , if there arc positive const a nts A and .3 > ! such that
( 13)
>
+ rr4>rk'
0 (n ot so large)
10' e-!.Tir(T)~ lldT)dT ~ ~ 10' e-"r[(T) 4>;-ki r )dT +
Simi la rl\' t o Propert" 1, it is p ossi ble to s t ate the follo\\'ing: Property 2 A. s long as th e cnd-p oint of the manipulator is constrainer! on t ilt' surface 0(.1' ) = 0, th e passi\'ity condition of robot d~'nami cs ckscrib('cl b~' eq,( I::! ) is satisfied, i,e .. it holds that
(20 )
-I,
(21 ) th en it follo\\'s from eq ,(20) that
101' e-'t~ ur+I(I)4>-I ~ uk+I(t)dt
( 14 )
::::: loT
\I·here ': is a co nstant d epending on only the initial stat e , To sce this , fi rs t not e th at the geometric a l cons trai nt described by eq,( 11 ) impli es
e-'t ~ ur(t)4>-I ~ uk(t)dl
- (3 - 1) loT e-"r[(t)4> - I I'k(t )dt.
160
(22 )
This means that th" squared integral norm of the input error signal decreases with repetition of exercises as long as the squared integral of YE·locity error does not \'anish. ,-\s in the prcyious literature (,-\rimoto. 1990 ), the inequality
tl.u h(r, f)
h
{H(qd
+ 1') -
H(qd)} iid
+~ {H(qd + 1') -
(23)
H(qd)} qd
+ {S(qd + r.qd + r) - S(qd , qd)} qd +g(qd + 1') - g(qd) - {Jo(qd + 1') - Jo(qd)}T id
is called the condition of exponential passiyity concerning the residual error d\·namics. Clearly the exponential passi\'ity is a weaker and more relaxed condition than the ordinary passiyity discussed in the prey ious sec tion. Howewr , the inequality of eq.(21) is stronger and hence more restricted in comparison with the exponential passiyity. Therefore, it is reasonable to call the inequality (2 1) with 3 > 1 and A > 0 the exponential passiyity \\'ith a specified quadratic margin. In the next sec tion, we will show that such a stronger condition of exponential passi\'ity is valid for the residual error dynamics of the manipulator under the end-point constraint prm'ided that t he inner serYo loop is properly composed of a force feedback in addition to an ordinary position and wlocity feedback.
(28)
:\'ote that ewry entry of H(q) is constant or a sinusoidal function of components of q and e\'ery entry of S(q, q) is linear in q. Therefore. h is linear in f and hence can be rewritten into the following form:
where all linear t erms of f in h in eq.(28) are firstly recast into the second term of the right hand side and hence the remaining terms in the right hand side become irrelevant to r. In detaiL
-1. P,-\SSI\'ITY OF RESIDUAL DYl",-\ ;-.. ncs :'{ow suppose that a desired end-point trajectory :rd(t) and a desired time-e\'olution id(t) of the magnitude of the contact forc e are defin"d on t E [0, Tj and giyen to the manipulator. We reasona bly assume that Td(t) satisfies .p(Xd(t)) = 0 and the contact force directs inside th e contact surface, that is, id (f) > 0, for all f E [O ,Tj. If the number of degrees of freedom of the manipulator is greater than three (n > 3), then there is a possibility of existence of many qd(f) that may sa ti sfy <1>(.rd(t)) = 0 for all t E [0, Tj, where .T(q) = (.T I (q) ..,.2(q) ..r 3(q))T denotes the coordinates transformation from joint coordinates q to task coordinates T. In th e present. paper we assume that one of such qd(t) is chosen or determined by the inwrse kinematics and fixed throughout learnin g. :'I 1oreo\'('r, we assume for simplicity of mathematical arguments that qd(t) is differentiable and bot h qd(t) and id(t) are piecewise continuous. l"ext. note that , since on-line measurement data on the reaction force i(1) caused by the contact between the endpoint and the snrface are a\·ailablc. it is reasona ble to design a servo- loop for th e manipulator in the followin g way: r = -.4('1 - 'Id) - I3 I q - J~(q)(f - id)
+ u.
where H'
= 8H18r i, and
E(h qd , qd , iid)T
+~
+ Ft(h qd, qd , iid , 1')
[~{ H'(qa+ r) -
IrT
hi
:s: :s:
{' e->.rfT(T)II(T)dT
lo
(25)
+ g(qd) - Jo(qd)T id
(32)
~ >!. {' e->.rfT(T)T(T)dT + I 2 lo
(33)
1 .
+ '2H(qd + 1'))1' +S ( qd + r. qd + r )f + AI' + hj dT
+ (B
1 .
+ H (qd + 1'));: + (B + '2H (qd + 1' ))1' +S(qd + r. qd + 1')1' + AI' + h = ~u
+ IrT Frl + IrTFtI
= fo' e-!.rrT(T) [(Ho + H(qd + r ))i'
(26)
Subtracting this equation from eq.(25) yields
(Ho
IrT El'l
porTr+PlrTr
10' e->.rfT ~u (T)dT
1 .
= (Ho + H('ld))qd + (I3 + '2 H (qd )) qd +S (qd. qd)qd
S(qd , qd )} qd (31)
with A > 0 (not so large). and 3> 1, where A depends on only the initial state (1'( 0). frO)). To prove this . we obser\'e that
\\'here B = I30 + B I . Sinc(' C'q.(25) is in\'(>rtible from output q to input 1I. it is poss ibl e to assume the existence of an ideal input lid that reali zes the desired output qd. i.e ..
lid
+ r, qd) -
for any I' and r. \Ve are now in a position to show the exponential passivity with a quadrati c margin for the residual dynamics described by eq.(21). Property 3. As long as the end-point of the manipulator is const.rained on the surface o(x) = 0, the exponential passivity of the residual robot dynamics of eq.(2i) is satisfied, i. e .. it holds that
(2-1)
1 .
+ A (q - qd ) = Jo( q)T Id + u.
g(qd)
l"ote again that F and Ft are periodic in I' and thereby al! entries and components of F and h are bounded provided all component.s of iid and id are piecewise continuous and h ence bounded. According t.o these observations , we see that t.here exist constants Po > 0 and PI > 0 such that
+ H ('1))ij + ( B + '2H (q ))q + S(q.q)q +g('1)
+ 1') -
}q~] qd
+ {H(qd + 1') - H(qd)} iid + {S(qd -{J.,( qd+r)-JO(qd)}Th
The third term of t he right hand side refers to t he force feed back through tlw transpose of the .Tacobian matrix and the fourth term 11 r('fers to th e feedforward input that must b e determined throu gh karning. Substitution of eq. (2-1) into eq.(l2) \'iclds
(Ho
Hi(qd)
= g(qd
= (2 7)
~ fo' :T [e-h {rT(Ho + H(qd + r))f + rTAr}] dT + fo' Ae- h
\\'here
161
{i,T(H o + H (qd
+ 1'))1' + rT AT} dT
+ 10' e-.lr {rT Bi + rTh(r, r)} dT
(Ho
~ e-.l'U(r(t),r(t)) - [;(r(O),r(O))
+ 10' e-.lr
[W(r(T),i-(T))
+ rT(T)Bi(T)]
1 .
+ H(qd + Tk)h + (B + 2"H(qd + rk)h + S(qd + rk,qd + 1'k)i-k + Ark + hk = fluk,(42) (43)
dT where
- 10' e-.lr {porT( T)r(T) + PI r T(T)r( T)} dT
(34)
hk
= E(h qd, qd, ijd}r. + F(qd, qd , rk)rk + h(h qd, qd, ijd, Tk)
(44) and E, F, and Ft are the same as those in eq.( 29) respectively. The same as in eq.(20) it holds
where
10' e-.l flUr+l(r)- lfluk+l(r)dr = 10' e-.lrflur(r)~ - lfluk(T)dr - 210' e-.lr1'r(r)flUk(r)dr + 10' e-.lr r[(T)i k(r)dr(45) r
Next it is convenient to define a scalar function
+ rT Br -
W(A ; r, T) = AU(r,r) =
1 T 2"r (AA -
1 +2"1'T [A {Ho
porT r - Pl1'T l' -
~1'T ~1'
2poI)r
+ H(qd + r)} + 2B -
Then, from the exponential passivity of the residual dynamics with a quadratic margin as in eq .( 37), it follows that
2pI1 - jJ ] 1'(36)
10' e-.lr1'r(T)fluk(T)dr ~ e-.l'U(rk(t), rk(t)) -U(Tk(O), 1'k(O)) + 10' e-.lrW(),; rk(T), rk(T))dT
which becomes positive definite in rand i with an appropriate choice for positive A. From this it follows that
10' e- .lr iT fludr ~ e-.l'U(r(t), r(t)) +
~
U(r(O) , i(O))
+f!.. (' e-.lri-I(r)1'k(r)dr.
r' e-.lrW(A;r(T),r(r)) + f!.. r' e-.lriTidT
Jo
2
-U(r(O),r(O))
Substitution of this into eq.( 45) yields
Jo
+ ~ 10' e-.lriT(T)1'(r)dT
(46)
2 Jo
Q(t, flUk+q; -I) = Q(t, flUk ; -I)
(37)
-2e-.l'U(rk(t), ik(t)) which proves Property 3. It is important to remark that if 13
2B
~
-210' e-.lrW(),; rk(T),i-k(T))dr
> 1 and satisfies
jJ > 0,
-(13 -
(38)
+ 2U(rk(0) , 1'k(O)),
(47)
10' e-.lrflUr(T)-lfluk(T)dr
(48)
1 )Q( t, 1'k;
where symbol Q means the choice for)' depends on neither I nor jJ . In other words, ), can be chosen in such a way that
Q(t, flu k;-I) = or
(49) as long as jJ satisfies eq.(38). Here IIXII denotes the spectre radius of matrix X. Another important remark is that the above exponential passivity is valid even when at some instant the end point may leave or repeat to touch and leave many times the surface tfJ(x) = 0 during [0 , Tj , provided that the tip of the endeffector does not slip on the surface and its contact is fri ctionless as long as the tip is in touch with the surface.
Now we assume that the manipulator is reinitialized perfectly the same at the beginning of exercise in every attempt, that is,
qk(O)=qd(O) ,
qk(O)=qd(O)
for
k=0,1 , 2,'"
(50)
Then the last term of eq.( 4 7) vanishes. Since 13 > 1, U( r , i) is positive definite in rand i, and ), is chosen so that W()'; r, i) is positive definite in rand i too, eq.( 4 7) implies
5. UNIFORM BOUNDEDNESS OF MOTION TRAJECTORIES
(5 1) which gives rise to
Consider again the P-type learning described by eq.(16), which can be easily implemented as shown in Fig.2. The purpose of this section is to show the uniform boundedness of all trajectories qk(t), k 0, 1, 2"", during the progress of learn ing. In principle we do not need the detailed knowledge of robot dynamics for designing a learning law, since eq.( 16) can be settled once an appropriate positive definite constant matrix is chosen a rbitrarily to some extent. However, in order to assure the uniform boundedness and the convergence of trajectories , we need to assume that the motion of the manipulator is subject to eq.(25). hence we now consider a set of the following recursive equations:
for all k ~ O. More over, since Q(t, fluk ; -l) is nonnegative and non-increasing in t , eq.( 4 7) in conjunct ion with eq.(52) means
=
(Ho
2e-.l'U(rk(t),rk(t)):::: Q ( t , flUO:~-I) :::: Q(T, fluo : -I)
+ H (qk))ijk + (B + ~H(qk))qk + S(qk , qk)qk + A(qk - qd) UHl = Uk
= (Jo (qk) + (qd -
Jo( qd))T id
qk )'
+ uk(40) (41)
Similarly to the deri\'ation of eq.(27), these equations are reduced to the following set of equations:
Fig .2 P-type learning control
162
Fluctuation of Dynamics
(53)
MeAsurement
t)(
which shows the uniform boundedness of Tk( t) and )\( t) in t and k. This again shows the uniform boundedness of original trajectories qk(t) and rik(t) in t and k.
Noise
6. LEARNING CONTROL WITH A FORGETTING FACTOR AND TRAJECTORY CO:,\\,ERGE:'\CE Although most of present industrial robots are superior in repeatability precision. small but non-negligible error happen to arise from reinitialization at every attempt. Therefore. instead of eq.(50) it is necessary to consider a relaxed condition of reinitialization such that
Fig~
where /lq/l denotes the euclidean norm of n-dimensional yector q. At the same time, there always arise some missing terms in robot dynamics. \Ve collect all these missing terms into a fluctuation term denoted by Tlk(t) at kth trial. In other words, we suppose that the manipulator is subject to (Ho
(Ho
Learning control with a forgetting factor a
1 .
+ H(qd + )"k))rk + (B + 2H(qd + rk))rk +S(qd + rk, rid + rk)rk + Ark + hk = ~Uk + Tlk. ~Uk+1
= (1 -
Id + Uk + Tlk (55)
Q(t,~Uk+1 - n~uo ; -I)
and assume that
rik)
- rk,
(59)
= (1- n)2Q(t , ~uk;-I)
Next observe that
In such an actual situation the learning law described by eq.( 43) may give rise to a large amount of tracking errors which accumulate through repeated trials of exercise. In fact, in eq.( 4 7) the last term of the right hand side can not be ignored and thereby it is impossible to conclude eqs .(51) to (53) from eq.(47). This observation leads to introduction of a forgetting factor into the recursive learning law , that is, eq.( 41) is replaced by the following recursive law (see Fig.3):
+ OUo + (rid -
+ o~uo
la' e-),TTk(r)~uk(r)dr + Q(t,rk ;
-2(1 - n) (56)
Uk+l = (1 - O)Uk
O)~Uk
(58)
and assume both conditions of eqs.(54) and (56). For a given piecewise continuous command input uo(t), it follows from eq.(59) that
1 .
+ H(qk))ijk + (B + 2H(qk))rik + S(qk, rik)rik +A(qk - qd) = {J.,,(qk) - J,p(qd)}T
J
(~Uk+1 - n~uof-I(~uk+1 - n~uo) = (1 - n)~uk+I-l~Uk+1
- 0(1 - a)~u~-I~uo (61)
+ O(~Uk+1 - ~uof-I(~Uk+1 - ~uo). Substituting this into eq.( 60) yields Qk+I(t) = (1 - n)Qk(t) - nQo(t) -oQ(t , ~Uk+1 - ~uo; -I)
(57)
-2(1 - 0)
where 0 expresses a forgetting factor which must be a small positive value less than unity. The convergence under such a realistic situation was first discussed by one of the authors ( ..... rimoto et ai , 1986) in the case of free endpoint for a class of PID-type learning laws. However, the discussion was based upon a linearized model of robot dynamics in a neighborhood of the given desired trajectory. Very recently, Heinzinger et al (1989) has attacked a similar problem for a class of D-type learning controls and proved without use of any linearization that the learned inputs and the corresponding outputs converge to neighborhoods of their desired ones respectiwly. It is then shown by the authors (1990.1g91a. b) that the introduction of a forgetting factor into the P-type recursive learning law also plays a crucial role in the uniform boundedness and the convergence of motion trajectories to a neighborhood of the desired one. The novelty of this paper lies in the use of a forgetting factor into the P-type recursi\'e learning law for a class of tasks with geometric end point constraints and in presenting a proof of the convergence of motion trajectories to a neighborhood of the desired one under the full nonlinear robot dynamics with fluctuations and the existence of reinitialization errors. Now, to show the convergence of trajectories, we subtract eq.(26) from eq.(55) and Ud from eq.(57) . which gives rise to
la' e-),Trr
(62)
where (63) Bearing in mind that only the difference of eq.(55) form eq.(42) is the existence of a fluctuation term Tldt) and then applying the property of exponential passivity for eq.( 55). we obtain (as in derivation of eq.(46))
la' e-),Tfk(r),;",uk(r)dr ~ e-,1'U(rk(t) , fdt)) - U(rk(O).rk(O)) + la' e-.lTW(>'; rk(r).rk(r))dr . . + -Plo' e-.lTrk(r)rk(r)dr 2
0
- la' e-,1T Irk(r)ryk(r)1 dr 2: e-"U(rdt), Tk(t))
+
la' e-.lTW(>'; rk(r).rk(r))dr
+
~(J 2
1) {' e-.lTrk(r)rk(r)dr 10
f
(64)
where
f = {11.4/1 fi + m,F /I Ho + H(q)/I fi + II-III d} ~ U(rk(O),rk(O)) + 10' e-.lTTlf(r)-IT/k(r)dr 163
(65)
role in the remaining part of the proof.
in which the last inequality follows from eqs.(54) and (56) . Substituting eq.(64) into eq.(62) yields
7. CONCLUSIONS
Qk+l(t) ~ (1 - a)Qk(t) + aQo(t) + 2(1 - a)e -2(1 - a)e-~tU(rk(t) , 7\(t))
A recursive algorithm of P-type learning control with a forgetting factor is proposed for bettering robot tasks with geometric end point constraints. A new concept of exponential passivity with a quadratic margin is introduced for a class of residual error dynamics of robots with or without end point constraints. A rigorous proof of the uniform boundedness of motion trajectories and a rough sketch of the convergence to a neighborhood of the desired motion are presented on the basis of various passivity properties concerning residual error dynamics and difference dynamics of robots. We finally point out that update of the content of longterm memory (see Fig.3) is important in acceleration of the speed of trajectory convergence. This problem has been discussed in our recent papers (1990, 1991a, b, c) restrictedly to the case of tasks without end point constraints. References
10' e-~TW(A; rk(r),Tk(r)dr a)({3 - 1) - 1} 10' e-~TTk(r)4>Tdr)dr
-2(1- a) - {(1 -
-aQ(t , tluHI -tluo ; -I).
(66)
Now let
2 {3=1+-1-0 and note that it is possible to choose A> 0 so that W(A ; r, T) is positive definite in rand T for the value of {3 defined above. Then, eq.(66) is reduced to
Qk+1(t) ~ (1 - a)Qk(t) + aQo(t) - 2(1 - a)e - Ek(t) (67) where
Ek(t) = 2(1 - a)e-~tU(rk(t) , Tk(t))
+ 10' e-~T {2(1
S. Arimoto, S. Kawamura, and F. Miyazaki(1984a) . Bettering operation of robots by learning. J. of Robotic System3 1,1,123-140. S. Arimoto, S. Kawamura, and F. Miyazaki(1984b). Bettering operation of dynamic systems by learning; A new control theory for servomechanism and mechatronics systems. Proc. 29rd IEEE Conf. Decision and Control, Las Vegas, NV. S. Arimoto, S. Kawamura, and F. Miyazaki(1985). Can mechanical robots learn by themselves? in "Robotics Research" The Second International Symposium, H. Hanafusa & H. Inoue, Eds., MIT Press, Cambridge, Massachusetts, pp.127- 134. J.J. Craig(1984). Adaptive control of manipulators through repeated trials. Proc. of 198~ American Control Conference, San Diego, California. P. Bondi, G. Casalino, and L. Gambardella(1988). On the iterative learning control theory for robotic manipulators. IEEE J. of Robotics and Automation ~, 14-22. S. Arimoto(1985). Mathematical theory of learning with applications to robot control. Proc. of 4th Yale Workshop on Applications of Adaptive System3 Theory, Yale University, New Haven, Con-
- a)W(A; rdr)i-k(r))
+rk(r)Tk(r)} dr.
(68)
Since Ek(t) ~ 0, it follows from eq.(67) that
Qk(t)
~
Qo(t)
2(1 - a)
(69)
+ --;;-e
which implies 2(1 - a)e-AtU(rk(t), Tk(t)) ~ Ek(t) ~ Qo(t)
2(1 - a)
+ --a-c. (70)
This shows the uniform boundedness of qk(t) and qk(t) in t E [O,T] and k. . To prove the convergence it is important to subject the recursive form of eq.(67) to close inspection .. At an early stage of learning, Ek(T) must be large enough III c~mpanson of e and hence Qk(t) must be decreasing with IllcreasIllg k. Suppose that at the first time Ek(T) becomes less than 2aQo(T) when k = k. From eq.(66) it follows that
necticut.
S. Kawamura, F. Miyazaki, and S. Arimoto(1985). Hybrid position/force control of manipulators based on learning method. Proc. '85 Inter. Conf. on Advanced Robotics, Tokyo. S. Kawamura, F. Miyazaki, and S. Arimoto(1988). Realization of robot motion based on a learning method. IEEE Trans. on System3, Man , and Cybernetics SMC-18, 126-134 . S. Arimoto(1990). Learning control theory for robotic motion. Int. J. of Adaptive Control and Signal Processing ~, 543-564. S. Arimoto and F. Miyazaki(1984). Stability and robustness of PID feedback control for robot manipulators of sensory capability. in "Robotics Re..arch " The 1st Int. Symp., by M. Brady & R.P. Paul, Eds. , MIT Press, Cambridge. Massachusetts, pp.783-799. S. Arimoto and F. Miyazaki{l985). Asymptotic stability of feedback control laws for robot manipulators. Proc. IFAC Symp. on Robot Control '85, Barcelona, Spain. D.E. Koditschek(1987). Natural motion for robot arms. Proc. of 29rd IEEE Con/. Decision and Control. Las Vegas, NV. S. Arimoto, S. Kawamura, F. Miyazaki(1986). Convergence, stability, and robustness of learning control schemes for robot manipulator. Recent Trends in Robotics: Mode/ing , Control, and Education, M.J. Jamshidi, L.Y.S. Luh , and M. Shahinpoor (eds.), pp.30i-316, Elsevier Sciences Publishing Co. , inc., New York. G. Heinzinger, D. Fenwick, D. Paden, and F. Miyazaki(1989). Robust learning control. Proc. 28th IEEE CDC, Tampa, Florida. S. Arimoto. T. Naniwa, H. Suzuki(1990). Robustness of P-type Learning control with a forgetting factor for robotic motions. Proc. of 29th IEEE CDC, Honolulu . Hawaii. S. Arimoto(1991a). Learning for skill refinement in robotic systems. IEICE (Institute of Electronics, lnfonnation and Communication Engineers) Transactions, Vol. E74, No.2, 235-243. S. Arimoto(1991b). Passivity of robot dynamics implies capability of motor program learning. to be published in "Advanced Robot Control" Proc. of the Workshop on NonJinear and Adaptive Control: Applications to RobotlG.'J, Springer-Verlag.
Qk(T) ~ (1 - ofQo(T) +
~I
2:(1 - a)k-;-I [aQo(T) + 2(1 -
a)e - E;(T
)]
i=O
~ (1 - a)kQo(T) k-I
_ 2:(1 - a)k-; -I [aQo(T) -
2(1 - a)e]
i==O
~ (1 - alQo(T) - Qo(T) + 2(1- a)E/a
(71)
for any k(~ k). Since 109(1 - a) ~ -0. we have
Qk(T) ~ (2e- ok - l)Qo(T) + 2(1 - a)E/a.
(72)
The right hand side of this inequality is decreasing with increasing k and becomes negative when k IS nearly equal to 2/0 since (2/ e2 - 1) < -0.5 provided that Qo(T) IS larger than 4(1 - o)e/o. Hence it is possible to conclude that If Qo(T) is not less than 4( 1 - O)E / 0 then the ~'alue of EdT) becomes of order 2( 1 - O)E / 0 by at least the kth tnal where k = 2/ 0. . Next we need to assure that Ek(T) will remalIl to be of order [ /0 after k ~ k. In the present paper we don not discuss further the details of the proof since It IS sophisticated and too mathematical. Instead we remark that the exponential passivity of difference dynamics in terms of difference vector dk( ~ qk+1 - {( 1 - Q )qk + oqo}) plays a key
Y. Nanjo and S. Arimoto(1991c). Experimental studies on robustness of a learning method with a forgetting {actor {or robotic motion control. to be presented at '91 ICAR, Pisa, Italy.
164