Demonstration learning of robotic skills using repeated suggestions learning algorithm

Demonstration learning of robotic skills using repeated suggestions learning algorithm

Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx Contents lists available at ScienceDirect Biologically Inspired Cognitive Architect...

4MB Sizes 0 Downloads 82 Views

Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Biologically Inspired Cognitive Architectures journal homepage: www.elsevier.com/locate/bica

Research article

Demonstration learning of robotic skills using repeated suggestions learning algorithm Hamed Shahbazi a,⇑, Maliheh Alsadat Arshi b, Kamal Jamshidi b, Behnam Khodabandeh a a b

Department of Mechanical Engineering, University of Isfahan, Isfahan, Iran Department of Computer Engineering, University of Isfahan, Isfahan, Iran

a r t i c l e

i n f o

Article history: Received 26 September 2016 Revised 26 December 2016 Accepted 28 February 2017 Available online xxxx Keywords: Imitation learning Hopf oscillator Adaptive frequency oscillator Central pattern generator Humanoid robots

a b s t r a c t In this paper a new model of nonlinear dynamical system based on adaptive frequency oscillators for learning rhythmic signals is implemented by demonstration. This model uses coupled Hopf oscillators to encode and learn any periodic input signal. Learning process is completely implemented in the dynamics of adaptive oscillators. One of the issues in learning in such systems is constant number of oscillators in the feedback loop. In other words, the number of adaptive frequency oscillators is one of the design factors. In this contribution, it is shown that using enough number of oscillators can help the learning process. In this paper, we address this challenge and try to solve it in order to learn the rhythmic movements with greater accuracy, lower error and avoid missing fundamental frequency. To reach this aim, a method for generating drumming patterns is proposed which is able to generate rhythmic and periodic trajectories for a NAO humanoid robot. To do so, a programmable central pattern generator is used which is inspired from animal’s neural systems and these programmable central pattern generators are extended to learn patterns with more accuracy for NAO humanoid robots. Successful experiments of demonstration learning are done using simulation and a NAO Real robot. Ó 2017 Elsevier B.V. All rights reserved.

Introduction Nature is usually a very good source of inspiration for science and technology. We can always use these inspirations to build up an artificial instance. ”Humanoid Robots” are good examples of this kind of inspiration. Humanoid robots are mechanical structures that are similar to human and generated to mimic the human ability and perform his or her tasks. The main motivation for using these humanoid robots is to achieve human skill and performance (Argall, Chernova, Veloso, & Browning, 2009). Today, most of these robots are being programmed by experts that have sufficient knowledge of desired tasks. Actually, programming the robot in this way not only is time-consuming, costly and limited to the situations but also is the obstacle for using the robots in daily work by unskilled people. To overcome these problems, one of the most successful approaches that can be used for this purpose is imitation or robot programming by demonstration (PbD). Robots can be overcome these problems by learning new

⇑ Corresponding author. E-mail addresses: [email protected] (H. Shahbazi), [email protected] (M.A. Arshi), [email protected] (K. Jamshidi), [email protected] (B. Khodabandeh).

skills. The Robot can learn how the demonstrator acts in many situations. Programming by demonstration greatly reduces the cost of programming. Perhaps, the subject of PbD is one of the situations that converge neuroscience and robotics. This common area of research centers on pattern generators in the spinal cord of vertebrate animals called Central Pattern Generators (CPGs) (Guertin, 2009). Central pattern generators are neural circuits located in the end part of the brain and first part of the spinal cord of a large number of animals. They are responsible for generating rhythmic and periodic patterns in different parts of the body. Although, these pattern generators use very simple sensory inputs imported from the sensory systems. They can produce high dimensional and complex patterns for drumming, swimming, jumping, turning and other types of locomotion. The origin of many movements in animals is the central pattern generators which were discovered by Brone in the early decades of the 20th century (Zielin´ska, 2009). He discovered that the movement in many animals is an outcome of central neuronal activities in some parts of their neural system, and simple sensory inputs change these activations and make them capable of responding to the extraneous perturbations. The idea that CPGs are neural networks generating complex locomotion patterns with only simple inputs is a provocative one (Righetti & Ijspeert, 2006). In this paper, a model for programmable

http://dx.doi.org/10.1016/j.bica.2017.02.004 2212-683X/Ó 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

2

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

central pattern generators capable of generating rhythmic patterns with more accuracy is developed. In Section ‘‘Related works” related works in the field of programming by demonstration are reviewed and the advantages and disadvantages of each method are discussed. Section ‘‘Learning input signal with Hopf oscillators” introduces the method for making rhythmic patterns in the Nao robot and how to use programmable central pattern generators to generate the desired patterns. In this section, some features of Nao robot are explained briefly. Arm movements which was used in the model are discussed. Section ‘‘Repeated suggestion learning algorithm” introduces our method in order to learn the rhythmic movements. Some of the implementations and experimental results in Webots simulator and Matlab Simulink are shown in Section ‘‘Experiment al result”. In Section ‘‘Discussion”, the conclusions and future prospective works are stated.

Related works This contribution deals with the study of programming by demonstration that is a promising approach to automate manual programming of robots (Kober & Peters, 2010). PbD consists of two primary components: task demonstrations from a teacher, and task reproductions from a robot student, as shown in Fig. 1. Trajectory learning is an important aspect of robot Programming by demonstration. In this case, a teacher must show to the robot how to do a task. Teacher even can be an unskilled person. There are several ways to obtain trajectories that have been shown by a teacher such as motion capture techniques (Ruchanurucks, Nakaoka, Kudoh, & Ikeuchi, 2006), computer vision techniques (Moeslund, Hilton, & Krüger, 2006) and physically techniques (Hersch, Guenter, Calinon, & Billard, 2008) that a robot can be guided through the desired trajectory by moving its joints. In physical guiding, movements are recorded directly on the learning robot and we are away from transferred a system with different kinematics and dynamics that cause errors (Ude, Gams, Asfour, & Morimoto, 2010). A human teacher can train a robot through kinesthetic demonstrations. Kinesthetic guiding records a set of desired movements, which were used to build a library of movement examples. So in this paper we use Kinnect method that has a user-friendly interfaces to get desired input signal and feed this input to system in order to learn it. PbD through Physical Manipulation (LfD-PM) requires the instructor to grasp and guide each part of the robot during a demonstration (Hersch et al., 2008). Encoded desired trajectories for learning by demonstration have been discussed in numerous papers. Spline-based representations in Miyamoto et al. (1996), Hidden Markov models (HMMs) (Asfour, Azad, Gyarfas, & Dillmann, 2008; Schaal, Peters,

Nakanishi, & Ijspeert, 2004) describes some of these methods. A completely different approach would provide in Ijspeert, Nakanishi, and Schaal (2002) based on nonlinear dynamic systems. They represent rhythmic learning and discrete tasks such as tennis strokes and drumming. Nonlinear oscillators are very important modeling tools in biological and physical sciences, and these models have been used strongly to control the rhythmic movements such as locomotion, dancing and drumming. Nonlinear oscillators have interesting properties for rhythmic motor control, including the limit cycle behavior (i.e., the ability to ignore the perturbations and compensate their effects), the smooth online modulation of trajectories through changes in the parameters of a dynamical system and synchronization with other rhythmic systems. The system proposed in Ijspeert et al. (2002) is based on Central oscillators that caused a major drawback. In this approach, frequency of the demonstration signal must be explicitly specified. This means that its approach requires signal preprocessing methods that can extract the frequency of recorded signals (Gams, Ijspeert, Schaal, & Lenarcˇicˇ, 2009). The implementation of CPGs based on the coupled oscillators are actually designs of stable limit cycles in some interconnected patterns generating oscillators. Righetti and Ijspeert represented a model for construction of a generic model of CPG (Righetti & Ijspeert, 2006). This method was a programmable central pattern generator which used dynamical systems and some differential Equations to build up a training algorithm. The learner model is based on the works of Righetti, Buchli and Ijspeert, which is a Hebian learning method in dynamical Hopfs oscillators. Programmable central pattern generator has been used to generate walking patterns for a Hoap2 robot. By using this type of generic CPG, they trained the generic CPGs with sample trajectories of walking patterns of the Hoap2 robot provided by Fujitsu. Each trajectory is a teacher signal to the corresponding CPG controlling the associated joints (Righetti, Buchli, & Ijspeert, 2006). In this approach, process of frequency extraction and adaptation embedded into adaptive frequency oscillator dynamics totally and does not need preprocessing methods for extracted frequency. They designed a learning mechanism for oscillators, which adapts the oscillator frequency to the frequency of any periodic input signal. Actually, they proposed the dynamical system that composed of a pool of adaptive frequency oscillators with negative mean-field coupling (Section ‘‘Learning input signal with Hopf oscillators”). Reproducing and modulating trajectories is also possible using this approach. Gams in Gams et al. (2009) discussed a system for learning and encoding a periodic signal with no knowledge on its frequency and waveform, which was able to modulate an input periodic trajectory in response to some external events. Their system was used to learn periodic tasks under taken by the arms of a humanoid HOAP2 robot for the task of drumming. This model uses two layers of trajectory generation. The first layer,

Fig. 1. Learning from demonstration components.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

3

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

the Canonical Dynamical System (CDS), is actually a polar implementation of generic CPGs included in Righetti and Ijspeert (2006). The second layer, the Output Dynamical System (ODS), is responsible for learning and regenerating the waveform of the input signals. The system introduced by Righetti and Ijspeert is defined in Cartesian coordinates and has several drawbacks such as learning speed. In Hackenberger (2007), the system described in the polar coordinates and solved some of the problems. The foundation of this Paper is the idea of using a CPG based controller,as published in Righetti and Ijspeert (2006) and Righetti et al. (2006), to generate a drumming trajectories for a humanoid robot. In order to have more precise learning process, enough oscillators must be in a feedback loop to ensure all signal components are learned. If number of oscillators are fewer than frequency component that there are in the signal or be more than it, in first situation we may lose some frequency components and in other situation we may have waste of time. In this paper, we extend this approach by applying a method to approximate different number of oscillators that is suitable for each input signal vs. using constant number of oscillators, in order to learn the rhythmic movements with greater accuracy, having lower error and avoid missing fundamental frequency. So, in this paper a repeated suggestions learning algorithm is proposed. The new system improves the results for the application in humanoid robotics. The model can be used in Nao robots. Learning input signal with Hopf oscillators Nonlinear dynamical systems are an appropriate method to explain adaptive mechanisms in order to develop robot’s controller. As mentioned before, nonlinear dynamical systems can show interesting properties such as attractor behavior which can be very useful for controlling process (Buchli, Righetti, & Ijspeert, 2005). In this paper we follow the idea of the adaptive frequency oscillator used as an adaptive controller. In this section first of all we describe Nao robot briefly. Next, we discuss the global architecture of the controller and it’s different components. After that, we will present how the adaptive frequency oscillator can learn the frequencies of arbitrary rhythmic input signals. Then we will describe complete system that is built as a pool of adaptive frequency oscillators, and discuss the nonlinear dynamical oscillators and their fundamental properties.

Fig. 2. Nao robot (Argall et al., 2009).

interface and generate the 3D model of objects, but also can track the skeleton in 3D space. At the same time it can also separate the specific character from complex circumstance. We can get the x-y-z coordinates of the interested pointed through Kinect which are used to define gesture modules. Skeleton tracking is the core technology of Kinect and because of this, the Human-Robot interactive demonstration system can be achieved. Our Kinect can track 20 skeleton points of human body. Different body gestures detect by Kinect is shown in Fig. 3. In this study, the geometric approach is used to retrieve each !

joint angle data as input data. The geometric vector dv is calculated according to Eq. (1).

2

!

!

Bx  Ax

3

6 7 dv ¼ XY ¼ fB  Ag ¼ 4 By  Ay 5

ð1Þ

Bz  Az where A and B are two points in the Cartesian space. Then the angle between the vectors (theta) is obtained by the inner product between the vectors. The Eq. (2) is expressed as !

!

dv1 : dv2 cosðthetaÞ ¼  !   !  dv1 :dv2    

ð2Þ

Nao robot Nao is one of the most advanced humanoid robots made at the Aldebran French Corporation which was first publicly presented in 2006. In 2007, Nao Replaced the robot dog Aibo by Sony as the robot used in the Robocup competitions. This robot has 26 degrees of freedom in different parts of its body including the head, shoulders, elbows, hips, knees and ankles. The Nao academics version is now available for universities and laboratories for research and education purposes. Fig. 2 shows this robot. The large number of servo motors in its joints has made Nao one of the most flexible humanoid robots to have ever been made and this has also made it the best choice for the Standard Platform league in Robocop soccer international competitions. Details about this robot can be found in Tay (2009). Kinect Microsoft Kinect sensor is applied in this research to recognize different body gestures and generate visual Human-Robot interaction interface. Kinect not only can offer natural interaction

This approach reduces computations and avoids the complexity of the problem in obtaining the control trajectories of joints rather than using inverse kinematics. The Algorithm 1 is used to calculate the elbow joint angle. Algorithm 1. Computing elbow joint angle 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

Input: three spatial sequence point Output: angle between these three point distances for each j 2 [1,fram number] do. x = joint coordinate (n: j) - joint coordinate (10: j). y = joint coordinate (m: j) - joint coordinate (10: j). Absolute x = (x(1)^2 + x(2)^2 + x(3)^2)^0.5 Absolute y = (y(1)^2 + y(2)^2 + y(3)^2)^0.5 tetha(j)= (x ⁄ y0 )/ ((Absolute _x ⁄ Absolute_y) +eps). tetha = acos (tetha). tetha = tetha ⁄ 180/ pi. end;

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

4

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

Fig. 3. Skeleton tracking using kinect camera.

Generic CPG architecture in NAO The architecture of the model used in the present research to use a generic CPG in the controlling structure of our Nao robot is shown in Fig. 4. According to the figure, the robot should be trained to a fundamental method of drumming. The Nao robot did not have any information about how to move its joints to drum. So we should train it with some drumming trajectories. The training module consists of three sub-modules. The first sub-module is a database of pre-recorded trajectories. This contains drumming trajectories of the Nao robot in different forms. The timer is the second module which is responsible for mapping a continuous time to some discrete values of index for pre-recorded matrices of drumming trajectories. The third module is the generic CPG which is the core of the learning section.

The fundamental building block of the generic CPG is the adaptive frequency of the Hopf oscillator, which is proposed in Righetti and Ijspeert (2006). These oscillators can learn the frequency of a periodic input signal without any external optimization. Usually, the frequency of an oscillator can be controlled by a specific parameter. In this model all the parameters are changed into a state variable which can be trained using a general evolution rule. It can be proved that when perturbed by a periodic input signal, these state variables will converge to one of the frequency elements of that signal. The adaptation is an intrinsic characteristic of these oscillators. In addition, there is no need for supervision or external processing. After convergence, if the input signal disappears, the learned frequency would remain encoded in the system. The relations governing this oscillator are described in the next subsection.

Fig. 4. Generic CPG architecture.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

5

The adaptive frequency Hopf oscillator

The general form of equations for a generic CPG is (Righetti et al., 2005):

For given learned signal, FðtÞ, it is necessary to feed it into the system. So the equations for the adaptive Hopf oscillator are (Righetti, Buchli, & Ijspeert, 2009):

@ r i ¼ cðl  r2i Þr i þ   FðtÞcosð/i Þ @t

ð6Þ

@ r ¼ cðl  r 2 Þr þ   FðtÞcosð/Þ @t

ð3Þ

@ e / ¼ xi  FðtÞsinð/i Þ @t 0 ri

ð7Þ

@  / ¼ x  FðtÞsinð/Þ @t r

ð4Þ

  @ e /i ¼ xi  FðtÞsinð/i Þ þ ssinðRi  /i;D  /i Þ @t ri

ð8Þ

ð5Þ

@ xi ¼ FðtÞsinð/i Þ @t

ð9Þ

@ x ¼ FðtÞsinð/Þ @t

where r is radius of the oscillation. / is the phase of the two dimensional output of the system at time t (in seconds). c defines the strength of the attracting limit cycle i.e. how fast the oscillator returns to the limit cycle after a perturbation. The oscillator has a stable limit cycle defined by the constant value l. x defines the frequency of the oscillation. FðtÞ is input signal and  is damping coefficient for the input signal coupling. In such system the oscillator will learn the frequency of the periodic input FðtÞ. In Fig. 5 we can see how adaptive Hopf oscillator learn the simple harmonic signal FðtÞ ¼ 30sinð5tÞ; black dash line shows Pteach and silver line shows Qlearn. It is clear that the oscillator learned the input signals, correctly. In Righetti et al. (2009), this adaptive mechanism is called dynamic Hebbian learning. Complete system Construction of programmable central pattern generator requires connecting and coupling these Hopfs oscillators. This connection is shown in Fig. 6, Pteach ðtÞ is the desired trajectory which is needed to be trained in the CPGs. Q learned ðtÞ is what the system has learned during the time (t). The difference between these signals is used as a perturbation signal for the Hopf oscillators. The output amplitude of each oscillator is multiplied by an alpha coefficient and then all of these outputs are added together. Output pattern stability is ensured by coupling oscillators, thus adding coupling equation helps the system to achieve right relation between the phases of oscillators (Righetti, Buchli, & Ijspeert, 2005).

@ ai ¼ gF  cosð/i Þri @t

ð10Þ

@ / ¼ ksinðRi  /i;D  /i Þ @t i;D

ð11Þ

Ri ¼

xi / x0 0

ð12Þ

FðtÞ ¼ Pteach ðtÞ  Q learned ðtÞ Q learned ðtÞ ¼

N X

ai ri  cosð/i Þ

ð13Þ ð14Þ

i¼0

here s and  are two constants for coupling oscillators and g is a training constant. Q learned is the output of this programmable CPG which is computed as a weighted sum of each oscillator outputs. FðtÞ is the learned feedback, which shows how much learning has been done and how much teaching signal Pteach ðtÞ still should be taught to the CPGs. ai shows the amplitude dedicated to the frequency xi of the i’th oscillator. Repeated suggestion learning algorithm In this section, we explain how generic CPGs are developed to train rhythmic patterns to the controller of our Nao robot. We have tried to explain details of implementation here to make this paper a suitable reference for other researchers in this field. Each network of

Fig. 5. Learning a simple harmonic signal.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

6

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

Fig. 6. The network of adaptive oscillators (Righetti & Ijspeert, 2006).

CPG is made of exactly constant number of oscillators, as shown in Fig. 6. In this paper, we extend this approach by applying a method to approximate different number of oscillators that is suitable for each input signal instead of using constant number of oscillators, in order to learn the rhythmic movements with greater accuracy. So, the algorithm presented in this section is based on the development of previous method and tries to optimize the number of oscillators to solve the problem of fixed number. An important feature of this method is that it requires no initial determination of the number of oscillators and for many complex input patterns that previous methods have not the ability to learn, training is done. This method is able to reduce train and test phase error as much as desired. The oscillators number which determined by suggestion algorithm is sufficient for learning of drumming trajectories. As noted, the number of oscillators is not fixed in each joint and this number is not pre-determined. In initial stage of this algorithm, an oscillator is placed in the network and is initialized with random numbers. The algorithm consists of several rounds that one oscillator will be trained in each round and as needed a new oscillator is coupled to the previous oscillators to be trained in the next round. First, the zero-frequency of signal is extracted from the main signal to make input signal as symmetrical as possible and training is done in a better way. By converting the signal into a symmetric form around zero, oscillators can detect the important harmonics of the signal easier. Pseudo code of this algorithm is shown in Algorithm 2. Algorithm 2. Pseudo code of repeated suggestions learning algorithm

2: 4:

6: 8: 10: 12: 14: 16:

Extract the bias: b1 meanðyteach Þ yteach  b1. yteach repeat Insert new oscillator to the network Initialize the new oscillator weights ½v 1; v 2; v 3 randomly. W0 ½v 1; v 2; v 3. WðNÞ W0. Execute Algorithm 2 to learn W(N) and yout Insert the new weights to the W: W ½W; WðNÞ. yteach yteach  yout . Compute the value(error):error RMSðyteach Þ. Increment the number of oscillators: N N þ 1. if Err < Tr1 then stop True end if until N > maxN or stop end;

So in the main loop of the algorithm in each round, a new oscillator is added to the network and this oscillator is randomly initialized after that. The second function of the Algorithm 3 starts to calculate and optimize parameters of this oscillator. This part of algorithm calculates the parameters of the oscillator in a manner that oscillator’s output be matched to input pattern as much as possible. Every single oscillator tries to model the overall behavior of the input and forms its output due to this behavior. When a new oscillator is added to the network, the algorithm computes the error of network of oscillators and if this value is greater than a certain threshold level, the main loop will continue to add Oscillators. In fact, each oscillator is trying to minimize the estimation error. The first oscillator starts with the main signal as input and next oscillators get generated error in previous round as an input. If the error is small enough, the main loop of the algorithm terminates and algorithm reaches the final stage. In the final stage oscillators are coupled together. Algorithm 3. Learning of A Single Oscillator.

3:

6:

Initiate Vectors Vec Randomly. while (it < MaxIt) do it it þ 1 for (each vector VecðiÞ : i 1 to N) do Test the Oscillator with v ecðiÞ to find yfit Compute Error v ecðiÞ yteach  yfit dyteach VecðiÞ

9:

12:

15:

18:

T

dyteach :dyteach :P T

ErrorðVecðiÞÞ dyteach :dyteach :P if ErrorðVecðiÞÞ < BestVecðitÞ then BestVecðitÞ VecðiÞ BestYfit yfit end if end for ebest BestVecðitÞ end while Yout Yfit Output Wn; yout end;

The most important step in the third algorithm is to discover the parameters of a single oscillator. In this algorithm the idea of swarm intelligence (Algorithm 3) is used to calculate the optimal parameters. At each step of this algorithm, particles are moving toward a better situation. Details about this algorithm can be found in Bai (2010). For example, a signal is selected as a training sequence. The mathematical equation of this arbitrary signal is as follows: Y teach ¼ 2sinð0:2t þ 2Þ þ sinð0:4t þ 3Þ þ 4sinð0:6t  0:8Þ þ 1. This signal has three Non-zero frequency components. Fig. 7 shows the learning of this signal in presented network: In part (A) input pattern and the final pattern can be observed. As it can be observed, the network outputs are fully consistent with the expected pattern. Part (B) is representative of learned pattern by each oscillator in the network. As can be seen, all 4 oscillators are applied to generate the signal and each of them train one of the main harmonics. In each round of the algorithm, one oscillator is coupled to a network of previous oscillators and the algorithm tries to teach it in a way that it most similar to the remainder signal. The forth oscillator is not oscillating, in fact, after the addition of the fourth oscillator the network training error is so small that the algorithm has reached a certain threshold and oscillators addition to the network is terminated.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

7

Fig. 7. (A) Learning input signal with repeated suggestions algorithm. (B1-2-3) learning by each oscillator in network.

Experimental result In this section the implementation methods and experimental results of the presented CPGs model are presented. In the first step, the training phase of CPGs were performed using Matlab software. In the second step, Nao robot was simulated in Webots. In this simulator the model of the robot is as close to the real robot as the simulation enables us to do. This means we simulate the exact number

of DOFs, the same mass distribution and inertia matrix for each limb. In Fig. 8 an example of a diagram in training mode is illustrated. The black-dash trajectory is joint values (Pteach ), and the silver one (Q learn ) is what the corresponding CPGs generate. It is observed that the training is very fast and almost efficient. Fig. 8 shows Pteach and Qlearn trajectories. Combination of repeated suggestion learning algorithm idea and network of adaptive frequency oscillators approach can be

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

8

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

Fig. 8. Learning drumming trajectories for six joints.

Fig. 9. Comparing learning methods, constant number of oscillators method (gray) and repeated suggestions learning algorithm method (silver). The our method is able to learn input signal with more similarities.

helpful to overcome the challenge of defining enough number of oscillators. In this way, system complexity increases but learning signal process is more accurate. Fig. 9 shows four samples of input signals learning by constant oscillators and our approach. In this figure the original signal (black-dash) and the learned signal with constant number of oscillator, N = 4, (gray) and the learned signal with multi number of oscillator (silver) are shown. The vertical axis represents the signal amplitude and the horizontal axis represents the time. In Fig. 9 silver lines shows learning signals when repeated suggestion algorithm is called to set the suitable number of oscillators, as the figure shows learning process is improved and silver trajectories follow their input in a better way. In Fig. 10 snapshots from the robot’s drumming in simulation environment are shown. To evaluate the performance of the presented method, the proposed algorithm is transferred to the real humanoid robot. Using python programming language, programming codes were generated in the Choregraphe environment. Despite many differences between the real robot dynamic and simulated one, the real NAO is able to show what it should be learnt from imitation learning. Fig. 11 presents a view of this implementation. A short video of these tests can be downloaded from:http://eng. ui.ac.ir/shahbazi/download/newdrum.mp4.

Modulation property and stability against perturbation Learning trajectories is stable against perturbation because of coupled oscillators and has modulation property. Modulation of generated trajectories helps the robot to change its speed and style of drumming. It can increase or decrease its speed and amplitude. In Fig. 12 a simple experiment is presented to show the stability and modulation properties. The upper graph shows stability against perturbation and the lower graph shows amplitude modulation of input signal. This system can ignore the perturbation and quickly recovers its original behavior.

Discussion In this paper, a network of coupled oscillators that can learn input signals were designed. At first, the principle of adaptive frequency oscillators was explained, then a network which is able to learn the frequency components was investigated. After that, a new method was designed to generate a drumming trajectories by using a programmable central pattern generator. As previously mentioned, the number of adaptive frequency oscillators is one of

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

9

Fig. 10. Snapshots of drumming in Webots simulation environment.

Fig. 11. Snapshots of NAO drumming in real environment.

Fig. 12. Stability and modulation properties of the system.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004

10

H. Shahbazi et al. / Biologically Inspired Cognitive Architectures xxx (2017) xxx–xxx

the design factors. So, a method was presented to design the efficient number of oscillators that can learn trajectories with more accuracy. In summery, a repetitive learning algorithm which is implemented in a programming by demonstration system based on an adaptive frequency oscillator network was suggested. Presented model can learn and encode the movement trajectories even in presence of perturbation. Our results show that the suggested method can learn the desired behavior from kinesthetic demonstrations with greater accuracy and lower error than previous methods. References Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57, 469–483. Asfour, T., Azad, P., Gyarfas, F., & Dillmann, R. (2008). Imitation learning of dual-arm manipulation tasks in humanoid robots. International Journal of Humanoid Robotics, 5, 183–202. Bai, Q. (2010). Analysis of particle swarm optimization algorithm. Computer and Information Science, 3, 180. Buchli, J., Righetti, L., & Ijspeert, A. J. (2005). A dynamical systems approach to learning: A frequency-adaptive hopper robot. In European Conference on Artificial Life (pp. 210–220). Springer. Gams, A., Ijspeert, A. J., Schaal, S., & Lenarcˇicˇ, J. (2009). On-line learning and modulation of periodic movements with nonlinear dynamical systems. Autonomous Robots, 27, 3–23. Guertin, P. A. (2009). The mammalian central pattern generator for locomotion. Brain Research Reviews, 62, 45–56. Hackenberger, F. (2007). Balancing central pattern generator based humanoid robot gait using reinforcement learning. na. Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24, 1463–1467.

Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning rhythmic movements by demonstration using nonlinear oscillators. In Proceedings of the ieee/rsj int. conference on intelligent robots and systems (iros2002) BIOROB-CONF-2002-003 (pp. 958–963). Kober, J., & Peters, J. (2010). Imitation and reinforcement learning. IEEE Robotics & Automation Magazine, 17, 55–62. Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., & Kawato, M. (1996). A kendama learning robot based on bi-directional theory. Neural Networks, 9, 1281–1302. Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104, 90–126. Righetti, L., Buchli, J., & Ijspeert, A. J. (2005). From dynamic hebbian learning for oscillators to adaptive central pattern generators. In Proceedings of 3rd international symposium on adaptive motion in animals and machines–AMAM 2005 BIOROB-CONF-2005-011. Ilmenau: Verlag ISLE. Righetti, L., Buchli, J., & Ijspeert, A. J. (2006). Dynamic hebbian learning in adaptive frequency oscillators. Physica D: Nonlinear Phenomena, 216, 269–281. Righetti, L., Buchli, J., & Ijspeert, A. J. (2009). Adaptive frequency oscillators and applications. The Open Cybernetics and Systemics Journal, 3, 64–69. Righetti, L., & Ijspeert, A. J. (2006). Programmable central pattern generators: An application to biped locomotion control. In Proceedings 2006 IEEE international conference on robotics and automation, 2006 (ICRA 2006) (pp. 1585–1590). IEEE. Ruchanurucks, M., Nakaoka, S., Kudoh, S., & Ikeuchi, K. (2006). Humanoid robot motion generation with sequential physical constraints. In Proceedings 2006 IEEE international conference on robotics and automation, 2006 (ICRA 2006) (pp. 2649–2654). IEEE. Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. (2004). Learning movement primitives, intl. In Symposium on Robotics Research. . Tay, A. J. S. B. (2009). Walking nao omnidirectional bipedal locomotion. UNSW SPL team report. Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26, 800–815. Zielin´ska, T. (2009). Biological inspiration used for robots motion synthesis. Journal of Physiology-Paris, 103, 133–140.

Please cite this article in press as: Shahbazi, H., et al. Demonstration learning of robotic skills using repeated suggestions learning algorithm. Biologically Inspired Cognitive Architectures (2017), http://dx.doi.org/10.1016/j.bica.2017.02.004