A L-MCRS dynamics approximation by ELM for Reinforcement Learning

A L-MCRS dynamics approximation by ELM for Reinforcement Learning

Neurocomputing 150 (2015) 116–123 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom A L-MCR...

3MB Sizes 0 Downloads 31 Views

Neurocomputing 150 (2015) 116–123

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A L-MCRS dynamics approximation by ELM for Reinforcement Learning Jose Manuel Lopez-Guede a,d,n, Borja Fernandez-Gauna b,d, Jose Antonio Ramos-Hernanz c a Department of Systems Engineering and Automatic Control, University College of Engineering of Vitoria, Basque Country University (UPV/EHU), Nieves Cano 12, 01006 Vitoria, Spain b Department of Software and Computing Systems, University College of Engineering of Vitoria, Basque Country University (UPV/EHU), Nieves Cano 12, 01006 Vitoria, Spain c Department of Electrical Engineering, University College of Engineering of Vitoria, Basque Country University (UPV/EHU), Nieves Cano 12, 01006 Vitoria, Spain d Computational Intelligence Group, Basque Country University (UPV/EHU), Spain

art ic l e i nf o

a b s t r a c t

Article history: Received 29 October 2013 Received in revised form 6 January 2014 Accepted 29 January 2014 Available online 2 October 2014

Autonomous task learning for Linked Multicomponent Robotic Systems (L-MCRS) is an open research issue. Pilot studies applying Reinforcement Learning (RL) on Single Robot Hose Transport (SRHT) task need extensive simulations of the L-MCRS involved in the task. The Geometrically Exact Dynamic Spline (GEDS) simulator used for the accurate simulation of the dynamics of the overall system is a time expensive process, so that it is infeasible to carry out extensive learning experiments based on it. In this paper we address the problem of learning the dynamics of the L-MCRS encapsulated on the GEDS simulator using an Extreme Learning Machine (ELM) approach. Profiting from the adaptability and flexibility of the ELMs, we have formalized the problem of learning the hose geometry as a multi-variate regression problem. Empirical evaluation of this strategy achieves remarkable accurate approximation results. & 2014 Elsevier B.V. All rights reserved.

Keywords: Extreme learning machines Linked multicomponent robotic systems Hose control Reinforcement learning

1. Introduction A relevant taxonomy of different types of Multi-Component Robotic Systems was introduced in [1]. That taxonomy was structured focusing on the degree of coupling among the robots composing the system, characterized by the strength of their coupling. The first type of systems is those where there is no physical coupling, namely Uncoupled systems. Examples of this kind of systems are robot soccer teams, teams of unmanned aerial vehicles (UAV) and uncoupled swarms. A second type of systems arises when a rigid physical coupling is considered between the robotic components, generating then a new unit with new physical and functional properties. This type of systems is called Modular systems. Some examples are PolyBot, M-TRAN, Proteo, some cases of S-bots and industrial robots properly coupled. The last type of systems is composed of elements coupled through a passive non-rigid element, and is called Linked systems. Being more specific in the discussion on the third type of systems, Linked Multi-Component Robotic Systems (L-MCRS) [1] are autonomous robot groups attached to a non-rigid unidimensional object linking them, which imposes contraints in the robot dynamics which interfere in their coordination, introducing strong non-linearities in the dynamics of the system, and consequent uncertainty in the control of the robots. It is also a

n

Corresponding author. E-mail address: [email protected] (J.M. Lopez-Guede).

http://dx.doi.org/10.1016/j.neucom.2014.01.076 0925-2312/& 2014 Elsevier B.V. All rights reserved.

non-linear transmission medium for the dynamical influences among robots. With regard to the applications in which this kind on systems can play an important role, most of them are related to some function the non-rigid link itself. The paradigm is illustrated by the transportation of a hose-like object [2,3], more specifically the transportation of the hose tip to a given location in the working space while the other extreme is attached to a fixed point. The paradigmatic task is the Single Robot Hose Transport (SRHT). Achieving SRHT control can be generalized to more complex tasks. These complex tasks include the transport of liquids as water (for supply, for fire fighting or for smart orchards), paint, fuel, etc., the transport of electrical energy or even air (compressed or not), among others. They can also perform tasks of collecting liquids as water in floodings or fuel, oil and toxic substances in accidents, or of collecting semisolids as garbage. To simulate the dynamics of this kind of systems and the effect of a robot action on its components (mainly on the hose), we use an accurate hose dynamics model based on Geometrically Exact Dynamic Splines (GEDS) [2], which is computationally expensive. Autonomous learning of control tasks in L-MCRS is still an open research issue. We have used Reinforcement Learning (RL) techniques to deal with it, modeling the problem as a Markov Decision Problem (MDP). There are several RL strategies, but when a modelfree strategy is chosen, learning performs the repetition of the experiments a large number of times. In many domains, this is not a problem by itself, however, in the domain of L-MCRS where the experimentation is carried out by means of a computationally

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

expensive simulator, the realization of the reported experimental results [3–7] consumes long times. To solve this issue, learning approximations to the GEDS model by Artificial Neural Networks (ANN) have been proposed [8]. A trained ANN model provides very fast responses allowing us to perform exhaustive simulations for RL. In this paper we focus on the development of an Extreme Learning Machines (ELM) approximation to the GEDS model. ELMs are very adaptable and flexible neural networks, and besides, they have several interesting properties: their training is very fast, they are easy to implement and need minimal human intervention, allowing the formulation of the GEDS model learning as multi-variable regression task. We analyze the accuracy of the approximation to the GEDS model, concluding that the tradeoff between the quality of the approximation and its response speed allows further use of the approximate model embedded in RL experiments. The paper is structured as follows. Section 2 recalls the definitions of some computational methods involved in the developed work. Section 3 introduces the problem that we are addressing in the paper, while Section 4 presents the ELM based approach to solve it. Section 5 details the experimental setup that has been carried out, and Section 6 discusses the obtained experimental results. Finally, Section 7 presents out conclusions and addresses future work.

2. Computational methods In this section we review computational methods used along the paper which are essential to understand the approximation problem that we are facing, its magnitude, and the solutions that we are reporting. Section 2.1 gives a review on Geometrically Exact Dynamic Splines (GEDS) to understand how the unidimensional element of the L-MCRS is modeled. Section 2.2 reviews some basic concepts of Reinforcement Learning (RL), specifically the Q-Learning (Q-L) and TRQ-Learning (TRQ-L) algorithms. Finally, Section 2.3 gives a short review of Extreme Learning Machines (ELMs). 2.1. Geometrically exact dynamic splines We assume a simplified L-MCRS model where the hose is a one-dimensional object deployed in the two dimensional space of the ground. The basic geometrical model of the hose is a spline [9], that is, a linear combination of control points pi where the linear coefficients are the polynomials Ni(u) which depend on the length of the curve parameter u defined in ½0; 1Þ. Formally: qðuÞ ¼ ∑ni¼ 0 N i ðuÞ  pi , where Ni(u) is the polynomial associated to the control point pi , qðuÞ is the point of the curve at the parameter value u. It is possible to travel over the curve by varying the value of parameter u, starting at one end for u ¼0, finishing at the other end for u ¼1. We need to specify some specific positions over the curve corresponding to the robot positions. Therefore, a more appropriate model is a B-Spline defined as follows. Given nþ 1 control points fp0 ; …; pn g and a knots' vector U ¼ fu0 ; …; um g, the B-spline curve of degree p is qðuÞ ¼ ∑ni¼ 0 N i;p ðuÞ  pi , where Ni;p ðuÞ are B-spline basis functions of degree p (p ¼3 in this work), built using the Cox–de-Boor's algorithm [10]. Furthermore, we expect that our system will be changing in time, so the appropriate model is a dynamic B-spline whose control points depend on the time parameter t, that is qðu; tÞ ¼ ∑ni¼ 0 N i;p ðuÞ  pi ðtÞ. In hose transport systems, either by a single robot or a team of robots, the dynamics of the B-spline control points are determined by the forces exerted by the robots and the intrinsic forces acting on the hose: stretching, bending, inertia, friction, and twisting moment. The dynamical model for the hose simulation, based on the GEDS approach [11] and Cosserat rod approach [12], is detailed

117

in [2,3,13]. The simulation of the effect of the robot control commands on the hose shape is computed using the following linear local approximation model [2]: Fp ¼ J pr  Fr , where the relation between the forces applied on the robot attaching points Fr and the resulting forces on the spline control points Fp are given by the Jacobian matrix Jpr relating robot positions and control points: 0 ∂qður Þ ∂qðurl Þ 1 0 1 1 N 0 ður1 Þ ⋯ N 0 ðurl Þ ⋯ ∂p0 B ∂p0 C B ⋮ ⋱ ⋮ C ⋮ ⋱ ⋮ C¼@ ð1Þ J pr ¼ B A; @ ∂qðu Þ A ∂qðurl Þ r1 ðu Þ ⋯ N ðu Þ N ⋯ n r n r 1 l ∂p ∂p n

n

where the robot positions uri correspond to the B-spline knots. 2.2. Reinforcement learning Reinforcement Learning (RL) [14] is a class of learning algorithms which assumes that the environment-agent system can be modeled as a discrete time stochastic process formalized as a Markov Decision Process (MDP) [15,16], defined by the tuple 〈S; A; T; R〉, where S is the state space, A is the action repertoire, specifically As are the actions allowed in state s A S, T : S  As  S-R is the probabilistic state transition function, and R : S  As -R is the immediate reward function. A policy π : S-As is the probabilistic decision of the action a A As to be taken in state s A S. RL procedures look for optimal action selection policies maximizing the total reward received by the agent. 2.2.1. Q-Learning Algorithm 1. Q-Learning algorithm. Initialize Q ðs; aÞ arbitrarily Repeat (for each episode): Initialize s Repeat (for each step of episode): Choose a from s using policy derived from Q Take action a, observe reward r and new state s0   0 0 Q ðs; aÞ’Q ðs; aÞ þ α r þ γ max Q ðs ; a Þ  Q ðs; aÞ 0 a

s’s0 until s is terminal Q-Learning [17] is an unsupervised model free RL algorithm that learns the optimal policy in environments specified by Finite MDP (S and A are finite sets). The learning process is specified in Algorithm 1. The main idea of the algorithm is to fill a lookup table Q ðs; aÞ of dimensions jSj  jAj, which is initialized arbitrarily, being updated following the rule specified by the following equation: h i Q ðst ; at Þ’Q ðst ; at Þ þ α r t þ 1 þ γ  max Q ðst þ 1 ; aÞ  Q ðst ; at Þ : ð2Þ a

It has been proved [18] that, for a discrete FMDP environment, the Q-Learning algorithm converges with probability one to the optimal policy if α decrease complies with the stochastic gradient convergence conditions and if all actions are infinitely sampled in all states. 2.2.2. TRQ-learning Algorithm 2. TRQ-Learning algorithm. Initialize Q ðs; aÞ with arbitrary random values Initialize Tðs; aÞ ¼ ̈¿œ, Rðs; aÞ ¼ 0 for all states s A S and actions a A A. Repeat (for each episode): Initialize s

118

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

Repeat (for each step of episode): Choose a from s using policy derived from Q if Tðs; aÞ a ∅ s0 ’Tðs; aÞ r’Rðs; aÞ else Take action a, observe reward r and new state s0 Tðs; aÞ’s0 Rðs; aÞ’r h i Q ðs; aÞ’Q ðs; aÞ þ α r þ γ max Q ðs0 ; aÞ  Q ðs; aÞ s’s0 until s is terminal

a

TRQ-Learning [7] is an algorithm derived from Q-Learning which performs an ad hoc model learning, i.e., besides learning the optimal policy to solve the proposed task, it also learns the response of the environment to the agent's actions. Its update rule is similar to the Q-Learning's rule, but it involves additional data structures Tðs; aÞ and Rðs; aÞ storing the learned transition and the reward functions, respectively, as specified in Algorithm 2. TRQlearning is faster than Q-learning, and it is also more successful [7], because the agent learns the environment's response to its actions by means of its transition and reward functions.

2.3. Extreme Learning Machines Extreme Learning Machines (ELMs) [19–26] are Single-Hidden Layer Feedforward Networks (SLFNs) trained without iterative tuning of the hidden layer, which confers them several interesting characteristics compared to classical back-propagation algorithm: they are easy to use, they require minimal human intervention, they have a faster learning speed and a higher generalization performance, and they are suitable for several nonlinear activation functions. The ELM approach to SLFN training consists in the following two steps: 1. Generate randomly the hidden layer weights W, computing the hidden layer output H ¼ gðWXÞ for the given data inputs X, where gðxÞ denotes the hidden units activation function, which can be sigmoidal, Gaussian or even the identity. 2. Solve the linear problem Hβ ¼ Y, where Y are the outputs of the data sample, and β are the weights from the hidden layer to the output layer, by the mean least squares approach. Therefore β^ ¼ H† Y, where H† is the pseudo-inverse.

3.1. Single robot hose transport A SRHT is a L-MCRS is an unidimensional object (e.g. a hose) that has one end attached to a fixed point (which is set as the middle point of the working space), and the other end (the tip) is transported by a mobile robot. The task for the robot is to bring the tip of the unidimensional object to a designated destination point applying a sequence of discrete actions of predefined duration. The working space is a square of 2  2 m2. We have applied a spatial discretization of 0, 5 m for better visualization, so each discrete action of the robot is the translation of the robot to a neighboring box. Fig. 1 illustrates an initial configuration of the entire system, where P r Initial is the initial position of the robot and P r Final is the desired final position of the robot. The robot must learn how to reach any arbitrary point of the workspace carrying the tip of hose. In order to achieve this goal autonomously, a RL algorithm has to perform multiple simulations of the system based on a GEDS model. Fig. 1 shows one of these simulations in which the robot carrying the tip of the unidimensional object goes from an initial position P r Initial to a desired position P r Final through five intermediate points.

3.2. Computational cost of L-MCRS simulation based on GEDS Due to the complexity of the hose GEDS model, which is implemented in Matlab, simulation times are acceptable only if a few repetitions are needed, but not when much more repetitions (in the order of millions) need to be done. Specifically, on a Dell Precision T7500 workstation, equipped with a processor Intel (R) Xeon(R) CPU E5620 @ 2,40 GHz with 12,00 GB of RAM memory and a Microsoft Windows 7 Professional 64 bits SP1 operating system, the simulation of a movement of 0.2 m requires 1.6 s. This response time is unacceptable under the convergence conditions of the RL algorithm, involving many repetitions of all possible movements in each reachable situation of the system. A simulation of one million of movements may last more than 18 days running 24 h a day. To make things worse the GEDS model and the RL algorithms are not suitable for parallelization.

There are a number of recent developments in which ELMs have been used with success, like the quickly training of a positive and negative fuzzy rule system for remote sensing image/natural image classification [27], the training of a neural network which implements a set of multiple upper integral classifiers [28] and an extremely fast learning of classifiers for the recognition of handwritten Malayan characters [29].

3. The approximation problem In this section we state the approximation problem. In Section 3.1 the basic structure of L-MCRS and a paradigmatic task carried out by them is provided, justifying the need of extensive simulations. Section 3.2 quantifies the time necessities of those simulations, clarifying that the computational burden of L-MCRS is a great practical problem.

Fig. 1. Evolution of the hose–robot systems through a GEDS simulator: the tip of the unidimensional object robot reaches the goal.

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

4. ELM approach to GEDS approximation The approximation of the GEDS hose model by means of ELMs consists in using the initial hose state and the action to be carried out as inputs, predicting the hose state after executing the action. The hose state is given by 11 points in the working space. Learning an approximate geometrical model of a hose is formulated as a set of 11 independent multi-variate regression tasks as follows: given an initial state (shape) of the hose and an action to be carried out by the robot attached to its tip, a collection of 11 independent ELMs must obtain the resulting state (shape) of the hose after performing that action during a predefined time according to the behavior of the hose (that has been previously obtained through extensive simulations with a GEDS simulator), after solving 11 independent multivariate regression problems. For all the 11 independent multi-variate regression tasks, the initial and the final states of the hose have been described by means of a hose discretization: taking into account that its length is 1 m and that segments of 10 cm are acceptable,  11 11 equally spaced two-dimensional points P j j ¼ 1 of the hose have been sampled. The action executed by the robot belongs to the set A ¼ fNorth; South; East; Westg. The final state of the hose is modeled by means of 11 two-dimensional equally spaced points and each one of them is learned by each one of the 11 independent ELMs. Thus each ELM has to learn a two real valued function. The training and testing datasets have the following content:

 Inputs: The initial state of the hose and the action of the robot. 

All considerations made for the previous multi-variate regression task are valid for this case. Outputs: For each independent ELM, the target is the two real  11 valued coordinates of a specific point P j j ¼ 1 of the hose after executing the robot action at the hose initial state, both at the input layer. The final state of the hose is modeled through the two-dimensional outputs of each of the 11 independent ELMs.

5. Experimental design In this section we explain the experimental design that we have followed to learn accurate geometrical models of the hose based on the ELM approximation described in Section 4. Section 5.1 explains in an algorithmic way the general procedure, involving issues regarding the datasets and ELMs. Section 5.2 shows the datasets extraction and modification possibilities. With regard to ELMs, Section 5.3 enumerates several structural parameters, and Section 5.4 discusses the general learning and validation procedure. 5.1. General procedure

Algorithm 3. General hose model learning procedure through ELMs. Given algorithm finding all feasible paths between any two arbitrarily designed points, without crossings, whose inputs are exact path length, a set of actions, and the distance that the robot moves with each movement. 1. Use the path finding algorithm to produce all the paths from the fixed point of the hose at ð0; 0Þ to any point reachable by the tip of a flexible hose of 1 m. 2. For each path, (a) for each available robot action, (i) Simulate the effect of the action on the system robothose by the GEDS initialized with selected path.

119

(ii) Save the initial position of the hose and the movement as inputs, and the final position of the hose as output. 3. Given the ELM learning approximation, (a) Adapt the input/output patterns obtained in the previous step to the description used in the approximation. (b) Modify the input patterns according to the approximation (normalization and noise addition). (c) Partition them in two datasets (75% train and 25% test). (d) Train ELMs with the adapted train input/output patterns following the approximation description. (e) Test the learned models adapted test dataset, to validate that model can be used instead the analytical model. Algorithm 3 provides a detailed specification of the process followed for the learning of the accurate approximation of the GEDS model by ELMs. The 1st main step of the general procedure indicates that all factible paths from the fixed point (0, 0) of the working area have to be found by means of a path finding algorithm. This is a basic combinatorial problem that is beyond the scope of this paper. The 2nd main step of the general procedure involves for each of the paths found out in the previous step the execution of all available actions of the robot. Those executions are carried out through the GEDS simulator, and for each of them the initial state of the robot-hose system (before executing the action) and its final state (after the execution) have to be saved to be used as datasets. In this way, all feasible paths between any two points are given to obtain the set of input/output training and test samples, where inputs are the initial robot-hose system configuration and the action performed, while the output is the final robot-hose system configuration after executing that action by the robot. Finally, the 3rd main step of the general procedure is devoted to the datasets adaptation to the learning procedure as indicated in Section 5.2, and to the training of ELMs networks as described through Sections 5.3 and 5.4. 5.2. Extraction and modification of the train/test datasets from the GEDS simulator Following the general procedure described in Algorithm 3, after carrying out extensive simulations with the GEDS simulator an original dataset is obtained containing the geometrical dynamics of the hose. Until this point the state of the hose is described through 11 two-dimensional sampled points of the hose, both in its initial and final states. Then, following the specification of the learning approximation, there are two transformations that could be carried out on input patterns: normalization and noise addition processes. There are four possibilities: both transformations can be executed, only one of them or none. A transformation is executed on all input attributes of the train and test datasets, which always are the 75% and 25%, respectively, of the total data instances. The normalization process of the input attributes of the train and the test sets them into the range ½  1; 1. So, potentially there are two versions of each training and testing dataset: raw unnormalized and normalized datasets. The noise addition process depends on the approximation specification. If it is applied, we can obtain several datasets versions by the addition of noise to each input attribute based on uniformly distributed pseudo-random numbers in the range r A ½  1; 1 and weighted by a parameter noisew A ½0; 100, as indicated in Eq. (3): attribute’attribute  ̈¿œf1 þ ½noisew  ð2 ̈¿œr  1Þg

ð3Þ

This modification rule is intended for generating additive or subtractive noise of magnitude noisew percent of the value of each input attribute. We have used the following values: noisew A f0; 1g,

120

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

i.e., when noisew ¼0 then the original value of the attributes remains unchanged.



5.3. ELM structure parameters We have used typical ELM networks with the following configurations:

 Activation function: We have tested ELMs with different activa

tion functions in the hidden nodes: sigmoidal, sine, hardlim, triangular basis function and radial basis function. Neurons number: We have tested ELMs with different number of hidden nodes h in the hidden layer. We have used near to logarithmic fashion spaced natural values between 10 and 200; 000, giving the 41 different values derived from Eq. (4): ( ! ) hs A

3



9

p¼1

⋃ 10p :q

q¼1

 H A N þ . It is the set of number of hidden nodes, which are near

20

[ ⋃ 104 :q

ð4Þ

q¼1

 Trials: For the resulting combinations of adopting the variants of activation function and neurons number previously exposed, we have performed 5 trials and we have measured individual, mean and standard deviation values, of both test accuracy and training time.

5.4. General training and validation procedure In this subsection we summarize in an algorithmic fashion the experimental design that we have carried out to validate the multi-variate regression ELM approach. We define the following complete sets to formalize the algorithmic description:

 R ¼ ftrue; falseg. It is the set of the possibilities regarding the normalization of the attributes.

 N ¼ f0; 1g. It is the noise gain added to each input attribute,

to logarithmic fashion spaced integer values between 10 and 200; 000, such that jHj ¼ 41 different values. F ¼ fsigmoidal; sine; hardlim; triangular; radialg. It is the set of activation functions.

The global procedure for multi-variate regression task is specified in Algorithm 4. Algorithm 4. Validation procedure for ELM multi-variate regression approximation of the geometrical hose model. For each possibility regarding the normalization r q A R For each noise percentage nr A N Load training and testing datasets with: - inputs according to the task - outputs according to the task - normalization according to rq - noise nr percent For each number of hidden nodes hs A H For each activation function f t A F For each initialization iA ½1; 5 Create an ELM suited to the task with: - inputs according to the task - outputs according to the task - hs hidden nodes - f t as activation function Train the i  th ELM using the training dataset, Obtain accuracy individual values Obtain time individual values Test the i  th ELM using the testing dataset Obtain accuracy individual values Obtain time individual values Obtain accuracy train/test mean/std dev values for all initializations Obtain time train/test mean/std dev values for all initializations

expressed as a percentage. 6. Experimental results Table 1 Best test accuracy (RMSE) results for 11 independent Multi-variate regressions based learning of the model, merging the results obtained with all combinations of hidden nodes and activation functions for each one of the 11 approximated points. Norm ELM

Best individual ELM Acc:10

No

Yes

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11

8.4 7.4 9.2 15 16 0.3 10.4 6.2 15.4 11.1 13.5 4.2 4.1 6.5 13.4 23.9 4.5 5.3 4.5 9.5 17 21.8

3

Best mean ELMs

Nodes

Funct

170,000 400 300 300 300 80,000 300 400 200 400 300 50 200,000 140,000 90 120,000 50,000 60 200 100 200 500

hardlim sin sin sin sin sig sin sin sin sin sin sig hardlim hardlim sig hardlim sin sig sig sig sig sin

Acc:

10  3

8.9 8.1 10 16.5 17.7 0.3 11.4 6.7 16.3 11.9 14 4.5 4.2 6.6 14.1 24.1 4.7 6 4.6 10.2 17 23.1

σ 10 5 5 7 12 12 0 2 6 12 7 6 1 0 1 4 2 0 1 1 2 1 9

4

Nodes

Funct

200,000 400 300 300 300 140,000 200 400 300 400 300 130,000 110,000 140,000 90 180,000 70,000 70 200 90 200 500

hardlim sin sin sin sin sin sin sin sin sin sin hardlim hardlim hardlim sig hardlim sin sig sig sig sig sin

In this section we discuss the results achieved with the learning approximation described in Section 4 under the experimental design proposed in Section 5. The experiments have been carried out using the public software available at the ELM web page,1 customized with regard to the manipulation of datasets and storage of results, among others. Regarding training and test time discussions, all experiments have been carried out on a Dell Precision T7500 workstation, equipped with a processor Intel (R) Xeon(R) CPU E5620 @ 2.40 GHz with 12,00 GB of RAM memory and a Microsoft Windows 7 Professional 64 bits SP1 operating system. The best test accuracy results of the hose model learning through 11 multi-variate regressions approximation are summarized in Table 1, merging for each one of the 11 points to be learned, the results obtained by the ELMs with all combinations of hidden nodes and activation functions. In this table, the focus is set on the effect of the training dataset on the learning carried out by the ELMs. This table also contains the merged effect of both the normalization and the noise addition: the normalized datasets also have a noise percentage addition of 1%, while the unnormalized datasets do not have any noise addition. We recall that following the experimental setup there are 11 sampled points in the hose, and an ELM has to be trained to learn the geometrical 1

http://www.ntu.edu.sg/home/egbhuang/ELM_Codes.htm.

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

dynamics of each of those points, i.e., 11 independent ELMs have to be trained to learn the two coordinates of the corresponding point. Table 1 shows the characteristics of the best ELM for each point, taking into account the test accuracy value based on the Root Mean Square Error (RMSE) performance index of the result, which

121

is composed of two coordinates. In this way, it provides for each hose discretized point and training dataset combination, the best initialization of an individual ELM (with its test accuracy, number of hidden nodes and activation function). It also contains the ELM configuration which has obtained the best mean values along all

Fig. 2. Average test accuracy (RMSE) of the ELMs of hose 3 spaced points, for 11 independent Multi-variate regressions based learning of the model: (a) 6th ELM, No normalized, 0% noise; (b) 6th ELM, Normalized, 1% noise; (c) 8th ELM, No normalized, 0% noise; (d) 8th ELM, Normalized, 1% noise; (e) 11th ELM, No normalized, 0% noise; and (f) 11th ELM, Normalized, 1% noise.

122

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

initializations (with its mean and standard deviation of test accuracy, number of hidden nodes and activation function) for the same combination of discretized point and training dataset. Analyzing Table 1 we can obtain several conclusions, taking into account that they are obtained under a double effect of normalization and noise addition to the datasets:

 In general, the test accuracy values (both for individual ELM







and mean values of an ELM structure) are successful, because working inside a square area of 4 m2 and with a length hose of 1 m, the test accuracy of the most ELMs is near or under a centimeter, namely, less than the hose thickness. Even for the coordinates of the 6th point (6th ELM), a mean precision of 0.3 mm is reached. In this approximation, we can see that for most ELMs the sin activation function is the best, because it obtains the best mean test accuracy values. Besides, the number of hidden nodes is the smallest. We realize that in general, there is not an absolute influence of the combination of normalizing datasets and adding noise on the number of hidden nodes. Seven of the eleven points have been approximated more successfully using normalized datasets with noise, while the remaining four using unnormalized ones without noise. The training with normalized datasets with noise generates much more larger ELMs in some cases, but slightly smaller in others. However, in general (not in all cases) using normalized datasets with noise gets best accuracy values, but usually at the expense of producing larger ELMs.

Fig. 2 shows average test accuracies (RMSE) of all initializations for each tested activation functions for three spacing points of the hose, i.e. the 6th, 8th and 11th points. The horizontal axis shows the number of hidden neurons of the ELM. Fig. 2(a), (c), (e) shows results reached with unnormalized datasets without noise, while Fig. 2(b), (d), (f) shows results using normalized datasets with noise. There we can see clearly that almost for all range of tested hidden nodes, the sin activation function is the best in the case of unnormalized without noise addition datasets. However, using normalized ones with noise addition, there is not so clear the best activation function. In this case, all activation functions exhibit a similar and characteristic behavior sharing the shape, although the sin activation function exhibits one of the best performance again. At this point we have found that the model learning approximation based on 11 independent multi-variate regressions obtains successful values of accuracy, however the main issue addressed in this paper is the computational cost of L-MCRS simulation based on GEDS and up to now this has not been solved. Table 1 shows the characteristics of the best ELM for each point with two different training datasets, but we can realize that for some points large ELMs have been found as the optimal from the accuracy point of view. This means that for these points the ELMs need an amount of time to obtain a response similar to the analytical and precise GEDS simulator. To overcome this problem the solution is to choose the optimal ELMs for those points whose ELM is relatively small and its response time is acceptable, while for those points whose ELMs have a response time unacceptably large, a suboptimal ELM from the point of view of accuracy but with a fast response time will be chosen. Table 2 shows the points for which suboptimal ELMs have been chosen due to the slowness of their optimal ones. The first three columns show the accuracy in millimeters (Acc), the number of nodes and the response time in milliseconds (RT) of the optimal ELMs contained in Table 1, while the next three columns show the same data for the chosen suboptimal ELMs for the corresponding point. The last two columns show respectively the accuracy

Table 2 Suboptimal ELMs for those points whose optimal ELMs are large, with the accuracy decrease ð∇AccÞ and response time saving (% TS). ELM Optimal ELM Acc:10 6 2 3 5

3

0.3 4.1 6.5 23.9

Suboptimal ELM 3

Nodes

RT 10

80,000 200,000 140,000 120,000

569.6 1020.8 693.8 710

s

Acc:10

3

1.2 4.5 6.8 24.2

∇Acc:10

Nodes RT 10 1000 200 90 300

7.4 1.5 0.8 2.3

3

3

% TS

s

0.9 0.4 0.3 0.3

98.70 99.85 99.88 99.68

Table 3 Test accuracy (RMSE) results and response time (Test) for 11 independent Multivariate regressions based learning of the model, with a tradeoff between accuracy and response speed. ELM

1 2 3 4 5 6 7 8 9 10 11

Norm

Yes Yes Yes Yes No No Yes Yes Yes No No

Acc:10

4.2 4.5 6.8 13.4 16 1.2 5.3 4.5 9.5 11.1 13.5

3

Nodes

50 200 90 90 300 1000 60 200 100 400 300

Funct

sig sig sig sig sin sin sig sig sig sin sin

Time10

3

s

Learning

Test

20.9 86.1 28.1 36.8 217.1 5508.9 20.5 79.2 32.2 416.3 193.7

0.6 1.5 0.8 0.7 2.7 7.4 0.9 2.1 0.9 4 1.9

decrease of the suboptimal ELM compared with the optimal one in millimeters ð∇AccÞ, and the percentage of response time saving when using the suboptimal ELM (% TS). There we can see that a tradeoff between accuracy and speed has been reached, since with a mean accuracy decrease of 0.45 mm, and always below 1 mm, a mean response time saving of 99.53% is reached. Table 3 shows the complete configuration of the learned model based on ELMs with that tradeoff between accuracy and response speed. Suboptimal ELMs have been chosen for the 2nd, 3rd, 5th and 6th points, always preserving the type of training dataset in which the optimal ELM for that point was obtained with. For the remaining points, the optimal ELM has been maintained. On one hand, focusing on the accuracy of the learned model we can see that for all points Acc ¼ 8:18 mm, which is comparable to the hose thickness, while the standard deviation s¼4.78 mm. On the other hand, focusing on the response time (Test in Table 3), we found that taking into account the response time of the 11 ELMs only 23.5 ms are needed to obtain the complete response of the model. Recalling that as stated in Section 3.2 the simulation of a movement of 0.2 m requires 1.6 s in the GEDS simulator, we state that a complete model response of 23.5 ms using ELMs means a time saving of 98.53%. This implies that for a Reinforcement Learning experiment of 1 million of movements, we will need only 6.52 h, versus the 18 days needed by the GEDS simulator.

7. Conclusions and future work In this paper we have discussed one of our main practical issues when dealing with tasks autonomous learning on mobile robots in Linked Multi-Component Robotic Systems (L-MCRS) [1] through Reinforcement Learning paradigm: the computational burden derived from extensive simulations needed for learning. In this paper we have recalled several computational methods associated

J.M. Lopez-Guede et al. / Neurocomputing 150 (2015) 116–123

to the autonomous learning of behaviors to develop intelligent tasks, such as unidimensional elements through GEDS, two RL algorithms (Q-L and TRQ-L) and ELMs. Next we have explained in detail the problem which we are facing quantifying it with time measures. Our approach to face it is to learn the GEDS model through ELMs. Profiting from the adaptability and flexibility of the ELMs, we have formalized hose geometry learning task based on multi-variable regression tasks, where 11 independent optimal and suboptimal ELMs have to predict the final position of 11 points which represent the hose state. On independent datasets, the mean accuracy (RMSE) of all points is Acc ¼ 8:18  10  3 m (which is comparable to the hose thickness), while the standard deviation s ¼ 4:78  10  3 m. Regarding the main issue addressed in the paper, i.e., the computational burden associated to the simulation with the GEDS simulator, we have shown that a time saving of the 98.53% is reached using the learned model through ELMs. These are successful and promising results that will allow much more extensive simulations to obtain control processes in a reasonable time where the small overhead of the initial ELMs training is compensated by the time reduction during the RL execution. Future work will consist in testing the embedding of the obtained ELMs into the Robot-Hose system automatic learning algorithms to predict the hose dynamics.

123

[17] C. Watkins, Learning from delayed rewards (Ph.D. Dissertation), University of Cambridge, England, 1989. [18] C. Watkins, P. Dayan, Q-Learning, Mach. Learn. 8 (May (3–4)) (1992) 279–292. [19] D.H.W.G.-B. Huang, Y. Lan, Extreme learning machines: a survey, Int. J. Mach. Learn. Cybern. 2 (2) (2011) 107–122. [20] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1–3) (2006) 489–501. [21] G.-B. Huang, L. Chen, C.-K. Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Netw. 17 (4) (2006) 879–892. [22] G.-B. Huang, L. Chen, Convex incremental extreme learning machine, Neurocomputing 70 (16–18) (2007) 3056–3062. [23] G.-B. Huang, M.-B. Li, L. Chen, C.-K. Siew, Incremental extreme learning machine with fully complex hidden nodes, Neurocomputing 71 (4–6) (2008) 576–583. [24] G.-B. Huang, L. Chen, Enhanced random search based incremental extreme learning machine, Neurocomputing 71 (16–18) (2008) 3460–3468. [25] L. Chen, G.-B. Huang, H.K. Pung, Systemical convergence rate analysis of convex incremental feedforward neural networks, Neurocomputing 72 (10–12) (2009) 2627–2635. [26] G.-B. Huang, D. Wang, Y. Lan, Extreme learning machines: a survey, Int. J. Mach. Learn. Cybern. 2 (2) (2011) 107–122. [27] W. Jun, W. Shitong, F.-l. Chung, Positive and negative fuzzy rule system, extreme learning machine and image classification, Int. J. Mach. Learn. Cybern. 2 (4) (2011) 261–271. [28] X. Wang, A. Chen, H. Feng, Upper integral network with extreme learning mechanism, Neurocomputing 74 (16) (2011) 2520–2525. [29] B. Chacko, V. Vimal Krishnan, G. Raju, P. Babu Anto, Handwritten character recognition using wavelet energy and extreme learning machine, Int. J. Mach. Learn. Cybern. 3 (2012) 149–161 [Online]. Available: http://dx.doi.org/10.1007/ s13042-011-0049-5.

Acknowledgments The research was supported by Grant UFI11-07 of the Research Vicerectorship, Basque Country University (UPV/EHU), and group funding by the Basque Government with code IT874-13.

Jose Manuel Lopez-Guede, Ph.D. is an assistant professor at the Universidad del Pais Vasco. His research interests are robotics and reinforcement learning.

References [1] R. Duro, M. Graña, J. de Lope, On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development, Inf. Sci. 180 (14) (2010) 2635–2648. [2] Z. Echegoyen, I. Villaverde, R. Moreno, M. Graña, A. d'Anjou, Linked multicomponent mobile robots: modeling, simulation, and control, Robot. Autonom. Syst. 58 (12, SI) (2010) 1292–1305. [3] B. Fernandez-Gauna, J. Lopez-Guede, E. Zulueta, M. Graña, Learning hose transport control with q-learning, Neural Netw. World 20 (7) (2010) 913–923. [4] B. Fernandez-Gauna, J. Lopez-Guede, M. Graña, Towards concurrent q-learning on linked multi-component robotic systems, in: E. Corchado, M. Kurzynski, M. Wozniak (Eds.), Hybrid Artificial Intelligent Systems, ser. Lecture Notes in Computer Science, vol. 6679, Springer, Berlin/Heidelberg, 2011, pp. 463–470. [5] J.M. Lopez-Guede, B. Fernandez-Gauna, M. Graña, E. Zulueta, Empirical study of q-learning based elemental hose transport control, in: E. Corchado, M. Kurzynski, M. Wozniak (Eds.), Hybrid Artificial Intelligent Systems, ser. Lecture Notes in Computer Science, vol. 6679, Springer, Berlin/Heidelberg, 2011, pp. 455–462. [6] J. Lopez-Guede, B. Fernandez-Gauna, M. Graña, E. Zulueta, Further results learning hose transport control with q-learning, J. Phys. Agents, 2014, in press. [7] J.M. Lopez-Guede, B. Fernandez-Gauna, M. Graña, E. Zulueta, Improving the control of single robot hose transport, Cybern. Syst. 43 (May (4)) (2012) 261–275. [8] J.M. López-Guede, B. Fernández-Gauna, E. Zulueta, Towards a real time simulation of linked multi-component robotic systems, in: KES, 2012, pp. 2019–2027. [9] R.H. Bartels, J.C. Beatty, B.A. Barsky, An Introduction to Splines for Use in Computer Graphics & Geometric Modeling, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1987. [10] C.D. Boor, A Practical Guide to Splines, Springer, Berlin, August 1994. [11] A. Theetten, L. Grisoni, C. Andriot, B. Barsky, Geometrically exact dynamic splines, Comput.-Aided Des. 40 (January (1)) (2008) 35–48. [12] M. Rubin, Cosserat Theories: Shells, Rods and Points, Kluwer, The Netherlands, 2000. [13] Z. Echegoyen, Contributions to visual serving for legged and linked multicomponent robots (Ph.D. Dissertation), UPV/EHU, 2009. [14] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998. [15] R. Bellman, A Markovian decision process, Indiana Univ. Math. J. 6 (1957) 679–684. [16] H. Tijms, Discrete-Time Markov Decision Processes, John Wiley & Sons, Ltd, Chichester (2004) 233–277.

Borja Fernandez-Gauna Ph.D. is an assistant professor at the Universidad del Pais Vasco. His research interests are robotics and reinforcement learning.

Jose Antonio Ramos-Hernanz is an assistant professor at the Universidad del Pais Vasco. His research interests are renewable energies.