CBR based reactive behavior learning for the memory-prediction framework

CBR based reactive behavior learning for the memory-prediction framework

ARTICLE IN PRESS JID: NEUCOM [m5G;February 13, 2017;21:12] Neurocomputing 0 0 0 (2017) 1–10 Contents lists available at ScienceDirect Neurocomput...

2MB Sizes 0 Downloads 44 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;February 13, 2017;21:12]

Neurocomputing 0 0 0 (2017) 1–10

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

CBR based reactive behavior learning for the memory-prediction framework I. Herrero-Reder∗, C. Urdiales, J.M. Peula, F. Sandoval Grupo ISIS, Dpto. Tecnología Electrónica, E.T.S.I. Telecomunicación, Universidad de Málaga, Campus de Teatinos s/n, 29071 Málaga, Spain

a r t i c l e

i n f o

Article history: Received 22 February 2016 Revised 7 October 2016 Accepted 8 October 2016 Available online xxx Keywords: Case based reasoning Reactive behaviors Behavior learning Robotics Control architecture

a b s t r a c t Some approaches to intelligence state that the brain works as a memory system which stores experiences to reflect the structure of the world in a hierarchical, organized way. Case Based Reasoning (CBR) is well suited to test this view. In this work we propose a CBR based learning methodology to build a set of nested behaviors in a bottom up architecture. To cope with complexity-related CBR scalability problems, we propose a new 2-stage retrieval process. We have tested our framework by training a set of cooperative/competitive reactive behaviors for Aibo robots in a RoboCup environment.

1. Introduction Biologically inspired systems offer an alternative to traditional robot control [1]. These systems: (i) emphasize the role of learning on intelligence development; and (ii) try to find more general knowledge models, that can be adjusted later to any specific application. Learning based models are more dynamical and better fitted for unknown, changing or uncontrolled scenarios when compared to traditional analytical approaches and lineal programming. Besides, learning allows natural acquisition of knowledge and skills, so no exhaustive knowledge of the problem application field is required a priori [2,3]. There are many biomimetic approaches to robotics. Many rely on mimicking biological motion patterns to create robot animals [4] or active prostheses [5]. Others focus on functionality, like biomimetic task modeling [6] and perception [7]. However, most works focus on specific aspects of intelligence rather than searching for an integrated approach. There are some ad hoc approaches to biomimetic control architectures in very specific frameworks [8], but no attempt at generalization has been made. Still, some works have established that both biological and biomimetic systems follow the Hybrid Control paradigm. Franz and Mallot [9] state that complex navigation behaviors are obtained by nesting more and more competence levels in a bottom-up way. They analyze a number of approaches to



Corresponding author. E-mail address: [email protected] (I. Herrero-Reder).

© 2017 Elsevier B.V. All rights reserved.

biomimetic navigation and propose a hierarchy valid both for animals and robots. They also state that animals tend to operate in a bottom-up way. Brooks proposed the behavior based system (BBS) control paradigm in [10], that relies on a collection of concurrently executing behaviors connecting sensors and effectors. Originally, BBS lacked the symbolic representation that would allow them to be used at higher levels. Hierarchical architectures for BBS have been proposed later [11,12]. However, behaviors in these architectures are often designed rather than learnt. The Memory-Prediction Framework (MP-F) [13] explains the brain as a memory system that stores experiences to reflect the true structure of the world. In this framework, intelligence relies on continuous-prediction of states based on received stimuli. Prediction depends on knowledge stored in memory via experience. Learnt responses to acquired patterns can be used to react to any given situation. Besides, reactions to unknown situations can be derived from responses to similar ones (reasoning by analogy). The M-PF deals with complexity by establishing a nested bottom-up hierarchy of memories presenting the same structure and operation principles. It also seeks to separate structural and operational aspects of intelligence from its biological implementation in human beings. This feature makes this theory attractive as a starting point for the design of robotic intelligent behavior. The M-PF has already been implemented in some Artificial Intelligence (AI) systems. Most of them are expert systems in fields such as vision pattern recognition [14] or Big Data analysis [15]. There are only a few implementations in the field of robotics [16– 18] and they mostly focus on environment perception.

http://dx.doi.org/10.1016/j.neucom.2016.10.075 0925-2312/© 2017 Elsevier B.V. All rights reserved.

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

We propose a behavior learning methodology for intelligent robot control. In previous works, we presented a learning methodology for isolated behaviors [19–21]. The generality of our approach allowed us to use the same software to work with a 4-wheeled Pioneer AT, a 4 legged Aibo robot and an assistive wheelchair, respectively. Although we had to change the case instance to reflect the robot sensor and motor configuration, we only had to retrain each robot using a joystick to learn the proper behavior: no further problem analysis was needed. In this work, behaviors are designed to be layered into a bottom-up control architecture that follows the guidelines of the Memory-Prediction Framework. This approach: (i) provides better scalability; (ii) allows integration of higher level deliberative layers; and (iii) follows a biomimetical model. This work specifically targets reactive behaviors. Although it is less frequent to use learning at reactive level, this choice has several advantages, like improved adaptability to hardware specifics and systematic errors. Besides, the proposed methodology allows every behavior to follow the design rules proposed in [13]. Behavior learning could rely on Artificial Neural Networks (ANN) or Fuzzy Learning (e.g. [22–25]). Fuzzy-logic based methods define a set of rules to map sensor readings and motor actions and adjust rule parameters during behavior execution. However, it may be hard to define analytical rules, even for experts on both fuzzy-logic and the problem domain, because some low level behaviors are intuitive and ill defined. ANNs solve this problem: once the neuron and layer structure has been chosen, behaviors are acquired via training. However, ANNs do not explicitly represent what has been learnt and it may be difficult to force specific cases into these structures [13]. We have chosen Case Based Reasoning (CBR) to acquire knowledge in the environment because its underlying principles are very similar to those of the MemoryPrediction Framework, as explained in next section. Besides, CBR allows us to evaluate acquired knowledge, which could very useful to debug errors and malfunctioning of the system and to better understand the learning process. CBR has been used in mobile robot control before, but it has focused mostly on deliberative control. In these cases, learning may be achieved by observing humans or other robots. For example, in [26] a robot learnt to solve a task by following human orders which were stored into a CBR data base; the performance of the robot was subjectively scored by the same human trainers. In [27], a team of Robosoccer players learnt to play by observation from a playing expert team; knowledge was stored in a CBR database. It is harder to find CBR approaches to acquire reactive behaviors. In these cases, CBR modules are applied to specific situations and/or as complementary part of a deliberative-oriented architecture. In [28] a reactive CBR module is introduced in a multirobot environment for fast obstacle avoidance. In [29] and [30] CBR low level modules are used to solve path planning decisions both in a robotic and a virtual game environment. In all these cases reactive and deliberative CBR modules appear as isolated elements of heterogeneous architectures with different structures and features. The RUPART robot control architecture [31] includes both deliberative and reactive CBR modules, but they are not integrated into a common framework. Instead, they simply coexist and a different mechanism is used to determine which of them are in use. In our work, higher level CBR modules feed from lower level ones and determine how to combine them. This is consistent with the M-PF. In order to test our theoretical model, we have implemented a demonstrator using Aibo ERS-7 robots in a RoboCup environment. We have chosen legged robots equipped with vision because learning is often not necessary for simpler robots and sensors. Section 3 shows how the model proposed in Section 2 is implemented in this particular scenario. However, our model could

be extended to other robots and environments by changing case structures and re-training the robots, following the steps presented in this section. Section 4 presents an example of behavior interaction in the test scenario. As commented, this work focuses on reactive modules. Hence, robot perceptions are limited to their local environment at punctual time instants (e.g. goals, other players, ball, beacons...). The complexity of emergent behaviors is consequently limited. To achieve smarter behaviors, we would need to design and build hybrid modules on top of reactive ones and give them feedback following the guidelines presented in Section 2, as proposed in the Memory-Prediction Framework. Future work and conclusions are provided in Section 5. 2. A learning based reactive control architecture 2.1. A Memory-prediction framework inspired structure The memory-prediction theory focuses on the functioning of the human neocortex. The model includes a hierarchical structure where all levels perform the same basic operations. Levels in the hierarchy are connected by multiple feedforward and feedback connections. The lowest levels of the hierarchy are fed with spatial and temporal patterns from the sensors. The structure of stored invariant patterns is used to make predictions and choose proper actions. Hierarchical hybrid architectures are not new in robotics. However, they rarely preserve a similar structure for the nodes at every level and, in most cases, learning is only employed at higher levels. Fig. 1 presents our implementation of a hybrid robotic control architecture that follows the main guidelines of the MemoryPrediction Framework. Every layer and all modules inside each layer are homogeneous in terms of structure and functionality. They differ only in the information they manage. Low level modules are fed with local, punctual information. They provide simple, instant response to events. Activation sequences of reactive modules sets provide higher level concepts. Higher modules have a wider spatiotemporal horizon. They receive conceptual information from the low level modules and, in return, they modulate the responses and the activation and deactivation cycles of those modules. 2.2. CBR based behavior design In previous works [20,32] we proposed a methodology for CBR based behavior learning. That methodology allowed us to design single behaviors like obstacle avoidance or wall following for a robot. In this work we have adapted these CBR based behaviors to interact with each other within the hierarchical framework presented in Section 2.1. Fig. 2 shows the proposed structure for a framework module. Every module stores a knowledge base to match a problem instance with a concept or belief. A module may: (i) execute some action; (ii) modulate the response of lower level modules; and (iii) contribute to generate higher-level longer-term concepts for higher level modules. Temporal sequences of triggered modules may be used to identify a concept and to predict responses and detect anomalies. This prediction is propagated downwards to modify the combination of lower level behaviors. We can illustrate this approach with a simple example. Let’s assume that a soccer playing robot has three purely reactive behaviors: avoid obstacles, keep ball and move towards enemy goal. If (i) the enemy goal is in sight, (ii) the robot has the ball; and (iii) there are other robots around, all three behaviors would return a response according to their own knowledge base. Then, those responses would be combined into an emergent action. However,

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

3

Fig. 1. General experience-based learning framework.

Fig. 2. Structure and operation of a Module.

simultaneous activation of the three modules in the proposed system leads to a higher level concept: dribbling. If predicted success probability is low, the corresponding higher level dribbling module could propose deactivation of the keep ball module to activate a pass ball to mate behavior instead. This requires longer term information, e.g. the position of other robots in the field: opponents, to evaluate its chances of success, and partners, to determine where to pass the ball. The main advantages of implementing all these modules with CBR are that (i) behaviors can be learnt on the go rather than planned beforehand; (ii) they can adapt to ongoing circumstances; and (iii) new modules can be added on a need basis. Learning has been often used to develop high level modules in robotics [33,34], whereas reactive behaviors are usually programmed at earlier stages of the architecture development in a fixed, analytical way. This approach has traditionally been adopted because reactive behaviors are simple and do not require models of the environment. However, the resulting behaviors present poor adaptability to unexpected situations. Learning improves adaptation at reactive level. Even a simple purely reactive obstacle avoidance behavior can benefit from learning to e.g. unsupervisedly adapt to the robot kinematic and dynamic properties and to absorb systematic sensor errors. For example, a limping robot could

learn how much it has to steer to correct its trajectory and follow a straight line. Hence, we need less a priori knowledge on the robot and environment, plus the same modules can be adapted to different robots and scenarios, as proposed in [21]. As commented, learning may rely on Artificial Neural Networks (ANNs), Genetic Algorithms (GAs), Reinforced Learning (RL), Case Based Reasoning (CBR), etc. However, CBR is most likely the closest fit to the Memory-Prediction Framework. CBR is a reasoning, learning and adaptation technique to solve current problems by retrieving and adapting past experiences [35]. Problems and their solution are stored as cases in a casebase for later use. Better solutions can be derived by adapting old ones. Similarly, new situations can be solved by recalling the most similar stored one and adapting it, if necessary. Given a problem instance, a CBR cycle consists of four steps: (i) retrieve the most similar stored case; (ii) adapt the solution if necessary; (iii) evaluate the results of the applied solution; (iv) learn from the new experience. Design decisions include: (i) how to describe the problem; (ii) which is the best casebase structure; (iii) how retrieval works and which similarity metrics will be used; (iv) how to adapt solutions; (v) how to evaluate success; and (vi) what, when and how to learn from solved problems. CBR has additional advantages in our framework: (i) no a priori set of rules is required; (ii) no major off-line training is needed; (iii) specific cases can be added on a need basis; and, most important, (iv) the casebase can be analyzed to evaluate what the system is learning and why. Analysis of the casebase may lead to understanding what kind of concept each module is acquiring through experience. 3. Implementation CBR has been widely used in many experience learning frameworks and also in robotics [36,37]. However, researchers have focused on deliberative layers. In [20], we proposed a reactive CBR based reactive navigation algorithm for legged robots. We guided the robot using a joystick and coupled visual data (perception) and motion commands (action). Our casebase absorbed all sensor and motor errors, because human operators tend to compensate these errors intuitively. The main drawback of the approach in [20] was a limited scalability. Consequently, learnt behaviors needed to be very simple and case instances were coarsely discretized. In order to cope with scalability issues, we have divided the CBR recovering stage into two sub-stages: a fast scene recognition

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

Fig. 3. CBR cycle.

stage based on object indexing and the traditional CBR similarity based search stage. 3.1. Case definition CBR Case definition requires: (i) an input instance, including necessary knowledge for behavior operation; (ii) a solution that defines the response to the input situation (either a concept or a motor action); and (iii) some measure of efficiency which is also related with the performance of the behavior for the fulfillment of its goal. Robotic behavior definition is a grounded problem, so it is hard to provide general solutions. In this sense, in order to define a set of behaviors, we need to know which problem we are trying to solve. Furthermore, each behavior may require a different input instance, provide a different output and be measured by different efficiency metrics. Behavior definition is usually made by an expert in the domain. The main advantage of our approach with respect to other expert-based systems is that we translate qualitative, intuitive knowledge into behaviors via learning. The expert only needs to decide which behavior to implement (e.g. move towards the goal, avoid a robot, etc.), which features and outputs are relevant to achieve that goal and how to measure efficiency. Then, they just need to guide the robot using their preferred input interface to accomplish the target behavior. Neither explicit analytical rules nor constraint lists are required to create new modules on a need basis. Supervisors do not need to understand the reasons behind their decisions: the robot learns what humans would do in its place. In order to show how cases would be defined, we are presenting a demonstrator using Aibo ERS 7 legged robots in a RoboCup environment (Fig. 3). In this environment, the robot main source of information is vision and the number of potential objects within the field of view is restricted to ball, other robots, beacons, goals and field lines. The field of view of these Aibos is reduced, so only a limited number of these elements can be simultaneously present in the scene. The output of any behavior is limited to the allowed set of movements for an Aibo: forwards/backwards, right/left steering and strife, plus head motion. Additionally, predesigned motion sequences, e.g. kick the ball, can be added to the outputs for simplicity. Although the domain is limited, the combination of legged agents, vision and multirobot systems provide an adequate level of complexity to demonstrate the potential of our approach. In the chosen scenario, the input information is collected from the camera in the robot head. Given the camera field of view, a scene may contain a pink ball, a goal; and no more than 3 opponent/friend robots. All these elements might be observed only partially. The precise number of elements in a given case instance depends on the behavior definition and goals, i.e. a “intercept opponent with ball” would not need the “goal” information. In our system, we cannot use a whole image frame as input instance, because the casebase dimension would be proportional

Fig. 4. Case definition: Input instance elements and output response.

to pixel resolution. Visual raw data volume could be reduced using Principal Component Analysis (PCA), histograms, or Color Moments. Still, behavior performance could be negatively affected by non-relevant data, because variations in that data would increase the number of cases. Also, that approach would not be coherent with the Memory-Prediction Framework, where input tends to be related to functionality. Qualitative techniques are a better option for active vision applications [38]. In these approaches, the problem instance only includes relevant features for the specific problem domain. In our environment, we work with descriptors of objects in the scene. In a Robosoccer environment, these objects present different, flat colors, so they can be easily segmented. Their relative position with respect to the robot is represented by their centroid and area, and (boolean) clipping (i.e. whether the object is completely within the field of view or not). All features are normalized with respect to their maximum expected values. Our case instance also includes the normalized PAN and NOD angles of the Aibo head, because the relative positions of objects in the frame depend on them (see Fig. 4). This approach also allows us to work with qualitative features, meaning that any behavior could be reused in other robot as long as the same set of features is extracted from its sensors. Adaptation to the new robot structure would eventually be achieved via learning. Input features in the case instance can be discretized and indexed. This process decreases the number of stored cases and also turns numeric data into conceptual information. Discretization intervals need to be set by an expert in the problem domain, according to their conceptual interpretation. However, no precision is needed, e.g. a person may decide whether a ball is far or close to an Aibo depending on the dimensions of the field. These descriptions allow us to understand the reasoning process of the robot by analyzing the acquired casebase. They also allow development of more complex concepts to describe the input instance with respect to the behavior goal, e.g., ”minimum area” and ”high Y-centroid” represent a ”far object”. Discretization depends on goals and behavior design. In future implementations it would be desirable to unsupervisedly arrange intervals via statistical analysis of input data obtained from learning. The simplest case output for an Aibo robot behavior is a single motion vector, expressed by its rotation, translation and strafe

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

5

components, which define the robot movement until a new CBR consultation arrives. Additionally, a behavior identifier is propagated upwards and weighting factors may be propagated downwards to activate/deactivate or modulate lower level behaviors if necessary. Finally, the efficiency function of each behavior needs to be defined depending on its goal. Again, this function is usually defined by human experts. In our test scenario, we have added navigation efficiency factors to every behavior, mostly related to analytical navigation functions properties, like smoothness or safety. This decision allows us to favor cases that produce better motion commands for the robot when possible. 3.2. Similarity and retrieval In our previous works on CBR learning for robots [20] we found out that higher complexity led to scalability problems. Hence, our modules were forced to deal with simplified, coarsely discretized input instances in order to keep bounded response times. Retrieval time at a flat casebase is O(cn), where c is the number of components of a case, and n the number of cases at the casebase. In this work we propose a new 2-stage retrieval procedure to cope with this issue. Our method simply uses a first stage to reduce the set of candidates for the second stage, thus improving the final retrieval time. To achieve this, cases are indexed with a binary vector indicating whether any element relevant for the behavior is present or not in each case. When a new situation arrives, binary indexes are checked first to compare it only to the best candidates in the casebase, i.e. the ones presenting the most input elements in common. We do not define a search tree, so no cases are discarded a priori. We simply search among the best potential matches. Improvement is proportional to the amount of cases in the subset after stage one. This process is usually advisable when the casebase size is large or when fine instance feature discretization is necessary. Each retrieval stage may use a different similarity measure. Traditional similarity functions quantify the similarity of two cases by computing the distance between each of its components in a multidimensional space. Different component distances can be weighted depending on their influence on case representation. Weighted distances are then aggregated using different functions to obtain a final measure. Selection of a distance function is problem dependent. Euclidean distance is usually adequate when stimulus dimensions are perceptually integral (such as brightness and saturation of a color), whereas city-block distances are appropriate when stimulus dimensions are perceptually separable (such as color and shape of an object) [39]. In our proposal, dimensions are perceptually separable, so we use a Manhattan normalized distance. However, a previous prefiltering stage (retrieval by objects in scene) is used to reduce potential candidates for recovery. In this stage we use a feature based distance by building a binary index to indicate whether potential objects in the scene are present (1) or not (0) (Eq. (1)). Every case in our casebase has an associated index string to state which objects exist in its input instance. When a situation is presented, its corresponding index string is generated and compared to the cases’ strings using a Jaccard distance.

M11 J ( p, ci ) = M01 + M10 + M11

(1)

being p, the situation (input instance) to solve; ci , a case of the CBR base, M11 objects present both in p and ci ; M10 objects present in p, but not in ci ; and M10 , objects present in ci , but not in p. A list with the most similar strings is generated. Then, the second Manhattan based retrieving stage operates with the resulting cases subset. This process improves the speed of the overall retrieving

Fig. 5. Two stage retrieval operation at the CBR casebase.

operation further, making it fit even for the lowest reactive behaviors (Fig. 5). 3.3. Learning and case efficiency Our system uses both supervised and unsupervised learning (Fig. 6). First, a learning by observation stage is used to seed the casebase with an initial set of cases, as we proposed in [20]. In this stage, a person guides the robot using a conventional joystick to achieve the desired behavior from their own point of view. Humans intuitively cope with kinematics and dynamics related to the robot, so the casebase implicitly absorbs this knowledge. The intrinsics of the behavior are also automatically incorporated to the robot knowledge without fully understanding their rules and parameters. For example, a supervisor guiding a limping legged robot would naturally compensate any associated route deviation to achieve a straight trajectory, so the robot would learn that it has to steer slightly in a given direction to reach a goal in front of it. In this sense, it is not necessary to analytically describe the behavior, just to determine which parameters define it. After training, the robot keeps learning by own experience, in autonomous mode without external supervision. Given an input instance, the most similar stored case is retrieved from the casebase. If the retrieved case is not similar enough to the instance, i.e. the robot is facing a new situation, its output needs to be adapted. Adaptation consists of combining a Potential Field-generated vecPF [40], and the retrieved case output, V C BRC ase (Eq. (2)). Both tor, V vectors are weighted in terms of the similarity (see Section 3.2) between the retrieved case and the input situation (wPF and 1 − wPF ). Hence, the degree of adaptation of the retrieved solution depends on how different the problem at hand is from a previously learnt one.

adapt = V PF ∗ wPF + V C BRC ase ∗ (1 − wPF ) V

(2)

Adapted cases in our system are evaluated in terms of efficiency to decide whether they should be added to the casebase or not. We use an utility function, η, similar to the one we proposed in [32] for shared control navigation (Eq. (3)).

η=

Fsm + Fs f + Fdir 3

(3)

η relies on classic properties of Navigation Functions, namely smoothness (Fsm ), directness (Fdir ) and safety (Fsf ). Smoothness penalizes sharp direction changes. Directness favors direct approximation to targets. Safety penalizes trajectories that bring the robot closer to obstacles. The original η has been adapted to an Aibo robot, that relies on visual information rather than on range sensors which provided information for the robot used in [32]

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM 6

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

Fig. 6. Proposed learning stages in our framework.

(Eq. (4)):

η=

1 −2.5·|dir | ·e 3 1 + · (1 − e−0.69·(1−sizeobst )·| θrob−obst | ) 3 target 1 −0.05· sizetarget1·−size (1−| θrob−target | ) + ·e 3

(4)

dir being the angle between the output motion vector and the previous heading; sizeobst and θrob−obst being the area and angle, projected on the field of view, of the nearer obstacle in the robot trajectory; and sizetarget and θrob−target , the area and angle of the target with respect to the robot. If η is larger than an heuristically fixed threshold, the adapted case is added to the casebase for future use. In situations where adapted cases do not reach enough efficiency to be added to the casebase, i.e. the observed emergent behavior is not adequate, further supervised learning can applied at any time by using the joystick to guide the robot through that specific situation. Finally, we can also manually add or remove cases to the casebase to cope with punctual situations, like local traps and instance errors. For example, any case with a null motion output should be removed from a fully reactive behavior because the robot would be stuck until the input instance changes. These situations are externally detected because the robot starts to oscillate or stops in specific positions during operation. Since our cases include mostly qualitative information, it is usually simple to detect the problem and fix it manually. 4. Experiments and results In order to test our method, we will design a ball seeking behavior for RoboCup. Unlike in our previous work [20], in this case there are other players in the field. Also, other players can be mates or rivals. Hence, the proposed behavior has different outputs depending on the relative positions of the other robots, e.g. to circle an obstacle, to block a rival, to correct our heading to head into open field space when the ball is finally captured, etc. Consequently: (i) the number of maneuvers grows; (ii) scalability becomes a concern; (iii) learning by observation is not enough, as it is hard to predict all potential situations; and (iv) analytical implementation of the behavior becomes harder because there are several sources of uncertainty. Nevertheless, this behavior can still be considered a reactive one. Hence, this example will illustrate the importance of CBR at low architecture levels. 4.1. Case structure Our input instance includes the ball and player elements. The Aibo field of view allows us a maximum of 3 robots for accept-

Fig. 7. Discretization classes for the PAN component.

able occlusions. Goal elements are not included, because they are not relevant for a ball seeking reactive behavior. The instance includes the PAN and NOD angles of the robot head, as explained in Section 3.1. Rather than including head motions in the case output, we have simply developed a tracking function to automatically focus the on-board camera on objects which are significant to the behavior (active vision). Hence, our case output simply includes the robot motion vector. Intervals of discretization were chosen after analyzing data captured during training. Initially we had considered eight equally spaced intervals for each case input component. However, case distribution during testing proved that most cases included intervals in front of the robot, so we heuristically changed to the discretization intervals in Fig. 7 to achieve a better case distribution. The object area was also heuristically discretized into eight classes, where the interval size grew in a quadratic form, because area is defined over two dimensions (Fig. 8). After discretization, we reduced our casebase from 25,0 0 0 to barely 30 0 0 cases. Categorized cases with the same input and different outputs were grouped by a majority vote to share the most frequent output. 4.2. Learning Our learning process started with a training stage (learning by observation), where people manually guided the robot using a joystick through several trajectories to cope with the different elements in the field. Trainers did not use feedback from the robot camera: they relied on their own vision and intuition to choose the trajectories they preferred. During this stage, 4 cases per second were acquired, which is fast enough considering the speed of the robot. Fig. 9 shows a set of different situations that we used to train this behavior. Trained trajectories followed no analytical rule. They were simply paths that a person used to reach the ball from the different locations in Fig. 9 while avoiding obstacles in the way. After training, the robot kept learning by own experience to cope with situations not included in its casebase. The robot performance in autonomous mode in our tests was excellent for all

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

ARTICLE IN PRESS

JID: NEUCOM

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

7

Fig. 8. Discretization classes for the Area component.

4.3. Scalability

Fig. 9. Set of trained situations.

trained situations, and also for untrained situations that presented some similarity with trained ones. Fig. 10(a) shows the response to a trained situation, where the emergent path is almost equal to the trained one. In this test, the distance between input situations and the corresponding retrieved cases was lower than 0.01 in a [0, 1] scale for 82% of the retrieved cases. It needs to be reminded that we did not train full trajectories, but punctual responses to specific local situations, so two paths are similar whenever the sequence of events presented to the robot are similar as well. We can also observe that, rather than moving ahead towards the ball, the robot was trained to approximate the ball from the right to protect it from its rival (black robot) and head into free space. The situation in Fig. 10(b) was not trained, but, at local level, it could be solved as a combination of the set of trained scenarios (Fig. 9). In this case, the distance between input situations and the corresponding retrieved cases was lower than 0.01 for 60% of retrieved cases and lower than 0.03 for 87% of retrieved cases. Hence, the robot adopts a similar solution (to progressively adapt its heading so that it faces open space when it reaches the ball). When we placed the robot in situations very different from trained ones, sometimes its initial performance was poor. Results improved via learn-by-own-experience, as expected. Fig. 11 shows one of these cases. In this situation, the ball is behind a rival. We had purposefully avoided training situations like this one to test adaptation later, so the closest case in the CBR was to try to reach the ball in a straight way. This initial trajectory clearly led to collisions. However, after acquiring new knowledge using the proposed adaptation function, the robot learnt to avoid this rival on its own. From that point on, it could reuse the acquired solutions to solve the situation in Fig. 10(a).

In previous works focused on designing isolated, purely reactive behaviors [20,41] scalability was not a pressing issue, since the number of case components was limited. In these works, casebases typically had a few hundred cases. The hierarchical architecture needs to include hybrid and deliberative levels as well, plus multirobot behaviors are more complex, so scalability becomes a major concern. In some tests in this work our casebases grew up to thousands of cases. Discretization is a typical approach to cope with scalability issues in CBR. Discretization is reported to improve the retrieval process. Besides, it provides qualitative meaning to raw data and it allows to apply different distance criteria for continuous attributes[42]. However, if discretization is too coarse, meaningful information could be missed during the learning process. On the other hand, if discretization is too fine, there may still be a too large number of cases in the casebase. To cope with this issue, we have implemented a 2-stage retrieval method. In order to test our 2-stage retrieval method, we have progressively increased the number of cases in a CBR module by adding more meaningful objects to the scenario. Even after discretization, the number of cases grows in a non-linear way with respect to the number of objects and their relative positions in the robot field of view. For example, if we only train situations with the ball and one rival, a coarse discretization returns approximately 500 cases. A second rival in the field would raise the number of cases over 1600. However, if we include one rival and one mate in the scenario, the number of cases grows over 3500, even though there are still just three elements, because there are more potential actions than when both robots were rivals. Fig. 12 shows the number of cases acquired when the number of elements in the scene is increased both for coarse and fine discretization, plus the evolution of retrieval times with respect to the volume of the casebase, both for conventional (1 stage) and proposed (2-stages) retrieval. It can be observed that retrieval time grows with the number of cases for both retrieval methods. However, the slope is much softer for the proposed 2-stage retrieval method. Discretization has an obvious impact on the casebase size and, consequently, on retrieval time. Fig. 12(a) and (b) show the same test, but in the second case we duplicated the discretization levels. As expected, fine discretization makes the number of cases in the casebase grow even further. In the most complex trained situation the response time goes up to 0.07 seconds. Still, we can observe that the evolution of response times when we use 2-stage retrieval is similar than for coarse discretization: response times in this case are under 0.03 seconds, lower than if we used 1-stage retrieval in the coarse discretization test. 5. Conclusions and future work In this work we have presented an approach to design modular behaviors for a robotic architecture inspired on the Memory

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM 8

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

Fig. 10. Performance example of designed behavior for both trained and untrained situations.

Prediction Framework. In our approach, all modules are homogeneously designed and we use learning at every level of the architecture. Robots can learn by demonstration (supervised mode) and also from their own experience (on the fly). CBR is conceptually similar to the Memory Prediction Framework, so all our behaviors are acquired via CBR based learning. When scenarios become more complex and dynamic, the number of relevant elements for a behavior grows and casebases may present scalability problems. The key idea of our proposal is to cope with more complex concepts at higher level by combining and modulating the responses of lower modules. Low level modules are simpler and cope with limited information. Hence, the number of cases in these modules remains constrained. Higher level modules are fed with the outputs of lower level modules, which are more conceptual, and also limited to the number of modules in the immediate lower level. Still, even in lower level modules, information in the casebase may lose integrity due to duplicated instances and retrieval oscillations. These cases can be detected and manually removed if necessary. Similarly, situations where the robot performs poorly can be supervisedly trained at any time. Fig. 11. Behavior improvement by learn-by-own-experience.

Fig. 12. Comparison of retrieval times using a two stage retrieval method.

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

JID: NEUCOM

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

Retrieval time increases significantly with the number of cases stored at the casebase. This problem is particularly important at reactive level. To reduce the number of potential cases in the casebases, we have proposed a qualitative representation of case components based on an indexation/discretization process. It allows us to describe a behavior using abstract concepts instead of raw data. Qualitative information is also useful for later interpretation of acquired knowledge. Besides, it also allows us to progressively build more complex concepts by aggregating simpler ones, as proposed by modern theories on human intelligence structure and organization. Functionally, we have created a new mixed hierarchical-flat casebase structure and a 2-stage case retrieval method to disambiguate knowledge and to reduce response times. The proposed approach has been tested using a vision based legged robot (Sony Aibo ERS-7) operating in a RoboCup environment. We have trained a reactive ball seeking behavior in situations where other robots -both mates and rivals- may appear within the field of view. The number of robots involved in the experiment allows us to check how the system fares in terms of scalability. Our experiments proved that: (i) the robot can acquire complex behaviors under supervised training and later unsupervisedly improve its knowledge; (ii) learning at reactive level allows us to acquire knowledge that might be hard to model analytically, but easy to demonstrate in a practical situation; (iii) supervised training at low level implicitly acquires knowledge on robot kinematics and dynamics, plus on systematic actuator and sensor errors; (iv) the proposed casebase retrieval process is suitable to cope with scalability problems. Future work will focus on nesting different behaviors to obtain a higher level emergent complex one. We are interested in studying not only the full functionality of the system, but also the emergency of progressively more complex concepts and how information flows upwards and downwards in the resulting architecture. We believe that analysis of casebases at higher levels after training will allow us to understand what kind of concepts humans intuitively produce to cope with complex situations. At functional level, we plan to improve the case adaptation algorithms for fast stabilization of a behavior set. We also want to adjust system parameters automatically depending on acquired knowledge, instead of using a human expert to do so. Finally, we plan to add a casebase maintenance algorithm to automatically remove unnecessary knowledge and keep an adequate number of cases. Acknowledgments This work has been partially supported by the Spanish Ministerio de Educación y Ciencia (MEC), Project n. TEC2014-56256 and by the Junta de Andalucía project No. TIC-7839. References [1] H. de Garis, C. Shuo, B. Goertzel, L. Ruiting, A world survey of artificial brain projects, Part I: Large-scale brain simulations, Neurocomputing 74 (1–3) (2010) 3–29. [2] B.M. Lake, R. Salakhutdinov, J.B. Tanenbaum, Human-level concept learning through probabilistic program induction, Science 350 (6266) (2015) 1332–1338. [3] I. Arel, D.C. Rose, T.P. Karnowski, Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier], IEEE Comput. Intel. Mag. 5 (4) (2010) 13–18, doi:10.1109/MCI.2010.938364. [4] R. Ding, J. Yu, Q. Yang, M. Tan, J. Zhang, CPG-based behavior design and implementation for a biomimetic amphibious robot, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2011, pp. 209–214. [5] H. Herr, A. Wilkenfeld, User–adaptive control of a magnetorheological prosthetic knee, Ind. Robot: Int. J. 30 (1) (2003) 42–55. [6] C.G. Atkeson, S. Schaal, Memory-based neural networks for robot learning, Neurocomputing 9 (3) (1995) 243–269.

[7] N.F. Lepora, J.C. Sullivan, B. Mitchinson, M. Pearson, K. Gurney, T.J. Prescott, Brain-inspired Bayesian perception for biomimetic robot touch, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2012, pp. 5111–5116. [8] N.G. Bizdoaca, H. Hamdan, D. Coman, M. Hamdan, K. Al Mutib, Biomimetic control architecture for robotic cooperative tasks, Artif. Life Robot. 15 (4) (2010) 403–407. [9] M.O. Franz, H.A. Mallot, Biomimetic robot navigation, Robotics Auton. Syst. 30 (1) (20 0 0) 133–153. [10] R. Brooks, Intelligence without representation, Artif. Intel. 47 (1991) 139–159. [11] M. Mataric, Integration of representation into goal-driven behavior-based robots, IEEE Trans. Robot. Autom. 8 (3) (1992) 304–312, doi:10.1109/70.143349. ´ A hierarchical architecture for behavior-based [12] M.N. Nicolescu, M.J. Mataric, robots, in: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, in: AAMAS ’02, ACM, New York, NY, USA, 2002, pp. 227–233, doi:10.1145/544741.544798. [13] J. Hawkins, S. Blakeslee, On Intelligence, McMillan, 2007. [14] C. Farr, Palm founders are back with grok, a neuroscience-inspired big data engine, (http://venturebeat.com/2012/12/18/numenta-grok). Accessed: 2016-0930. [15] F.L. Greitzer, R.E. Hohimer, Modeling human behavior to anticipate insider attacks, J. Strategic Secur. 4 (2) (2011) 25–48. [16] M. Bundzel, Object identification in dynamic images based on the memory-prediction theory of brain function, J. Intell. Learn. Syst. Appl. 2 (4) (2010) 212–220. [17] X. Mai, X. Zhang, Y. Jin, Y. Yang, J. Zhang, Simple perception-action strategy based on hierarchical temporal memory, in: Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2013, pp. 1759–1764. [18] M. Otahal, Architecture of autonomous agent based on cortical learning algorithms Master’s thesis, Department of Cybernetics, Czcech Technical University in Prague, Czechoslovakia, 2014. [19] A. Poncela, C. Urdiales, F. Sandoval, A cbr approach to behaviour-based navigation for an autonomous mobile robot, in: Proceedings of the 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 3682–3686. [20] J.M. Peula, C. Urdiales, I. Herrero, I. Sánchez-Tato, F. Sandoval, Pure reactive behavior learning using Case Based Reasoning for a vision based 4-legged robot, Robot. Auton. Syst. 57 (6–7) (2009) 688–699. [21] J.M. Peula, C. Urdiales, I. Herrero, M. Fernandez-Carmona, F. Sandoval, Casebased reasoning emulation of persons for wheelchair navigation, Artif. Intell. Med. 56 (2) (2012) 109–121, doi:10.1016/j.artmed.2012.08.007. [22] E. Aguirre, A. González, Fuzzy behaviors for mobile robot navigation: design, coordination and fusion, Int. J. Approx. Reason. 25 (3) (20 0 0) 255–289. http: //dx.doi.org/10.1016/S0888-613X(0 0)0 0 056-6. [23] K. Low, W. Leow, M. Ang Jr, A hybrid mobile robot architecture with integrated planning and control, in: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, in: AAMAS ’02, ACM, New York, NY, USA, 2002, pp. 219–226, doi:10.1145/544741.544797. [24] K. Noda, H. Arie, Y. Suga, T. Ogata, Multimodal integration learning of robot behavior using deep neural networks, Robotics Auton. Syst. 62 (6) (2014) 721– 736. http://dx.doi.org/10.1016/j.robot.2014.03.003. [25] D. Gu, H. Hu, Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots, IEEE Trans. Syst. Man Cybern., Part C (Appl. Rev.) 37 (4) (2007) 670–681, doi:10.1109/TSMCC.2007.897491. [26] S. Chernova, N. DePalma, E. Morant, C. Breazeal, Crowdsourcing human-robot interaction: application from virtual to physical worlds, in: Proceedings of the 2011 RO-MAN, IEEE, 2011, pp. 21–26. [27] B. Rekabdar, B. Shadgar, A. Osareh, Learning teamwork behaviors approach: Learning by observation meets case-based planning, in: A. Ramsay, G. Agre (Eds.), Artificial Intelligence: Methodology, Systems, and Applications: 15th International Conference, AIMSA 2012, Varna, Bulgaria, September 12–15, 2012. Proceedings, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 195–201. [28] Y. Liu, C. Yang, Y. Yang, F. Lin, X. Du, T. Ito, Case learning for cbr-based collision avoidance systems, Appl. Intel. 36 (2) (2012) 308–319. [29] S. Wender, I. Watson, Combining case-based reasoning and reinforcement learning for unit navigation in real-time strategy game ai, in: Proceedings of the International Conference on Case-Based Reasoning, Springer, 2014, pp. 511–525. [30] H. Min, Y. Lin, S. Wang, F. Wu, X. Shen, Path planning of mobile robot by mixing experience with modified artificial potential field method, Adv. Mech. Eng. 7 (12) (2015) 1–17. [31] S.E. Fox, P. Anderson-Sprecher, Robot navigation using integrated retrieval of behaviors and routes., in: Proceedings of the FLAIRS Conference, 2006, pp. 346–351. [32] C. Urdiales, E.J. Perez, J. Vázquez-Salceda, M. Sánchez-Marrè, F. Sandoval, A purely reactive navigation scheme for dynamic environments using Case-Based Reasoning, Auton. Robots 21 (1) (2006) 65–78. [33] M. Wang, J.N.K. Liu, Fuzzy logic-based real-time robot navigation in unknown environment with dead ends, Robot. Auton. Syst. 56 (7) (2008) 625–643. [34] J.C. Murray, H.R. Erwin, S. Wermter, Robotic sound-source localisation architecture using cross-correlation and recurrent neural networks, Neural Netw. 22 (2) (2009) 173–189. [35] A. Aamodt, E. Plaza, Case-based reasoning: foundational issues, methodological variations, and system approaches, AI Commun. 7 (1) (1994) 39–59. [36] M. Kruusmaa, Global navigation in dynamic environments using case-based reasoning, Auton. Robots 14 (1) (2003) 71–91.

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075

9

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;February 13, 2017;21:12]

I. Herrero-Reder et al. / Neurocomputing 000 (2017) 1–10

[37] R. Ros, J.L. Arcos, R. Lopez de Mantaras, M. Veloso, A case-based approach for coordinated action selection in robot soccer, Artif. Intell. 173 (9–10) (2009) 1014–1039. [38] J. Aloimonos, Purposive and qualitative active vision, in: Proceedings of the 10th International Conference on Pattern Recognition, 1990, vol. 1, 1990, pp. 346–360, doi:10.1109/ICPR.1990.118128. [39] W.R. Garner, The processing of information and structure, vol. 20, Psychology Press, 1974, doi:10.2307/469111. [40] O. Khatib, Real-time Obstacle Avoidance for Manipulators and Mobile Robots, in: Proceedings of the 1985 IEEE International Conference on Robotics and Automation, 2, 1986, pp. 90–98. [41] I. Herrero, C. Urdiales, J. Peula, I. Sánchez-Tato, F.Sandoval, A guided learning strategy for vision based navegation of 4-legged robots, AI Commun. 19 (2) (2006) 127–136. [42] H. Nuñez, M. Sánchez-Marrè, U. Cortés, J. Comas, M. Martínez, I. RodríguezRoda, M. Poch, A comparative study on the use of similarity measures in casebased reasoning to improve the classification of environmental system situations, Environ. Model. Softw. 19 (9) (2004) 809–819, doi:10.1016/j.envsoft.2003. 03.003. Ignacio Herrero Reder received the M.Sc. degree in Telecommunication Engineering and Ph.D. degree from the University of Málaga, Spain. In 1996, he earned a ’Training of University Professors’ grant (FPU) from the Spanish Ministerio de Educación y Ciencia. In 1999, he joined the Department of Tecnología Electrónica as an Assistant Professor and, from 2001, as a Lecturer. His research interests include artificial intelligence in autonomous systems, primarily behavior models and learning from experience.

Jose Manuel Peula Palacios was born in Spain and he is nowadays a researcher and developer at the Electronic Technology in the Technical University of Malaga (UMA), Spain. He received the title of Telecommunication Engineering and the Ph.D. degree in Telecommunication at the UMA. His research is focused mainly on robotics, augmented reality, computer vision and artificial intelligence.

Francisco Sandoval received the title of Telecommunication Engineering and Ph.D. degree from the Technical University of Madrid, Spain, in 1972 and 1980, respectively. From 1972 to 1989 he was engaged in teaching and research in the fields of opto-electronics and integrated circuits in the Universidad Politécnica de Madrid (UPM) as an Assistant Professor and a Lecturer successively. In 1990 he joined the University of Málaga as Full Professor in the Department of Tecnología Electrónica. He is currently involved in autonomous systems and foveal vision, application of Artificial Neural Networks to Energy Management Systems, and in BroadBand and Multimedia Communication.

Cristina Urdiales received the M.Sc. degree in Telecommunication Engineering at the Universidad Politécnica de Madrid (UPM), Spain, and two Ph.D. degrees at University of Málaga (1998) and Universidad Politécnica de Catalunya (2009). She is a Lecturer at the Department of Tecnología Electrónica (DTE) of the UMA. Her research is focused on robotics and computer vision.

Please cite this article as: I. Herrero-Reder et al., CBR based reactive behavior learning for the memory-prediction framework, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.10.075