18th IFAC Conference on Technology, Culture and International Stability 18th IFAC Conference on Technology, Culture and International Stability 18th IFAC Conference on Technology, Culture and International Baku, Azerbaidschan, Sept 13-15, 2018 Stability Available online at www.sciencedirect.com Baku, Azerbaidschan, 13-15, 2018 Stability 18th Conference Sept on Technology, Baku,IFAC Azerbaidschan, Sept 13-15, 2018Culture and International Baku, Azerbaidschan, Sept 13-15, 2018 Stability Baku, Azerbaidschan, Sept 13-15, 2018
ScienceDirect
IFAC PapersOnLine 51-30 (2018) 354–359 Interactive mobile robot in a dynamic Interactive mobile robot in a dynamic Interactive mobile robot in a Interactive mobile robot 11in a dynamic dynamic environment 11 environment Interactive mobile robot in a dynamic environment environment ∗ ∗1 ∗ Bakhtawar ∗ Rauf Yagfarov ∗ Alexandr Klimchik ∗ Bakhtawar Rehman Rehmanenvironment ∗ Rauf Yagfarov ∗ Alexandr Klimchik ∗
Rauf Rauf Yagfarov Yagfarov ∗ Alexandr Alexandr Klimchik Klimchik ∗ in Robotics and Mechatronics Components, ∗ Rauf Yagfarov Alexandr Components, Klimchik ∗ in Robotics and Mechatronics in Robotics and Mechatronics Components, Russia, (e-mail: {b.rehman, r.yagfarov, Center for Technologies in Robotics and Mechatronics Components, Innopolis University, Russia, (e-mail: {b.rehman, r.yagfarov, Innopolis University, Russia, (e-mail: {b.rehman, a.klimchik}@innopolis.ru) ∗ Innopolis University, Russia, (e-mail: {b.rehman, r.yagfarov, r.yagfarov, a.klimchik}@innopolis.ru) Center for Technologies in Robotics and Mechatronics Components, a.klimchik}@innopolis.ru) a.klimchik}@innopolis.ru) Innopolis University, Russia, (e-mail: {b.rehman, r.yagfarov, a.klimchik}@innopolis.ru) Abstract: Abstract: In In this this work work we we introduced introduced aa robotic robotic system system to to work work in in aa sociable sociable environment environment Abstract: In this work we introduced a robotic system to work in a environment alongside humans. The robot is able to recognize information through sound and Abstract:humans. In this work introduced system to work in a sociable sociable environment alongside The we robot is able ato robotic recognize information through sound and it it is is alongside humans. The robot is able to recognize information through sound and capable to speak in order to response or provide significant information. We implemented alongsidetohumans. The robot is able or to robotic recognize information through sound and it it is is capable speak inwork order tointroduced response provide significant information. We implemented Abstract: In this we a system to work in a sociable environment capable to speak in order to or provide significant information. We implemented and compared different simultaneous localization and mapping (SLAM) algorithms available capable tohumans. speak inThe order to response response or provide significant We implemented and compared different simultaneous localization and mappinginformation. (SLAM) algorithms available alongside robot is able to recognize information through sound and it is and compared different simultaneous localization and mapping (SLAM) algorithms available in Robot Operating System (ROS). We compared their results showed performance in and compared different simultaneous and mapping (SLAM) available in Robot Operating System (ROS). Welocalization compared their results and and showedalgorithms performance in real real capable to speak in order to response or provide significant information. We implemented in Operating System (ROS). We compared their results showed performance in real time in indoor environment using lidars as core sensing module. Thus we up in Robot Robot System (ROS). compared results and and showed performance real time in an anOperating indoor environment usingWe lidars as the the their core module. Thus we came cameavailable upinwith with and compared different simultaneous localization and sensing mapping (SLAM) algorithms time in an indoor environment using lidars as the core sensing module. Thus we came up a mobile robot autonomously navigating in a dynamic environment capable of interaction and time in anrobot indoor environment usingWe lidars the their core sensing module. Thus came upinwith with ain mobile autonomously navigating in as a dynamic environment capable ofweinteraction and Robot Operating System (ROS). compared results and showed performance real a mobile robot autonomously navigating in aa dynamic environment capable of interaction and manipulation intelligently. a mobile robot autonomously navigating in dynamic environment capable of interaction and manipulation intelligently. time in an indoor environment using lidars as the core sensing module. Thus we came up with manipulation intelligently. manipulation intelligently. a© mobile robot autonomously navigating in a dynamic capable and 2018, IFAC (International Federation of Automatic Control) environment Hosting by Elsevier Ltd.ofAllinteraction rights reserved. Keywords: SLAM, navigation, manipulation intelligently. Keywords: SLAM, navigation, interaction, interaction, interactive interactive control, control, humanoid humanoid Keywords: Keywords: SLAM, SLAM, navigation, navigation, interaction, interaction, interactive interactive control, control, humanoid humanoid 1. INTRODUCTION plemented and assessed Keywords: SLAM, navigation, interaction, interactive control, humanoid 1. INTRODUCTION plemented and assessed three three different different slam slam algorithms algorithms 1. INTRODUCTION plemented and assessed three different slam algorithms available in ROS using a differential drive robot. Mazum1. INTRODUCTION plementedin and three different algorithms available ROSassessed using a differential driveslam robot. Mazumavailable in ROS using a differential drive robot. Mazumdar et al. (Mazumdar et al., 2017) came up with an In the world of modern robotics it is important for robots available in ROS using a differential drive robot. Mazumdar et al. and (Mazumdar etthree al., different 2017) came upalgorithms with an 1. INTRODUCTION assessed slam In the world of modern robotics it is important for robots plemented dar et al. (Mazumdar et al., 2017) came up with an approach to rescue a person by autonomously navigating In the world of modern robotics it is important for robots to interact, navigate and execute desire tasks in an inet al.in etdifferential al., came upnavigating with an approach to(Mazumdar rescue a person by 2017) autonomously In the world navigate of modernand robotics it isdesire important available ROS using a drive robot. Mazumto interact, execute tasks for in robots an in- dar approach to rescue a person by autonomously navigating towards him/her using slam and computer vision in the to interact, navigate and execute desire tasks in an intelligently. The capability of a robot to interact with the approach to rescue a person by autonomously navigating towards him/her using slam and computer vision in the to interact, navigate and execute desire tasks in an indar et al. (Mazumdar et al., 2017) came up with an telligently. The capability of a robot to interact with the In the worldThe of modern robotics it is important for robots towards him/her using slam and computer in framework of ROS. telligently. capability of aa robot to with the human must be user centered. Normally people interact towards him/her using slam by andautonomously computer vision vision in the the framework ofrescue ROS. telligently. The of robot to interact interact with the approach to a person navigating human must becapability user and centered. Normally peoplein interact to interact, navigate execute desire tasks an inframework of human must be user Normally people interact with computers as interact with other fellow of ROS. ROS. human must besimilarly user centered. centered. Normally people interact towards him/her using slam and computer which vision in the The with computers similarly asofthey they interact with other fellow telligently. The capability a robot to interact with the framework The goal goal of of the the paper paper is is to to present present aa system system which enables enables with computers similarly as they interact with other fellow beings. Thus interactions must be designed in a natural with computers similarly as they interact with other fellow The goal of the paper is to present a system which framework of ROS. robot to autonomously navigate in a dynamic envibeings. Thus be interactions must Normally be designed in a interact natural aThe human must user centered. people goal to of the paper is to present a system which enables enables a robot autonomously navigate in a dynamic envibeings. Thus interactions must be designed in a natural way sharing same underlying models behind communicabeings. Thussame interactions must be designed in a natural aa robot to autonomously navigate in aa execute dynamic environment, interact with social beings and desired way sharing underlying models behind communicawith computers similarly as they interact with other fellow robot to autonomously navigate in dynamic environment, interact withis social beings and execute desired way sharing same underlying models behind communicaThe goal of the paper to present a system which enables tion among humans. This must avoid any confusions or way sharing same underlying models behind communicaronment, interact with social beings and execute desired efficiently. We have assessed SLAM algorithms using tion among humans. This must must avoid any confusions or tasks beings. Thus interactions be designed in a natural ronment, interact with social beings and execute desired tasks efficiently. We have assessed SLAM algorithms using tion among humans. This must avoid any confusions or a robot to autonomously navigate in a dynamic envidifficulties a understand or interpret the tion sharing amongfor humans. Thisto avoidbehind any or lidars tasks efficiently. We have assessed SLAM algorithms using in indoor environment. Assessment of algodifficulties for a human human tomust understand or confusions interpret the way same underlying models communicatasks Wewith havesocial assessed SLAM algorithms using lidars efficiently. in an an indoor environment. Assessment of these these algodifficulties for a human to understand or interpret the ronment, interact beings and execute desired system. (Breazeal, 2003). difficulties for a human understand or confusions interpret the in an indoor environment. Assessment of algorithms done by visual representations of end maps system. (Breazeal, 2003). tion among humans. Thistomust avoid any or lidars lidars inis indoor environment. of these these algorithmsefficiently. isan done by of the the end using maps system. (Breazeal, 2003). tasks Wevisual have representations assessedAssessment SLAM algorithms system. (Breazeal, 2003). rithms is done by visual representations of the end maps i.e by comparing the end maps with real representation difficulties for a human to understand or interpret the We are living in a dynamic world i.e. our environment rithms is done by visual representations of the end maps i.e by comparing the end maps with real representation in an indoorthe environment. Assessment of these algoWe are living in a dynamic world i.e. our environment lidars i.e by end maps with real representation of world (factors include readings, We living in aa2003). dynamic world i.e. our environment system. changes in example, people are moving around, i.e by comparing comparing the end mapsnoise, with end-point real representation of the the world (factors include noise, end-point readings, We are are (Breazeal, living environment is done by visual representations of the end maps changes in time. time.inFor Fordynamic example,world peoplei.e. areour moving around, rithms of the world (factors include noise, end-point readings, clarity of the map etc). It is worth noting that we changes in time. For example, people are moving around, a new furniture added to the room or an old one removed. of the world include noise, end-point clarity of the (factors mapthe etc). Itmaps is worth noting thatreadings, we did did changes in time.inadded For example, people areour moving around, i.e by comparing end with real representation aWe new furniture to the room ori.e. an old one removed. are living a dynamic world environment clarity of the map It is worth noting that we not use time to aTo furniture added to the or oldanother one operate and interact human and robots ofprocessing the (factors map etc). etc). Itor worth noting resources thatreadings, we did did notthe useworld processing time or iscomputational computational resources to a new new furniture to with the room room or an an one removed. removed. of include noise, end-point To operate and added interact with human andold another robots clarity changes in time. For example, people are moving around, not use processing time or computational resources to analyze these algorithms, thus assessment is solely based To operate and interact with human and another robots in this environment robot should be intelligent enough to useofthese processing time Itor computational to analyze algorithms, thus assessment is resources solelywe based To operate and added interact with human andoldanother robots clarity the map etc). is worth noting that did in this environment robot should be intelligent enough to not a new furniture to the room or an one removed. analyze these algorithms, thus assessment is solely based on the end maps acquired. Apart from SLAM, we have in this environment robot should be intelligent enough to keep the track of these changes. In the world of robotics analyze these algorithms, thus assessment is solely based on the end maps acquired. Apart from SLAM, we have in this environment robot should be intelligent enough to not use processing time or computational resources to keep the track of these changes. In the world of robotics To operate andofinteract with human and another robots on the maps acquired. Apart from SLAM, we have implemented speech recognition and sound synthesis as keep the track changes. In the world of robotics this problem by simultaneous localization and on the end end maps acquired. Apart SLAM, webased have implemented speech recognition andfrom sound synthesis as a a keep the trackovercomes of these these changes. In the world ofenough robotics analyze these algorithms, thus assessment is solely this problem overcomes byshould simultaneous localization and in this environment robot be intelligent to implemented speech recognition and sound synthesis as a part of interaction module. We also enabled our this overcomes by localization and mapping (SLAM) (Leonard and 1991; speech recognition andfrom sound synthesis asto a partthe of end interaction module. We also enabled our robot robot to this problem problem overcomes by simultaneous simultaneous localization and implemented on maps acquired. Apart SLAM, we have mapping (SLAM) (Leonard andInDurrant-Whyte, Durrant-Whyte, 1991; keep the track of these changes. the world of robotics part of interaction module. We also enabled our robot to execute certain tasks in the physical world in real time. mapping (SLAM) (Leonard and Durrant-Whyte, 1991; Aulinas et al., 2008; Durrant-Whyte and Bailey, 2006). part of interaction module. We also enabled our robot to execute certain tasks in the physical world in real time. mapping (SLAM) (Leonard and Durrant-Whyte, 1991; speech recognition and sound synthesis as a Aulinas et al.,overcomes 2008; Durrant-Whyte andlocalization Bailey, 2006). this problem by simultaneous and implemented execute certain tasks in the physical world in real time. Thus our work on the integration of modules Aulinas etknown al., Durrant-Whyte and Bailey, 2006). SLAM is as a chicken and egg problem: the execute certain tasks in physical world in time. Thus our work focuses focuses onthe theWe integration of three three modules Aulinas al., 2008; 2008; Bailey, As 2006). of interaction module. also enabled ourreal robot to SLAM iset(SLAM) known as(Leonard aDurrant-Whyte chickenand andDurrant-Whyte, eggand problem: As the part mapping 1991; Thus our work focuses on the integration of three modules in system: Human-Robot Interaction, SLAM is as a create chicken and egg problem: As the mobile needs the map and simultaneously Thus ourcertain workrobotic focuses onthe the physical integration of three modules in aa complex complex robotic system: Human-Robot Interaction, SLAM robot isetknown known asto chicken egg problem: As the execute tasks in world in real time. mobile robot needs toaDurrant-Whyte create theand map and simultaneously Aulinas al., 2008; and Bailey, 2006). in aa complex robotic system: Human-Robot Interaction, SLAM and Navigation, Manipulation. mobile needs to it create the map and simultaneously localize itself; where is to have a for complex robotic Human-Robot Interaction, SLAM and Navigation, Manipulation. mobile robot robot needs create theand map and simultaneously Thus our work focuses system: on the integration of three modules localize itself; where it is essential essential to have a map map for in SLAM isitself; known astoa it chicken egg problem: As the SLAM and Navigation, Manipulation. localize where is essential to have a map for localization and it is essential to have pose estimation of SLAM and Navigation, Manipulation. localize itself; where it is essential to have a map for localization and it istoessential to map have and posesimultaneously estimation of in a complex robotic system: Human-Robot Interaction, mobile robot needs create the localization and it is to have pose estimation of the robot in order create the give for localization it to is essential essential tomap. havetoSLAM pose of SLAM and Navigation, Manipulation. the robotitself; inand order to create the map. SLAM give for the the localize where it is essential haveestimation a map for the robot in order to create the map. SLAM give for the robots a capability of navigating intelligently in a dynamic 2. the robot inand order to create the map. SLAM for the robots a capability of navigating intelligently ingive a dynamic localization it is essential to have pose estimation of 2. BACKGROUND BACKGROUND INFORMATION INFORMATION robots a capability of navigating intelligently in a dynamic environment. 2. robots a capability a dynamic environment. the robot in order of to navigating create the intelligently map. SLAMingive for the 2. BACKGROUND BACKGROUND INFORMATION INFORMATION environment. environment. 2.1 Interaction robots aand capability of navigating intelligently instressed a dynamic Mishra Javed (Mishra and Javed, 2018) on 2. BACKGROUND INFORMATION Interaction Mishra and Javed (Mishra and Javed, 2018) stressed on 2.1 2.1 Interaction Mishra and Javed (Mishra and Javed, 2018) stressed on environment. the emergence of human robot interaction and they im2.1 Interaction Mishra and Javed (Mishrarobot and interaction Javed, 2018) stressed on the emergence of human and they imthe emergence of human robot interaction and they implemented a slam using ROS slam is the emergence of algorithm human robot and they and implemented a Javed slam algorithm usinginteraction ROS claiming claiming slam and 2.1 Interaction Mishra and (Mishra and Javed, 2018) stressed on Interaction Interaction is an an essential essential part part for for robots robots operating operating in in plemented aaare slam algorithm using ROS claiming slam and navigation the essential process for a robot. is an essential part for robots operating in aInteraction sociable environment. Thus the interaction between plemented slam using ROS slam and navigation are thealgorithm essential process forclaiming a service service robot. the emergence of human robot interaction and they imInteraction is an essential part for robots operating in a sociable environment. Thus the interaction between navigation the essential process for aaet service robot. Rojas-Fernandez et al. a al., imsociable environment. Thus the interaction between and robots should be designed similar to humannavigation aare are the essential process forclaiming robot. Rojas-Fernandez et al. (Rojas-Fern´ (Rojas-Fern´ andez ndez et service al., 2018) 2018) im- aahumans plemented slam algorithm using ROS slam and sociable environment. Thus the interaction between humans andisrobots should be designed similar to humanInteraction an essential part for robots operating in Rojas-Fernandez et al. (Rojas-Fern´ a ndez et al., 2018) imhumans and robots be similar to humaninteraction. And, normally people interact with 1 Rojas-Fernandez et been al. (Rojas-Fern´ andez et al., 2018) im- human navigation are has the essential process for service robot. This research supported by the a Russian Ministry humans andenvironment. robots should should be designed designed similar to between humanhuman interaction. And, Thus normally people interact with 1 a sociable the interaction This research has been supported by the Russian Ministry human interaction. And, normally people interact with 1 each using sound. it to robot of Education agreement: No14.606.21.0007, ID: This researchand hasetScience, been supported by the et Russian Ministry Rojas-Fernandez al. (Rojas-Fern´ andez al., 2018) imhuman interaction. And,When normally people interact with 1 each other other sound. When it comes comes to enabling enabling robot humans andusing robots should be designed similar to humanof This Education agreement: ID: researchand has Science, been supported by No14.606.21.0007, the Russian Ministry each other using sound. When it comes to enabling robot of processing sound, we can break it down into two major RFMEFI60617X0007, ”Androidagreement: Technics” company and Innopolis of Education and Science, No14.606.21.0007, ID: each other using sound. When it comes to into enabling robot of processing sound, we can break it down two major human interaction. And, normally people interact with RFMEFI60617X0007, ”Android Technics” company and Innopolis of Education and Science, agreement: No14.606.21.0007, ID: 1 This research has ”Android been supported by company the Russian Ministry of processing sound, we can break it down into two parts for speech sound synthesis. University. RFMEFI60617X0007, Technics” and Innopolis of processing sound, werecognition can break itand down into two major major parts for robot: robot: speech recognition and sound synthesis. University. RFMEFI60617X0007, ”Androidagreement: Technics” company and Innopolis each other using sound. When it comes to enabling robot of Education and Science, No14.606.21.0007, ID: parts for speech recognition and sound synthesis. University. parts for robot: robot: speech sound University. of processing sound, werecognition can break itand down intosynthesis. two major RFMEFI60617X0007, ”Android Technics” company and Innopolis Copyright © 2018 IFAC 354 parts for robot: speech recognition and sound synthesis. University. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2018, 2018 IFAC 354Hosting by Elsevier Ltd. All rights reserved. Bakhtawar Bakhtawar Rehman Rehman ∗ ∗ Center for Technologies ∗ Bakhtawar Rehman ∗ ∗ Center for Technologies for Technologies Innopolis University, ∗ Center
Copyright 2018 responsibility IFAC 354Control. Peer review© of International Federation of Automatic Copyright ©under 2018 IFAC 354 10.1016/j.ifacol.2018.11.331 Copyright © 2018 IFAC 354
IFAC TECIS 2018 Baku, Azerbaidschan, Sept 13-15, 2018 Bakhtawar Rehman et al. / IFAC PapersOnLine 51-30 (2018) 354–359
Speech Recognition Speech recognition is the science of converting audio signals into text or machine readable format. This is done using acoustic and language models. Sound Synthesis A robot should be able to record its response or give instructions too. This is done using sound synthesis, which converts electrical signal into sound and make it possible for machines to speak. 2.2 SLAM and Navigation GMapping GMapping is based on the paper by Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard (Grisettiyz et al., 2005). It uses rao-blackwellized particle filter. Gmapping overcome two serious problems of particle filter of computational complexity (as it required a huge number of particles for map correction) and particle depletion (particle removal in resampling phase). Gmapping claim to solve resampling (particle depletion) problem by limiting resampling step, its referred to as adaptive resampling. Adaptive resampling solve the problem of the requirement of high number of particles by incorporating probability distribution, which than help to include present data (sensor or/and odometry) reading in the map resulting in more precise map and reduced error. This approach results in reducing required number of particles for posterior representation. In gmapping, this step is known as selective resampling step. Localization in gmapping is based on monte carlo localization (Dellaert et al., 1999). In gmapping each particle represents the pose of the robot at some particular time. The difference in gmapping is the one proposed by Giorgio Grisetti, where they use more efficient resampling technique, hence they resample the particles only when required. As we know that the data acquired from lidar is more precise or reliable as compared to odometry, gmapping make use of scan matching which take odometry as an initial guess. The result acquired by the scan matching is used to do a gaussian approximation of a proposal distribution to estimate the new pose of the particle. Gmapping uses grid based mapping approach. It is more expensive in terms of computational efficiency and memory efficiency, however; it is more detailed and it can be used without any prior information of the world / landmarks that is not possible in the case of topological maps. Hector SLAM Hector SLAM was proposed by Kohlbrecher, von Stryk, Meyer and Klingauf at Technische Universitt Darmstadt (Kohlbrecher et al., 2013). The motivation behind implementing hector SLAM is 6DOF motion for rough terrains with limiting computational resources, hence enabling motion in roll and pitch dimensions. It dose not require odometry information. Hector SLAM requires computationally low resources and hence it is also suitable for autonomous robots operating with limited computational power. Authors have recommended to use it in small scale environments which dose not required closing of large loops. Like GMapping, hector SLAM also make a good use of grid maps. Hector SLAM improves the performance of grid maps by embedding probability distribution, hence 355
355
each cells occupancy is measured by some probability. It incorporates scan matching, which was motivated by an approach in computer vision. Scan matching is the process where each laser scan is aligned with each other and / or with an existing map. Laser scans (end points) are used to improve the map by comparing them with an existing map. This method make use of Gaussian-Newton to optimize rigid transformations. However local minima convergence is not guaranteed by computing guassian newton. To further improve, Hector SLAM make use of image pyramids concept from computer vision. Hector SLAM stores multiple maps of different resolutions subsequently. But unlike image pyramid concept, it stores all maps resolution in memory and keep them updating frequently. Cartographer Cartographer is a SLAM algorithm introduced by Hess and fellows (Hess et al., 2016). It is open source with ROS integration. While it claim to overcome the problem of both 2d and 3d SLAM in real time, it is more recommended to use cartographer for indoor environments. Cartographer uses an IMU for 3d mapping. The reason for using imu is to measure the gravity; as gravity is considered as z-axis in 3d SLAM. Google cartographer does not use particle filter as it require more resources in terms of grid mapping. It tackles the problem of error accumulation by pose estimation. It performs a loop closure check when each submap is finished scanning, hence if a match is found against the submap it adds it into loop closure constraint. In cartographer scans are iteratively matched with sub map referred to as frames. The conversion of scan points from scan frame into sub map frame is given by (Hess et al., 2016): cos ξθ − sin ξθ ξx p+ (1) Tξ p = sin ξθ cos ξθ ξy Rξ
tξ
In (1 (Hess et al., 2016)), ξ defines the pose and Tξ is the transformation from scan frame into submap frame of scan points. Cartographer represents the submaps in form of probability grid points. Each grid point represents a pixel consisting all the points near that grid. To insert a scan, hits and misses are computed. All nearest grids points are inserted into the hit set. Where as, grid points between scan origin and scan point intersecting the rays are inserted in the miss set. Based on this, all grid points are updated with the probability phit and pmiss given by: p , (2) 1−p Mnew (x) = clamp(odds−1 (odds(Mold (x)).odds(phit )))−2 (3) odds(p) =
In (2, 3 (Hess et al., 2016)), odds represents the probability of a grid point falling in hit or miss. Cartographer scan matching is based on the algorithm Ceres scan matching. It maximizes the probabilities to find more accurate scan pose in the sub map. Octomap OctoMap is a 3D-SLAM algorithm proposed by Armin Hornung and fellows in 2013, released under the New BSD license (Hornung et al., 2013). Octomap possesses the following properties which make it efficient:
IFAC TECIS 2018 356 Baku, Azerbaidschan, Sept 13-15, 2018 Bakhtawar Rehman et al. / IFAC PapersOnLine 51-30 (2018) 354–359
• It is capable of differentiating between occupied, free and unknown space. • It is enough computational efficient to work in real time. • It uses probabilistic approach to cope with different sensor readings (sensor fusion). Octomap is based on voxels; where voxel is a cubic volume occupied in a node. As the name implies, each node of an octree points to its eight children. Usually, each node occupancy is represented by some boolean. Octomap stores endpoint of data reading (e.g laser beam reflecting by hitting some obstacle) which is marked as occupied and the free space is defined as the distance between sensor position and the occupied point. Thus, it stores both free space and occupied space hence differentiating free from unknown. Octomap is also flexible in using sensor fusion. As, different measurements around the world has certain uncertainties, it is very vital to use all readings from different sensors for a better result. For memory efficiency, octomap uses pruning and clamping. If all the children of the nodes have same state (either occupied or free), children can be pruned and result can be saved in the parent node as if required for further traversing of the tree. To traverse the tree, octomap uses a concept known as clamping. In clamping, the children of the nodes can be restored which were pruned previously for memory efficiency in order to traverse the tree. Each voxel in octomap is considered to be occupied if the probability is above the given threshold and each voxel is considered to be free if it is less than the desire threshold. The probability threshold is an adjustable parameter in octomap. In shallow surfaces one laser beam can cancel the effect of other i.e if a surface is measured as occupied by one laser ray can be marked as free by the other one (ray casting). Octomap cope with this problem by following a technique that if a scan is marked as occupied once, it dont let the other rays of the laser to mark it as free during the same scan process. This is given as:
L(n | zt ) =
Fig. 1. Methodology of the work done. images are searched for local extrema and local extrema is considered to be an important keypoint. Once important keypoints are fetched, another check is imposed with keypoints having certain threshold are allowed to live and others are discarded. To make images invariant to rotations, sift assign some orientation to each descriptor based on local image gradient direc- tions and operations are performed accordingly. The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.SURF (Speeded Up Robust Features) is a fast approximation of sift idea. Face Recognition - Viola Jones LBPH Face recognition is an approach of recognizing someone by processing face features in an image. To recognize a face, first step is to detect a face in an image. The idea behind detecting faces is based on haar like features proposed by viola jones. Once the face is detected, we can extract features of faces to differentiate from others. 3. METHODOLOGY AND SYSTEM INFORMATION
locc lf ree
where locc and lfree
if beam is reflected within volume. if beam traversed volume. (4) indicate occupancy and free space.
2.3 Manipulation To complete a task successfully, a robot should be capable to have an idea of its surroundings and somehow it should be able to make some changes accordingly (if required). We have decided refer to this module as manipulation. More explanation to this module can be found in section 3. Object Recognition - SURF Object detection in computer vision is a technique of finding and searching for instances of a particular object in an image. SIFT stands for Scale Invariant Feature Transform. It uses Difference of Gaussian (DoG) (approximation of LoG for computational efficiency) where it subsequently blurs the image for different octaves of the image in Gaussian Pyramid.Than, 356
A user order the robot to bring him a desirable product (in our case: a chocolate). Robot is suppose to interpret this (audio) command precisely and ask for more information (only) if the given command is not clear or incomplete. Once the command is processed, robot navigates to the goal position in order to execute the task. After reaching the goal position, it ask for the product, take it and navigate back to the position order was assigned from. (Preferably) Recognize the user and deliver the product on its return. This process can be seen in figure 1. 3.1 Pocketsphinx Pocketsphinx is a speech recognition library for handheld / mobile devices. It is created by carnegie mellon university and it claims to use over twenty years of cmu research data. Its free and opensource and it copes with speech recognition requiring low computing power and hence real time processing (Huggins-Daines et al. (2006)). It is used to convert sound into meaningful text. Pocketsphinx comes
IFAC TECIS 2018 Baku, Azerbaidschan, Sept 13-15, 2018 Bakhtawar Rehman et al. / IFAC PapersOnLine 51-30 (2018) 354–359
Fig. 2. Robot Platform
357
Fig. 3. Object detection in real time using find object 2d.
with keyword spotting, so the algorithm will look for a particular word. Information about the meaningful word is published on /kwsdata topic. Dictionary needs to be created for those meaningful words (as to how they will sound like). A number of languages with different accents are available in pocketshinx to train the acoustic model. Pocketsphinx gave different results for different persons. We found out that the acoustic model trained and accent used to train it has major impacts on the results achieved. The accuracy was better for one accent but not up to the mark for the people having a different accent. 3.2 SoundPlay Sound Play is a sound synthesizer package in ROS and it is an essential part when it comes to enabling robot to give its response or some instruction. Sound play offers text to audio processing. It can be manipulated using c + + or python. One can subscribe to the topic /robotsound used for playing sound. 3.3 SLAM To assess different SLAM algorithms, we have used the robot shown in figure 2. We have used hokuyo lidar for laser points and Velodyne for pointcloud accumulation. Apart from that odometry and / or imu is also used to improve the results of our work done. Results of these algorithms are further explained in section 4. 3.4 Manipulation Once the robot reaches the goal position to take the product, it should ask for the desire product, recognize it and take it back to the initial position. Reaching the initial position, robot should (preferably) recognize the order giver and deliver him his desire product. We refer to these tasks as manipulation as we are enabling the robot to manipulate (taking and delivering the product) its surrounding in order to execute its task. Object Detection Object detection in computer vision is a technique of finding and searching for instances of a particular object in an image. find object 2d library is used for object detection. It uses sift and surf features for object detection. SIFT stands for Scale Invariant Feature Transform. It uses Difference of Gaussian (DoG) (approximation of LoG for computational efficiency) where it subsequently blurs the image for different octaves of the image in Gaussian Pyramid.Than, images are searched for local extrema 357
and local extrema is con- sidered to be an important keypoint. Once important keypoints are fetched, another check is imposed with keypoints having certain threshold are allowed to live and others are discarded. To make images invariant to rotations, sift assign some orientation to each descriptor based on local image gradient direc- tions and operations are performed accordingly. The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination. SURF (Bay et al., 2006) is a fast approximation of sift idea. Find object 2d offers a QT interface to add and train objects. It subscribes to the /object topic by default and /object is remapped to the camera topic accordingly. Further decisions can be made by creating a new node which subscribes to the /object topic and hence make decisions accordingly. Figure 3 shows detected object using feature extraction. Once detected, we later find the desire product using homography matrix relative to camera in the physical world. For this task, we have used two classes (datasets) in total. There were no drastic lightning changes. Thus the system was able to recognize desire objects accordingly every time. Face Recognition Once the robot returns to the initial position (back to the place order was given from), it should recognize the desire person to avoid giving the product in wrong hands. Face recognition can be done by having some notion of memorizing once face. Face recognition is an approach of recognizing someone by processing face features in an image. To recognize a face, first step is to detect a face in an image. The idea behind detecting faces is based on haar like features proposed by viola jones (Viola and Jones, 2004). Once the face is detected, we can extract features of faces to differentiate from others. We used LBPH (Local Binary Pattern Histogram) has for face recognition (Ahonen et al., 2006). Opencv apps is a ROS package which provides opencv functionalities in ROS. Once the face is recognized, robot greets a person by his name and tell him to take the product. Figure 4 shows detected face using lbph. Overall the results achieved are satisfactory. However, this approach have several limitations. Face recognition is based on face detection (viola jones). This require a person to look straight into the camera (frontal face detection), thus any deviation of eyes from the camera angle resulted in no detection of the face. Resulting accuracy decreases with addition of more facial datasets.
IFAC TECIS 2018 358 Baku, Azerbaidschan, Sept 13-15, 2018 Bakhtawar Rehman et al. / IFAC PapersOnLine 51-30 (2018) 354–359
Fig. 6. Simulation setup for octomap in gazebo. Fig. 4. Face recognition using Local Binary Pattern Histogram in OpenCV.
Fig. 7. Map of environment in RViz.
Fig. 8. OCTOMAP: Map @ Innopolis University
Fig. 5. GMAPPING: Map @ Innopolis University 4. EXPERIMENTAL RESULTS The experiments for our paper are done at Innopolis University. Environment has a different structure including corridors and rooms (with solid walls and glass boundaries) etc. Experiments for all SLAM algorithms are carried out in the same environment. Robot Operating System (ROS) (Quigley et al., 2009) is the main underlying platform for the implementation of our work. ROS is free, open source and widely used among robotics society. Initially system was developed in simulation using gazebo. Later we implemented it on a real robot. Our results are explained below with findings in table 1. 4.1 GMapping Theory behind gmapping is explained in 2.2. We have initially implemented it in simulation using ROS, gazebo and RViz. Later we implemented it using our robot shown in figure 2. For successful implementation of the algorithm, hokuyo lidar (2d) and odometry is used. Practical implementation of the algorithm is shown in figure 5.
Fig. 9. Cartographer: Map @ Innopolis University using the parameter frame id. Map efficiency also depends upon the readings inserting pointcloud data upto certain distance. For simulation purposes 5 meters is the value chosen. Figure 6 shows simulating process in gazebo. Figure 7 represents the map acquired from simulation in rviz. Figure 8 shows physical representation of the world using octomap in rviz. 4.3 Cartographer
4.2 Octomap As mentioned in section 2.2, octomap is 3d mapping technique and a sensor with 3d mapping capabilities is essential to complete the part. For simulation, a 3d lidar has been mounted on the chassis of the robot. The 3d model and working principle of the lidar is based on velodyne. Velodyne publishes pointcloud topic at /velodyne points. Octomap server incorporates this pointcloud message for scan integration and this is achieved simply by remapping cloud in (octomap server) topic to /velodyne points. Apart from sensor readings, odometry has also been incorporated 358
Figure 9 represents the map created using cartographer. It is difficult task to tune cartographer i.e to find the right values to produce an adequate map. We have used hokuyo lidar and odometry for the algorithm to produce a map. 4.4 Hector SLAM Theoretical overview of hector SLAM is given in section 2.2. Figure 10 represents the map acquired from hector SLAM. Hector SLAM is easy to tune. However, it may cause some problems with sharp corners.
IFAC TECIS 2018 Baku, Azerbaidschan, Sept 13-15, 2018 Bakhtawar Rehman et al. / IFAC PapersOnLine 51-30 (2018) 354–359
Fig. 10. Hector SLAM: Map @ Innopolis University
Fig. 11. Ar601: Anthropomorphic robot 5. CONCLUSION In this work we have designed a system for a robot to become functional in a sociable environment. It should be able to communicate and navigate intelligently. For this we have implemented and showed sound recognition and sound synthesis module. We also implemented different SLAM algorithms including gmapping, hector, cartographer and octomap respectively. We compared their results and showed their performance in real time in an indoor environment using lidars as the core sensing module. Besides that, we also introduced a way by enabling our robot to achieve different tasks (manipulation). In future, the developed system will be implemented on the humanoid robot AR-601. AR-601 is shown in the figure 11. It has a built in multi-sensor system, a stereo camera, a rear-view camera, lidar and ir sensors located on wrists and feet, feet pressure sensors, and gyroscopes. REFERENCES Ahonen, T., Hadid, A., and Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE transactions on pattern analysis and machine intelligence, 28(12), 2037–2041. Table 1. Comparison of SLAM algorithms * GMAPPING HECTOR CARTOGRAPHER
Map Best 3rd Best 2nd Best
Tuning Medium Easy Very Difficult
Sharp Turns Good Trajectory Lost Good
359
359
Aulinas, J., Petillot, Y.R., Salvi, J., and Llad´o, X. (2008). The slam problem: a survey. CCIA, 184(1), 363–371. Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf: Speeded up robust features. In European conference on computer vision, 404–417. Springer. Breazeal, C. (2003). Emotion and sociable humanoid robots. International Journal of Human-Computer Studies, 59(1-2), 119–155. Dellaert, F., Fox, D., Burgard, W., and Thrun, S. (1999). Monte carlo localization for mobile robots. In Robotics and Automation, 1999., volume 2, 1322–1328. IEEE. Durrant-Whyte, H. and Bailey, T. (2006). Simultaneous localization and mapping: part i. IEEE robotics & automation magazine, 13(2), 99–110. Grisettiyz, G., Stachniss, C., and Burgard, W. (2005). Improving grid-based slam with rao-blackwellized particle filters by adaptive proposals and selective resampling. In Robotics and Automation, 2005. ICRA 2005., 2432– 2437. IEEE. Hess, W., Kohler, D., Rapp, H., and Andor, D. (2016). Real-time loop closure in 2d lidar slam. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, 1271–1278. IEEE. Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., and Burgard, W. (2013). Octomap: An efficient probabilistic 3d mapping framework based on octrees. Autonomous Robots, 34(3), 189–206. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., and Rudnicky, A.I. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Acoustics, Speech and Signal Processing, 2006., volume 1, I–I. IEEE. Kohlbrecher, S., Meyer, J., Graber, T., Petersen, K., Klingauf, U., and von Stryk, O. (2013). Hector open source modules for autonomous mapping and navigation with rescue robots. In Robot Soccer World Cup, 624–631. Springer. Leonard, J.J. and Durrant-Whyte, H.F. (1991). Simultaneous map building and localization for an autonomous mobile robot. In Intelligent Robots and Systems’ 91., 1442–1447. Ieee. Mazumdar, A., Lee, K., Evran, E., and Algorri, M.E. (2017). Development of a service robot for indoor operation using distributed computing. Zeszyty Naukowe Instytutu Pojazd´ ow/Politechnika Warszawska. Mishra, R. and Javed, A. (2018). Ros based service robot platform. In 2018 4th International Conference on Control, Automation and Robotics (ICCAR), 55–59. IEEE. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., and Ng, A.Y. (2009). Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, 5. Kobe, Japan. Rojas-Fern´andez, M., M´ ujica-Vargas, D., Matuz-Cruz, M., and L´opez-Borreguero, D. (2018). Performance comparison of 2d slam techniques available in ros using a differential drive robot. In Electronics, Communications and Computers (CONIELECOMP), 2018 International Conference on, 50–58. IEEE. Viola, P. and Jones, M.J. (2004). Robust real-time face detection. International journal of computer vision, 57(2), 137–154.