Multi-agent Gaussian Process Motion Planning via Probabilistic Inference ⁎

Multi-agent Gaussian Process Motion Planning via Probabilistic Inference ⁎

12th 12th IFAC IFAC Symposium Symposium on on Robot Robot Control Control 12th IFAC on Control Budapest, Hungary, August August 27-30, 2018 12th IFAC ...

492KB Sizes 0 Downloads 53 Views

12th 12th IFAC IFAC Symposium Symposium on on Robot Robot Control Control 12th IFAC on Control Budapest, Hungary, August August 27-30, 2018 12th IFAC Symposium Symposium on Robot Robot Control Budapest, Hungary, 27-30, 2018 12th IFAC Symposium on Robot Control Budapest, Hungary, August 27-30, 2018 Available online at www.sciencedirect.com Budapest, Hungary, August 27-30, 2018 12th IFAC Symposium on Robot Control Budapest, Hungary, August 27-30, 2018 Budapest, Hungary, August 27-30, 2018

ScienceDirect

IFAC PapersOnLine 51-22 (2018) 160–165

Multi-agent Gaussian Process Motion Multi-agent Gaussian Process Motion Multi-agent Gaussian Process Motion  Planning via Probabilistic Inference Multi-agent Gaussian Process Motion  Planning via Probabilistic Inference Planning via Probabilistic Inference Planning via Probabilistic Inference ∗ Luka c c Luka Petrovi´ Petrovi´ c,, Ivan Ivan Markovi´ Markovi´ c,, Marija Marija Seder Seder ∗∗

∗ Luka Petrovi´ c , Ivan Markovi´ c , Marija Seder Luka c c Luka Petrovi´ Petrovi´ c,, Ivan Ivan Markovi´ Markovi´ c,, Marija Marija Seder Seder ∗∗ ∗ Luka Petrovi´ c , Ivan Markovi´ c , Marija Seder ∗ University of Zagreb Faculty of Electrical Engineering and ∗ University of Zagreb Faculty of Electrical Engineering and ∗ University of Zagreb Zagreb Faculty of of Electrical Engineering and of Faculty and ∗ University Computing, Laboratory for Autonomous Autonomous SystemsEngineering and Mobile Mobile Robotics, Robotics, Computing, Laboratory for Systems and of Zagreb Faculty of Electrical Electrical Engineering and ∗ University Computing, Laboratory Computing, Laboratory for for Autonomous Autonomous Systems Systems and and Mobile Mobile Robotics, Robotics,

University of Zagreb Faculty of Electrical Engineering and Croatia (e-mail: {luka.petrovic, ivan.markovic, Croatia (e-mail: {luka.petrovic, ivan.markovic, marija.seder}@fer.hr) Computing, Laboratory for Autonomous Systemsmarija.seder}@fer.hr) and Mobile Robotics, Croatia (e-mail: {luka.petrovic, ivan.markovic, marija.seder}@fer.hr) Croatia (e-mail: {luka.petrovic, ivan.markovic, marija.seder}@fer.hr) Computing, Laboratory for Autonomous Systems and Mobile Robotics, Croatia (e-mail: {luka.petrovic, ivan.markovic, marija.seder}@fer.hr) Croatia (e-mail: {luka.petrovic, ivan.markovic, marija.seder}@fer.hr) Abstract: This paper deals with motion planning for multiple agents by representing the Abstract: This paper deals with motion planning for multiple agents by representing the Abstract: This paper deals with motion planning for multiple agents by representing the Abstract: This paper deals with motion planning for multiple agents by representing the problem as a simultaneous optimization of every agent’s trajectory. Each trajectory is considered problem as a simultaneous optimization of every agent’s trajectory. Each trajectory is considered Abstract: This paper deals with motion planning for multiple agents by representing the problem as a simultaneous optimization of every agent’s trajectory. Each trajectory is considered problem as a simultaneous optimization of every agent’s trajectory. Each trajectory is considered Abstract: This paper deals with motion planning for multiple agents by representing the as a sample from a one-dimensional continuous-time Gaussian process (GP) generated by a linear as a sample from a one-dimensional continuous-time Gaussian process (GP) generated by a linear problem as a simultaneous optimization of every agent’s trajectory. Each trajectory is considered as aa sample aa one-dimensional continuous-time Gaussian process (GP) generated by aa linear as sample from one-dimensional continuous-time Gaussian process (GP) generated by linear problem as afrom simultaneous optimization of every agent’s trajectory. Each trajectory is considered time-varying stochastic differential equation driven by white noise. By formulating the planning time-varying stochastic differential equation driven by white noise. By formulating the planning as a sample from a one-dimensional continuous-time Gaussian process (GP) generated by a linear time-varying stochastic differential equation driven by white noise. By formulating the time-varying stochastic differential equation driven by white noise. By formulating the planning as a sample from a one-dimensional continuous-time Gaussian process (GP) generated byplanning acan linear problem as probabilistic inference on aa factor graph, the structure of the pertaining GP be problem as probabilistic inference on factor graph, the structure of the pertaining GP can be time-varying stochastic differential equation driven by white noise. By formulating the planning problem as probabilistic inference on a factor graph, the structure of the pertaining GP can be problem as probabilistic inference on a factor graph, the structure of the pertaining GP can be time-varying stochastic differential equation driven by white noise. By formulating the planning exploited to find the solution efficiently using numerical optimization. In contrast to exploited to find the solution efficiently using numerical optimization. In contrast to planning problem as probabilistic inference on a factor graph, the structure of the pertaining GP can be exploited to find the solution efficiently using numerical optimization. In contrast to planning exploited to find the solution efficiently using numerical optimization. In contrast to planning problem as probabilistic inference on a factor graph, the structure of the pertaining GP can be each agent’s trajectory individually, where only the current poses of other agents are taken into each agent’s trajectory individually, where only the current poses of other agents are taken into exploited to trajectory find the solution efficiently using numerical optimization. In agents contrast totaken planning each agent’s individually, where only the current poses of other are into each agent’s trajectory individually, where only the current poses of other agents are into exploited to propose find the simultaneous solution efficiently using numerical optimization. In works contrast planning account, we planning of multiple trajectories that in aatotaken predictive account, we propose simultaneous planning of multiple trajectories that works in predictive each agent’s trajectory individually, where only the current poses of other agents are taken into account, we propose simultaneous planning of multiple trajectories that works in aa taken predictive account, we propose simultaneous planning of multiple trajectories that in each agent’s trajectory individually, where only the current poses ofwhereabouts other agentsat into manner. It into account the information each agent’s every future manner. It takes into account the information about each agent’s whereabouts at every future account, wetakes propose simultaneous planning of about multiple trajectories that works works inare a predictive predictive manner. It takes into account the information about each agent’s whereabouts at every future manner. It takes into account the information about each agent’s whereabouts at every future account, we propose simultaneous planning of multiple trajectories that works in a predictive time instant, since full trajectories of each agent are found jointly during a single optimization time instant, since full trajectories of each agent are found jointly during aa single manner. It takes into account the information about each agent’s whereabouts at optimization every future time instant, since full trajectories of each agent jointly during single time since full trajectories of agent are found jointly during single optimization manner. It We takes into the information about each agent’s whereabouts at optimization every future procedure. We compare the proposed proposed method toare an found individual trajectory planning approach, procedure. compare the method to an individual trajectory approach, time instant, instant, since fullaccount trajectories of each each agent are found jointly during aa planning single optimization procedure. We compare the proposed method to an individual trajectory planning approach, procedure. We compare the proposed method to an individual trajectory planning approach, time instant, since full trajectories of each agent are found jointly during a single optimization demonstrating significant improvement in both success rate and computational efficiency. demonstrating improvement in both success rate and computational efficiency. procedure. We significant compare the proposed method to an individual trajectory planning approach, demonstrating improvement in both success rate and computational efficiency. demonstrating significant improvement in both success rate and computational efficiency. procedure. We significant compare the proposed method to an individual trajectory planning approach, demonstrating significant improvement in both success rate and computational efficiency. © 2018, IFAC (International Federation of Automatic Control) rate Hosting bycomputational Elsevier Ltd. Allefficiency. rights reserved. demonstrating significant improvement in both success and Keywords: motion motion planning, planning, trajectory trajectory optimization, optimization, multi-agent multi-agent system, system, probabilistic probabilistic Keywords: Keywords: motion planning, trajectory optimization, multi-agent system, probabilistic Keywords: motion planning, trajectory optimization, multi-agent system, probabilistic inference, factor graphs inference, factor graphs Keywords: motion planning, trajectory optimization, multi-agent system, probabilistic inference, factor graphs inference, factor graphs Keywords: motion planning, trajectory optimization, multi-agent system, probabilistic inference, factor graphs inference, factor graphs 1. INTRODUCTION INTRODUCTION constraints. Schulman Schulman et et al. al. (2013), (2013), Schulman Schulman et et al. al. (2014), (2014), 1. constraints. 1. INTRODUCTION INTRODUCTION constraints. Schulman et al. al. (2013),formulate Schulmanmotion et al. al. (2014), (2014), 1. constraints. Schulman et (2013), Schulman et in their their framwork framwork called TrajOpt, plan1. INTRODUCTION in called TrajOpt, formulate motion planconstraints. Schulman et al. (2013), Schulman et al. (2014), in their framwork called TrajOpt, formulate motion plan1. INTRODUCTION in their framwork called TrajOpt, formulate motion planconstraints. Schulman et al. (2013), Schulman et al. (2014), ning as sequential quadratic programming. The key feaMotion planning planning is is an an indispensable indispensable skill skill for for robots robots that that ning as sequential quadratic programming. The key feain their framwork called TrajOpt, formulate motion planMotion ning as sequential quadratic programming. The key feaMotion planning is an indispensable skill for robots that ning as sequential quadratic programming. The key feain their framwork called TrajOpt, formulate motion planMotion planning is an indispensable skill for robots that ture of TrajOpt is the ability to solve complex motion aspire to navigate through an environment without colture of TrajOpt is the ability to solve complex motion ning as sequential quadratic programming. The key feaaspire to navigate through an environment without colMotion planning is an indispensable skill for robots that ture of TrajOpt is the ability to solve complex motion aspire to navigate through an environment without colture of TrajOpt is the ability to solve complex motion ning as sequential quadratic programming. The key feaaspire to navigate through an environment without colplanning problems with few states since swept volumes Motion planning is an indispensable skill for robots that lisions. Motion planning algorithms attempt to generate planning problems with few states since swept volumes ture of TrajOpt is the ability to solve complex motion lisions. Motion planning algorithms attempt to generate aspire to navigate through an environment without colplanning problems with few states states sincecomplex sweptHowever, volumes lisions. Motion planning algorithms attempt without tospace generate planning problems with few since swept volumes ture of TrajOpt is the ability to solve motion lisions. Motion planning algorithms attempt to generate are considered to ensure continuous-time safety. aspire to navigate through an environment coltrajectories through the robot’s configuration that are considered to ensure continuous-time safety. However, planning problems with few states since swept volumes trajectories through the robot’s configuration space that lisions. Motion planning algorithms attempt tospace generate considered to ensure continuous-time safety. However, trajectories through the robot’s configuration that are are considered to continuous-time safety. However, problems with few states since swept volumes trajectories through the robot’s configuration that if the smoothness is required in the output trajectory, lisions. Motion planning algorithms attempt tospace generate are both feasible and optimal based on some performance if the smoothness is required in the output trajectory, are considered to ensure ensure continuous-time safety. However, are both feasible and optimal based on some performance trajectories through the robot’s configuration space that planning if the smoothness is required in the output trajectory, are both feasible and optimal based on some performance if the smoothness is required in the output trajectory, are considered to ensure continuous-time safety. However, are both feasible and optimal based on some performance either a densely parametrized trajectory or post-processing trajectories through the robot’s configuration space that criterion that may vary depending on the task, robot, or either densely trajectory or post-processing the aasmoothness is required in the output trajectory, criterion that may vary depending task, robot, or are both feasible and optimal basedon on the some performance either densely parametrized parametrized trajectory or thus post-processing criterion that Motion may vary depending on the task, robot, or if either densely parametrized or post-processing if is required the output trajectory, criterion that may vary depending on task, robot, or of the theaasmoothness trajectory might stilltrajectory bein needed needed increasing are both feasible and optimal based on the some performance environment. planning algorithms that can be exof the trajectory might still be thus increasing either densely parametrized trajectory or post-processing environment. Motion planning algorithms that can be excriterion that may vary depending on the task, robot, or of the trajectory might still be needed thus increasing environment. Motion planning algorithms that can be exexthe might still be needed increasing either a trajectory densely parametrized trajectory or thus post-processing environment. planning algorithms be computation time. In order to overcome the computational criterion may are vary depending on the that task,can robot, or of ecuted in that real Motion time are highly encouraged, mostly because computation time. In order to overcome the computational of the trajectory might still be needed thus increasing ecuted in real time highly encouraged, mostly because environment. Motion planning algorithms that can be excomputation time. In order to overcome the computational ecuted in real time are highly encouraged, mostly because computation time. In order to overcome the computational of the trajectory might still be needed thus increasing ecuted in real time are highly encouraged, mostly because cost incurred by using large number of states, the Gaussian environment. Motion planning algorithms that can be exthey allow for fast replanning in response to environment cost incurred by using large number of states, the Gaussian computation time. In order to overcome the computational they allow for fast replanning in response to environment ecuted in real time are highly encouraged, mostly because cost incurred by using large number of states, the Gaussian they allow for time fast replanning replanning in response response to to environment cost incurred by using large number of states, the Gaussian computation time. In order to overcome the computational they allow for fast in environment process motion planning family of algorithms (Mukadam ecuted in real are highly encouraged, mostly because changes. process motion planning family of algorithms (Mukadam cost incurred by using large number of states, the Gaussian changes. they allow for fast replanning in response to environment process motion planning family of algorithms (Mukadam changes. process motion planning family of algorithms (Mukadam cost incurred by using large number of states, the Gaussian changes. et al. (2016), Dong et al. (2016), Huang et al. (2017)) they allow for fast replanning in response to environment et al. (2016), Dong et al. (2016), Huang et al. (2017)) process motion planning family of algorithms (Mukadam changes. A significant amount of recent work has focused on trajecet al. (2016), Dong et al. (2016), Huang et al. (2017)) A significant amount of recent work has focused on trajecet al. (2016), Dong et al. (2016), Huang et al. (2017)) process motion planning family of algorithms (Mukadam use continous-time trajectory representation. Mukadam changes. use continous-time trajectory representation. Mukadam A significant amount of recent work has focused on trajecet al. (2016), Dong et al. (2016), Huang et al. (2017)) A significant amount of recent work has focused on trajectory optimization and related problems. Trajectory optiuse continous-time trajectory representation. Mukadam tory optimization and related problems. Trajectory optiuse continous-time trajectory representation. Mukadam A significant amount of recent work has focused on trajecet al. (2016), Dong et al. (2016), Huang et al. (2017)) al. (2016) parametrize the trajectory with a few support et al. (2016) parametrize the trajectory with a few support tory optimization and related problems. Trajectory optiuse continous-time trajectory representation. Mukadam tory optimization and problems. Trajectory optiA significant amount ofrelated recentan work hastrajectory focused on trajecmization methods start with initial and then et al. (2016) parametrize the trajectory with aainterpolation few support mization methods start with an initial trajectory and then et al. (2016) parametrize the trajectory with few support tory optimization and related problems. Trajectory optiuse continous-time trajectory representation. Mukadam states and then use Gaussian process (GP) and then use process (GP) mization methods start with an an initial initial trajectory and optithen et al. (2016) parametrize the trajectory with ainterpolation few support mization methods start with trajectory and tory optimization and related Trajectory minimize an objective objective function in order order to optimize optimize the states states andthe then use Gaussian Gaussian process (GP) interpolation minimize an function in to the states and then use process mization methods start with anproblems. initial trajectory and then then et (2016) parametrize the with ainterpolation few support to al. query trajectory at trajectory any time (GP) of interest. interest. Dong to query trajectory at any time of Dong minimize anRatliff objective function in order order to et optimize the states andthe then use Gaussian Gaussian process (GP) interpolation minimize an objective function in to optimize the mization methods start with an initial trajectory and then trajectory. et al. (2009) and Zucker al. (2013), to query the trajectory at any time of interest. Dong trajectory. Ratliff et al. (2009) and Zucker et al. (2013), to query the trajectory at any time of interest. Dong minimize an objective function in order to optimize the states and then use Gaussian process (GP) interpolation et al. (2016), in their framework called GPMP2, repreet al. (2016), in their framework called GPMP2, repretrajectory. Ratliff et al. (2009) and Zucker et al. (2013), to query the trajectory at any time of interest. Dong trajectory. Ratliff et al. (2009) and Zucker et al. (2013), minimize an objective function in order to optimize the in their work abbreviated as CHOMP, proposed utilizet al. (2016), in their framework called GPMP2, reprein their work abbreviated as CHOMP, proposed utilizet al. (2016), in their framework called GPMP2, repretrajectory. Ratliff et al. (2009) and Zucker et al. (2013), to query the trajectory at any time of interest. Dong sent continuous-time trajectories as samples from a GP sent continuous-time trajectories as samples from a GP in their work abbreviated as CHOMP, proposed utilizet al. (2016), in their framework called GPMP2, reprein their work abbreviated as CHOMP, proposed utiliztrajectory. Ratliff etsigned al. (2009) and field Zucker et al.collision (2013), ing a precomputed distance for fast sent continuous-time trajectories as samples from a GP ing a precomputed signed distance field for fast collision sent continuous-time trajectories as samples from a GP in their work abbreviated as CHOMP, proposed utilizet al. (2016), in their framework called GPMP2, repreand then formulate the planning problem as probabilistic and then formulate the planning problem as probabilistic ing a precomputed signed distance field for fast collision sent continuous-time trajectories as samples from a GP ing a precomputed signed distance field for fast collision in their work abbreviated as CHOMP, proposed utilizchecking and using using covariant covariant gradient descent to minimize and then formulate the planning problem as probabilistic checking and gradient descent to minimize and then formulate the planning problem as probabilistic ing a precomputed signed distance field for fast collision sent continuous-time trajectories as samples from a GP inference, generatingthe fastplanning solutionsproblem by exploiting exploiting the sparsparinference, generating fast solutions by the checking and using covariant gradient descent to minimize and then formulate as probabilistic checking and using covariant gradient descent to minimize ing a precomputed signed distance field for fast collision obstacle and smoothness costs. Kalakrishnan et al. (2011) inference, generating fast solutions by exploiting the sparobstacle and smoothness costs. Kalakrishnan et al. (2011) inference, generating fast solutions by exploiting the sparchecking and using covariant gradient descent to minimize and then formulate the planning problem as probabilistic sity of the underlying linear system. A useful property of the underlying system. A useful property obstacle and smoothness costs. Kalakrishnan et al. (2011) sity inference, generating fastlinear solutions by exploiting the sparobstacle smoothness costs. Kalakrishnan (2011) checking and using covariant gradient descent et to al. minimize developed a stochastic trajectory optimization method of the underlying system. A useful property developed a stochastic trajectory optimization method sity of the underlying linear system. A useful property obstacle and and smoothness costs. Kalakrishnan et al. (2011) sity inference, generating fastlinear solutions by exploiting the sparof GPMP2 is its extensibility and applicability for wide of GPMP2 is its extensibility and applicability for wide developed a stochastic trajectory optimization method sity of the underlying linear system. A useful property developed aa smoothness stochastic trajectory optimization obstacle and costs. Kalakrishnan et al.method (2011) (STOMP) that samples a series of noisy trajectories to exof GPMP2 is its extensibility and applicability for wide (STOMP) that samples a series of noisy trajectories to exof GPMP2 is its extensibility and applicability for wide developed stochastic trajectory optimization method sity of the underlying linear system. A useful property range of problems (Rana et al. (2017), Mukadam et al. range of problems (Rana et al. (2017), Mukadam et al. (STOMP) that samples a series of noisy trajectories to exof GPMP2 is its extensibility and applicability for wide (STOMP) that samples a series of noisy trajectories to exdeveloped a stochastic trajectory optimization method plore the space space around an initialoftrajectory trajectory which are aretothen then range of problems (Rana et al. (2017), Mukadam et al. plore the around an initial which range of problems (Rana et al. (2017), Mukadam et al. (STOMP) that samples a series noisy trajectories exof GPMP2 is its extensibility and applicability for (2017a), Mari´ c et et al. al.(Rana (2018)) and in this paper paper we also alsoetwide rely (2017a), Mari´ c (2018)) and in this we rely plore the space around an initial trajectory which are then range of problems et al. (2017), Mukadam al. plore the space around initial which are then (STOMP) that samplesan a series oftrajectory noisy trajectories to ex- (2017a), combined to produce an updated trajectory with lower Mari´ cc et al. (2018)) and in this paper we also rely combined to produce updated trajectory with lower (2017a), Mari´ et al. (2018)) and in this paper we also rely plore the space around initial trajectory which are then range of problems (Rana et al. (2017), Mukadam et al. on the aforementioned framework. All mentioned methods on the aforementioned framework. All mentioned methods combined to produce an updated trajectory with lower (2017a), Mari´ c et al. (2018)) and in this paper we also rely combined to produce an updated trajectory with plore the space around initial trajectory which arelower then on cost. The key trait of STOMP is its ability to optimize the aforementioned framework. All mentioned methods cost. The key trait of STOMP is its ability to optimize on the aforementioned framework. All mentioned methods combined to produce an updated trajectory with lower (2017a), Mari´ c et al. (2018)) and in this paper we also rely are primarily designed for planning motion of one robot in primarily designed for planning motion of robot in cost. The key key traitconstraints. of an STOMP is important its ability shortcoming to optimize on the aforementioned All mentioned cost. The trait of STOMP its ability to optimize combined to produce updated trajectory with lower are non-differentiable Anis are primarily designedInframework. for planning motion of one one methods robot in non-differentiable constraints. An important shortcoming are primarily designed for planning motion of one robot in cost. The key trait of STOMP is its ability to optimize on the aforementioned framework. All mentioned methods static environments. many challenging problems, such static environments. In many challenging problems, such non-differentiable constraints. An important shortcoming are primarily designed for planning motion of one robot in non-differentiable constraints. An important shortcoming cost. The key trait of STOMP is its ability to optimize of CHOMP and STOMP is the need for many trajectory static environments. Infor many challenging problems, such of CHOMP and STOMP is the need for many trajectory static environments. In many challenging problems, such non-differentiable constraints. An important shortcoming are primarily designed planning motion of one robot in as warehouse management, delivery and construction, a as warehouse management, and construction, a of CHOMP and STOMP is the thefine need for many many trajectory static environments. In manydelivery challenging problems, such of CHOMP and STOMP is need for trajectory non-differentiable constraints. An important shortcoming states in order to reason about resolution obstacle repas warehouse management, delivery and construction, a states in order to reason about fine resolution obstacle repas warehouse management, delivery and construction, a of CHOMP and STOMP is the need for many trajectory static environments. In many challenging problems, such single robot may not be able to achieve desired tasks single robot may not be able to achieve desired tasks states in order to reason about fine resolution obstacle repas warehouse management, delivery and construction, a states in order to reason about fine resolution obstacle repof CHOMP and STOMP is the need for many trajectory resentations or find feasible solutions when there are many single robot may not be able to achieve desired tasks resentations or find feasible solutions when there are many single robot may not be able to achieve desired tasks states in order to reason about fine resolution obstacle repas warehouse management, delivery and construction, a and therefore multi-robot teams are required. Multi-robot and therefore multi-robot teams are required. Multi-robot resentations or find feasible solutions when there are many single robot may not be able to achieve desired tasks resentations or find feasible solutions when there are many states in order reason about fine resolution obstacle rep- and and therefore multi-robot teams are required.desired Multi-robot therefore multi-robot teams are required. Multi-robot resentations orto find feasible solutions when there are many single robot may not be able to achieve tasks teams are also more robust to malfunctions, since one  teams are also more robust to malfunctions, since one and therefore multi-robot teams are required. Multi-robot This been supported by European Regional De resentations find feasible solutions there are many teams are also more robust to malfunctions, since one This research researchorhas has been supported by the the when European Regional De teams are more to malfunctions, since and therefore multi-robot teams required. Multi-robot This been by Regional  teams are also also more robust robust to are malfunctions, since one one velopment Fund has under thesupported grant KK.01.1.1.01.0009 KK.01.1.1.01.0009 (DATACROSS). This research research has been supported by the the European European Regional DeDe velopment Fund under the grant (DATACROSS). This research has been supported by the European Regional Deteams are also more robust to malfunctions, since one velopment Fund  velopment Fund under under the the grant grant KK.01.1.1.01.0009 KK.01.1.1.01.0009 (DATACROSS). (DATACROSS).

This research been by the European Regional Development Fund has under thesupported grant KK.01.1.1.01.0009 (DATACROSS). velopment Fund under the grant KK.01.1.1.01.0009 (DATACROSS). Copyright © 2018, 2018 IFAC IFAC 160Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2018 160 Copyright ©under 2018 responsibility IFAC 160Control. Peer review© of International Federation of Automatic Copyright 2018 IFAC 160 Copyright © 2018 IFAC 160 10.1016/j.ifacol.2018.11.535 Copyright © 2018 IFAC 160

IFAC SYROCO 2018 Budapest, Hungary, August 27-30, 2018

Luka Petrović et al. / IFAC PapersOnLine 51-22 (2018) 160–165

robot can take over the tasks of another robot in case of failure. In this paper, we propose using a continous time Gaussian process trajectory representations in order to plan motion for every robot in a multi-agent system. We augment the method proposed in Dong et al. (2016) and consider multi-agent trajectory optimization from a probabilistic inference perspective, optimizing all of the trajectories concurrently. By optimizing the trajectories at the same time, our method takes into account the information where each agent will be at every future time instant, thus working in a predictive manner. We evaluated our approach in simulation and compared it to planning each agent’s trajectory indvidually. The rest of the paper is organized as follows. Section 2 presents Gaussian processes as trajectory representations. In Section 3, the method for simultaneous multi-agent trajectory optimization as probabilistic inference is proposed. Section 4 describes implementation aspects of the proposed method. Section 5 demonstrates the main results in simulation and Section 6 concludes the paper. 2. GAUSSIAN PROCESSES AS TRAJECTORY REPRESENTATIONS A continuous-time trajectory is considered as a sample from a vector-valued Gaussian process (GP), x(t) ∼ GP(µ(t), K(t, t )), with mean µ(t) and covariance K(t, t ), generated by a linear time-varying stochastic differential equation (LTV-SDE) x(t) ˙ = A(t)x(t) + u(t) + F (t)w(t) (1) where A, F are system matrices, u is a known control input and w(t) is generated by a white noise process. The white noise process is itself a GP with zero mean value w(t) ∼ GP(0, Qc δ(t − t )), (2) where Qc is a power spectral density matrix . The solution of the LTV-SDE in (1) is generated by the mean and covariance of the GP:  t Φ(t, s)u(s)ds (3) µ(t) = Φ(t, t0 )µ0 + t0

K(t, t ) = Φ(t, t0 )K0 Φ(t , t0 )T +  min(t,t ) Φ(t, s)F (s)Qc F (s)T Φ(t , s)ds,

(4)

t0

where µ0 , K0 are initial mean and covariance of the first state, and Φ(t, s) is the state transition matrix (Barfoot et al. (2014)). The GP prior distribution is then given in terms of its mean µ and covariance K: 1 2 (5) p(x) ∝ exp{− x − µK }. 2 One major benefit of using Gaussian processes to model continuous-time trajectory in motion planning is the possibility to query the planned state x(τ ) at any time of interest τ , and not only at discrete time instants. If the prior proposed in (5) is used, GP interpolation can be performed efficiently due to the Markovian property of 161

161

the LTV-SDE given in (1). State x(τ ) at τ ∈ [ti , ti+1 ] is a function only of its neighboring states (Dong et al. (2016)) x(τ ) = µ(τ ) + Λ(τ )(xi − µi ) + Ψ(τ )(xi+1 − µi+1 ), (6) Λ(τ ) = Φ(τ, ti ) − Ψ(τ )Φ(ti+1 , ti ), (7) Ψ(τ ) = Qi,τ Φ(ti+1 , τ )T Q−1 , (8) i,i+1 where  tb Qa,b = Φ(b, s)F (s)Qc F (s)T Φ(b, s)T ds. (9) ta

The fact that any state x(τ ) can be computed in O(1) complexity can be exploited for efficient computation of obstacle avoidance costs, as explained in Section 3.

Another major benefit arising from the aforementioned Markovian property of the LTV-SDE in (1) is the fact that the inverse kernel matrix K −1 of this prior is exactly sparse block tridiagonal (Barfoot et al. (2014)) K −1 = A−T Q−1 A−1 (10) where   1 0 ... 0 0 −Φ(t1 , t0 ) 1 ... 0 0    . ..  . . . −1  . . . 0 −Φ(t2 , t1 ) A =    .. .. . ..  . . 1 0 0 0 ... −Φ(tN , tN −1 ) 1 (11) and −1 Q−1 = diag(K0−1 , Q−1 (12) 0,1 , ..., QN −1,N ) with Qa,b given in (9), the trajectory going from t0 to tN , and K0 being the initial covariance. As it will be shown in Section 3, this kernel allows for fast, structure-exploiting inference. 3. MULTI-AGENT TRAJECTORY OPTIMIZATION AS PROBABILISTIC INFERENCE In this section we formulate the multi-agent trajectory optimization problem as probabilistic inference. The presented formulation is predicated on the work of Dong et al. (2016) and it represents the extension of their trajectory optimization method to multi-robot systems. To formulate the trajectory optimization problem as probabilistic inference, we seek to find a trajectory parametrized by x given desired events e. The posterior density of x given events e can be computed via Bayes’ rule from a prior and likelihood p(x|e) = p(x)p(e|x)/p(e) ∝ p(x)p(e|x) (13) where p(x) represents the prior on x which encourages smoothness of the trajectory, while p(e|x) represents the probability of the desired events occuring given x. The optimal trajectory x is found by maximizing the posterior p(x|e), using the maximum a posteriori (MAP) estimator x∗ = arg max p(x)p(e|x), (14) x

where p(e|x) =



p(e|xi )

(15)

i

with xi being the configuration at discrete time instant ti . The conditional distributions p(e|xi ) specify the likelihood of desired events occuring at the configuration xi L(xi |e) ∝ p(e|xi ). (16)

IFAC SYROCO 2018 162 Budapest, Hungary, August 27-30, 2018

Luka Petrović et al. / IFAC PapersOnLine 51-22 (2018) 160–165

In motion planning context, desired event is avoidance of collisions and therefore conditional distribution p(e|xi ) specifies the likelihood that the configuration xi is collision free Lobs (xi |ci = 0) ∝ p(ci = 0|xi ). (17) In our case, for each agent there are two possible types of collision; collision with a static obstacle in the environment and collision with another agent. Since the aforementioned types of collision are independent events, the likelihood of trajectory x being free of collisions can therefore be considered as the product of two likelihoods: Lobs (x|c = 0) = Lstat (x|cobs = 0)Lmul (x|cmul = 0), (18) where the first the term specifies the probability of being clear of collisions with static obstacles and the second term specifies the probability of being clear of collisions with other agents. For each agent j, the likelihood of being free of collision with static obstacles is defined as a distribution of the exponential family (Dong et al., 2016): 2 1 Lstat (xj |cobs = 0) ∝ exp{− h(xj )Σ }, (19) obs 2 where h(x) is a vector-valued obstacle cost function and Σobs a diagonal matrix and the hyperparameter of the distribution. In the same manner, we define the likelihood of being collision-free with other agents as a distribution of the exponential family that is a product of probabilities of being free of collision with every other agent nag  2 1 exp{− g(xj , xj  )Σ }, Lmul (xj |cmul = 0) ∝ mul 2  j =1 j  =j

(20) where g(xj , xj  ) is a vector-valued function that defines the cost of two agents j and j  being close to each other, nag is number of agents and Σmul is a hyperparameter of the distribution.

Combining Eq. (19) and Eq. (20), the total likelihood of agent j being free of collision is obtained Lobs (xj |c = 0) ∝

need a linearized approximation of (22). Converting the nonlinear least squares problem to a linear problem around operating point x ¯j , the following expression for the optimal perturbation δx∗j is obtained 1 2 δx∗j = arg min x ¯j + δxj − µj K + j 2 δxj   1 2 h(¯ xj ) + Hj δxj Σ + obs 2 nag    2 1 g(¯ , (23) xj , xj  ) + Gj δxj Σ mul 2  j =1 j  =j

where Hj is the Jacobian matrix of h(xj )  ∂h  Hj = , ∂xj x¯j

and Gj is the partial derivative of g(xj , xj  )  ∂g  . Gj = ∂xj 

The optimal perturbation δx∗j is obtained by solving the following linear system   nag  −1 T −1 T −1 Kj + Hj Σobs Hj + Gj Σmul Gj δx∗j = j  =1 j  =j

¯j ) − HjT Σ−1 xj ) − Kj−1 (µj − x obs h(¯

j =1 j  =j

Deriving the MAP trajectory from Eq. (14), Eq. (5) and Eq. (21) in a similar manner to Dong et al. (2016), our maximum a posteriori trajectory for each agent is 2 1 x∗j = arg min { xj − µj K + j 2 xj nag     1 h(xj )2 + 1 g(xj , xj  )2 }. Σ Σmul obs 2 2 

(22)

j =1 j  =j

This is a nonlinear least squares problem that can be solved with iterative approaches, such as Gauss-Newton or Levenberg-Marquardt. To obtain the MAP trajectory by solving the aforementioned optimization problem with iterative approaches, we 162

GTj Σ−1 xj , xj  ). mul g(¯



j =1 j  =j

cj



j j mul fi,j,j  (xi , xi )

cj 

(21)

nag 

(26) This MAP trajectory optimization can be represented as inference on a factor graph (Kschischang et al. (2001)). The fact that the system in (26) is linear and sparse can be exploited for finding the solution efficiently. In our case, the posterior distribution given in Eq. (13) can be factorized similarly to Mukadam et al. (2017b):   gp j j obs j P (x|e) ∝ fj (xi , xi+1 )fi,j (xi ) 

 2 2 1 1 exp{− g(xj , xj  )Σ }. exp{− h(xj )Σ } obs mul 2 2 

(25)

(¯ xj ,xj  )

ti

nag

(24)

np 





intp j j j j fi,j,j  ,τ (xi , xi+1 , xi , xi+1 ),

(27)

τ =1

where f gp represents factor corresponding to a GP prior, f obs represents the cost of collision with static obstacles, f mul represents the cost of collision with other agents and f intp represents collision cost calculated for np interpolated states that can be obtained by using (6). The factor graph defined in (27) is depicted in Fig. 3 for a simple case of trajectory optimization problem with two agents and three states. 4. IMPLEMENTATION DETAILS 4.1 GP prior In our implementation, for dynamics of our robot we use the double integrator linear system with white noise injected in acceleration, meaning that the trajectory is generated by the LTV-SDE in (1) with

IFAC SYROCO 2018 Budapest, Hungary, August 27-30, 2018

Luka Petrović et al. / IFAC PapersOnLine 51-22 (2018) 160–165

Legend Robot state Start and goal factor GP prior factor Multi-agent factor Obstacle factor

t1

t2

t3

Fig. 1. A simple illustration of the factor graph describing an example multi-robot trajectory optimization problem. GP interpolation factors present between states are omitted for clarity.     0 I 0 A= , u(t) = 0, F (t) = . (28) 00 I

This represents a constant velocity GP prior which is centered around a zero acceleration trajectory (Barfoot et al. (2014)). Applying such a prior will minimize actuator velocity in the configuration space, thus minimizing the energy consumption and giving the physical meaning of smoothness. Choosing the noise covariance Qc has an effect on smoothness, with smaller values penalizing deviation from the prior more. 4.2 Avoidance of collision with static obstacles

In (19) we defined the likelihood of one agent j being free of collision with static obstacles which relies upon a vectorvalued obstacle cost function h(xj ). In our case, for the obstacle cost h(xj ) we use the hinge loss function with a precomputed signed distance field similarly to Dong et al. (2016) and Mukadam et al. (2017b). The pertaining hinge loss function is defined  as: h(xj ) =

εobs − ds,j 0

if ds,j ≤ εobs , if ds,j > εobs

(29)

where ds,j is the signed distance, and εobs is the safety distance indicating the boundary of the danger area near obstacle surfaces (Mukadam et al. (2017b)). 4.3 Avoidance of collision with other agents In (20) we defined the likelihood of one agent j being free of collision with other agent j  . This likelihood relies on a vector-valued function g(xj , xj  ), which we defined as the hinge loss function:  εmul − dj,j  if dj,j  ≤ εmul g(xj , xj  ) = , (30) 0 if dj,j  > εmul where dj,j  denotes d(xj , xj  ), the Euclidean distance between two agents j, and j  and εmul is a safety distance indicating the boundary of the area inside which we anticipate a collision. If the parameter εmul is set to zero, multiagent obstacle cost would always be zero and our planner would work exactly the same as GPMP2 (Dong et al. (2016)) for each agent. Higher values of εmul enable robots to anticipate each other’s future trajectories and to adapt them accordingly in the next iteration of optimization, giving our algorithm a predictive property. 163

163

To obtain the MAP trajectory in (23), the proposed method requires Gj , the partial derivative of the hinge loss function defined in (31). This partial derivative used in our implementation can be analytically obtained:  xj  − x j    . if dj,j  < εmul  d(xj , xj  ) ∂g(xj , xj  ) = (31) 0.5 if dj,j  = εmul .  ∂xj    0 if dj,j  > εmul Parameters Σobs and Σmul , needed to fully implement static and multi-agent obstacle likelihoods in (19) and (20) are defined by isotropic diagonal matrices Σobs = 2 2 σobs I and Σmul = σmul I, where σobs and σmul represent obstacle cost weights. Reducing the value of σobs causes the optimization to place more weight on avoiding collision with static obstacles, while reducing the value of σmul causes the optimization to place more weight on avoiding collision with other agents. 4.4 Software implementation In our experiments we use the GPMP2 C++ library (Dong et al. (2016), Dong et al. (2017)), and its respective MATLAB toolbox, which is based on the GTSAM C++ library (Dellaert (2012), Dellaert and Kaess (2006)). Experiments are performed on a system with a 3.8-GHz Intel Core i77700HQ processor and 16 GB of RAM. 5. EXPERIMENTAL RESULTS 5.1 Formation control In this section we present the results of our method in a quantitative manner. We used the proposed approach to switch the positions of 2D holonomic circular robots with radius r = 1 m inside a formation in an obstacle free simulation environment. We did this for every possible permutation of positions inside a formation for cases of three, four and five agents and compared the success rate and computation time of our approach to the GPMP2 framework (Dong et al. (2016)) where each agent’s trajectory is planned individually at every time step. In three and five robot simulation the formation is a triangle, while in four robot simulation it’s a square. That means that we tested and compared algorithms on the total of 150 unique planning problems. When using the GPMP2 framework, for every agent the others were incorporated in its signed distance field (SDF), meaning that at every time step the SDF had to be changed and the trajectory had to be replanned. All trajectories were initialized as a constant-velocity straight line trajectory in configuration space. Total time of execution for every case is set as ttotal = 10 s and all trajectories were parametrized with 10 equidistant support states such that 9 points are interpolated between any two states (91 states effectively). The parameters used for GPMP2 are εobs = 2, Σobs = 0.3, while the parameters used for our approach are εmul = 15, Σmul = 0.7. It makes sense to use relatively small εobs in comparison to εmul since in GPMP2 the agents only react to each other locally, and in our approach it is desireable that every agent knows each other’s position at all times in order to calculate

IFAC SYROCO 2018 164 Budapest, Hungary, August 27-30, 2018

Luka Petrović et al. / IFAC PapersOnLine 51-22 (2018) 160–165

successful simultaneous motion of two agents in complex static environments. Our goal was to switch positions of two planar holonomic robots that start in two rooms which are separated by a narrow hallway. The radius of each robot is r = 1 m, and the width of the hallway is w = 3.6 m, which is smaller than two robots diameters combined, meaning that robots cannot pass each other in the hallway. Since the environment is complex, if the initial trajectory of each robot is set as a straight line, the optimization gets stuck in the local minimum and the collisions are not avoided. To solve that problem, we set the initial trajectories as the paths obtained by A∗ algorithm with constant velocities. When computing the path with the A∗ algorithm, we downsampled the grid representing the environment so that the initial path was obtained in acceptable time. Since the paths obtained by the A∗ algorithm are only used as priors in our method, possible loss of information about small sized obstacles when downsampling is not concerning.

Table 1. Comparison of success rates (percentages) for our method and replanning with GPMP2 for every possible permutation of positions inside formations of 3, 4 and 5 agents Number of robots 3 4 5

GPMP2 100 87.5 70.8

MUL-GPMP 100 100 100

Table 2. Comparison of average execution times (ms) for our method and replanning with GPMP2 for every possible permutation of positions inside formations of 3, 4 and 5 agents Number of robots 3 4 5

GPMP2 196 264 457

Replanning with GPMP2

15

15

10

10

y [m]

y [m]

Simultaneous optimization

5

0

5

0

−5 −15

MUL-GPMP 22 37 59

−5 −10

−5

0

5

10

15

x [m]

−15

−10

−5

0

5

10

15

x [m]

Fig. 2. Example of switching formation with the proposed simultaenous optimization and with replanning with GPMP2. The proposed algorithm produces visibly smoother trajectories due to its predictive nature. the trajectories that avoid collisions. The success rates comparison is showed in Table 1 and the computational times comparison is showed in Table 2. Since computing the SDF and replanning the trajectory is relatively computationally demanding in comparison to computing the multi-agent factor in (20), it is not surprising that our approach achieves significantly faster computational times. Incorporating current positions of other agents inside an agent’s SDF means that the GPMP2 framework, unlike the proposed method, can only react to the changes in the environment and not anticipate them. Thus the better success rates of our approach were also expected. Our method was able to successfully solve all of the given 150 motion planning problems, while the GPMP2 failed in total 38 cases. Due to the predictive nature of the proposed approach, the generated trajectories are also visibly smoother which is demonstrated in Fig. 2 for one example of five robot formation change. Note that for the case of replanning with GPMP2 better results could be achieved by parameter tuning, possibly using a grid search, but that is relatively computationally demanding and possibly unattainable in the real world circumstances. 5.2 Complex static environment This experiment shows qualitatively the capability of our approach to generate trajectories that can ensure 164

The described setup represents a challenging task for existing motion planning algorithms, and most of them would result in redundant motion; robots would move towards each other and meet in the middle of the hallway, where one of them would have to turn back to make way for the other. In our case, however, due to the fact that the proposed approach works in a predictive manner, after exiting the room one robot turns away from its goal and waits for the other one to enter its destination room before proceeding to travel to its goal. The downside of the proposed method is the number of the optimization hyperparameters that need to be set, for example static and multi-agent obstacle cost factor covariances Σobs and Σmul . In this specific example, the optimization hyperparameters Σobs , Σmul , εobs , εmul were tuned via exhaustive grid search. The described environment and the result of the conducted experiment are shown in Fig. 3. 6. CONCLUSION AND FUTURE WORK In this paper we have presented a fast trajectory optimization method for multi-agent motion planning. We considered each trajectory as a sample from a continuous time Gaussian process generated by linear time-varying stochastic differential equation driven by white noise. We formulated the multi-agent planning problem as probabilistic inference on a factor graph, and thus were able to exploit the structure of the mentioned GP to find the solution efficiently using numerical optimization. The proposed approach works in a predictive manner since each agent’s trajectory is optimized simultaneously. We tested our approach in simulation and compared it to planning each trajectory indvidually, demonstrating significant improvement in both success rate and computational efficiency. In future work, it would be interesting to investigate how different priors affect the result of optimization. Another potentially interesting direction would be to explore the possibility of planning robot’s trajectory in a dynamic environment. If the trajectory of a dynamic obstacle could be predicted, the avoidance of such obstacle would be achieved using the same concepts introduced in the proposed approach.

IFAC SYROCO 2018 Budapest, Hungary, August 27-30, 2018

Luka Petrović et al. / IFAC PapersOnLine 51-22 (2018) 160–165

t =6 [s] 15

10

10

5

5 0

−5 −10 −20

y [m]

15

10

0

−5 −15

−10

−5

0

5

10

−10 −20

15

x [m]

5 0 −5

−15

−10

t =18 [s]

−5

0

5

10

−10 −20

15

x [m]

15

10

10

10

5

y [m]

15

5

0

0

0

−5

−5

−10

−5

0

5

10

15

x [m]

−10 −20

−15

−10

−5

0

x [m]

5

−5

0

5

10

15

10

15

x [m]

5

−5 −15

−10

t =30 [s]

15

−10 −20

−15

t =24 [s]

y [m]

y [m]

t =12 [s]

15

y [m]

, y [m]

t =0 [s]

165

10

15

−10 −20

−15

−10

−5

0

5

x [m]

Fig. 3. The example of trajectories planned with our approach for two agents in a complex static environment consisting of rooms with narrow passage between them. Robots anticipate each other’s trajectories and one robot makes way for the other one. REFERENCES Barfoot, T.D., Tong, C.H., and S¨ arkk¨ a, S. (2014). Batch continuous-time trajectory estimation as exactly sparse gaussian process regression. In Robotics: Science and Systems. Dellaert, F. (2012). Factor graphs and gtsam: A handson introduction. Technical report, Georgia Institute of Technology. Dellaert, F. and Kaess, M. (2006). Square root sam: Simultaneous localization and mapping via square root information smoothing. The International Journal of Robotics Research, 25(12), 1181–1203. Dong, J., Boots, B., and Dellaert, F. (2017). Sparse gaussian processes for continuous-time trajectory estimation on matrix lie groups. arXiv preprint arXiv:1705.06020. Dong, J., Mukadam, M., Dellaert, F., and Boots, B. (2016). Motion planning as probabilistic inference using gaussian processes and factor graphs. In Proceedings of Robotics: Science and Systems (RSS-2016). Huang, E., Mukadam, M., Liu, Z., and Boots, B. (2017). Motion planning with graph-based trajectories and gaussian process inference. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, 5591– 5598. Kalakrishnan, M., Chitta, S., Theodorou, E., Pastor, P., and Schaal, S. (2011). Stomp: Stochastic trajectory optimization for motion planning. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, 4569–4574. Kschischang, F.R., Frey, B.J., and Loeliger, H.A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47(2), 498–519. Mari´c, F., Limoyo, O., Petrovi´c, L., Petrovi´c, I., and Kelly, J. (2018). Singularity avoidance as manipulability maximization using continuous time gaussian processes. arXiv preprint arXiv:1803.09493. 165

Mukadam, M., Dong, J., Dellaert, F., and Boots, B. (2017a). Simultaneous trajectory estimation and planning via probabilistic inference. In Proceedings of Robotics: Science and Systems (RSS). Mukadam, M., Dong, J., Yan, X., Dellaert, F., and Boots, B. (2017b). Continuous-time gaussian process motion planning via probabilistic inference. arXiv preprint arXiv:1707.07383. Mukadam, M., Yan, X., and Boots, B. (2016). Gaussian process motion planning. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, 9–15. Rana, M.A., Mukadam, M., Ahmadzadeh, S.R., Chernova, S., and Boots, B. (2017). Towards robust skill generalization: Unifying learning from demonstration and motion planning. In Conference on Robot Learning, 109– 118. Ratliff, N., Zucker, M., Bagnell, J.A., and Srinivasa, S. (2009). Chomp: Gradient optimization techniques for efficient motion planning. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, 489–494. Schulman, J., Duan, Y., Ho, J., Lee, A., Awwal, I., Bradlow, H., Pan, J., Patil, S., Goldberg, K., and Abbeel, P. (2014). Motion planning with sequential convex optimization and convex collision checking. The International Journal of Robotics Research, 33(9), 1251– 1270. Schulman, J., Ho, J., Lee, A.X., Awwal, I., Bradlow, H., and Abbeel, P. (2013). Finding locally optimal, collisionfree trajectories with sequential convex optimization. In Robotics: science and systems, volume 9, 1–10. Zucker, M., Ratliff, N., Dragan, A.D., Pivtoraiko, M., Klingensmith, M., Dellin, C.M., Bagnell, J.A., and Srinivasa, S.S. (2013). Chomp: Covariant hamiltonian optimization for motion planning. The International Journal of Robotics Research, 32(9-10), 1164–1193.