Zero-Sum Nonlinear Polynomial Game for Planar Robots Coordination

Zero-Sum Nonlinear Polynomial Game for Planar Robots Coordination

Preprints, 1st IFAC Conference on Modelling, Identification and Preprints, 1st IFAC Conference on Modelling, Preprints, IFAC Conference Modelling, Ide...

467KB Sizes 28 Downloads 76 Views

Preprints, 1st IFAC Conference on Modelling, Identification and Preprints, 1st IFAC Conference on Modelling, Preprints, IFAC Conference Modelling, Identification Identification and and Control of 1st Nonlinear Systems on Preprints, 1st IFAC on Preprints, 1st IFAC Conference Conference on Modelling, Modelling, Identification Identification and and Control of Nonlinear Systems Control of Nonlinear Systems Available online at www.sciencedirect.com June 24-26, 2015. Saint Petersburg, Russia Control of Nonlinear Systems Control of Nonlinear Systems June 24-26, 2015. Saint Petersburg, Russia June 24-26, 24-26, 2015. Saint Saint Petersburg, Russia Russia June June 24-26, 2015. 2015. Saint Petersburg, Petersburg, Russia

ScienceDirect

IFAC-PapersOnLine 48-11 (2015) 463–468

Zero-Sum Nonlinear Polynomial Game Zero-Sum Nonlinear Polynomial Game Zero-Sum Nonlinear Polynomial Game Zero-Sum Nonlinear Polynomial Game Planar Robots Coordination Planar Robots Coordination Planar Robots Coordination Planar Robots Coordination

for for for for

∗ ∗ Manuel Jim´ e nez-Liz´ arraga ∗ Ricardo Chapa ∗ ∗ Manuel Jim´ e nez-Liz´ rraga Chapa ∗ Ricardo ∗ ∗a ∗∗∗ Manuel Jim´ e nez-Liz´ a rraga Ricardo Chapa ∗ ∗ Manuel Jim´ e nez-Liz´ a rraga Ricardo Chapa Celeste Rodriguez ∗ aPedro Manuel Jim´ enez-Liz´ rraga Castillo-Garcia Ricardo Chapa∗∗ ∗∗ Celeste Celeste Rodriguez Rodriguez ∗∗∗ Pedro Pedro Castillo-Garcia Castillo-Garcia ∗∗ ∗∗ Celeste Rodriguez Pedro Castillo-Garcia Celeste Rodriguez Pedro Castillo-Garcia ∗ ∗ Department of Physical and Mathematical Sciences, Autonomous ∗ Department Physical and Mathematical Sciences, Autonomous ∗ Department of of Physical and Mathematical Sciences, Autonomous ∗ Department of and Mathematical Sciences, Autonomous University of Nuevo Leon, San Nicolas, Nuevo Leon, Mexico (e-mail: Department of Physical Physical and Mathematical Sciences, Autonomous University of Nuevo Leon, San Nicolas, Nuevo Leon, Mexico (e-mail: University of Nuevo Leon, San Nicolas, Nuevo Leon, Mexico (e-mail: University of Nuevo Leon, San Nicolas, Nuevo Leon, Mexico (e-mail: manalejimenez@ yahoo.com). University of Nuevo Leon, San Nicolas, Nuevo Leon, Mexico (e-mail: manalejimenez@ yahoo.com). ∗∗ manalejimenez@ yahoo.com). manalejimenez@ yahoo.com). CNRS 3175, CINVESTAV-IPN, M´ e xico, D. F. ∗∗ UMI LAFMIA manalejimenez@ yahoo.com). ∗∗ UMI LAFMIA CNRS 3175, CINVESTAV-IPN, M´ e xico, ∗∗ UMI LAFMIA LAFMIA CNRS 3175, CINVESTAV-IPN, M´ M´exico, D. D. F. F. ∗∗ UMI 3175, CINVESTAV-IPN, (e-mail: [email protected]) UMI LAFMIA CNRS CNRS 3175, CINVESTAV-IPN, M´eexico, xico, D. D. F. F. (e-mail: [email protected]) (e-mail: [email protected]) (e-mail: [email protected]) (e-mail: [email protected]) Abstract: This paper proposes a for the coordination of planar mobile robots based Abstract: This paper proposes aa scheme scheme for the coordination of planar mobile robots based Abstract: This paper proposes scheme for the coordination of planar mobile robots based Abstract: This paper proposes a scheme for the coordination of planar mobile robots based on Zero-Sum differential game model. The robots’ dynamic are given in the polynomial Abstract: This paper proposes amodel. schemeThe for robots’ the coordination of planar mobile robots based on Zero-Sum differential game dynamic are given in the polynomial on Zero-Sum differential game model. The robots’ dynamic are given in the polynomial on Zero-Sum differential game model. The robots’ dynamic are given in the polynomial approximation of the nonlinear model, we explore the so-called state-dependent Riccati on Zero-Sum differential game model. The robots’ dynamic are given in the polynomial approximation of the nonlinear model, we explore the so-called state-dependent Riccati approximation ofa set theofnonlinear nonlinear model, we explore explore the so-called so-called state-dependent Riccati approximation of the model, we the state-dependent Riccati equations to find strategies that guarantee the effectively drive the robots to aa particular approximation ofa set theofnonlinear model, we explore the so-called state-dependent Riccati equations to find strategies that guarantee the effectively drive the robots to particular equations to find a set of strategies that guarantee the effectively drive the robots to aameanwhile particular equations to find a set of strategies that guarantee the effectively drive the robots to particular formation. One of the robots acts as the leader following a free designed trayectory equations toOne findofa the set of strategies that guarantee the effectively drive the trayectory robots to ameanwhile particular formation. robots acts as the leader following a free designed formation. One of thejust robots acts as the leader following aa free designed trayectory meanwhile formation. One of robots acts as the following designed trayectory meanwhile the rest of the follow him. proposed solution is given as aa polynomial formation. Onerobots of the thejust robots acts as The the leader leader following a free free designed trayectoryRiccati-like meanwhile the rest of the robots follow him. The proposed solution is given as polynomial Riccati-like the rest of the robots just follow him. The proposed solution is given as a polynomial Riccati-like the rest of the robots just follow him. The proposed solution is given as a polynomial Riccati-like state-dependent differential equation which utilizes a p-linear form tensor representation for its the rest of the robots just follow him. The proposed solution is given as a polynomial Riccati-like state-dependent differential equation which utilizes a p-linear form tensor representation for state-dependent differential equation which utilizes aatop-linear p-linear form tensor representation representation for its its state-dependent differential equation which utilizes form tensor for its polynomial part. A numerical example is presented illustrate effectiveness of the approach. state-dependent differential equation which utilizes a p-linear form tensor representation for its polynomial part. A numerical example is presented to illustrate effectiveness of the approach. polynomial part. A numerical example is presented to illustrate effectiveness of the approach. polynomial part. A numerical example is presented to illustrate effectiveness of the approach. polynomial part. A numerical example is presented to illustrate effectiveness of the approach. © 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Zero-Sum Differential Games, Mobile Robots, Nonlinear Control. Keywords: Keywords: Zero-Sum Zero-Sum Differential Differential Games, Games, Mobile Mobile Robots, Robots, Nonlinear Nonlinear Control. Control. Keywords: Keywords: Zero-Sum Zero-Sum Differential Differential Games, Games, Mobile Mobile Robots, Robots, Nonlinear Nonlinear Control. Control. 1. INTRODUCTION that can be developed for this kind of problems. In the 1. INTRODUCTION that can be developed for this kind of problems. In the 1. INTRODUCTION INTRODUCTION that can can be developed developed for (2012) this kind kind of problems. problems. In the the 1. that be for this of In work of Goode and Roan a game for two agents 1. INTRODUCTION that can be developed for this kind of problems. In the work of Goode and Roan (2012) a game for two agents work collision of Goode Goodeavoidance and Roan Roanis (2012) (2012) a game game for to twofind agents work of and a for two agents with outlined the tool the work of Goode and Roan (2012) a game for two agents with collision avoidance is outlined the tool to find the In the last years a significant interest in the application with collision avoidance is outlined the tool to find the In the last years a significant interest in the application with collision avoidance is outlined the tool to find the collision avoidance control is the Hamilton-Jacobi-Isaacs In the the last last years years significantgames interest in the application application with collision avoidance is is outlined the tool to find the In aa interest in the collision avoidance control the Hamilton-Jacobi-Isaacs of of differential to a variety of engiIn the theory last years a significant significantgames interest in the application collision avoidance control is the Hamilton-Jacobi-Isaacs of theory of differential to a variety of engicollision avoidance control is the (HJI) equation and an on line to rapidly find such of the the theory of differential differential games to aa autonomous variety of of engicollision avoidance control is method the Hamilton-Jacobi-Isaacs Hamilton-Jacobi-Isaacs of of games to variety (HJI) equation and an on line method to rapidly find such neering fields such as electricity market, veof the theory theory of differential games to a autonomous variety of engiengi(HJI) equation equation and an an on on linepartitions method to toinrapidly rapidly findspace such neering fields such as electricity market, ve(HJI) and line method find such solution that operates over the state neering fields such as electricity market, autonomous ve(HJI) equation and an on line method to rapidly find such neering fields such as electricity market, autonomous vesolution that operates over partitions in the state space hicles, satellite clustering, enviromental problems etc., has neeringsatellite fields such as electricity market,problems autonomous ve- solution solution that operates over partitions in the state space hicles, clustering, enviromental etc., has that operates over partitions in the state space is presented. The zero-sum games, particulary, has been hicles, satellite clustering, enviromental problems etc., has solution that operates over partitions in the state space hicles, satellite clustering, enviromental problems etc., has is presented. The zero-sum games, particulary, has been been see Pant et al. (2002), Gu (2008), Goode hicles,increased, satellite clustering, enviromental problems etc., has is is presented. presented. The zero-sum games, particulary,situations, has been been been increased, see Pant et al. (2002), Gu (2008), Goode The zero-sum games, particulary, has an important tool to modeling pursit-evasion beenRoan increased, seeCruz Pantand et al. al. (2002), (2009), Gu (2008), (2008), Goode is presented. The zero-sum games, particulary,situations, has been been increased, see Pant et (2002), Gu Goode an important tool to modeling pursit-evasion and (2012), Xiaohuan Maler and beenRoan increased, seeCruz Pantand et al. (2002), (2009), Gu (2008), Goode an important tool to modeling pursit-evasion situations, and (2012), Xiaohuan Maler and an tool to pursit-evasion situations, in the case of Unmanned Vehicles (UAVs) the andZeeuw Roan (1998). (2012), Such Cruzan and Xiaohuan (2009), Maler and an important important tool to modeling modelingAerial pursit-evasion situations, and Roan (2012), Cruz and Xiaohuan (2009), Maler and and in the case of Unmanned Aerial Vehicles (UAVs) the de interest because they present aa and andZeeuw Roan (1998). (2012), Such Cruzan and Xiaohuan (2009), Maler and and in in the the control case of of problem Unmanned Aerial Vehicles (UAVs) thea de interest because they present and case Unmanned Aerial Vehicles (UAVs) the formation of a set of UAVs is viewed as de Zeeuw (1998). Such an interest because they present a and in the case of Unmanned Aerial Vehicles (UAVs) thea de Zeeuw (1998). Such an interest because they present aa formation control problem of a set of UAVs is viewed as suitable framework to model strategic dynamic interaction de Zeeuw (1998). Such an interest because they present formation control problem of a set of UAVs is viewed as suitable framework to model strategic dynamic interaction formation control problem of a set of UAVs is viewed as aa Pursuit Game of n pursuers and n evaders. Stability of the suitable framework to model strategic dynamic interaction formation control problem of a set of UAVs is viewed as a suitable framework to model strategic dynamic interaction Pursuit Game of n pursuers and n evaders. Stability of the between different agents (or prayers). theory was inisuitable framework to model strategic This dynamic interaction Pursuit Game of n pursuers and n evaders. Stability of the between different agents (or prayers). This theory was iniPursuit Game of n pursuers and n evaders. Stability of the coordination of the UAVs is guaranteed if each one of them between different agents (or prayers). prayers). This theory was iniiniPursuit Game of n pursuers and n evaders. Stability of the between different agents (or This theory was coordination of the UAVs is guaranteed if each one of them tiated in the works of Isaacs (1965): he focused mainly on between different agents (or prayers). This theory was inicoordination of the the UAVs is is guaranteed if each each onetime of them them tiated in works of Isaacs (1965): he focused mainly on of UAVs if of can reach their destinations within aa specified and tiated in in the the works of Isaacs Isaacs (1965): henonzero-sum focused mainly mainly on coordination coordination of the UAVs is guaranteed guaranteed if each one onetime of them tiated works of (1965): he focused on can reach their destinations within specified and zero-sum games (ZSG). Later on, the differtiated in the the works of Isaacs (1965): henonzero-sum focused mainly on can can reach their destinations within a specified time and zero-sum games (ZSG). Later on, the differreach their destinations within a specified time and assuming that the destination points are fixed the vehicles zero-sum games (ZSG). Later on, the nonzero-sum differcan reach their destinations within a specified time and zero-sum games (ZSG). Later on, the nonzero-sum differassuming that the destination points are fixed the vehicles ential games were introduced in Starr and Ho (1969a), and zero-sum games (ZSG). Later on, the nonzero-sum differassuming that the destination points are fixed the vehicles ential games were introduced in Starr and Ho (1969a), and assuming that the destination points are fixed the vehicles achieved them in an optimally see Camponogara et al. ential games were introduced in Starr and Ho (1969a), and assuming that the destination points are fixed the vehicles ential games were introduced in Starr and Ho (1969a), and achieved them in an optimally see Camponogara et al. (1969b). In such each player looks ential games weregames, introduced in Starr andfor Hominimization (1969a), and achieved achieved them in an optimally see Camponogara et al. (1969b). In such games, each player looks for minimization them in an optimally see Camponogara et al. (2002), Vachtsevanos et al. (2004), Tomlin et al. (1998). (1969b). In such such games,criterion. each player player looks for minimization minimization achieved them in an optimally see Camponogara et al. (1969b). In games, each looks for (2002), Vachtsevanos et al. (2004), Tomlin et al. (1998). of his own individual The paper of Starr and (1969b). In such games,criterion. each player looks for minimization (2002), Vachtsevanos et al. (2004), Tomlin et al. (1998). of his own individual The paper of Starr and (2002), Vachtsevanos et al. (2004), Tomlin et al. (1998). of his his own individual individual criterion. The paper paper of Starr Starr of and (2002), Vachtsevanos et al. (2004), Tomlin et al. (1998). of own criterion. The of and Ho (1969b) derived sufficient conditions of existence a In this paper we take the zero-sum game approach for the of his own individual criterion. The paper of Starr of and In this paper we take the zero-sum game approach for the Ho (1969b) derived sufficient conditions of existence Ho (1969b) (1969b) derived sufficientforconditions conditions of existence existence of aaa In In this this paper of weatake take theplanar zero-sum game approach for is thea Ho derived sufficient of of linear feedback equilibrium a finite planning horizon, we the zero-sum game approach for the set of robots in which there Ho (1969b) derived sufficientforconditions of existence of a coordination In this paper paper of weatake theplanar zero-sum game approach for is thea linear feedback equilibrium a finite planning horizon, coordination set of robots in which there linear feedback equilibrium for a finite planning horizon, coordination of a set of planar robots in which there is a linear feedback equilibrium for a finite planning horizon, but only in the case of linear games governed of of planar in there is participant “free such participant linear feedback equilibrium for quadratic a finite planning horizon, coordination coordinationwith of aaaa set set ofmovement”, planar robots robots in aawhich which there will is aa but only in the case of linear quadratic games governed participant with “free movement”, such participant will but only in the case of linear quadratic games governed participant with a “free movement”, such a participant will but only in the case of linear quadratic games governed by linear dynamics and quadratic criterion, see Engwerda participant with a “free movement”, such a participant will consider the leader the group of robots, and has the butlinear only dynamics in the case of quadratic linear quadratic games governed be participant a “freeof such a participant by and criterion, see Engwerda be consider with the leader leader ofmovement”, the group group of of robots, and has has will the by linear linear dynamics and quadratic criterion, see see Engwerda be consider the of the robots, and the by dynamics and quadratic criterion, Engwerda (2005) for a detailed survey and Engwerda (1998), Basar be consider the leader of the group of robots, and has the characteristic that we can design arbitrary it’s trayectory, by linear dynamics and quadratic criterion, see Engwerda be consider the leader of the group of robots, and has the (2005) for a detailed survey and Engwerda (1998), Basar characteristic that we can design arbitrary it’s trayectory, (2005) for a detailed survey and Engwerda (1998), Basar characteristic that we can design arbitrary it’s trayectory, (2005) for a detailed survey and Engwerda (1998), Basar and Olsder et al. (1999). In the field of characteristic that we design arbitrary it’s trayectory, the rest of the player will belong to group pursuers (2005) for a(1999), detailedEngwerda survey and Engwerda (1998), Basar characteristic we can can it’s of trayectory, and Olsder (1999), Engwerda et al. (1999). In the field of the rest rest of of the thethat player will design belongarbitrary to aaa group group of pursuers and Olsder Olsder (1999), Engwerdahas et been al. (1999). (1999). In the the field of the the player will belong to of pursuers and (1999), Engwerda et al. In field of robotics several approaches proposed to achive rest of the player will belong to a group of pursuers of evaders, all of them finally following the free designed and Olsder (1999), Engwerda et al. (1999). In the field of the rest of the player will belong to a group of pursuers robotics several approaches has been proposed to achive of evaders, all of them finally following the free designed robotics several approaches has been proposed to achive of evaders, of allthe of them them finally following thegame. free designed designed robotics several approaches has been proposed to a colective goal by a of robots (see Camponogevaders, all of finally following the free trayectroy leader robot during the Because several approaches has been proposed to achive achive of of evaders, of allthe of them thegame. free designed arobotics colective goal by aa group group of robots (see Camponogtrayectroy leaderfinally robot following during the the Because a colective colective goal by by group ofMurray robots (2006) (see CamponogCamponogtrayectroy of the leader robot during game. Because aara goal a group of robots (see et al. (2002), Dubnar and and Desai trayectroy of the leader robot during the game. Because we are dealing a with zero-sum game a single cost function a colective goal by a group ofMurray robots (2006) (see Camponogtrayectroy of the leader robot game duringa single the game. Because ara et al. (2002), Dubnar and and Desai we are are dealing dealing a with with zero-sum cost function function ara et al. (2002), Dubnar and Murray (2006) and Desai we a zero-sum game a single cost ara et al. (2002), Dubnar and Murray (2006) and Desai et and based on differential games we find we are aofwith zero-sum game athe single cost incorporate all the information of cohesion interest araal. et (1998)), al. (2002), Dubnar and Murray (2006) and Desai we are dealing dealing game single cost function function et al. (1998)), and based on differential games we find incorporate allaof ofwith the zero-sum information of athe the cohesion interest et al. al. (1998)), (1998)), and based some on differential differential gamesthe we work find incorporate incorporate all the information of cohesion interest et and based on games we find several works, to mention recent papers: all of the information of the cohesion interest of each participant, all of the desired distantances between et al. (1998)), and based on differential games we find incorporate all of the information of the cohesion interest several works, to mention some recent papers: the work of each participant, all of the desired distantances between several works, to mention some recent papers: the work of each participant, all of of the the game desired distantances between several works, to some recent papers: work of Gu (2008) a coordination based on of each participant, all desired distantances between the robots are included. is posed to an standard several works,proposes to mention mention somefor recent papers: the the work of participant, all ofThe the game desired between of Gu (2008) proposes aa scheme scheme for coordination based on theeach robots are included. included. The is distantances posed to to an an standard standard of Gu Gu loop (2008) proposes scheme for coordination based on the the robots are The game is posed of (2008) proposes a scheme for coordination based on open Nash strategies, each robot is given a desired robots are included. The game is posed to an standard from with a particular design of weighted matrices. A of Gu loop (2008) proposes a scheme for coordination based on the robots are included. The game is posed to an standard open Nash strategies, each robot is given a desired from with a particular design of weighted matrices. A open loop loop to Nash strategies, each robot is is given given desired from with a particular particular design ofequation weightedis matrices. matrices. A open Nash strategies, each robot aa desired trayectory follow to achieve a partitular formation, set from with a design of weighted A single reduced dimension Ricatti required to open loop Nash strategies, each robot is given a desired from with a particular design of weighted matrices. A trayectory to follow to achieve a partitular formation, set single reduced dimension Ricatti equation is required to trayectory to follow follow to to achieve achieve a partitular partitularare formation, set single reduced dimension Ricatti equation is required to trayectory to formation, set of N Riccati-type to be solve reduced dimension Ricatti equation is required to be solve to find the solution controls, we consider that trayectory to followdifferential to achieve aaequations partitularare formation, set single single reduced Ricatti equation required to of N Riccati-type differential equations to be solve be solve solve to find finddimension the solution solution controls, we is consider that of N Nthe Riccati-type differential equations are to toof be be solve be be to the controls, we consider that of Riccati-type differential equations are solve find robot controls. In the extensive work LaValle solve to find the solution controls, we consider that this reduce the complexity of solving a set of N coupled of Nthe Riccati-type differential equations are toof be solve be solve to find the solution controls, we consider that find robot controls. In the extensive work LaValle this reduce the complexity of solving a set of N coupled find the the robot robot controls. foundation In the the extensive work of LaValle LaValle this reduce the the complexity of solving solving set of of N N coupled coupled find controls. In work of (2000) theoretical for the robot motion this reduce complexity of aa set Ricatti-type equations in Nash game. find thethe robot controls. foundation In the extensive extensive work of LaValle this reduce the complexity of solving (2000) the theoretical for the robot motion Ricatti-type equations in aaa Nash Nash game.a set of N coupled (2000) the theoretical foundation for the robot motion Ricatti-type equations in game. (2000) the theoretical foundation for the robot motion planing based on game theory is outlined, the authors (2000) the theoretical foundation for the robot motion Ricatti-type Ricatti-type equations equations in in aa Nash Nash game. game. planing based on game theory is outlined, the authors planing abased based onofgame game theory is is outlined, outlined, the authors planing on theory the authors present survey the methods analysis and sintesis planing abased onofgame theory isof outlined, the authors present survey the methods of analysis and sintesis present a survey of the methods of analysis and sintesis present present aa survey survey of of the the methods methods of of analysis analysis and and sintesis sintesis

Copyright © 2015, IFAC IFAC 2015 (International Federation of Automatic Control) 467Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © Copyright © IFAC 2015 467 Copyright ©under IFAC responsibility 2015 467Control. Copyright 467 Peer review© of International Federation of Automatic Copyright © IFAC IFAC 2015 2015 467 10.1016/j.ifacol.2015.09.229

MICNON 2015 464 Manuel Jiménez-Lizárraga et al. / IFAC-PapersOnLine 48-11 (2015) 463–468 June 24-26, 2015. Saint Petersburg, Russia

The second contribution in the paper is to include the nonlinear dynamics of the robots, this is important, because in many situations, the originally linear modelling cannot fit all required movements, the linear approximation is valid only within a region of state space. Therefore, we need to extend to a certain class of nonlinear systems, namely, polynomial systems. Polynomial dynamics represent an important class of nonlinear dynamical systems, since it can approximate a large variety of intrinsically nonlinear functions, keeping the complexity on a manageable and pre-specified level. Compared to the linear-quadratic case, there are not many works zero-sum nonlinear differential games applied to robot coordination and, particularly, to the best of authors’ knowledge, no results have been obtained for zero-sum polynomial differential games. An example of recent paper on nonlinear games is the work presented in Zhang et al. (2011) that proposes an iterative adaptive dynamic programming method to solve a particular type of games called two players zero-sum games, the near Nash equilibrium solution is presented in the work of Jimenez-Lizarraga et al. (2015) for polynomial nonlinear games. In this paper, we take the advantages of the so-called State-Dependent Riccati Equations (SDRE) approach, Mracek and Cloutier (1998), C ¸ imen (2008), Basin and Calderon-Alvarez (2009) in the zero-sum nonlinear polynomial game and derive a set of controls for each participant robot that leads to an open-loop approximated solution. For one player optimization problem (optimal control), the SDRE method has been proven to work well in many particular situations, providing a simple procedure for designing feedback controls, see Mracek and Cloutier (1998), C ¸ imen (2008), Basin and CalderonAlvarez (2009). However, the general case solution is nearoptimal, that is, the SDRE approach leads only to an approximate result. Nevertheless, fast convergence of the obtained solution to the optimal one, a feedback form for the equilibrium controls, and numerical feasiblity make the SDRE approach a valuable method. The rest of the article is organized as follows: Section 2 presents the polynomial game description, problem statement, and basic assumptions as well the conversion of the problem to the standard of zero-sum games. Section 3 describes the zero-sum game with the free movement participant in the near optimal solution is presented in the form of feedback controls, the general cost function of the desired cohesion and separation for the robots is also given here. A numerical example is presented in section 4 with mobile robots. In addition in section 5 concludes this study. 2. PROBLEM STATEMENT For simplicity consider a group of three robots to coordinate, later on we will show that this approach can handle any number of robots, and let us give them the following roles: the leader robot, follower 1 and follower 2, each one with an n-dimensional coodinate vector, (zl , zf1 , zf2 ) ∈ Rn , and an independent dynamic given governed by the differential equation: z˙f1 (t) = F1 (t, zf1 ) + Bf1 (zf1 )uf1 (t), z˙f2 (t) = F2 (t, zf2 ) + Bf2 (zf2 )uf1 (t), (1) z˙l (t) = F3 (t, zl ) + Bl (zl )ul (t), with zf1 (t0 ) = zf1 0 , zf2 (t0 ) = zf1 0 , zl (t0 ) = zl0 468

where (uf1 , uf 2 , ul ) are the control of each robot, which varies within a given region U ⊂ Rm , (Bf1 (zf1 ), Bf2 (zf2 ), Bl (zl )) ∈ Rn×m are the control matrices, which are allowed to be state dependent. We consider the nonlinear function f (t, zt ) as a polynomial of n variables, components of the state vectors zt (t) ∈ Rn (t = f1 , f2 , l). This requires a special definition of the polynomial for degrees n > 1, which can be approximated better than linear models the nonlinear dynamics of each robot. Following the ideas in Basin et al. (2006), a p-degree polynomial of a vector zt (t) ∈ Rn is regarded as a p-linear form of n components of zt (t) , such that: Fk (t, z) = Ak0 + Ak1 (t) z + Ak2 (t) z ∗ z  (2) + · · · + Aks (t) z ∗ · · · s times · · · ∗ z, k = 1, 2, 3. where Ak0 is a vector of dimension n, Ak1 is a matrix of dimension n × n, a2 is a 3D tensor of dimension n × n × n, and Aks is an (s + 1)D tensor of dimension n × · · · (s + 1) times · · · × n, and x ∗ · · · s times · · · ∗ x is a pD tensor of dimension n × · · · s times · · · × n, obtained by p times spatial multiplication of the vector x by itself. Now consider the extended dynamic as: z(t) ˙ = f (t, z) + B1 (z)u1 (t) + B2 (z)u2 (t) (3) +B3 (z)u3 (t), z (t0 ) = z0 ,     where z(t):= zf1 (t) , zf2 (t) , zl (t) , Bf1 := Bf1 , 0, 0 ,     Bf2 := 0, Bf2 , 0 , Bl := 0, 0, Bl and the extended polynomial term is: f (t, z) := A0 + A1 (t) z + A2 (t) z ∗ z  (4) + · · · + As (t) z ∗ · · · s times · · · ∗ z, where   A1i 0 0 0 A2i 0 Ai := ; i = 0, 1, ..., s (5) 0 0 A3i We can consider that state vector containing the positions and angular information is divided in terms of position and velocity subvectors, that is:  zt = (xt , ϕt , x˙ t , ϕ˙ t ) , (t = f1 , f2 , l) (6) In this work we deal with planar robots, therefore we fix the dimension of the position vector to 2, xt ∈ R2 . The proposed coordination scheme, with determined cohesion and separation for the robots, consists in a leader robot that is programmed with a given known trajectory xl (t), (with that end, in this part it is taken only the position componentes of the vector z(t)) with this we can set B3 = 0; then the robot one (f1 ) will approach to the leader and robot two (f2 ) will approach to follower one up to some distance. Then the follower one minimization 2 error is defined as the norm xf1 (t) − xl (t) − θ1  , where θ1 is the desired distance between the robots. Notice that the previous can be rewritten as:

xf1 (t) − xl (t) − θ1  = x Q1 x (7) and xf2 (t) − xf1 (t) − θ2 2 = x Q2 x for a matrix Q1 and Q2 . The whole situation can be represented as a zero-sum game with a free moving participant or target, see Dongxu and Cruz (2011), Levinson et al. (2002), where the f1 minimize and f2 wants to maximize the following cost function: 2

MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia Manuel Jiménez-Lizárraga et al. / IFAC-PapersOnLine 48-11 (2015) 463–468

J(u1 ,u2 ,x0 )=

ˆ

0

T

(u1 (τ )u1 (τ )-u2 (τ )u2 (τ )

+κ1 xf1 (τ )-xl (τ )-θ1 2 -κ2 xf2 (τ ) +κ1 xf1 (T ) − xl (T ) − θ1 2 − κ2 xf2 (T ) − xf1 (T ) − θ2 2 (8) with some weighted parameters κj , κi (i, j = 1, 2). Define the constant distance vectors θ1 , θ2 as:     εj εj εj θj = ∈ Rn ; j = 1, 2 ··· 2 2 2 The cost function (8), using the representation (7), yields to: ˆ T (u1 (τ )u1 (τ )-u2 (τ )u2 (τ ) J(u1 ,u2 ,x0 )= 0 ¯ + S  x + T )dτ + x (T ) Qf x (T ) + S  x (T ) + T +x Qx (9) the separation distance vector impose a cost function which includes linear and constant term, which generalize the standard zero-sum cost, see Basar and Olsder (1999). Where the corresponding matrices are:    (κ1 − κ2 ) 0 κ2 0 −κ1 0      0 κ1 − κ2 0 κ2 0 −κ1     κ 0 −κ2 0 0 0     2 ¯ , κ )=  Q(κ 2 1 0 −κ2 0 0  0 κ2    −κ  0 0 0 κ1 0  1 0 0 0 κ1 0 −κ1 (10) in the same way the weighted matrix for the terminal terms is  0 κ2 0 −κ1 0 (κ1 − κ2 ) In  0 κ1 − κ2 0 κ2 0 −κ1    κ2 0 −κ2 0 0 0   Qf (κ1 , κ2 )=  0 κ2 0 −κ2 0 0     −κ1 0 0 0 κ1 0  0 −κ1 0 0 0 κ1 (11) the matrices associanted with the linear and constant terms are: √ √ √ √ 2  S(κ1 , κ2 ) = √ (−κ1 ε1 − κ2 ε2 ) (−κ1 ε1 − κ2 ε2 ) 2 √ √ √ √  κ2 ε2 κ2 ε2 κ1 ε1 κ1 ε1 T (κ1 , κ2 ) = (κ1 ε1 + κ2 ε2 ) (12) Theorem 1. For the zero-sum nonlinear polynomial game of two players given in (3) (B3 ≡ 0) with cost function (9) and with matriz (10) and (11) the game admits a near saddle point solution given by u∗1 (t, z) = −B1 (z) (P (z)z + p (z)) (13) u∗2 (t, z) = −B2 (z) (P (z)z + p (z)) where P (z) matrix is solution of the following state dependent Riccati-like matrix differential equation:   

−P˙ (z) = P (z) A1 (t) + A2 (t) z + · · · + As (t) z ∗ · · · ∗ z  

times (s−1) 

· · ∗ z  P (z) + + A1 (t) + A2 (t) z + · · · + As (t) z ∗ · (s−1) times

Q+P (z) (B1 (z) B1 (z) -B2 (z) B2 (z)) P (z) ; P (z, T ) =Qf (14) 469

465

and the linear state-dependent vector equation p (z) satisfies:   −p˙ (z) = P B1 (z) B1 (z) + B2 (z) B2 (z) p (z) + S   + A1 (t) + A2 (t) z + · · · + As (t) z ∗ · · · ∗ z  p (z) (s−1) times

(15)

Remark 2. Eq. (14) is so-called State-Dependent Riccati Equations, which has been extensively used in one-player optimization problems, see Mracek and Cloutier (1998), C ¸ imen (2008), Basin and Calderon-Alvarez (2009). To the best of the authors’ knowledge, this approach has not been applied to zero-sum differential games. The main point in finding the solution to (14) and (15) is that those equations must be solved along the state trajectories; a numerical approach, such as shooting, Bryson and Ho (1975), would be convenient for this purpose. Remark 3. The condition to obtain the set of controls (13) is that the co-state vector satisfies the equation ψ (t) = P (x) x. However, in the matrix case, as pointed out in for one-player optimization problem, this approach provides only a near-optimal solution, because the second condition of the maximun principle can be satisfied for a linear in state form of ψ (t) only asymptotically. This is the reason why the controls in feedback form (13) can yield only near saddle point solutions. Nevertheless, the trajectories generated by (13) would be close enough to those generated by the conditions of optimality; this fact is also observed in numerical simulations in Section 5, see also Mracek and Cloutier (1998).

3. ZERO-SUM GAME WITH A FREE MOVEMENT In this section we consider a ZSG involving a participant that describe a known trayectory. In our case of robot coordination such a participant will play the leader rol. So then, assume the leader moves in R2 according to a predesigned trajectory xl . This motion is known for the rest of the participants. Because the leader robot movement is known the control is also known (this can be achieved with another independenly designed control), this makes u3 = 0, then the game will be played by the robot one and two. Notice that the cost function (8) can be modify in order to include desired distances of the robot follower 2 and the leader, that is, (8) can be rewritten as: J(u1 ,u2 ,x0 )=

ˆ

0

T

(u1 (τ )u1 (τ )-u2 (τ )u2 (τ )

+κ1 xf1 (τ )-xl (τ )-θ1 2 -κ3 xf2 (τ )-xl (τ )-θ2 2 −κ2 xf2 (τ )-xf1 (τ )-θ3 2 )dτ +κ1 xf1 (T ) − xl (T ) − θ1 2 + κ3 xf2 (τ )-xl (τ )-θ2 2 − κ2 xf2 (T ) − xf1 (T ) − θ3 2 (16) which can also be transformed to a standard cost function ¯´as: form (9), with the corresponding matrix Q

MICNON 2015 466 Manuel Jiménez-Lizárraga et al. / IFAC-PapersOnLine 48-11 (2015) 463–468 June 24-26, 2015. Saint Petersburg, Russia

¯´(κ1 , κ2 , κ3 )= Q 0 κ2 0    0 κ2 0 (κ1 -κ2 )   0 - (κ2 +κ3 ) 0  (κ2 +κ3 )    0 - (κ2 +κ3 ) 0 κ2 +κ3   - (κ +κ ) 0 0 0 1 3 0 0 0 - (κ1 +κ3 ) 

(κ1 -κ2 )

-κ1 0 0 0 κ1 0

 0  -κ1   0  0   0  κ1 (17)

with the corresponding control matrix and polinomial dinamics equal to (3). The cost function of the game given by (16), with the weighted matrix as (17). The near saddle point solution for this problem is given in the Theorem 2 that is, the controls become (the argument on z is ommited for brevity): u∗1 (t, z)=-B1 (P z+p) ; u∗2 (t, z)=B2 (z) (P (z)z+p (z)) with P and p being the solutions of the state dependent equations:  

The next theorem gives the near solution controls for this case. −P˙ = P A1 (t) + A2 (t) z + · · · + As (t) z ∗ · · · ∗ z  Theorem 4. Assuming the trajectory of the leader robot xl is known. The zero-sum nonlinear polynomial game (s−1) times   with dynamics (3), a free movement participant and a cost function of (16) admits a near saddle point solution given · · ∗ z  P + + A1 (t) + A2 (t) z + · · · + As (t) z ∗ · by: (s−1) times   ¯  (z) (P11 (z)z + m(z) + p1 (z)) u∗1 (t, z) = −B B -B B ) P ; P (z, T ) =Qf Q+P (B 1 1 2 1 2 ¯  (z) (P11 (z)z + m(z) + p1 (z)) u∗2 (t, z) = −B (23) 2 ¯1 := ( B1 0 ) ; B ¯2 := ( 0 B2 ) and and where the matrices B     the P11 (z) matrix is solution of the following reduced −p˙ = P B1 B1 + B2 B2 p + S   dimension state dependent Riccati-like matrix differential (24) equation:  + A1 (t) + A2 (t) z + · · · + As (t) z ∗ ·  · · ∗ z  p -P˙11 (z) =P11 (z) A¯1 (t) +A¯2 (t) z+ · · · +A¯s (t) z ∗ · · · ∗ z  

times (s−1) 

· · ∗ z  P11 (z) + + A¯1 (t) + A¯2 (t) z + · · · + A¯s (t) z ∗ ·

(s−1) times  ¯1 (z) B ¯  (z) -B ¯2 (z) B ¯  (z) P11 (z) ; Q11 +P11 (z) B 1 2 P11 (z, T ) =Qf11 (18) and the reduced dimension linear state-dependent vector equation p1 (z) satisfies:   ¯  (z) + B ¯2 (z) B ¯  (z) p (z) + S¯ ¯1 (z) B −p˙1 (z) = P11 B 1 2  

+ A¯1 (t) + A¯2 (t) z + · · · + A¯s (t) z ∗ · · · ∗ z  p (z) (s−1) times

(19) and the weighted matrices Q11 , Qf11 come from the partition of the matrices:

¯ = Q11 Q12 ; Qf = Qf11 Qf12 ; Q Q21 Q22 Qf21 Qf22

and m (z) is the solution of the vector state dependent equation: ¯1 (z) B ¯  (z) − B ¯2 (z) B ¯  (z))]m − Q12 xl m ˙ = [−A¯ + Z11 (B 1 2 (20) with reduced polynomial dinamic: A¯ := A¯0 + A¯1 (t) z + A¯2 (t) z ∗ z  (21) + · · · + A¯s (t) z ∗ · · · s times · · · ∗ z, where:

A1i 0 ¯ Ai := ; i = 0, 1, ..., s (22) 0 A2i Proof: given that the trajectory and therefore the control of the leader is given the dinamic (3), is reduce to:

(s−1) times

perform the following partition of the last equations:



p1 P11 P12 ; p= P = P21 P22 p2 where the dimension of the matrices are as follow: P11 ∈ R4×4 this is considering the summation of the position subvectors of the robot one and two, and P22 ∈ R2×2 , P12 ∈ ¯  and Q ¯ f , are R4×2 , P21 ∈ R2×4 . The weights matrices Q partitioned accordingly. With this partition we can take the reduced dimension equations (18) and (19) to form the controls as: ¯  (z) (P11 z¯+P12 z¯+p1 ) ; u∗1 (t, z)=-B 1 ∗ ¯  (z) (P11 z¯+P12 z¯+p1 ) u2 (t, z)=-B 2 if we define m := P12 z¯, and take the time derivative we get exactly (20).

4. NUMERICAL EXAMPLE The following numerical example given in this section corresponds to the robot mobile with four states, each robot with dynamics as: z˙1 = z4 cos (z3 ) z˙2 = z4 sin (z3 ) z˙3 = z4 u1 z˙4 = u2 with the polynomial approximation:

z32 z˙1 = z4 1 − 2!

z3 z˙2 = z4 z3 − 3 3! z˙3 = z4 u1 z˙4 = u2 

(25)

here B1 (z) = ( 0 0 z4 1 ) . Each car has the same dinamic model. The given trayectory for the leader is:

z(t) ˙ = f (t, z) + B1 (z)u1 (t) + B2 (z)u2 (t), z (t0 ) = z0 , 470

MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia Manuel Jiménez-Lizárraga et al. / IFAC-PapersOnLine 48-11 (2015) 463–468

x˙ l1 = −xl2 x˙ l2 = xl1 with distances ε1 = 1, ε2 = 2.

Figure 1: States of the game

Figure 2: States of the game 5. CONCLUSIONS This paper presented a scheme for the coordination of planar mobile robots based on a Zero-Sum differential game. The dynamic of the robots was given in the polynomial approximation of the nonlinear model, the control solutions used the state-dependent Riccati equations to guarantee the effectively drive the robots to a particular formation. One of the robots tooks the leader role following a prescribed trayectory while the rest of the participan just follow him. The proposed solution was given as a polynomial Riccati-like state-dependent differential equation which utilizes a p-linear form tensor representation for its polynomial part. A numerical example was presented to prove the effectiveness of the approach. REFERENCES Basar, T. and Olsder, G. (1999). Dynamic Noncooperative Game Theory. SIAM, Philadelphia. Basin, M. and Calderon-Alvarez, D. (2009). Optimal controller for uncertain stochastic polynomial systems with deterministic disturbances. International Journal of Control, 82, 1435–1447. 471

467

Basin, M., Perez, J., and Skliar, M. (2006). Optimal filtering for polynomial system states with polynomial multiplicative noise. International Journal of Robust and Nonlinear Control, 16, 287–298. Bryson, G. and Ho, Y. (1975). Applied Optimal Control. Hemisphere, Washington, DC. Camponogara, E., Jia, D., Krogh, B., and Talukdar, S. (2002). Distributed model predictive control. IEEE Trans. Automat. Contr, 22(1), 44–52. C ¸ imen, T. (2008). State-dependent Riccati equation (SDRE) control: A survey. Proceedings of the 17th World Congress The International Federation of Automatic Control Seoul, Korea, 3761–3775. Cruz, J.B. and Xiaohuan, T. (2009). Dynamic Noncooperative Game Models for Deregulated Electricity Markets. Nova Science Publishers, Inc. Desai, J., Ostrowski, J., and Kumar, V. (1998). Control formation of multiple robots. In Proceedings of the 1998 IEEE Intemational Conference on Robotics and Automation Leuven, Belgium, 2864–2869. Dongxu, L. and Cruz, J.B. (2011). Defending an asset: a linear quadratic game approach. IEEE Trans. on Aerospace and Electronic Systems, 47(2), 1026–1042. Dubnar, W. and Murray, R. (2006). Model predictive control of coordinated muti-vehicle formation. Automatica, 2(4), 549–558. Engwerda, J. (1998). On the open-loop Nash equilibrium in LQ-games. Journal of Economic Dynamics and Control, 22, 729–762. Engwerda, J. (2005). LQ Dynamic Optimization and Differential Games. Wiley. Engwerda, J., van Aarle, B., and Plasmans, J. (1999). The (in)finite horizon open-loop Nash LQ game: An application to EMU. Annals of Operational Research, 88, 251–273. Goode, B. and Roan, M. (2012). A differential game theoretic approach for two-agent collision avoidance with travel limitations. The Journal of Intelligent and Robotic Systems, 67. Gu, D. (2008). A differential game approach to formation control. IEEE Transactions on Control Systems Technology, 16, 85–93. Isaacs, R. (1965). Differential Games. John Wiley and Sons, Inc. Jimenez-Lizarraga, M., Basin, M., Rodriguez, V., and Rodrigues, P. (2015). Open-loop nash equilibrium in polynomial differential games via state-dependent riccati equation. Automatica, (53), 155–163. LaValle, S. (2000). Robot motion planning: A gametheoretic foundation. Algorithmica, 26, 430–465. Levinson, S., Weiss, H., and Ben-Asher, J. (2002). Trajectory shaping and terminal guidance using linear quadratic differential game. In AIAA Guidance, Navigation and Control Conference and Exhibit, Monterey, California,. Maler, K. and de Zeeuw, A. (1998). The acid rain differential game. Environmental and Resource Economics, 12, 167–184. Mracek, C.P. and Cloutier, J.R. (1998). Control designs for the nonlinear bench mark problem via the statedependent Riccati equation method. International Journal of Robust and Nonlinear Control, 8, 401–433.

MICNON 2015 468 Manuel Jiménez-Lizárraga et al. / IFAC-PapersOnLine 48-11 (2015) 463–468 June 24-26, 2015. Saint Petersburg, Russia

Pant, A., Seller, P., and Hedrick, K. (2002). Mesh stability of look-ahead interconnected systems. IEEE Trans. Automat. Contr, 47(2), 403–407. Starr, A. and Ho, Y. (1969a). Further propeties of nonzerosum differential games. Journal of Optimization Theory and Applications, 3(4), 207–219. Starr, A. and Ho, Y. (1969b). Nonzero-sum differential games. Journal of Optimization Theory and Applications, 3(3), 184–206. Tomlin, C., Pappas, J., and S., S. (1998). Conflict resolution for air traffic management: a study in multiagent hybrid systems. IEEE Trans. Automat. Contr, 43(4), 509–521. Vachtsevanos, G., Tang, L., and Reimann, J. (2004). An intelligent approach to coordinated control of multiple unmanned aerial vehicle. In Amer. Helicopter Soc. 60th Annu Forum, Baltimore. Zhang, H., Weib, Q., and Liu, D. (2011). An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 47, 207–214.

472