A Design of Entrapment Strategies for the Distributed Pursuit-Evasion Game⋆

A Design of Entrapment Strategies for the Distributed Pursuit-Evasion Game⋆

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011 A Design of Ent...

350KB Sizes 0 Downloads 49 Views

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011

A Design of Entrapment Strategies for the Distributed Pursuit-Evasion Game ⋆ Wei Lin ∗ Zhihua Qu ∗∗ Marwan A. Simaan ∗∗∗ ∗

Department of EECS, University of Central Florida, Orlando, FL 32816 USA, (e-mail: [email protected]). ∗∗ Department of EECS, University of Central Florida, Orlando, FL 32816 USA, (Tel: 407-823-5976; e-mail: [email protected]) ∗∗∗ Department of EECS, University of Central Florida, Orlando, FL 32816 USA, (Tel: 407-882-2220; e-mail: [email protected]) Abstract: In this paper, the entrapment strategies are proposed for a multiple-pursuer singleevader game. The aimed game is nonzero-sum and under distributed information, where the global information is not available for all the players. Since the information is distributive, cooperative control is integrated with the the differential game theory to deal with the difficulty of lack of information. The issue of entrapment strategies is taken into account for the pursuers to chase the evader in a more intelligent manner. Simulation results are presented. Keywords: Cooperative Control; differential games; distributed control; optimal control. 1. INTRODUCTION

far, pursuit-evasion game still draws much attention and is a fairly promising area.

Pursuit-evasion games, as a commonly used differential game model, has been studied by many researchers during the past decades. From the pioneering work of Rufus Isaacs on pursuit-evasion games (Issacs [1965]) on, numbers of mathematicians and engineers devoted themselves into this field. Ho and Bryson applied the calculus of variation approach to the pursuit-evasion games in 1965 (Ho et al. [1965]). Starr and Ho dealt with the nonzero-sum games and presented an example of pursuit-evasion game in 1969 (Starr et al. [1969]). Simaan and Cruz took a simple pusuit-evasion game for illustrating the Stackelberg strategies in 1973 (Simaan [1973]), where pursuer and evader make decisions sequentially. Foley and Schmitendorf dealt with a game involving two pursuers and one evader in 1974 (Foley [1974]), which introduced the cooperation of the pursuers. In the early development of pursuit-evasion games, based on the optimal control theory, the focus is on the Nash and Stackelberg strategies. Nowadays, as computer technology advances rapidly, people are more and more interested in the pursuit-evasion game involving team collaboration, distributed control, artificial intelligence, etc. Some recent works are introduced here: Nitschke used the artificial evolution approaches to the pursuit-evasion games in 2003 (Nitschke [2003]); Bopardikar et al dealt with discrete time pursuit-evasion games with sensing limitations in 2008 (Bopardikar [2008]), where there exist communications within a group of the pursuers and the approach of sweep-pursuit-capture was implemented; Wu et al designed the strategies for the pursuers so that they will finally encircle the evader based on limitcircle in 2009 (Wu [2009]); Gu and Hu applied formation control to the group of the pursuers in 2009 (Gu [2009]). So

In the development of the pursuit-evasion game, many papers assume that both pursuers and evaders have complete information of others, or that the pursuers can exchange information with other pursuers and so do the evaders. However, in real life applications, for instance, several mobile robots are chasing a target in an environment, then due to the sensing or communicates limitation, it is quite possible that not every robot has the complete information of others. In this paper, game of such type is called a distributed game. Recently, Qu [2009] and Lin [2010] studied zero-sum and nonzero-sum pursuit-evasion games under distributed information and proposed distributed saddlepoint strategies and Nash strategies, respectively.

⋆ This work is supported in part by a grant (NSF CCF-0956501) from National Science Foundation.

978-3-902661-93-7/11/$20.00 © 2011 IFAC

In this paper, idea of the previous work (Qu [2009], Lin [2010]) will be carried on and the strategies for the pursuers to achieve the entrapment of the evader under distributed information will be considered. Nonzero-sum pursuit-evasion game is the basis of our discussion, and the cooperative control will be integrated with the game theory. The structure for the rest of the paper is: problem description is in section 2; classical result is shown in section 3; distributed strategies for the pursuit-evasion game are derived in section 4; distributed entrapment strategies for the pursuers are derived in section 5; simulation results are presented in section 6. 2. PROBLEM DESCRIPTION In our pursuit-evasion game, there are totally n+1 players, which are n pursuers and one evader. The n pursuers are trying to chase the evader while the evader is trying to escape from them. Every player in the game is capable to detect other players within a sensing range. Under such setting, the pursuit-evasion game is a distributed game,

9334

10.3182/20110828-6-IT-1002.00964

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

i.e., the game is under distributed information. For the convenience of discussion, we assume that the observation between any pursuer and the evader is mutual and those between any two pursuers are not necessarily mutual. To make the problem meaningful, we also assume that there always exists at least one pair of pursuer and evader that can observe each other and every pursuer can observe at least one other pursuer in its sensing range. Without those assumption, there will be undesired cases: 1, the evader does not observe any pursuer while the pursuers cannot observe the evader either; 2, some pursuers do not observe any object within their sensing range. Those cases are beyond the scope of this paper and will not be investigated. Suppose that the pursuit-evasion game takes place in an m-dimensional Euclidean space, where the position of the evader is denoted by the vector y = [y1 , y2 , . . . , ym ]T and those of the n pursuers are denoted by xj = [xj1 , xj2 , . . . , xjm ]T for j = 1, 2, . . . , n, respectively. Let zj = xj − y be the vector pointing from the evader to pursuer j. For more compact notations, if x = [xT1 , xT2 , . . . , xTn ]T and z T = [z1T , z2T , . . . , znT ]T , then z(t) = x(t) − 1n ⊗ y(t), where 1n is the n by 1 column vector with all the entries being 1, and ⊗ denotes the Kronecker product. Therefore, the system dynamics can be expressed as z(t) ˙ = up (t) − 1n ⊗ ue (t), (1) where up = x˙ is the velocity vector for the pursuers and ue = y˙ is the velocity vector for the evader. Specifically, if the game is zero-sum, then there is a common performance index, where the group of the n pursuers minimize (maximize) it and the evader maximizes (minimizes) it. If the game is nonzero-sum, then there are two different performance indices where the pursuers minimize one and the evader minimizes the other. In this paper, we will deal with the nonzero-sum pursuit-evasion game whose quadratic performance indices are set to be 1 Jp = kpf z T (tf )z(tf ) 2 Z tf " z #T "qp I 0 0# " z # 1 up 0 rp I 0 up dt (2) + 2 t0 1 ⊗ u 0 0 0 1n ⊗ u e n e for the group of n pursuers and 1 Je = − kef z T (tf )z(tf ) 2 Z tf " z #T "−qe I 0 0 # " z # 1 up 0 0 0 up + dt (3) 2 t0 1 ⊗ u 0 0 re I 1n ⊗ ue n e for the evader, where kpf , kef , qp , qe are nonnegative scalars, and rp , re are positive scalers. The above performance indices mean that the pursuers try to minimize the weighted distances between themselves and the evader while keep their energy use as little as possible, while the evader tries to maximize the weighted distances between itself and the pursuers while keeps its energy use as little as possible. If the above game is under global information, then solving Nash equilibrium for this problem is not difficult. But if the game is under distributed information, then a proper approach is needed. First of all, to describe the pursuitevasion game under distributed information, according

to Qu [2009] and Lin [2010], the following binary sensing/communication matrix can be used.   1 s01 (t) s02 (t) . . . s0n (t)  s10 (t) 1 s12 (t) . . . s1n (t) S(t) =  (4) .. .. ..  ..  .. , . . . . . sn0 (t) sn1 (t) sn2 (t) . . . 1 where, index 0 stands for the evader while indices 1 ∼ n stand for the n pursuers. sij (t) stands for the observation from player i to player j at time t. Since the matrix is binary, if sij (t) is equal to 1, then player j can be observed by player i, and if sij (t) is equal to 0, then player j cannot be observed by player i. Since every player can always ”observe” itself, every diagonal entry of (4) are always constant and equal to 1. If the game is under global information, then every entry in matrix (4) are constant and equal to 1. 3. CLASSICAL RESULT For the game under global information, the classical result of this problem is based on the optimal control theory. For the linear feedback Nash strategies of this n-pursuer single-evader game, the result is presented as the following theorem. Theorem 1. Given an n-pursuer single-evader game whose system dynamics is given by (1) and the performance indices for the pursuers and evader are given by (2) and (3), respectively, if the game admits a linear feedback Nash equilibrium, then the linear feedback Nash strategies are given by 1 u∗p (t) = − Kp z(t) (5) rp for the pursuers and 1 u∗e (t) = − (1T ⊗ Im×m )Ke z(t) (6) nre n for the evader, where 1 K T (1T ⊗ 1n ⊗ Im×m )Kp K˙ p = − qp I − nre e n 1 1 Kp (1n ⊗ 1Tn ⊗ Im×m )Ke + Kp Kp , (7) − nre rp 1 1 K˙ e = − qe I + KpT Ke + Ke Kp rp rp 1 Ke (1n ⊗ 1Tn ⊗ Im×m )Ke . (8) − nre with the terminal conditions Kp (tf ) = kpf I, Ke (tf ) = kef I. For the detailed proof, please refer to appendix A. If there is only one pursuer, i.e., n = 1, then (7) and (8) become kp u∗p (t) = − z(t), (9) rp ke (10) u∗e (t) = − z(t), re where 2 1 (11) k˙ p = −qp + kp2 − kp ke , rp re 2 1 k˙ e = −qe + kp ke − ke2 (12) rp re with the terminal conditions kp (tf ) = kpf , ke (tf ) = kef .

9335

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Remark 2. If we integrate (7) backwards in time, then it is obvious that Kp (t) and Ke (t) are generally not in the diagonal form for t ∈ [t0 , tf ). Therefore, according to (5) and (6), the strategy (velocity control) for each pursuer depends upon the state (position information) from other pursuers. Therefore, if the game is under distributed information, then strategies (5) and (6) cannot work any more. Alternatively, in order to conform to the information constraint, every pursuer can utilize strategy (9) and the evader can utilize the strategy (10) at the cost of loss of optimality with respect to the original performance indices (2) and (3), which will be discussed in the following section. 4. DISTRIBUTED PURSUIT-EVASION GAME STRATEGIES In Qu [2009] and Lin [2010], the distributed Nash strategies were obtained for the zero-sum and nonzero-sum pursuit-evasion games. In this section, the results in those papers are briefly presented. The basic idea of the design of the distributed strategies is to let each player make its decision based on the information that is available to it. Toward that end, the strategy for pursuer j is designed as n X kp u∗pj = − [xj (t) − dji (t)xi (t) − fj (t)y(t)] (13) rp i=1 for j = 1, 2, . . . , n, and the one for the evader is n ke X u∗e = − [ ei xi (t) − y(t)], re i=1

Specifically, rewrite the strategies in a more compact form using Kronecker product as kp (16) u∗p = − {x − F (t) ⊗ y − [D(t) ⊗ Im×m ]x} rp for the group of n pursuers and ke  T u∗e = − [E (t) ⊗ Im×m ]x − y (17) re for the evader, where T

Jpd =

which means that it will use the cooperative control to follow the observed nearby pursuers within its sensing range. If pursuer j observes the evader, i.e., s0j = 1, then fj = 1, dji = 0, and from (13), kp u∗pj = − [xj (t) − y(t)], rp which means that it will chase the evader directly by itself. For the evader, strategy (14) means that it will escape from center of mass of all the observed pursuer.

The above design has the feature that the cooperative control has been integrated into the game theory. It provides distributed strategies for the pursuers and evader. However, as a game problem, the designed strategies must form a Nash equilibrium with respect to some meaningful performance indices so that neither the pursuers nor the evader has the intend to deviate from the equilibrium. Indeed, strategies (13) and (14) form a Nash equilibrium.

1 1 kpf z(tf )T z(tf ) + 2 2

Mpd

up 1 n ⊗ ue

t0

up dt 1 n ⊗ ue (19)

and Jed

1 1 = − kef z(tf )T z(tf ) + 2 2

Z

tf

t0

"

z up 1n ⊗ ue

#T

Med

"

#

z up dt, 1 n ⊗ ue (20)

where Qdp kp (D T ⊗ Im×m + I) 0 d Mp = kp (D ⊗ Im×m − I) rp I 0 , 0 0 0

#

"

(14)

where kp , ke are defined in (11) and (12), and s0j (t) ej = Pn , fj = s0j , (15) i=1 soi (t) sji (t) dji (t) = (1 − fj ) Pn , l=1 sjl (t) where sij is the entry defined in the matrix (4) for i, j = 0, 1, . . . , n. What the designed strategies mean is as follows: for pursuer j, if it does not observe the evader within its sensing range, i.e., s0j = 0, then fj = 0, and from (13), n X kp dji (t)xi (t)], u∗pj = − [xj (t) − rp i=1

T

E(t) = [e1 (t) · · · en (t)] , F (t) = [f1 (t) · · · fn (t)] , D(t) = [dij (t)] ∈ Rn×n . (18) then the following theorem is obtained. Theorem 3. If there is an n-pursuer single-evader game under distributed information whose system dynamics is given by (1) and performance indices are given by " # Z tf " z #T z

Med

Qde 0 ke (2I − 1T n ⊗ E ⊗ Im×m ) 0 0 0 = , −ke (1T re I n ⊗ E ⊗ Im×m ) 0

"

Qdp =qp I +

#

kp2





−I + (I − D ⊗ Im×m )(I − D T ⊗ Im×m )

rp kp ke [2I − 1n ⊗ E T ⊗ Im×m − 1T + n ⊗ E ⊗ Im×m ], re kp ke (D ⊗ Im×m + D T ⊗ Im×m ) Qde = − qe I + rp k2 T + e [−I + (1T n ⊗ E ⊗ Im×m )(1n ⊗ E ⊗ Im×m )], re

then the game admits (16) and (17) as its Nash strategies. For the detailed proof, please refer to Lin [2010]. If there is only a single pair of pursuer and evader under global information, i.e., n = 1, D(t) = 0, E(t) = 1, then the performance indices (19) and (20) become the same as (2) and (3) for n = 1. 5. ENTRAPMENT STRATEGIES FOR THE PURSUERS UNDER DISTRIBUTED INFORMATION Under the distributed information, if one pursuer observes the evader, then the designed strategy (13) will make it chase the evader in the direction of its line of sight as shown in the simulation result of fig. 5 in subsection 6.2. Since it is not quite intelligent for a group of pursuers to chase a single evader in such a way, there exists a need to apply some sort of formation control on those pursuers such that they can cooperate with each other to achieve a better capture. Therefore, in this section, the entrapment strategies for the pursuers are proposed. Since entrapment is related with the formation, without

9336

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

loss of much generality, the environment of the game is restricted to be a planar Euclidean space, i.e., m = 2. The basic idea is to integrate distributed strategies with formation control so that the pursuers can chase the evader from different directions (angles). Visually, the pursuers will spread out around the evader while approaching it. Since the game is under distributed information, in the perspective of individual pursuer, there are basically two cases: 1, the pursuer observes the evader within its sensing range; 2, the pursuer does not observe the evader but there is (are) other pursuer (pursuers) within its sensing range. Each of these three cases will be discussed in the following paragraphs. Case 1: If pursuer j observes the evader, then to achieve the entrapment, it will not directly chase the target at [y1 , y2 ]T but a virtual target at [yj1 , yj2 ]T , which can be expressed as     y yj1 = 1 + rj , y2 yj2 T

Pursuer n

xn=(xn1,xn2) Evader

y=(y1,y2)

j2 j1 Pursuer m

xm=(xm1,xm2)

Pursuer k

rj

Pursuer i

Pursuer j

xi=(xi1,xi2)

xj=(xj1,xj2)

Fig. 1. Case 1: the pursuer j observes the evader. Pursuer n

xn=(xn1,xn2)

T

where rj is the vector pointing from [y1 , y2 ] to [yj1 , yj2 ] , which can be written as   cos θj , rj = ρj sin θj

where ρj represents the norm of rj and θj ∈ [0, 2π) represents the argument of rj . To designate the virtual evader to every pursuer, one intuitive way is to let those virtual evaders space equally around the evader, in other words, θj is equal to 2π/n for j = 1, 2, . . . , n. Such approach is basically to prescribe the position of virtual evader for each pursuer. However, this approach is not flexible enough for complicated situation. Alternatively, a more dynamic approach is that ρj and θj are subject to the following differential equations ||xj − y|| ρ˙ j = −ρj + (21) β and θ˙j = θj2 − θj1 , (22) where ||xj − y|| is the Euclidean norm (distance) between pursuer j and the evader, β is a real number greater than 1, and θj1 is the smallest relative angle formed by rj and the vector pointing from the evader to the pursuer observed by the jth pursuer in the clockwise direction and θj2 is such angle in the counterclockwise direction. Equation (21) means that ρj decreases as pursuer j approaches the evader so that the capture is possible. Equation (22) means that pursuer j will chase its virtual target, which is set to be the middle of the observed neighboring two pursuers. By doing so, the pursuers can somehow besiege the evader. Figure 1 briefly shows how the approach works for case 1. If there is only one pursuer within the sensing range of the pursuer j, say pursuer i, then from (22), θj reaches its equilibrium when θj1 = θj2 = π, which means that pursuer j will chase the virtual evader whose position is opposite to that of pursuer i with respect to the evader. Therefore, the following strategy for pursuer j can be obtained for this case.         kp r y xj1 upj1 (23) − ( 1 + j1 ) . =− rj2 y x upj2 2 j2 rp Case 2: If pursuer j does not observe the evader but other pursuer(s) within its sensing range, then

xk=(xk1,xk2)

(yj1,yj2)

Evader

y=(y1,y2)

Pursuer k

Pursuer m

xk=(xk1,xk2)

xm=(xm1,xm2)

( (xi1+xK1)/2, (xi2+xK2)/2 )

Pursuer i

Pursuer j

xi=(xi1,xi2)

xj=(xj1,xj2)

Fig. 2. Case 2: pursuer j does not observe the evader. relying on the other observed pursuer(s), it will determine the two closest observed pursuers and then try to chase the virtual evader in the middle of them. For instance, if the pursuer j observes pursuers i and k as its closest two pursuers as shown in figure 2, then it will chase the virtual evader located at (xi + xk )/2, which is the middle of the two pursuers. In extreme case, where the pursuer j can only observe one pursuer in its sensing range, say pursuer i, then the position of the virtual evader will be calculated as (xi + xi )/2 = xi , which means that pursuer j will follow pursuer i. Hence, strategy for pursuer j in this case can be expressed as (     ) n X s′ji (t) kp upj1 xi1 xj1 =− ) , (24) ( Pn ′ − upj2 xi2 xj2 rp s (t) l=1 jl i=1

where  1       s′ji =       0

If the jth pursuer observes the ith pursuer (i 6= j) within its sensing range and the ith pursuer is one of the nearest two . pursuers to the jth pursuer. otherwise.

(25)

Therefore, in general, from (23) and (24), the following distributed entrapment strategy for the group of n pursuer is

9337

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

5 Pursuer 1 Pursuer 2 Pursuer 3 Evader

4.5 4

EEvader vader

vertical axis

3.5 3 2.5

Pursuer P ursu 3 2

Obstacle 1.5 Obstacle

1 0.5

P Pursuer ursuer 2

Pursuer 1 0 −3

−2

−1

0 1 horizontal axis

2

3

4

Fig. 4. Game under distributed information.

Fig. 3. Nash State trajectories under global information. (   )   X  ∗  n kp upj1 x y1 + rj1 xj1 ′ dji i1 − − fj =− xi2 y2 + rj2 xj2 u∗pj2 rp

4.5 Pursuer 1 Pursuer 2 Pursuer 3 Evader

4 3.5

i=1

for j = 1, 2, . . . , n and the strategy for the evader is ( n  ∗     ) ke X ue1 y x =− , ei i1 − 1 u∗e2 y2 xi2 re i=1

vertical axis

3

where ej , fj are defined in (15), and n X d′ji = (1 − fj )s′ji (t)/ s′jl (t), where

2 1.5 1 0.5

l=1

s′ji

2.5

is defined in (25).

0 −3

−2

−1

0 1 horizontal axis

2

3

4

6. SIMULATION RESULTS Fig. 5. State trajectories under distributed Nash strategies.

6.1 Nash Strategies under Global Information There are three pursuers and one evader, whose initial positions are x1 (t0 ) = [−3, 0]T , x2 (t0 ) = [3, 0]T , x3 (t0 ) = [4, 1]T , y(t0 ) = [0, 3]T , respectively. The coefficients in the performance indices (2) and (3) are given by rp = 1, re = 2, qp = 1, qe = 2, kpf = 10 and kef = 5. The time duration is from t0 = 0 to tf = 4. The simulation result of this pursuit-evasion game under global information is shown in fig. 3. 6.2 Nash Strategies under Distributed Information Suppose that the initial positions for the three pursuers do not change, but there are obstacles among those players, which block the observations between players. This scenario is roughly shown in figure 4. The sensing matrix changes three times during the game process, which can be expressed as       11 1 1 10 1 1 10 0 1 1 1 1 1 0 1 1 1 0 1 1 1 . ,S = ,S = S1 =  0 0 1 1 2 1 0 1 1 3 1 1 1 1 11 1 1 10 0 1 10 0 1 In the first time period, only pursuer 3 (green) can observe the evader; in the second time period, all the pursuers except pursuer 1 (red) can observe the evader; in the last time period, all the pursuers can observe the evader. The simulation result is shown in figure 5. The black asterisk

in pursuer 1’s trajectory (red) stands for the time when it observes the evader, and so does the black asterisk in pursuer 2’s trajectory (blue). 6.3 Entrapment Strategies under Distributed Information Given the same game under distributed information, the simulation result of the entrapment strategies is shown in figure 6 (β = 20 in (21)). The black asterisks in the red and blue trajectories have the same meanings as those in the previous simulation. From the plot, it may be more rational for pursuer 1 to chase the evader from the left side of the big obstacle. In fact, we can improve the strategies by using the second-order control (i.e. taking the velocity information into account). Roughly, if the pursuer cannot observe the evader, it can calculate the instantaneous velocities of other observed pursuers and use their heading information to estimate the position of the evader. However, such strategies for the pursuers need more information and computations, and we are not going to expand it out in this paper. 7. CONCLUSION In this paper, the entrapment strategies for the pursuers are proposed for an n-pursuer single-evader game under distributed information. Although information is not available for some of the players during the game process, by

9338

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Z. Qu and M. Simaan. A design of distributed game strategies for networked agents. Proceedings of the 1st IFAC Workshop on Estimation and Control of Networked Systems (NecSys’09), pages 270–275, Venice, Italy, 2009. W. Lin, Z. Qu and M. Simaan. A Design of Distributed Nonzero-Sum Nash Strategies. Proceedings of the 49th IEEE Conferenceon Decision and Control, Atlanta, USA, 2010.

6 Pursuer 1 Pursuer 2 Pursuer 3 Evader

5

vertical axis

4

3

Appendix A 2

1

0 −3

−2

−1

0

1 2 horizontal axis

3

4

5

Fig. 6. State trajectories under entrapment strategies. integrating the cooperative control and formation control, those players can successfully reach the entrapment requirement. In the future, collision avoidance among pursuers and that between pursuer and the obstacle must be taken into account. We will also extend the pursuit-evasion game to the infinite horizon case as well as the steady state strategies. Strackelberg solution for the pursuit-evasion game is also one of our interests. REFERENCES R. Isaacs. Differential Games. John Wiley and Sons Inc., New York, NY, 1965. Y. C. Ho , A. E. Bryson Jr., and S. Baron. Differential Games and Optimal Pursuit-Evasion Strategies. IEEE Trans. Automatic Control, volume AC-10, pages 385– 389, 1965. A. W. Starr, and Y. C. Ho. Nonzero-Sum Differential Games. Journal of Optimization Theory and Applications, volume 3, pages 184–206, 1969. M. Simaan and J. B. Cruz. A Stackelberg solution for games with many players. Journal of Optimization Theory and Applications, volume 11, pages 533–555, 1973. M. Foley and W. Schmitendorf. A Class of Differential Games with Two Pursuers Versus One Evader. IEEE Trans. Automatic Control, volume 19, pages 239–243, 1974. G. Nitschke. Emergence of Specialized Behavior in a PursuitCEvasion Game. Proceedings of the 3rd Central and Eastern European conference on Multi-agent systems, pages 324–334, 2003. S. D. Bopardikar, F. Bullo, and J.P. Hespanha. On Discrete-Time Pursuit-Evasion Games With Sensing Limitations. IEEE Trans. Robotics, volume 24, pages 1429–1439, 2008. M. Wu, F. Huang, L. Wang, and J. Sun. A Distributed Multi-robot Cooperative Hunting Algorithm Based on Limit-cycle. 2009 International Asia Conference on Informatics in Control, Automation and Robotics, pages 156–160, 2009. D. Gu and H. Hu. Distributed Network-based Formation Control. International Journal of Systems Science, volume 40, pages 539–552, 2009.

Proof. Consider the Lyapunov function 1 V (t) = z T (t)Kp (t)z(t). (A.1) 2 Differentiating (A.1) yields 1 V˙ = z T K˙ p z + 2z T Kp z, ˙ (A.2) 2 Integrating (A.2) from t0 to tf yields Z 1 tf T ˙ V (tf ) − V (t0 ) = (z Kp z + 2z T Kp z)dt. ˙ 2 t0 Z 1 tf T ˙ V (tf ) = V (t0 ) + (z Kp z + 2z T Kp z)dt. ˙ 2 t0 Substituting (1) and (7) into the RHS of the above equation, we obtain Z 1 tf 1 V (tf ) = V (t0 ) + [−qz T z + z T Kp Kp z 2 t0 rp 2 T T T z Ke (1n ⊗ 1n ⊗ Im×m )Kp z − nre + 2z T Kp (up − 1n ⊗ ue )]dt. Adding both sides of the above equation with Z 1 tf (qp z T z + rp uTp up )dt 2 t0 gives 1 Jp (up , ue ) = z T (t0 )Kp (t0 )z(t0 ) 2 Z 1 rp tf ||up + Kp z||2 dt + 2 t0 rp Z tf 1 z T Kp 1n ⊗ {ue + − (1n ⊗ Im×m )Ke z}dt. (A.3) nr e t0 Similarly, we can get 1 Je (up , ue ) = − z T (t0 )Ke (t0 )z(t0 ) 2 Z re tf 1 (1n ⊗ Im×m )Ke z}||2 dt + ||1n ⊗ {ue + 2 t0 nre Z tf 1 z T Ke (up + Kp z)dt. − (A.4) rp t0 Since it is obvious that Jp (u∗p , u∗e ) ≤ Jp (up , u∗e ) and Je (u∗p , u∗e ) ≤ Je (u∗p , ue ) under u∗p = −1/rp Kp z and u∗e = −1/(nre )(1n ⊗ Im×m )Ke z, linear feedback strategies (5) and (6) form a Nash equilibrium.

9339