17th 17th IFAC IFAC Workshop Workshop on on Control Control Applications Applications of of Optimization Optimization Yekaterinburg, Russia, October 15-19, 2018 of Optimization 17th IFAC Workshop onOctober Control 15-19, Applications Yekaterinburg, Russia, 2018 Available online at www.sciencedirect.com 17th IFAC Workshop onOctober Control 15-19, Applications Yekaterinburg, Russia, 2018 of Optimization Yekaterinburg, Russia, October 15-19, 2018
ScienceDirect
IFAC PapersOnLine 51-32 (2018) 654–658
Target Target Target Target
problem problem for for mean mean field field problem for mean field differential game problem for mean field game differential differential game differential game ∗∗ Yurii Averboukh ∗∗
type type type type
Yurii Averboukh ∗∗ ∗∗ Yurii Averboukh ∗∗ Yurii Averboukh ∗ ∗ Krasovskii Institute of Mathematics and Mechanics UrB RAS, 16, S. Krasovskii Institute of Mathematics and Mechanics UrB RAS, 16, S. ∗ ∗ Krasovskii Institutestr., of Mathematics and620990, Mechanics UrB RAS, 16, S. Kovalevskaya Yekaterinburg, Russia (e-mail: Kovalevskaya Yekaterinburg, Russia (e-mail: ∗ Krasovskii Institutestr., of Mathematics and620990, Mechanics UrB RAS, 16, S.
[email protected]). Kovalevskaya str., Yekaterinburg, 620990, Russia (e-mail:
[email protected]). ∗∗ Kovalevskaya str., Yekaterinburg, 620990, Russia (e-mail: ∗∗ Ural Federal University, 19, Mira str., Yekaterinburg,
[email protected]). 19, Mira str., Yekaterinburg, 620990, 620990, ∗∗ Ural Federal University, ∗∗
[email protected]). Ural Federal University, 19, Mira str., Yekaterinburg, 620990, Russia Russia ∗∗ Ural Federal University, 19, Mira str., Yekaterinburg, 620990, Russia Russia Abstract: Abstract: The The paper paper is is concerned concerned with with the the mean mean field field type type differential differential game game that that describes describes the behavior of the large number of similar agents governed by the unique decision and Abstract: The paper is concerned with the mean field type differential game thatmaker describes the behavior of the large number of similar agents governed by the unique decision maker and Abstract: The paper is number concerned withthat theagents mean field maker typebydifferential game that describes the behavior of the large of similar governed the unique decision maker and influenced by disturbances. It is assumed the decision wishes to bring the distribution influenced by disturbances. It is assumed that the decision maker wishes to bring the distribution theagents behavior of the target large number of similar agents governed by the unique decision maker The and of onto the set in the space of probabilities within the state constraints. influenced by disturbances. It is assumed that the decision maker wishes to bring the distribution of agents onto the target set the space oftheprobabilities within thetostate constraints. The influenced by disturbances. It isin assumed that decision maker wishes bring the distribution solution of this problem is obtained based on the notions uand v-stability first introduced for of agents onto the target set in the space of probabilities within the state constraints. The solution ofonto this the problem is obtained based notions u- and v-stability first introducedThe for of agents target set in the spaceon ofthe probabilities within the state constraints. solution of this problem is obtained based on the notions uand v-stability first introduced for the finite dimensional differential games. the finite dimensional differential games. solution this problem is obtained based on the notions u- and v-stability first introduced for the finiteofdimensional differential games. © 2018, (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. the finiteIFAC dimensional differential games. Keywords: Keywords: Differential Differential games, games, feedback feedback control, control, stable stable sets, sets, distributed distributed feedback, feedback, mean mean field field Keywords: Differential games, feedback control, stable sets, distributed feedback, mean field type differential game. type differential game. Keywords: Differential type differential game. games, feedback control, stable sets, distributed feedback, mean field type differential game. 1. INTRODUCTION INTRODUCTION evolution evolution was was considered considered in in Cavagnari Cavagnari and and Marigonda Marigonda 1. 1. INTRODUCTION evolution was considered in Cavagnari and Marigonda (2015), Cavagnari et al. (2017), Pogodaev (2016). (2015), Cavagnari et al. (2017), Pogodaev (2016). 1. INTRODUCTION evolution was considered in Cavagnari and (2015), Cavagnari et al. (2017), Pogodaev (2016). In this this note note we we examine examine the the mean mean field field type type control control system system The mean field type differential game appearsMarigonda In (2015), Cavagnari et al. (2017), Pogodaev (2016).naturally The mean field type differential game appears naturally In this note we examine the mean field type control system governed by two decision makers with the opposite goals. governed by two decision makers with the opposite goals. The mean field type differential game appears naturally when we examine the mean field type control system In thissystems note examine the mean field type control system we examine the mean field type control system governed bywe two decision makers with thetype opposite goals. when Such can be called mean field differential The mean field type differential game appears naturally when we examine the mean field type control system Such systems can be called mean field type differential under assumption that the agents are influenced by governed by two decision makers with the opposite goals. under assumption that the agents are influenced by the the Such systems can bethe called mean field That type means differential games. The consider target problem. that when we examine the mean field type control system games. The consider the target problem. That means that exogenous disturbance. They were studied in Djehiche under assumption that the agents are influenced by and the Such systems can be called mean field type differential exogenous disturbance. They were studied in Djehiche and games. The consider the target problem. That means that one player player (for (for definiteness, definiteness, the the first first player) player) wishes wishes to to under assumption that theand agents are(2018), influenced by and the one Hamad` e ne (2016), Cosso Pham Averboukh exogenous disturbance. They were studied in Djehiche games. The consider thetarget targetthe problem. That ene disturbance. (2016), Cosso andwere Pham (2018), Averboukh one player (for definiteness, first player) wishesthat to Hamad` bring the system to the the set within state means constraint; exogenous They studied in Djehiche and Hamad` e ne (2016), Cosso and Pham (2018), Averboukh bring the system to target set within state constraint; Cosso and Pham (2018) the mean field one player (for definiteness, first player) wishesthat to (2018). (2018). eIn In and Pham (2018) meanAverboukh field type type bring thethe system toplayer the target set within state constraint; whereas second triesthe to prevent prevent this. Notice Notice ne Cosso (2016), Cosso and Phamthe (2018), whereas the second player tries to this. that Hamad` differential game with the dynamics described by (2018). In Cosso and Pham (2018) the mean field type bring the system to the target set within state constraint; differential game with the dynamics described by SDE SDE whereas the second player tries to prevent this. Noticeset that for the mean field type differential games the target is (2018). In Cosso and Pham (2018) the mean field type for the mean field type differential games the target set is were considered in the class of nonanticipative strategies. differential game with the dynamics described by SDE whereas the second player tries to prevent this. Notice that considered inwith the class of nonanticipative strategies. for the mean field type differential games theand target set is were a subset of the product of the time interval the space differential game the dynamics described by SDE aforsubset of the product of the time games interval the space In Averboukh feedback approach the considered(2018) in thethe class of nonanticipative the mean field type differential theand target set is were In Averboukh feedback approach for forstrategies. the case case aofsubset of the product of the time interval and the space probabilities. were considered(2018) in thethe class of studied. nonanticipative strategies. In Averboukh (2018) the feedback approach the case of probabilities. of deterministic deterministic evolution was In that thatfor paper it is is a subset of the product of the time interval and the space of evolution was studied. In paper it of probabilities. In Averboukh (2018) the feedback approach the case assumed that the quality of control is by deterministic evolution was studied. Indescribed thatfor paper itthe is Previously, the of probabilities. assumed that the quality of control is described by the Previously, the target target problem problem was was studied studied for for the the case case of of deterministic evolution was studied. Indescribed that paper itthe is payoff functional, whereas in the present paper we study Previously, the targetdifferential problem was studied for the case assumed that the quality of control is by of finite-dimensional games. For this type of functional, whereas in control the present paper weby study of finite-dimensional differential games. For for thisthe type of payoff assumed that the quality of is described the Previously, the target problem was studied case the target problem. of finite-dimensional differential itgames. For this of payoff functional, in the present paper we study games under some some conditions conditions is proved proved thattype either the target problem.whereas games under itgames. is that either payoff functional, of finite-dimensional differential For the this type of the target problem.whereas in the present paper we study games under some conditions it is within proved that either first player can solve target problem feedback first player cansome solveconditions target problem within the feedback games under it isprevent provedthe that either the target problem. first player can solve target problem within the feedback strategies or the second player can approach 2. strategies or the second player can prevent the approach 2. PROBLEM PROBLEM STATEMENT STATEMENT first player target problem within and the approach feedback strategies orcan thesolve second player can prevent the using also feedback strategies see Krasovskii Subbotin 2. PROBLEM STATEMENT using also feedback strategies see Krasovskii and Subbotin strategies or the second player can prevent the approach using feedback Krasovskii and Subbotin (1988).also Notice thatstrategies the most most see important condition neces- We examine the 2. PROBLEM STATEMENT (1988). Notice that the important condition necesusingfor also feedback see Krasovskii and Subbotin We examine the target target problem problem on on the the finite finite time time interval interval (1988). Notice thatstrategies the most important condition necessary this result is so called Isaacs’ condition which is sary for this result is so called Isaacs’ condition which is for the mean field type control system governed by We examine the target problem on the finite time interval (1988). Notice that the most important condition necesfor the mean field type control system governed by two two sary for this result is so called Isaacs’ condition which is satisfied for example when the dynamics of the system We examine the target problem on theflow finite time interval satisfied for example when the dynamics of the system is for the mean field type control system governed by two players. It is assumed that, given the of probabilities sary to forthe this result is when so called Isaacs’ condition which is players. It is assumed that, given the flow of probabilities satisfied forsum example the dynamics ofbythe system split of two dynamics governed the first and is for the mean field type control system by two players. It is that, given the m(t), flowgoverned ofthe probabilities split to the sum of twowhen dynamics governedofbythe the first and the distribution of agents motion satisfied forplayers example the dynamics is describing describing theassumed distribution of agents m(t), motion of of split to the sum of two dynamics governed by thesystem first and the second independently. players. It is assumed that, given the differential flow ofthe probabilities the second players independently. each agent is given by the controlled equation describing the distribution of agents m(t), the motion of splitsecond to the players sum of two dynamics governed by the first and each agent is given by the controlled differential equation the independently. describing the distribution of agents differential m(t), the motion of each agent is given by the controlled equation The mean field type control system describes the coopd the second independently. The mean players field type control system describes the coop- each agent d is given by the controlled differential equation x(t) = f (t, x(t), m(t), u, v), The mean field type control the agents cooperative behavior of the the largesystem numberdescribes of similar similar d x(t) = f (t, x(t), m(t), u, v), (1) dt erative behavior of large number of agents (1) The mean field interaction. type control system describes theAhmed coopdt d x(t) = f (t, x(t), m(t), u,dv), erative behavior of the large number of similar agents with mean field It was first consider in (1) dt with mean field interaction. It was first consider in Ahmed dv), t ∈ [0, T ], x(t) ∈ R , u ∈ U, v ∈ V. x(t) = f (t, x(t), m(t), u, erative behavior of the large number of similar agents , u ∈ U, v ∈ V. t ∈ [0, T ], x(t) ∈ R with mean field interaction. It was first consider in Ahmed d and Ding (2001). Nowadays, the mean field field control (1) dt d , u ∈ U, v ∈ V. and Ding (2001). Nowadays, the mean field field control t ∈ [0, T ], x(t) ∈ R with Ding mean field It the was first consider inback Ahmed Here is and Nowadays, mean field field control system are(2001). wellinteraction. studied. The approach going to t ∈ [0, T ],v) Rd , uof ∈ the U, vfirst ∈ V.(respecHere u u (respectively, (respectively, v) x(t) is aa ∈control control of the first (respecsystem are well studied. The approach going back to and maximum Ding Nowadays, theapproach mean in field fieldback control (respectively, a control of are thecontrol first (respectively,usecond) second) player. v) Theissets sets U and and V V are control spaces system are(2001). well studied. going to Here the principle wasThe considered Andersson and tively, player. The U spaces the maximum principle was considered in Andersson and Here u (respectively, v) is a control of the first Integrat(respecsystem are well studied. The approach going back to for the first and the second players respectively. tively, second) player. The sets U and V are control spaces the maximum principle was considered in Andersson and Djehiche (2011), Buckdahn et al. (2011), Carmona for the first and the second players respectively. IntegratDjehiche (2011), Buckdahn et al. (2011), Carmona and tively, second) player. The sets U and V are control spaces the maximum principle was considered in Andersson ing formally (1) we get that the motion of the flow the first and the second players respectively. IntegratDjehiche (2011), Buckdahn et al. (2011), Carmona and for Delarue (2013); the dynamic programming for mean field ing formally (1) the we second get that the motion of theIntegratflow of of Delarue (2013); the dynamic et programming for mean field for the first and players respectively. Djehiche (2011), Buckdahn al. (2011), Carmona and probabilities m(·) obeys the following equation: ing formally (1) we get that the motion of the flow of Delarue (2013); the dynamic programming for meanetfield type control system was developed in Bayraktar al. probabilities m(·) obeys the following equation: type control system was developed in Bayraktar et al. ing formally m(·) (1) we get the that the motion of the flow of Delarue (2013); the et dynamic programming for mean obeys following equation: type control system was developed inere Bayraktar etfield al. probabilities (2018), Bensoussan al. (2015), (2015), Lauri` and Pironneau d (2018), Bensoussan etwas al. Lauri` ere and Pironneau d m(·) probabilities obeys the following equation: type control system developed in Bayraktar et al. m(t) = f (t, ·, m(t), u, v), ∇m(t). (2018), Bensoussan et al. (2015), Lauri` e re and Pironneau (2014)). Case Case when when the the dynamics dynamics is is given given by by deterministic deterministic d m(t) = f (t, ·, m(t), u, v), ∇m(t). dt (2014)). dt (2018), Bensoussan et al. (2015), is Lauri` erebyand Pironneau d m(t) = f (t, ·, m(t), u, v), ∇m(t). (2014)). Case when the dynamics given deterministic dt In this equation, for tt ∈ T ], m(t) = (t, ·, m(t), v), This work was funded by the Russian Science Foundation (project (2014)). Case when the dynamics is given by deterministic In this equation, forfeach each ∈1 u,[0, [0, T ∇m(t). ], m(t) m(t) should should be be This work was funded by the Russian Science Foundation (project dt d ], m(t) d should be In this equation, for each tC ∈b1 (R [0, d )Tto d ). considered as functional from C no. 17-11-01093). This work was funded by the Russian Science Foundation (project b (R considered as functional from C (R ) to C (R ). no. 17-11-01093). b 1 [0, d T ], m(t) d should be b1 This work was funded by the Russian Science Foundation (project d d In this equation, for each t ∈ considered as functional from Cbb1 (Rd ) to Cbb (Rd ). no. 17-11-01093). considered as functional fromreserved. Cb (R ) to Cb (R ). no. 17-11-01093). 2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights
Copyright © 2018 654 Copyright 2018 IFAC IFAC 654 Control. Peer review© under responsibility of International Federation of Automatic Copyright © 2018 IFAC 654 10.1016/j.ifacol.2018.11.499 Copyright © 2018 IFAC 654
IFAC CAO 2018 Yekaterinburg, Russia, October 15-19, 2018 Yurii Averboukh / IFAC PapersOnLine 51-32 (2018) 654–658
We assume that the first player wishes to aim the system to the set M subject to the constrains given by the set N . The goal of the second player is opposite. We assume that both M and N lie in the product of [0, T ] and the set of probabilities on phase space. Moreover, it is natural to assume that M ⊂ N . 3. GENERAL NOTATION AND STANDING ASSUMPTIONS If (Ω , Σ ) and (Ω , Σ ) are measurable spaces, m is a probability on Σ , h : Ω → Ω is measurable, then denote by h# m the probability on Σ given by the following rule: for Υ ∈ Σ , (h# m)(Υ) m(h−1 (Υ)). Further, we restrict our attention to the Borel probabilities defined on the separable metric space satisfying the Radon property (X, ρX ). Denote the set of probabilities on (X, ρX ) by P(X) the set of Borel probabilities on (X, ρX ). t is natural to endow P(X) by the topology of the narrow convergence. It is metrizable. Further, let P 2 (X) denote the set of probabilities m ∈ P(X) such that, for some (and, thus, any) x∗ ∈ X, ρX (x, x∗ )2 m(dx) < ∞. X 2
We consider on P (X) the 2-Wasserstein metric defined by the rule: if m1 , m2 ∈ P(X), then W2 (m1 , m2 ) inf
π∈Π(m1 ,m2 )
ρX (x1 , x2 )2 π(dx1 , dx2 ) X×X
1/2
.
Here Π(m1 , m2 ) the sets of probabilities on X ×X with the marginal distributions equal to m1 , m2 . Note that P 2 (X) is a Polish space when X is Polish. Additionally, if X is compact, then the space P(X) is also compact; the sets P(X) and P 2 (X) coincide and W2 metricize the narrow convergence and P 2 (X) is compact.
If (X, ρX ), (Y, ρY ) are separable metric spaces satisfying the Radon property, then denote by WM(X, Y ) the set of weakly measurable functions from X to P(Y ). If b ∈ WM(X, Y ), then let m b be a measure on X × Y defined by the rule: for ϕ ∈ Cb (X × Y ), ϕ(x, y)(m b)(dx, dy) X×Y ϕ(x, y)b(x, dy)m(dx). X
Y
Hereinafter, we write b(x, dy) instead of b(x)(dy). If (Z, ρZ ) is also a separable metric space satisfying the Radon property, ξ ∈ P(Y ), ζ ∈ P(Z), then denote by ξζ the product of probabilities, i.e., ξζ is a probability on Y × Z defined by the rule: for φ ∈ Cb (Y × Z), φ(y, z)(ξζ)(d(y, z)) φ(y, z)ξ(dy)ζ(dz). Y ×Z
Y
Z
Further, if b ∈ WM(X, Y ), c ∈ WM(X, Z), then denote by bc the element of WM(X, Y × Y ) defined by (bc)(x, d(y, z)) b(x, dy)c(x, dz).
Further, if m is a finite measure on X, then Λ(X, m, Y ) is a quotient space of WM(X, Y ) under relation given by 655
655
m-a.e. coincidence. We assume the narrow convergence on Λ(X, m, Y ) i.e. bn → b iff m bn → m b in the narrow sense. If X and Y are compact, then Λ(X, m, Y ) is also compact. If π ∈ P(X × Y ), then denote by π(·|x) (respectively, π(·|y)) its disintegration with respect to marginal distribution on X (respectively, Y ). To simplify notations put C C([0, T ], Rd ). Let et stand for the evaluation operator defined by the rule: for x(·) ∈ C, et (x(·)) x(t).
Obviously, if χ1 , χ2 ∈ P 2 (C), then W2 (et# χ1 , et# χ2 ) ≤ W2 (χ1 , χ2 ).
(2)
Further, let M denote the set of all continuous functions defined on [0, T ] with values in P 2 (Rd ). If W is a subset of [0, T ] × P 2 (Rd ), t ∈ [0, T ], then denote
W[t] {m ∈ P 2 (Rd ) : (t, m) ∈ W}. Below, let dist stand for the distance between the point and the set. We assume that • the sets U and V are compact subsets of a metric space; • the function f is continuous; • the function f is bounded i.e. there exists R > 0 such that, for any t ∈ [0, T ], x ∈ Rd , u ∈ U , v ∈ V , f (t, x, u, v) ≤ R; • the function f is Lipschitz continuous w.r.t. x and m i.e. there exists a constant L such that, for any t ∈ [0, T ], x , x ∈ Rd , m , m ∈ P 2 (Rd ), u ∈ U , v ∈V, f (t, x ,m , u, v) − f (t, x , m , u, v) ≤ Lx − x + LW2 (m , m ); • there exists an even function : R → [0, +∞) continuous and vanishing at 0 such that f (t , x, m, u, v) − f (t , x, m, u, v) ≤ (t − t ); • (Isaacs’ condition) for any t ∈ [0, T ], x, w ∈ Rd , m ∈ P 2 (Rd ), min maxw, f (t, x, m, u, v) u∈U v∈V
= max minw, f (t, x, m, u, v) . v∈V u∈U
• the sets M and N are compact; • there exists a compact set E ⊂ Rd such that M ⊂ N ⊂ [0, T ] × P 2 (E). 4. STRATEGIES AND MOTIONS Let U 0 (respectively, V 0 ) denote the set of measurable functions from [0, T ] to U (respectively, V ). The elements of the sets U 0 and V 0 are (usual) control of the first and the second players respectively. It is convenient to introduce the relaxed controls. Set U Λ([0, T ], λ, U ), V Λ([0, T ], λ, V ). Hereinafter λ denotes the Lebesgue measure. An element of U (respectively, V) is a function from [0, T ] to P(U ) (respectively, P(V )).
IFAC CAO 2018 656 Yekaterinburg, Russia, October 15-19, 2018 Yurii Averboukh / IFAC PapersOnLine 51-32 (2018) 654–658
Without loss of generality, one can assume that U ⊂ U 0 ⊂ U, V ⊂ V 0 ⊂ V. If s ∈ [0, T ], y ∈ Rd , m(·) ξ ∈ U , ζ ∈ V, then denote by x(·, s, y, m(·), ξ, ζ) the solution of the initial value problem d x(t) = f (t, x, m(t), u, v)ξ(t, du)ζ(t, dv), dt U V x(s) = y. It is not difficult to prove the uniqueness and existence theorem for this problem. Further, let trajsm(·) stand for the operator which assigns to y, ξ and ζ the motion x(·, s, y, m(·), ξ, ζ). Notice that traj maps Rd × U × V to C.
We assume that the player can influence on each agent and his/her strategy is a function of time and current distribution of agents. Since the agents are similar it is reasonable to assume that the strategy is a distribution of controls on Rd . We will consider the case when the player chooses the constant control on a short time interval. Denote the set of distributions of the first player’s constant controls by Ac and the set distributions of the second player’s constant controls by Ac by B c i.e. A WM(R , U ), B WM(R , V ). c
d
c
d
Further, we will consider the case when the players choose the distribution of relaxed controls. Put A WM(Rd , U), B WM(Rd , V). Notice that A (respective, B) is the set of distribution of relaxed control of the first (respectively, second) player. In the following the distributions of pairs of control will play the crucial role. Set D WM(Td , U × V). Further, we will consider the distributions of players’ control consistent with the distribution of controls of the first (respectively, second) player. Namely, if α ∈ A, then denote by D1 [α] the set of distributions of pairs of controls κ ∈ D such that, for each x ∈ Td , marginal distribution of κ(x) on U is α(x). Further, denote by D10 [α] the set of distributions κ ∈ D1 [α] such that κ(x) is concentrated on U × V 0 . Analogously, given β ∈ B, let D1 [β] (respectively, D10 [β]) stand for the distribution of players’ controls κ ∈ D = WM(Td , U × V) (respectively, κ ∈ WM(Td , U 0 × V)) such that, for any x ∈ Td , the projection of κ(x) on V is β(x). Let t0 ∈ [0, T ], m0 ∈ P 2 (Rd ), α ∈ A, β ∈ B. We say that the flow of probabilities m(·) ∈ M is generated by t0 , m0 , and κ ∈ D if there exists a probability χ ∈ C such that • m(t) = et# χ; 0 • χ = trajtm(·) # (m0 κ).
assume that if the first player uses a feedback strategy, then the second player can use a distribution of measurable controls. We also assume that the first player corrects his/her control in the finite number of time instants. Thus, we say that the motion m1 (·) is generated by the initial time t0 , initial distribution of agents m0 , strategy of the first player u and a partition of time interval ∆ = {ti }ni=0 if there exist κi , i = 1, . . . , n − 1 such that • m(t0 ) = m0 ; • κi ∈ D10 [u[ti , m(ti )]]; • for t ∈ [ti−1 , ti ], i = 1, . . . , n, m(t) = m(t, ti−1 , mi−1 , αi−1 , κi−1 ).
Let X01 (t0 , m0 , u, ∆) stand for the set of step-by-step flows of probabilities generated by t0 , m0 , u and ∆. Below we denote by X 1 (t0 , m0 , u) the set of limit motions m(·) such that there exist a sequence of partitions of time interval {∆r } and a sequence of flows of probabilities {mr (·)} such that • d(∆r ) → 0 as r → ∞; • mr (·) ∈ X01 (t0 , m0 , u, ∆); • lim sup W2 (m(t), mr (t)) = 0. r→∞ t∈[t ,T ] 0
The first player wins at (t0 , m0 ) is there exists a strategy u such that for any m(·) ∈ X 1 (t0 , m0 , u) one can find a time instant τ satisfying • m(τ ) ∈ M [τ ] • for any t ∈ [t0 , τ ], m(t) ∈ N [t].
Below we denote the set of (t0 , m0 ) where the first player wins by W∗1 .
If we consider the problem of the second player, then his/her feedback strategy is a function v from [0, T ] × P 2 (Rd ) to B c . Given an initial time t0 , an initial distribution of agents m0 , the second players strategy v, a partition ∆ one can define the set of corresponding flows of probabilities X02 (·, t0 , m0 , v, ∆). Further, let X 2 (t0 , m0 , v) denote the set of limit motions. The second player wins at (t0 , m0 ), if there exists a strategy v such that for any m(·) ∈ X 2 (t0 , m0 , v) and some τ ∈ [t0 , T ] the following conditions hold: • either τ = T or m(τ ) ∈ / N [τ ]; • for any t ∈ [0, τ ), m(t) ∈ / M [t].
Let W∗2 denote the set of (t0 , m0 ) at those the second player wins. 5. STABLE SETS
Below we denote the flow of probabilities generated by t0 , m0 , α and β by m(·, t0 , m0 , κ). One can prove the existence and uniqueness theorem for the flow of probabilities m(·, t0 , m0 , κ). Now let us introduce the feedback formalization. As it was mentioned above we assume that the feedback strategy of the first player is a distribution of constant controls i.e. we say that a function u from [0, T ] × P 2 (Rd ) to Ac is a feedback strategy of the first player. It is reasonable to 656
The following definitions extend to the case of mean field type differential games the notions of u- and v-stability first introduced in Krasovskii and Subbotin (1988) for the finite dimensional differential games. We say that the set W ⊂ N ⊂ [0, T ] × P 2 (Rd ) is u-stable, if for any (t∗ , m∗ ) ∈ W, t+ ∈ [t∗ , T ], β ∈ B c there exists a distribution of pairs of controls κ ∈ D2 [β] and a time τ such that, for m(·) = m(·, t∗ , m∗ , κ), • either m(τ ) ∈ M [τ ] or τ = t+ ;
IFAC CAO 2018 Yekaterinburg, Russia, October 15-19, 2018 Yurii Averboukh / IFAC PapersOnLine 51-32 (2018) 654–658
• m(t) ∈ W[t] when t ∈ [t∗ , τ ].
The notion of v-stability is defined slightly non-symmetric. First, recall that, if F ⊂ [0, T ] × P 2 (Rd ), then an open set O(F ) containing F is called a neighborhood of F . Below we assume that F ⊂ [0, T ] × P 2 (E). Thus, [|t − t | + W2 (m , m )] < ∞. sup (t ,m )∈∂F,(t ,m )∈∂O(F )
We say that the set W ⊂ [0, T ] × P 2 (Rd ) is v-stable, if, there exist neighborhoods of M and N O(M ) and O(N ) satisfying the following: W ∩ O(M ) = ∅ and for any (t∗ , m∗ ) ∈ W, t+ ∈ [t∗ , T ], α ∈ Ac one can find a distribution of the pairs of controls κ ∈ D1 [α] and a time τ such that, for m(·) = m(·, t∗ , m∗ , κ), • either m(τ ) ∈ / O(N )[τ ] or τ = t+ ; • m(t) ∈ W[t] when t ∈ [t∗ , τ ].
Without loss of generality, one can assume that u- and v-stable sets are closed. Theorem 1. Let W be a u-stable set. Furthermore, assume that W[T ] ⊂ M [T ]. Then there exists a strategy u such that, for any (t0 , x0 ) ∈ W and any m(·) ∈ X 1 (t0 , m0 , u), one can find a time instant τ satisfying • m(τ ) ∈ M [τ ] • for any t ∈ [t0 , τ ], m(t) ∈ N [t]. In particular, Theorem 1 states that if W is u-stable, then W ⊂ W∗1 . The key idea of the proof of Theorem 1 is an extension of Krasovskii-Subbotin extremal shift rule Krasovskii and Subbotin (1988) to the case of mean field type differential games. We construct the strategy u in the following way. First, given s ∈ [0, T ], m∗ ∈ P 2 (Rd ), x, y ∈ Rd , put u ˆ(s, x, y, m∗ ) argmin maxx − y, f (s, x, m∗ , u, v). u∈U
v∈V
Analogously, set vˆ(s, x, y, m∗ ) argmax minx − y, f (s, x, m∗ , u, v). v∈V
u∈U
Without loss of generality, one can assume that the functions (x, y) → u ˆ(s, x, y, m∗ ) and (x, y) → vˆ(s, x, y, m∗ ) are measurable. Now, let s ∈ [0, T ], m∗ ∈ P 2 (Rd ). If (s, m∗ ) ∈ W, then we put u(s, m∗ ) to be equal to an arbitrary distribution of the first player’s controls α ∈ Ac . If (s, m∗ ) ∈ / W, then let ν∗ ∈ P 2 (Rd ) be such that • ν∗ ∈ W(s); • W2 (m∗ , ν∗ ) = min{W2 (m∗ , m) : m ∈ W[s]}.
Further, pick π ∈ Π(m∗ , ν∗ ) to be an optimal plan between m∗ and ν∗ . Let π(·|x) be an disintegration of π along m∗ i.e., for any ϕ ∈ Cb (Rd × Rd ), ϕ(x, y)π(dx, dy) Rd ×Rd ϕ(x, y)π(dy|x)m∗ (dx). = Rd
Rd
Now set, for (s, m∗ ) ∈ /W ˆ(s, m∗ , x, ·)# π(·|x). u[s, m∗ ](x) u
657
657
The proof of optimality of the strategy u is based on the following lemmas. Lemma 2. Let s ∈ [0, T ], x∗ , y∗ ∈ Rd , m(·), ν(·) ∈ M, ξ ∈ U , ζ ∈ V, u∗ = u ˆ(s, m(s), x∗ , y∗ ), v ∗ = vˆ(s, m(s), x∗ , y∗ ). Denote x(·) = x(s, x∗ , m(·), u∗ , ζ), x(·) = x(s, y∗ , ν(·), ξ, v ∗ ). Then, for any r ∈ [s, T ], x(r) − y(r)2 ≤ x∗ − y∗ 2 (1 + 3L(r − s)) + LW2 (m(s), µ(s)) + ω1 (r − s) · (r − s). Here ω1 : R → [0, +∞) is a continuous at 0, vanishing at 0 determined only by the function f . The proof of this Lemma is analogous to the proof of Lemma 1 in Averboukh (2018). Integrating the result of the Lemma w.r.t. the optimal plan π we get the following. Lemma 3. Let s, r ∈ [0, T ], s ≤ r, m∗ , ν∗ ∈ P 2 (Td ), π be an optimal plan between m∗ and ν∗ , π(·|x), π(·|y) be its disintegration with respect to m∗ and ˆ(s, m∗ , x, ·)# π(·|x), β ∗ (y) ν∗ respectively, α∗ (x) u vˆ(s, m∗ , ·, y)# π(·|y), κ ∈ D1 [α∗ ], ϑ ∈ D2 [β∗ ], m(·) = m(·, s, m∗ , κ), ν(·) = m(·, s, ν∗ , ϑ). Then W2 (m(r), ν(r)) ≤ W2 (m∗ , ν∗ )(1 + 4L(r − s)) + 1 (r − s) · (r − s). The proof of this lemma is the same as the proof of Lemma 2 in Averboukh (2018). Proof of Theorem 1. Let t0 ∈ [0, T ], m0 ∈ P 2 (Rd ) satisfy (t0 , m0 ) ∈ W. We shall prove that for any ε > 0 one can find δ > 0 such that whatever ∆ = {ti }ni=0 and m(·) ∈ X01 (t0 , m0 , u, ∆) are chosen there exists τ ∈ [t0 , T ] the following properties hold true: • dist(m(τ ), M [τ ]) ≤ ε; • dist(m(t), W[t]) ≤ ε when t ∈ [t0 , τ ].
such that κi Notice that there exist {κi }n−1 i=0 D10 [u[ti−1 , m(ti−1 )]] and, for t ∈ [ti−1 , ti ] m(t) = m(t, ti , m(ti−1 ), κi−1 ).
∈
Let j be the greatest number i = 1, n such that m(ti ) ∈ / W[ti ]. Recall that W ⊂ [0, T ] × P 2 (E), where E is a compact in Rd . Since m(tj−1 ) ∈ W[tj−1 ] using the ustability of W we get that dist(m(tj ), W[tj ]) ≤ 2R(tj − tj−1 ) ≤ 2Rd(∆). (3) Now, using the u-stability of W, inclusion W[T ] ⊂ M [T ], and Lemma 3, we construct the number l ∈ {j, . . . , n − 1}, time θ ∈ [tl , tl+1 ], and {νi (·)}li=j such that, for τi ti ∧ θ, • νi (t) ∈ W(t) when t ∈ [τi , τi+1 ]; • νl (τ ) ∈ M (τ ); • the following inequality holds true: for t ∈ [τi , τi+1 ], W2 (νi (t),m(t)) ≤ W2 (νi (τi ), m(τi ))(1 + 4L(t − τi )) (4) + 1 (t − τi ) · (t − τi ). Since dist(m(t), W[t]) ≤ W2 (νi (t), m(t), applying (4) sequentially, and taking into account (3) we get dist(m(t),W[t]) ≤ 2Rd(∆) exp(4L(t − tj )) + 1 (d(∆)) exp(4L(t − tj )) · (t − tj ).
IFAC CAO 2018 658 Yekaterinburg, Russia, October 15-19, 2018 Yurii Averboukh / IFAC PapersOnLine 51-32 (2018) 654–658
The inclusion W[t] ⊂ N [t] implies the following inequality, for any t ∈ [t0 , θ], dist(m(t),N [t]) ≤ 2Rd(∆) exp(4L(t − tj )) (5) + 1 (d(∆)) exp(4L(t − tj )) · (t − tj ). Furthermore, using the inclusion νl (θ) ∈ M [θ], we conclude that dist(m(θ),M [θ]) ≤ 2Rd(∆) exp(4L(t − tj )) (6) + 1 (d(∆)) exp(4L(t − tj )) · (t − tj ). Choosing δ such that 2Rδ exp(4LT ) + 1 (δ) exp(4LT ) · T ≤ ε, and by (5), (6) we obtain the conclusion of the Theorem. Theorem 4. Let W be a v-stable set. Then there exist a neighborhoods of M and N O(M ) and O(N ) and a strategy of the second player u such that, for any (t0 , x0 ) ∈ W and any m(·) ∈ X 2 (t0 , m0 , v) one can find a time instant τ satisfying • either m(τ ) ∈ / O(N )[τ ] or τ = T ; / O(M )[t]. • for any t ∈ [t0 , τ ], m(t) ∈ The proof of Theorem 4 is analogous to the proof of Theorem 1. The Theorems 1 and 4 and the definitions of u- and vstablity as well as definitions of the set W∗1 and W∗2 implies the following. Theorem 5. [0, T ] × P 2 (Rd ) = W∗1 ∪ W∗2 . The sets W∗1 , W∗2 are u- and v-stable respectively. 6. CONCLUSION The paper was consider the mean field type differential game. This problem arises quite naturally when we examine the system of many interacting agents pursuing the common goal under some disturbances. In the paper we study the target problem for the mean field type differential game. Our results are based on KrasovskiiSubbotin approach. In the framework of this approach feedback strategies are used. The nonanticipative strategies for mean field type differential game was developed in Cosso and Pham (2018). In that paper it is assumed that the players wish to minimize/maximize a functional. The proof of equivalence between feedback formalization and the approach based on nonanticipative strategies is the subject of future work.
658
REFERENCES Ahmed, N. and Ding, X. (2001). Controlled McKeanVlasov equation. Commun Appl Anal, 5, 183–206. Andersson, D. and Djehiche, B. (2011). A maximum principle for SDEs of mean-field type. Appl Math Optim, 63(3), 341–356. Averboukh, Y. (2018). Krasovskii-subbotin approach to mean field type differential games. Preprint at ArXiv:1802.00487. Bayraktar, E., Cosso, A., and Pham, H. (2018). Randomized dynamic programming principle and Feynman-Kac representation for optimal control of McKean-Vlasov dynamics. Trans Amer Math Soc, 370, 2115–2160. Bensoussan, A., Frehse, J., and Yam, S. (2015). The master equation in mean field theory. Journal de Math´ematiques Pures et Appliqu´ees, 103, 1441–1474. Buckdahn, R., Djehiche, B., and Li, J. (2011). A general stochastic maximum principle for SDEs of mean-field type. Appl Math Optim, 64(2), 197–216. Carmona, R. and Delarue, F. (2013). Forwardbackward stochastic differential equations and controlled McKean–Vlasov dynamics. Preprint at arXiv:1303.5835. Cavagnari, G., Marigonda, A., Nguyen, K., and Priuli, F. (2017). Generalized control systems in the space of probability measures. Set-Valued Var Anal. doi: 10.1007/s11228-017-0414-y. Published online. Cavagnari, G. and Marigonda, A. (2015). Time-optimal control problem in the space of probability measures. In Large-Scale Scientific Computing, volume 9374 of Lecture Notes in Computer Science, 109–116. Cosso, A. and Pham, H. (2018). Zero-sum stochastic differential games of generalized mckean-vlasov type. Preprint at ArXiv:1803.07329. Djehiche, B. and Hamad`ene, S. (2016). Optimal control and zero-sum stochastic differential game problems of mean-field type. Preprint at ArXiv:1603.06071. Krasovskii, N.N. and Subbotin, A.I. (1988). Gametheoretical control problems. Springer, New York. Lauri`ere, M. and Pironneau, O. (2014). Dynamic programming for mean-field type control. C R Math Acad Sci Paris, 352(9), 707–713. Pogodaev, N. (2016). Optimal control of continuity equations. NoDEA Nonlinear Differential Equations Appl, 23, Art21, 24 pp.