Copyright © IFAC System, Structure and Control Oaxaca, Mexico, USA, 8-10 December 2004
ELSEVIER
IFAC PUBLICATIONS www.elsevier.comllocatdifac
NASH EQUILIBRIUM STRATEGIES FOR A CLASS OF NON-LINEAR STOCHASTIC CONSTRAINED DIFFERENTIAL GAMES Daishi A. Murano· and Alex S. Poznyak··
• ITESM-CEM. Edo. de Mexico. Ca7'1'etem Lago de Guadalupe, km. 3.5. Email:
[email protected] •• CINVESTAV. Mexico D.F. AP 14-740, Email:
[email protected]
Abstract: This article deals with the problem of finding a set of Nash equilibrium strategies for the class of non-linear stochastic constrained affine time-invariant differential games with average cost functionals. The constrains are applied to the strategies space. Dynamic Programming (DP) and Lagrange approach are used to obtain a set of sufficient conditions for a Nash equilibrium solution. A structure of solution in proposed for solving the set of Hamilton-Jacobi-Bellman (HJB for short) equations derived from DP and an explicit set of Nash equilibria strategies are obtained. Copyright@2004 IFAC Keywords: Stochastic Differential games, Dynamic programming, HJB-equations.
1. INTRODUCTION
winski, that is, he focused on the existence of Nash equililibrium points in stochastic differential games. In Basar (Basar and C{.'CCt, 1995) the sufficient conditions for the existence of Nash equilibrium strategies are given for a general case in both sense, deterministic and stochastic case, and, specifically, for the case of linear quadratic time-invariant differential games the set of Nash equilibrium strategies have been derived. Lukes (Lukes, 19719) consider{.'
Nowadays there exists a great interest to expand the classical control theory to the game theory dealing with a control design for multiparticipant conflict situations. Each participant has his own c~t functional. In fact, Game Theory involves multi person decision making where each person pursues his or her own interest which are partly (totally) conflicting with others. Each time more and more scientific areas pay their attention to analysis of conflicting situations (mathematics (Tolwinski, 1978b),(Uchida, 1978), CCtr nomics (Engclbert, 1989), engineering, aeronautics (Vladimir and Josef, 2003), sociology and politics (Aart de, 1992). Here we briefly review some results seemed to be important for this work. Tolwinski (Tolwinski,
The aim of this paper is to extend classical results concerning Nash equilibrium strategies design to a class of non-linear stochastic differential games
1978a) studied the existence of Nash equilibrium points in differential games either with linear or nonlinear dynamics. Uchida (Uchida, 1978) studied a complementary part given by Tol-
609
Assumption 1. (u i , d) , i E N is a separable mctric space (with mcasure d: d(Ui,U i/ ), i E N is a distancc bctwcn u i , U 'i E U i .
with time averaged functional and with some constrains to control actions. The problem to be solved is focused on the class of differential stochastic games with affine timeinvariant non-linear dynamics, which functions defined in a strip-cone type space, and with the set of strategies which belong to a compact (spheric) set. Two main re-suits are obtained in this paper: the first-one is the sufficient conditions obtained by Dynamic prograITllning approach application and providing a Nash equilibrium solution and the second-one, an analytical expressions given in explicit form for the set of Nash equilibrium strategies in the case of affine models supplied by the average time quadratic criterion.
Assumption 2. The maps 10 : ]Rn li : ]Rn --+ ]Rnxm', i EN, C' : lR n • .. X ]RmN --+ lR hold definition 1.
U i := {u
x(o)
dt
+ adWt
where x E
]Rn
I UT ' u i ~
(U+ i )2 , u+ i
>
a}
The set of strategies that hold assumptions 1. 3 and fulfill (3) will be called the set of admissible strategies for the game (1). It will be denoted by U~dm; .• , i E N .
Definition 2 (Nash equilibrium)An N-tuple of strategies {u l , ... , uN} with u i E U i , i EN, is said to costitute a noncooperative (Nash) equilibrium solution for an N-persoll nonzero sum differential game, if the following N inequalities are satisfied for all u i E U i , i E N:
(1)
is the state vector of the ganlC;
JI(U h ,U2>, ... ,U N .)::; J I (u l ,u 2>, ... ,u N » J2(U h ,U2>,U 3>, ... ,uN JI(U h ,U2,u3*, ... , u N *)
(i E N) is the strategies (controls) vector of each player where U i is a set of admis]Rm'
»::;
sible strategies defined below; lo :]Rn --+ ]Rn and 1i:]Rn --+ ]Rnxm' are the mappins of the dynamics of the game; a E ]Rnxm is a constant matrix; W t is a m-dimensional standart Wienner process. Each player wish to minimize his average cost functional given by
IN (u l >,
... ,
uN-I>, uNo) ::; JI(u h , ... , UN - h , uN)
(4) The formulation of the problem: for the class of the games given by (1) and supplied with cost functional (2) under the assumptions 1-3, we want to find strategies u Oi E U!drni .• (i E N) providing a equilibrium solution in the sense of (4).
3. SUFFICIENT CONDITIONS FOR NASH EQUILIBRlUM STRATEGIES
Definition 1. The function I : [0, x) x ]Rn X U --+ is said to be an (C 2 )-mapping (or, "strip-cone" type) if it is Borel measurable and is C 2 in x (almost everywhere) for any t E [0, Xl) and any u i E U i (U := U I x . .. X UN) and also there exist a matrix A E ]Rnxn, a constant Land a modulus of continuity r/> : [0,00) --+ [0,00) such that for any t E [0, Xl) and for any x , u, x, ti E ]Rn X U X ]Rn X U ]Rnxm
E ]Rm'
Assumption 3. We will consider thc class of S(}named BIBO (bounded input - boundcd output) strategies u (.) which, under thc constraint (3). kccp the state dYllanlics bounded.
= Xo
ui ~ U i E
X
(3)
Let the class of non-linear affine stochastic timeinvariant differential ganlCs modeled by
[/o(X) + tlt(X)U i ]
]Rn ,
Thc restrictions applied to the stratcgies (i EN) are as follows:
2. CLASS OF DIFFERENTIAL GAMES AND NASH EQUILIBRlUM DEFINITION
dx =
--+
X ]R",l
L:
Using the Dynamic Progranlming technique for the problem formulated above, the following HJB equations (see (Onesimo, 1994)) may be obtained
II/(t,x,u)-/(t,x,ti)-A(x-x)1I ~ L IIx - xII + r/> (d (u, ti)) II/x (t , x , u) - Ix (t,x,ti) - All ~ L Ilx - ill + r/> (d (u , ti)) II/xx (t, x , u) - Ixx (t, ti)1I ~ r/>(lIx-xll +d(u,ti))
x,
(5) where V;(x). i E N is the gradient of the function Vi : lR n --+ lR (i E N), and V;",(x) is its Hessian
(here Ix (', x, .) and IIX (-, x, .) are the partial derivatives of the first and the second order).
610
matrix which should be found during the design process of equilibrium strategies, ci are some constants to be defined below.
=
dx°
+O"dWt ,
The main result , concerning Nash equilibrium strategies for the problem under consideration, consists in the following verification rule.
~f:(X')Ui' (V;T(X')'x o)] dt
O [fo(X ) +
x' (0) =
Xo
(9)
x x
and in (8) = (t), which corresponds to (1) for some admissible strategy u (u i E U,:dm i ., ,i EN). Then the integration of (7) and (8) by t from 0 up to T implies
Theorem 1 (Nash equilibrium characterization). A8.':iUffie that
T
J
=
ciT
(V; T (XO) [Jo(XO)+
1= 0
~ft(X')1Li' (V;T (x'),x.)] T
+ C i (x',u' (V.r. (x' ,x' (t))) dt + O"dWI)T
T
J
O"dW,
J
+~
1=0
tT [0"0" TV;x(x')j dt =
1=0
T
J
Then the existence of the constants ci and the s0lution V;(x) (i EN) to the coupled H.J-equations given by
,;~ [vt
(x)
('O(X) + N
+gi(X)
+L
t.,1
Ilu
i
•
t=u
((Vt (x') ,dx')+C (x· ,Ui.(V;T (x·) ,x·) ))dt T
T
-J T
J
(x),';' (V;(x) , x) )
(x·,u i• (v; T(x'),x'))dt
t=O
t=o
T
O"dWt
+4tT [0"0" TV;x(X)j] , C{) +Cl
=
T
dVi(x'(T))+
(V;(x) , x)ll~
TV;x(xo)j dt
t=O
j=l
such that V(x) :S
J~tT [0"0" Jd -J
O"dW1 +
1=0
IIxll2
i E N
t=o
(10)
(6) guarantees that the
and
ciT:S Vi(X (T)) - Vi(xo)+
i
controls u • (vt (x) , x) are the Nash equilibrium strategies.
T
J
i
C (x,u)dt -
Proof. By (5) it follows
~
1=0
T
J
O"dW,
(11)
I=U
where C i (x , u) := gi(X)+
N
L
Ilujll~ ,substituting
j=l
the last expression into (11) and dividing by T gives
J T
(7)
~
Ci(X' ,Ui'(V;'(x') , x')dt
t=O
"
g'(x)
+
~ V;' (x) ('O(X) + t. filx )U,) +
t.IIU'II~ + ~t' [""TV;Ax)] l'
:S
~ (Vi (x (T)) -
V"(x' (T))) +
J. T
T1
Cl (x (t) , u (t)) dt
1=0
iE N
Tending T -+ 00 and applying the mathematical expectation operator to both sides of the last inl.'
(8) Select in (7) x = Xo (t), which corresponds to the close-loop dynamics
611
T
fj ~f
E { cf (x'. Ui.(V; T(x'),
X·) }dt
1=0
(E{Vi(X(T)}) _E{Vi(X' (T))}) T
Tj
+1
.
I ·T
(x)V:11 ,if f{ (x)V;'11 >21£+r'. -r'. + Ilf(21£+ { 0, if Ilff (x)V;11 ~ 21£+r i
>.i =
E {C' (x (t) , 1£ (t) ) } dt
1=0
that gives
and, in view of (??) and A3, we have T
li~-:,~P-f
i i j E { C (x', u ' (V;T (x'), x·) } dt + f(
1=0 T
~
IimSUPf jE{Ci(X(t),1£(t))}dt T-oo
{
1=0
It means that the strategy 1£ i '
(vt (x'), x·)
(V.r
a) if
(x), x) we need to solve the So, to design u i ' following two sub-problems:
.
min
u'EU~d"li ."
X
i
x X
ifllf( (x)V;(x)11
i
Ilff (x)V;1I > 21£+r\
c =
(1) To solve the following constraint optimization problems (i E N) : -4
2~J( (x)V;(x),
-
1
~ 21L+r i
(14) where V; := V;(x) , i EN is :solution to the next set of equations
is
optimal. •
vt (x)J;(x)1£ i + "1£ill~
(x)V;~~) = ~fi'll(fx:T =( )Vi( )11 >2u + r
I fi'T(x)V~(x) . 11' 1
-1£
V;
T
[fo(X) - ,,+
i EN
f:!I (x) (fl: (X)V: III (X)Vx
j=1
+ gi(X)
(12)
+ (1£+)2
f>j + ~tr [0'0'
TV;x(X)]
j=1
(15)
(2) To find the solutions V;(x) to the coupled algebraic equations (6) for some constants ci (i EN).
4. SOLUTION OF THE I-ST SUB PROBLEM In view of the constraints (5) and u:sing Lagrange multipliers approach, we should to solve the following min-max problems (i EN) : (16)
min max .ci(1£i , >.i)
Remark 1. We can notice that . for the considered class of differential games with bounded control actions, the strategies that guarantee a Nash equilibrium solution belong to the class of variable structure strategies (in this case two), since for each subspace of current states the corresponding HJ-equations (15) or (16) should be solved.
U'EU' + Al > O
£;(u; ,A;)
,=
{V;' ~O(X) + ~ fl(x)"j) +
N
gi(X)
+
L Il1£ II~j + ~tr [0'0' TV::r(x)] i
j=1
+>.i (1£iT 1£i _ (1£+)2)} (13) supplied by the complementary slackness condiT tions >.i (1£ i• 1£i• - (1£+ i = O. Thecorresponding extrcmality conditions
5. SOLUTION OF THE 2-ND SUB PROBLEM
)2)
Let us try to find the solution in the following form
a.ci~i' >.i) = f;T (x)V; + 2r i1£ i + 2>.i1£ i = 0
V;",
Er = ri /, ri > 0,
V; := -v}(x)fo(x), i E N - 'V1/}(x)f: (x)
= -1/i(x)'V fo(x)
( 17)
where 0 ~ 'l/,i (x) is a scalar function to be defined.
a.ci(1£\ >.') = 1£ iT 1£i _ (1£+)2 = 0 a>.' for
V; of (15) and (16)
1) First, considered the subspace where
Ilff (x)V;11
> 21£+ri, i E N that corresponds to the equation
leads to
612
(15). The direct substitution of (17) into (15) implies
C
i
= -'l/l(x) Ilfo(x)1I 2-
Substituting this relation into (??) we obtain the next system of linear homogeneous equations
N
U+1/Ji(X) L
Ilf; (x)fi(x)11
j=1
N
+gi(X)
+ (U+)2 L
~'ljJi(x)tr [aa TV' fo(x)]
r1 -
i E N
that gives
1/Ji(x)a(x) + j3T (X)V'1/Ji(X)
+ (}
= 0,
Ilfo(x) 112 + U+ L Ilf; (X)ft (X)II j=1
1
+2tr [aa TV' fo(x)] TIT
j3 (X) := 2fo (x)aa N
(}:= (U+)2
L
T
rl-c i
#1
u i• = S
ui·(x) = u+ f( (x)fo{x) 8 Ilff (x)fo(x)11
(24)
= 0
ci
_'ljJi(X) Ilfo(x)11 2
=
+ gi(X)_
N
2~i'ljJi(X) Lf;[(x)ti(x)f( (x)fo(x)~(x)+ )=1
j3(x) 8'ljJI (X) = ZI(X,'ljJI) O:f, { j3(X) 8'ljJ (x) = Z2(X, t/J2) oX ZI(X, 'ljJI) := -a(x)'ljJI(X) Z2(X, 'ljJ2) := -a(x)'ljJZ(X)
4~i
(20)
= 0,
i
L
~ (x)f; (x)fi (x)f( (x)f()(x)~ (x)
1 [ ' . T(x)) ] 2tr aaT (-'ljJ'(x)V'fo(x) - V''I/,I(x)fo
+ gl.I(X) - Cl + g1.2(X) _ c2
= 1, 2
N
;=1
Defining
cj>i(X) := ~f(i (x)ft(x)ff (x)fo(x) r' ci - gi(X) := ci(x), i E N
one can sce that to find the solution of (20) is enough to find 'ljJ' (i = 1,2) in implicit form
the last equations may be represented as
(21)
ci(x) = -'ljJi(x)a(x) -
!
=0
(x)V;11
Rewriting (19) as
being ocp'
(23)
2) Within the subspace 11ft S; 2u+r i the direct substitution of (17) into (16) gives (i E N)
(19)
cpi(X, 'I/,i)
d'ljJ2
Taking the value for 'ljJi (i EN) , we obtain the the following control law
+ a(x)'ljJI(X) _ gl.l(X) + Cl = 0 + a(x)'ljJ2(X) - g1.2(X) + C2
dx
cpi(X,'ljJi) = 0 or
({i)
We can notice that the last equation is a first order partial differential equation which can be solved using the method of characteristics. To illustrate this method let consider the simplest game case, that is, a game with two players and one state.
{ j3(X) O'ljJO;X)
= 0
= Zl (x, 1/JI) , !3(x) = z2(x,1/J2)
)=1
j3(X) O~~X)
(22)
+ z2(x. 'ljJ2) :::
we may find the first integrals of each equation in (23) ,1(X,'ljJI)=d l , ,2{x,'ljJ2)=~ which are called the characteristics of (19) (or (22) . Then the general solution of (22) has the form cpi=({i) (i = 1,2) being an arbitrary function. The s0lution 1/Ji, i = 1. 2 of the system (19) that depends on the arbitrary function is determined by the equation
N
:=
j3(x) o'l/~;x)
dx !3{x)
i EN (18)
where
a(x)
+ZI(X,'ljJI)::: =0
2
where the function cpl (i = 1, 2) must be satisfied under the condition that 'ljJ is function of x , that is given by the equation (21). Rewriting the system of ODE (22) in the form of the vectorial lines for each player
2
j=1
-~tr [aa TV''I//(x)f;[ (x))],
{
j3{X)O'l/~~X)
O. Considering that the function
~foT(x)aaTV''ljJi(x)
N
+~ LqI(x)(~(x)?,
8'ljJ' 'ljJi = 'ljJi(X) is determined by the equation (21) cpi(X ,'ljJi(X)) = 0 and calculating its derive with respect to x, we obtain
(25) i EN
)=1
where
Ocpi Ocpi o'Ii} -+-.' =0, i= 1,2 ox mp' ox 0'ljJ' _ Ocpi Ocpi ox - - ox / 8'ljJi
a{x):= IIfo(x)1I
2
N
+ ~ Lq,i(x)~(x) )=1
1
+'2tr [aa TV' fo(x)]
613
linT (x)v;11 ::; 2u+r
Thcn for thc structurc
.
u~'(x) =
IT
i
REFERENCES
thc stratcgics havc
.
2r in (x)fo(x)'I/"(x), i E N
Aart dc, Zecuw (1992). Notc on nash and stackclbcrg solutions in a differcntial gamc model of capitalism. Journal of Economic Dynamic and Control 16, 139 145. Basar, T. and Olsder Gccrt (1995) . Dynamic Noncooperative Game Theory. Acadcmic Prl.-'Ss. Engelbert , Dockncr ct AI. (1989). Noncooperativc solutions for a diffcrcntial gamc model of fishcry. Journal of Economic Dynamic and Control 13, 1 20. Leon, Pctrosjan and Zaccour Georges (2003) . Time-consistcnt shapley valuc allocation of pollution cost reduction. Journal of Economic Dynamic and Control 27, 381 398. Lukcs, D. L. (19719) . Equilibrium fecdback control in linear gamcs with quadratic cost. SIAM Journal on Control and Optimization 9(2), 234 252. Oncsimo, Hcrnandez-Lerma (1994). Lccturcs on continuous-time markov control proccsses. Technical rcport. Sock'Ciad Matematica Mcxicana. Tolwinski, B. (1978a). Numcrical solution of nperson nonzero-sum diffcrcntial gamcs. Control and Cybernetics 7(1), 37 50. Tolwinski, B. (1978b) . On thc existcncc of nash cquilibrium points for differcntial ganlcs with lincar and nonlincar dynamics. Control and Cybernetics 7(3), 57 69. Uchida, K. (1978) . On thc cxistcncc of nash equilibrium point in n-pcrson nonzcro-sum stochastic diffcrcntial gamcs. SIAM Journal on Control and Optimization 16(1), 142 ·149. Vladimir, Thrctsky and Shiner Josef (2003) . Missilc guidance laws ba~d on pursuit-evasion gamc formulations. Automatica 39, 607 618.
(26)
whcrc 'l/'i(x), i E N is solution of(25) which can bc solved using thc mcthod of charactcristics cxactly as beforc. Finally, the control stratcgies looks likc following:
II~i (x) fC (x)fo(x)11 > 2u+r i whcrc;P' (x) := '1/'; (x)is given by (23) the equilibrium
• if
control is a unitary control and is as follows : ·T
u i' = ui'(x) = u+ n
....
(x)fo(x)
Ilfr (x)fo(x)11
(27)
• if II~i (x) (x)fu(x)11 ::; 2u+r i thcn thc equilibrium control is as
ff
.
.
u:' = u:'(x) =
IT
2rJi
.
(x)fo(x)w'(x)
(28)
bcing 'l/'i(X) solution of (25) . Now wc arc ready to formulate thc mail result:
Theorem 2. The N -person differential game. given by (1) with cost functionals (2) under the assumptions 1-3 and the constrnints (3) , has a Nash equilibrium solution (may be. not unique) given by (27) and (28) .
6. CONCLUSIONS • In this papcr, thc problcm of finding a Nash equilibrium solution for a class of non-linear time-invariant stochastic differcntial games with avcrage cost functionals is tackled. • Thc Dynamic Programming, Lagrangc multiplicrs and !to diffcrential rulc approachcs arc applicd to obtain thc corresponding HJBequations. The theorem on the characterization of Nash cquilibrium is presented. • It is shown that for thc considcrcd class of differcntial gamcs cquilibrium stratcgies can bc dcsigned analytically as a nonlinCaf feedback control: in one statc subspace it is a unitary control (27) and, in anothcr onc, it is esscntially nonlincar (28). • The obtained controllers, which provides a Nash equilibrium solution, havc a variablc structurc. In fact, the obtained equilibrium stratcgies rcpresent a viscosity solution [4] to the corresponding HJB-problcm , since the obtained HJB-equations are non diffcrcntiablc cverywherc.
614