Copyright @ IFAC Modeling and Control of Economic Systems. Klagenfurt. Austria. 2001
ELSEVIER
IFAC PUBLICATIONS www.elsevier.comllocate/ifac
COORDINATION OF LONG· TERM AND SHORT· TERM PAYOFFS IN DYNAMIC BIMATRIX GAMES A.M.Tarasyev· ,·· ,1
• Institute of Mathematics and Mechanics UrB of RAS, S.Kovalevskaya str. 16, Ekaterinburg 6202i9, Russia e-mail:
[email protected] •• international institute for Applied Systems Analysis (IIASA), A-2361 Laxenburg, Austria
Abstract: Dynamic optimality principles are applied to construct equilibrium solutions in evolutionary bimatrix games. The dynamics of the model can be interpreted as Kolmogorov 's differential equations in which coefficients describing flows are not fixed a priori and can be chosen on the feedback principle. Long-term payoffs of coalitions are defined on the infinite horizon by the integral functionals . A peculiarity of the model is the coordination of velocities of solutions constructed for long-term payoff integrals with directions of gradients of short-term payoffs. The notion of a dynamic Nash equilibrium is introduced in the class of control feedbacks. A solution based on the notion of guaranteeing feedbacks is proposed. Discontinuous guaranteeing feedbacks are constructed using generalized methods of characteristics and switching regimes are indicated. Copyright © 200i IFAC Keywords: Dynamic bimatrix games, guaranteeing feedbacks, equilibrium trajectories
I. INTRODUCTION
The dynamics of game interaction is related to differential games (see (Kleimenov, 1993), (Krasovskii and Subbotin, 1988) and evolutionary game-theoretical models (see (Basar and Olsder, 1982), (Hofbauer and Sigmund, 1988), (Kaniovski et al., 2000), (Nelson and Winter, 1982), (Sonnevend, 1981), (Vorobyev, 1985), and can be interpreted as a generalization of the well-known Kolmogorov's equations which arise in stochastic models of mathematical economics and queueing theory. The generalization consists of introducing control parameters instead of fixed coefficients which describe incoming and outgoing flows within coalitions. The process evolves over the infinite interval of time. Payoffs of participants are specified by payoff matrices. Long-term payoffs of coalitions are defined as average integral payoffs of participants. The global long-term functionals are connected with the foreseeing concept which takes into account not only local short-term interests of coalitions but is
We consider a model of an evolutionary non zero sum game between two coalitions of participants in the framework of differential games theory (see (Krasovskii and Subbotin, 1988)of the approach for nonantagonistic problems (Kleimenov, 1993), and statements and methods of analysis of evolutionary games proposed in (Kryazhimskii and Osipov, 1995). We concentrate our attention on constructing dynamic Nash equilibria (Kleimenov, 1993), and guaranteeing feedbacks which maximize corresponding payoffs (Krasovskii and Subbotin, 1988). We obtain resolving trajectories which give better results than the solutions of classical models.
1 The author was supported by the Russian Fund for Fundamental Research under grants 99-01-00146, 00-15-96057, and Fujitsu Research Institute (FRI) under IIASA-FRI contract 00-117.
41
payoff of the second coalition by the payoff matrix B = { bij }.
oriented also on global future change. Coordination of long-term and short-term payoffs means that components of the system velocity should have the same sign as components of gradients of payoff functions.
The terminal payoff functional of the first coalition is defined as mathematical expectation corresponding to the payoff matrix A and can be interpreted as its "local" interest
We introduce the notion of a dynamic Nash equilibrium in the class of control feedbacks and use an approach for its construction which is based on the "guaranteeing" concept. This new solution is generated by constructions of the theory of positional differential games and involves guaranteeing feedbacks of auxiliary zero sum games (see (Kleimenov, 1993), (Krasovskii and Subbotin, 1988), (Tarasyev, 1999). The synthesis of guaranteeing feedbacks is determined by switching curves for control signals. The switching curves are constructed by means of the maximum principle of Pontryagin (Pontryagin et al., 1962) with the proper pasting of characteristics of Hamilton-lacobiBellman-Isaacs equations. We generate equilibrium trajectories using these switching curves. The behavior of equilibrium trajectories generated by guaranteeing control synthesis differs qualitatively from the evolution of trajectories of classical models with replicator dynamics. Remember that these trajectories converge a fortiori to a static Nash equilibrium or circulate in its neighborhood (see (Hofbauer and Sigmund, 1988), (Kaniovski et al., 2000), (Nelson and Winter, 1982). The new equilibrium solutions are disposed in the intersection of domains in which the payoffs values are better than the corresponding values calculated at a static Nash equilibrium. Examples of "almost antagonistic" games show that these trajectories converge to the points of intersection of switching curves which can be interpreted as "new" points of equilibrium with better index values.
Here parameters CA, et 1 , et2 are determined according to the classical theory of bimatrix games
CA =all-aI2 -a21 +a22 etl
= a22
= a22 -
a21
(3)
We assume that there are restrictions P(x , y) on control parameter u which are coordinated with the short-term interests 9A
P(x,y)={u: X::;U::;lif
0::; u ::; x if
a::
a:: 2:0 ,
=< O}
(4)
One can introduce similar restnctlOns Q(x, y) on control parameter v coordinated with the short-term interests 9B. We define the "global" interests J A of the first coalition as multifunctions
JA =
[JA', Jt]
(5)
Jt
generated by lower JA' and upper limits of integral mean values calculated on trajectories (x('),yC)) of the system (1) 1
(T - t.)
iT t.
9A(X(t), y(t))dt
The "global" payoff J'B = [JE, Jit] of the second coalition is constructed similarly to (5) as limits of the "local" payoff 9B.
Let us consider the system of differential equations which describes dynamics for two coalitions
y=-y+v
et2
The "local" payoff 9B of the second coalition, and parameters CB, /31, (32 are constructed similarly to (23) using coefficients of matrix B.
2. THE MODEL DYNAMICS
:i:=-x+u,
- a12 ,
(I)
We obtain a solution for the game with dynamics (1) and payoffs J A , J'B using guaranteeing feed backs for zero sum games. Below we construct guaranteeing feedbacks for zero sum games basing on the maximum principle of Pontryagin, (Pontryagin et al., 1962) and properly pasting characteristics of the Hamiltonlacobi-BelTman-Isaacs equation.
Assume that parameter x, 0 ::; x ::; 1 is a part of the first coalition playing the first strategy, and (1 - x) is a part playing the second strategy. Parameter y, 0 ::; y ::; 1 is a part of the second coalition playing the first strategy, and (1 - y) is a part playing the second strategy. Control parameters u and v satisfy the restrictions 0 ::; u ::; 1, 0 ::; v ::; 1 and can be interpreted as signals for coalitions to change their strategies. As an example of dynamics (I) one can consider the game interaction of two large group of firms (or their capital investments) on two markets.
3. DYNAMIC NASH EQUILIBRIUM Let us introduce the notion of a dynamic Nash equilibrium for the evolutionary game (1) with the integral functionals J A , J'B.
We assume that the payoff of the first coalition is described by the payoff matrix A = { aij }, and the
42
°
Definition 1. Let c > and (xo, Yo) E [0,1) x [O,lJ. A pair offeedbacks UO = uO(t,x,y,c), VD = VO(t,x,y,c) is called a Nash equilibrium for an initial position (xo , Yo) if for any other feedbacks U = u(t , x, y, c ), V = vet, x, y, c) the following conditions hold
J A (XO(.),yO(')):2: J1(x 1(-),y1(-)) - c
(6)
J E(x°(-) , yOU) :2: Jj;(x 2(-), y2(.)) - c
(7)
u~=u~o s=u~OS (x,y)=
v~os
is
constructed
The "punishment" feedbacks u~n , v~un are generated similarly to (10) by classical static solutions XB, YA which correspond to a static Nash equilibrium NE = (XB , YA).
(X1U,yl(-)) E X(U, VD), (x 2 (.),y2(-)) E X(Uo, V) (x°(-), y0(-)) E X (U o, VG)
Remark 1. Note that "positive" feedbacks u~os, v~os are inflexible because they don't take into account information about dynamics (I). Our main goal is to construct flexible "positive" feedbacks which essentially use information about dynamics and coordinate short-term and long-term payoffs. Remark 2. Values of payoff functions gA (x , Y), gB (x, y) coincide at points (x A, YB), (x B, YA).
We compose an equilibrium with the help of optimal feedbacks constructed for zero-sum differential games r A = r A U r~ and r B = r B U r~ with the payoffs J A (5) and J'B. In the game r A the first coalition maximizes with guarantee the functional JA(x( ·),y(-)) using a feedback U = 'Lt (t , x, y , c ), and the second coalition attempts, on the contrary, to minimize the functional (x(-), y( .)) using a feedback V = vet, x, y, c) . The game rB is constructed in the asymmetric way.
Let us now construct a Nash equilibrium pair of feedbacks by pasting together "positive" feedbacks u~, v~ and "punishment" feedbacks u~, v~. Let us choose an initial position (xo,Yo) E [0,1) x [0,1) and an accuracy parameter c > 0. Choose a trajectory (x°(-),y°(-)) E X(Xo,YO,u~(-),v~(-)) generated by "positive" feedbacks u~ = u~ (t, x, y, c) and v~ = v~ (t, x , y, c) . Let Tc > be such that for t E [TE' +00) we have
J1
introduce the following
if XA < x ::; 1 ifO::;x
1
(10)
The "positive" feedback analogously to (10).
Here trajectories (x i (-), yi(-)), i = 0,1,2 are generated from their initial position (xo, Yo) by combinations of strategies Un, VD, U, V (see (Kleimenov, 1993), (Krasovskii and Subbotin, 1988»
Let us
{°
°
notations. By
u~(t,x,y, c ) and v~(t,x,y,c) we denote feedbacks
gK(XO(t),yO(t)) > Ji«(XO(.),yO(.)) - c, K = A,B
solving, respectively, the problem of guaranteeing maximization of the payoff functionals J A, J E. We call them the "positive" feedbacks. By u~(t,x,y,c) and v~ (t , x, y, c) we denote feed backs minimizing and call them the the payoff functionals Jj;, "punishment" feedbacks .
Denote by uA(t) : [O,TE) -+ [0,1]' vB(t) : [O,TE ) -+ [0, 1J step-by-step realizations of strategies u~, v~ such that the corresponding step-by-step motion (X E(.), YE(.)) satisfies the condition
J1
Assume for definiteness that the following relations are valid
XA
=
fh
(}2
CA' xB
= CB'
(}1
YA
= CA'
/31
YB
CA > 0,
XA, XB, YA , YB E (0,1),
Proposition 2. The pair of feed backs Uo, VO
U O = {uA(t)
= CB CB <
°
= (~ ~ ),
B
= (~ ~ )
II(x, y) - (xE(t) ' YE(t))11
VO
These relations take place, for example, for matrices
A
if
u~(x, y) otherwise
= { VB(t)
if II(x, y) - (xc(t), YE(t))11 < c v~ (x, y) otherwise (12)
is a dynamic Nash c-equilibrium
(8)
Proposition 1. Differential games rA' r1, and differential games rB' r~ have the following equal values
4. OPTIMAL CONTROL PROBLEM Let us examine the three-step optimal control problem for the first coalition when control parameter u is coordinated with short-term interests of individuals. The coordination means that components x, iJ of velocity vector of the system (I) should have the same sign as components of gradients 8g A /8x (4) and
for any initial position (xo , Yo) E [0,1) x [0,1). These values can be guaranteed, for example, by "positive" feedbacks u~os, v~os corresponding to classical static solutions XA, YB . The structure of feedback u~os is given by formulas
8gB/8y.
43
Thus, we analyze the optimal control problem with the payoff functional
{Tt
J~
= lo
IP
(13)
gA(x(t),y(t))dt
TAl
on trajectories (x( t), y( t) of the dynamic system (I) starting from initial position x(O) = Xo, y(O) = Yo and generated by controls u(·), v( ·). The terminal instant Tj(xo, Yo) in formula (13) will be specified below.
SPl I
~
TA2
-- - --------~--
FP
I
SP2
TA3
We consider the case when initial positions (xo, Yo) satisfy relations Xo = XA, Yo > YA· XA
Let us suppose that actions of the second coalition are the most unfavorable. For trajectories of the system (I) starting from initial positions (xo, Yo) these actions v~ = v~un (x, y) are determined by formula v~un (x, y) ==
Fig. I. Families of characteristics T RI, T R2, T R3. The third collection of characteristics is given by the system of differential equations
O. The optimal actions u~l(x , y) of the first coalition which maximize the payoff functional J~ are constructed as the three-step control if to < t < s { x(s) if s ::;-t < TYA o if TYA ::; t < Tj
::i;
=
x(t) (14)
= (xo -1)e- t + I,
y(t)
(15)
l(s)
= yoe- t ,
if =-y
(11
+ I, y(s) = yoe-
S
= lo
l)e- t
(24)
+ l)yoe- t
-
(CAX(s)y(s)e- t
(25)
-
+ a22)dt
(26)
(r(s)
h(s)
= lo
(CAx(s)YAe- 2t
-
(11x(s)e-t - (12YAe- t + a22)dt
y(t)
- l)e- S
(23)
+ 1) - (12Yoe- t + a22)dt
(1jX(S) - (12y(s)e- t
(17)
(27)
On Fig. initial position 1 P = (0.6, .09), characteristics of three different types T RI (16), TR2 (18), TR3 (22), switching points SPl = (x(s), y(s», SP2 (x(p), y(p» and final point FP = (x(r), Y(1'» are shown.
by relations
= (xo
((xo - l)e- t
h(s)
= ye- t , 0::; t ::; p(s) (18) Initial positions (x,y) = (x(s),y(s» are determined x(s)
= YAe- r
fos (CA((XO res)
solutions of which are determined by the Cauchy formula
x(t) = x,
(12
= It (s) + h(s) + h(s)
h (s) =
The second aggregate of characteristics is determined by the system of differential equations
= 0,
Y(1')
1'=1'(s)=ln
The optimal control problem consists of finding such time s and corresponding switching point (x,y) = (x(s), y(s» on the trajectory (x(·), y(-) that the integral 1 = l(s) attains the maximum value
0::; t ::; s (16) Here the first switching instant s is a parameter of optimization.
::i;
CAx(s)
(12
=xA=CA'
x(1') = XA,
and can be represented by the Cauchy formula
x(t)
= YAe- t , 0::; t ::; 1'(s) (22) instant l' = 1'( s) and the final position y(t)
_,.
xe
SO we consider three collections of characteristics. Characteristics of the first collection satisfy the system of differential equations
= -x + 1, if = -y
= xe- t
Here the final (x(1'), Y(1'» of the characteristic trajectory (22) are given by formulas
The parameter s here is the parameter of optimization. The switching instant TYA is the moment when the trajectory (x(·), y(.)) crosses the line y = YA, i.e. y(TyA ) = YA, and the final instant Tj is determined by the condition that the trajectory (x( ·), y( .» comes back to the line x = XA, i.e. x(Tj ) = XA.
::i;
(21 )
and is determined by the Cauchy formula
I
u~l(x(t), y(t»
y = -y
= -x,
(19)
=
The second switching instant p = p( s) and the position (x(p),y(p» of the characteristic (18) are given by formulas
5. OPTIMAL SYNTHESIS
ye- P = YA = x(p) = x,
~:,
p
y(p) = YA
= p(s) = In CA:j(S)
Let us indicate th~scheme for the solution of the threestep optimal control problem (15)-(27). We express
(20)
44
integrals h, k = 1,2,3 as functions h(x , xo,yo) depending on the first coordinate x of the optimal switching points (x,y) = (x(s),y(s)) and the initial positions (xo , Yo), calculate the derivative by the variable x of the integral I(x, xo, Yo) (24), equate this derivative to zero dI / dx = 0, and find the equation F(x , xo, yo) = for optimal switching coordinates x . Then we connect initial positions (xo , Yo) with the switching positions (x , y) by corresponding relations and obtain the equation F(x , xo(x,y),yo(x,y)) = for the optimal switching curve.
°
°
We calculate derivatives dlddx, dh/dx, dh/dx
°
In the case when GA < the curves N1 and N~ are described symmetrically to formulas (30) . Let us remind ourselves that the line
is the switching line for the first coalition. The curves NA, LA divide the unit square [0, 1) x [0,1) into three parts - the upper part SA :J { (x , y) : x = XA , Y > YA }, the lower part S~ :J { (x, y) : x = XA, Y < '!lA }, and the middle part SAt = SAt! U SAt2, SAtl :J { (x, y) : x < XA, y = YA }, SA 2 :J { (x , y) : x >
Summarizing derivatives dlt/dx , dh/dx, dh/dx and equating the sum to zero we obtain the following equation
XA,'!I=YA}. The "positive" feedback u~l has the following structure
GAYo(l - x) I GAYo(1- x) - al - al n (1 - xo) al(l - xo) (alG~x2 - 2G~a22x 2G~x2
+ ala~)
=
ult
A
°
={
max{O , -sgn(GA )} max{O,sgn(GA)} X [O,x) or [x , 1)
if (x, y) E SA if(x , y) E S~ if(x,y)ESA1 if(x,y)ENAULA (32)
For the second coalition one can obtain similar switching curves NB = N1 U N~ , L B , and construct "positive" feedback v~t .
We assume for definiteness that VA = 0. Taking into account that parameters x, y, Xo, Yo are connected by formula y = Yo (1 - x) / (1 - xo), we obtain the expression for the switching curve N1
On Fig. 2 switching curves N1, N~, LA and N1, N~, LB are shown. The directions of velocities i; are depicted by horizontal arrows : left arrows - in the upper domain SA' right arrows - in the lower domain S~, left-right arrows - in the middle domain SAt. The directions of velocities iJ are indicated by vertical arrows: up arrows - in the left domain S~, down arrows - in the right domain SB' up-down arrows in the middle domain SE/'
GAy-al
I ( GAy-a l ) (GAx-a2)2_ 0 n 1+ 2 al 2GA x 2 (28) The curve N1 with accuracy of the second order with respect to variable y is a hyperbola which passes through the point (XA, YA) ---'-=----- -
al
(29)
6. EQUILIBRIUM VALUE Generalizing the considered three-step optimal control problem we can formulate the result which arises from the optimization nature of this problem and provides better index values than values of trajectories tending to static Nash equilibria.
It has the horizontal asymptote y = 2 al / GA and the vertical asymptote x 0.
=
To construct the complete switching curve N A for the optimal strategy of the first coalition we need to supplement the curve N1 by the analogous curve N~ in the domain where y :S YA
Proposition 3. Consider three-step optimal control u~t (x, y) (32) with the switching curves N A (30), LA
45
(31). Then for any initial position (xo, Yo) E [0, 1] x [0,1] and for any trajectory NB2
(Xfl(-),yfl(-)) E X(xo,Yo,u~)
YB
generated by the optimal feedback control u ~l (x, y) there exists a finite moment t. E [0, TA] such that al this moment the trajectory (xfl (-), yfl (.)) comes te the line x = x A. Then according to the construction 01 three-step optimal feedback control u~ (it maximize~ the integral (24) and conforms to short-term interest~ of individuals) the following estimate holds jiminf (T 1 T-.+ oo
-
t.
)!
1
-------
--7 I
l'
NB
H
LA FP
NE
~
H
!
NA2
2':
1
LB
YA
T
gA(xfl(t),yfl(t))dt
I
---+--------
VA
0
f.
XB
XA
(33 ~
The analogous inequality is valid for trajectories generated by three-step optimal control v~l corresponding to the switching curves NB, LB.
Fig. 2. The equilibrium trajectory T R.
Thus, the acceptable trajectory (xf1(.),yf l (.)) provides better results for both coalitions than trajectories leading to points of a static Nash equilibrium at which corresponding payoffs are equal to values VA and VB.
Hotbauer, 1. and K. Sigmund (1988). The Theorv of Evolution and Dynamic Systems. Cambridg~ University Press. Cambridge. Kaniovski, YM., AV Kryazhimskii and H.P. Young (2000). Adaptive dynamics in games played by heterogeneous populations. Games and Economic Behavior. Kleimenov, A.F. (1993). Nonantagonistic Positional Differential Games. Nauka. Ekaterinburg. (in Russian). Krasovskii, N.N. and AI. Subbotin (1988). GameTheoretical Control Problems. Springer. NY, Berlin. Kryazhimskii, AV and Yu.S. Osipov (1995). On evolutionary-differential games. Proceedings of Steklov Mathematical Institute 211, 257-287. Nelson, R. and S. Winter (1982). An Evolutionary of Economic Change. The Belknap Press of Harvard University Press. Cambridge, Massachusetts and London. Pontryagin, L.S., VG. Boltyanskii, RV Gamkrelidze and E.F Mishchenko (1962). The Mathematical Theory of Optimal Processes. Interscience. NY Sonnevend, G. (1981). Existence and numerical computation of extremal invariant sets in linear differential games. Lectures Notes in Control and Information Sciences 22,251-260. Tarasyev, AM. (1999). Control synthesis in grid schemes for Hamilton-lacobi equations. Annals of Operations Research 88,337-359 . Vorobyev, N.N. (1985). Game Theory . Nauka. Moscow. (in Russian) .
Fig. 2 the acceptable trajectory T R = (xfl(-),yfl(-)) is shown. It starts from the initial
On
position I P = (0.6, 0.75) and moves along a straight line to the corner (1,0) of the unit square [0, 1] x [0,1] with control signals u = 1, v = O. Then it crosses the switching line N1 and the first coalition switches its control u from 1 to x. Further the trajectory T R moves down along the straight line parallel to the y axis until meeting with the switching line LA. Here the first coalition changes its signal u from x to O. After that the trajectory directs to the corner (0,0) . Next it meets the switching line N1 and the second coalition changes its control signal from 0 to y. Then the trajectory T R moves to the left along the straight line parallel to the x axis and intersects the switching curve N~. On this curve the first coalition changes its control signal u from 0 to x. The trajectory T R stops at point FP. Let us note that in this example trajectory (Xfl (.), yfl (.)) provides strictly better results for both coalitions than trajectories leading to points of a static Nash equilibrium since the payoff values at point FP are larger than at point NE.
REFERENCES Basar, T. and G.J. Olsder (1982). Dynamic Noncooperative Game Theory. Academic Press. London, NY
46