L!S.S.R. Cornput. Mxhs. Math. Phys. VoL 21, No. 4, pp. 29-31, 1981. Printed in Great Britain
0041-5553/81/040029-09$07.50/O 0 1982. Pergamon Press Ltd.
ANALYSIS OF THE ROLE OF INFORMATION PATTER-N IN DYNAMIC MODELS OF CONFLICT SITUATIONS*
A.E.BUNAKOV Moscow (Received 8 October 1979; revised 27 Februar.v 1980)
THE THEORY of games with a futed sequence of moves is used to study the control of a discrete dynamic system by two players pursuing different aims, formally described by certain general efficiency criteria. The pay-offs of the player with the right to the first move are compared when he has different information patterns about the system states, and his optimal strategies are found, It is generally accepted that the conflicts inherent in hierarchical control systems should be modelled [ 1- 81 by using the theory of games with unopposed interests [2]. A solution is constructed in [3, 41 in the class of positional strategies, for dynamic decision-making models described by non-antagonistic differential games. This construction has also proved to be suitable for the discrete form of the models [ 51, and for analyzing economic planning and control models [6 - 81. It was shown in [3 - 51 that the optimal strategy consists in choosing a joint program of action, in monitoring the performance of the program, and in threatening a penalty for deviations from it. The discrete dynamic model of decision-making examined below differs from that of [5] in that the players’ criteria contain integral as well as terminal terms. It should be noted that the introduction of extra phase variables, taking account of the running values of the integral terms, still does not reduce the model of conflict to the form of [5] if the players use positional strategies, since the player’s information about the new components of the state vector then usually increases his maximum guaranteed result. This fact was first noted by A. F. Kononenko, to whom is due the example quoted in our Section 3. It is from this stand-point that we study the role of information on the different variables characterizing the system state. We find that, apart from information about the running values of the criteria, a memory of the history of the past states can give a significant advantage.
1
We describe the multi-step game by the expressions
xk+i=fk bk, uk,
uk>,
k=O,
uk=lJk~EP,
*Zh. vj%hisZ.Mat. mat. Fiz., 21,4,844-852,
1,. . . , N,
G’k=
1981.
29
Vkdg,
xkEE’:
(1.1) (1.2)
30
A. .I2 Bunnkov N
c
+e+ibN+d
(1.3)
h, (9, uk, %a) +hN+l cxN+i);
(1.4)
gR(xR, $i, 4
Jt =
7
k=O
iv
Jz =
is k-0
here,
Xk
is the system state at the k-th step, Uk, Uk are the
COIItrOl
actions of players PI, P2
respectively,
and ./I, J2 are their pay-off functionals.
All the functions
continuous,
the sets uk, vk are closed and bounded
in the appropriate
in (1.1) - (1.4) are Euclidean spaces, and the
initial state x0 is given. At each step the players get to know the running state next Uk, Yk in the light of this information,
Xk,
and they make their choice of the
using as their strategies the synthesis
u~={uo(zo),..., U&N)},
~~={~o(xo),. . ., dLv)},
where I.&,(s&,)and vk (&) are mappings off? into Uk and of Er into vk respectively. On substituting into (1. l), the pair uX, vX uniquely determines the phase trajectory z= {to,. . . , JX+~}, the programs
U= {uO, . . . ,
Consider the statement
uN}, u = {vo, . . . , VN}, and hence the values of J1, J2.
of the game FIX ( see [3, S]), in which the order of moves is fvted:
player P1 chooses first, and tells P2 his strategy u,; then P2 chooses any vX from the set
(The functional
6 ( . ) is known to PI ; 6 (u,) =0,
this rule of behaviour
if sup J, (uz, v,) is reached.) Knowing about
of P2, player P1 chooses u+~, wi:h
guarantees that he obtains, maybe with
e-accuracy, the quantity inf J, (u+, v,) , ux “x=.R(u,)
IV, = sup which is the upper bound of his guaranteed
pay-offs.
Our problem in the present section is to find IV, and U= in the class of strategies indicated. We introduce new variables TV’, uk’, vk’, connected, like Xk, Uk, vk, by rdatiOnS (1 .l), (1.2). We define the constructions
31
Information pnttern in dynamic models of conflict situations
L, (Xk,Xk’, Uk’, . . . , XN’, UN’) = LA+(Xk’,Uk’, . . . , XN’,UN’)p (Xk_Xk’)
inf Lk+h
+
4,
&, & . . . ,XN’,2~~‘)[l-p txk-xk’) I,
uk="h
+
Lk+i[fktxh, Uk, Uh>,XL+*, &+i,.-*rXN’7UN’113
L,-
(Xh’,z&h’,. . . XN’,UN’)= sup {h (Xk’,Uk’r4 )
Oh
+L+*cfk(x’ k t”k, ’ u& > 5’k+k,“h+i,-*-,xN, ’
’ UN’]),
I
Pk(xk,
VA+
uk,
(XI,
xkfi)
=
uh, %+i)
=
{Uk=vkj Arg
fk(xk,
UA,
Uid
mix
b
min
gk bkr
ukt
1,.
. . , N-1,
bk,
uk7
k=O,
=Tk+i},
uk),
Ul,‘,&l),
~kE~k\Pkbk',
UkEPk
I,
(xk,
Uh,
*,
* *
xk+i)
A’,
,
‘k
vk-
(&,
ukr
irk+%)
=
Arg
Uk) ,
94 Uh=v,,+
vN+
(xk,
uk,
k=O,
xk+i),
(XN, UN) = Arg max
{h(Xnr,
UN1
U,)
+hN+,
UN)
+gN+i
[f~
(XN,
UN,
UN)
UN,
Uh.)
I},
VKdN
VN-(XN,
UN)
Arg min {gN (zxT UN,
=
[f~
(EN,
I},
DN UN=
vN+
(XN,
UN),
Q= {u, VI Uk=Uk, k=O,
UkEvk-
1,. . . ( N-l,
(Sk,
u.YEUN,
uk?
xkd)
t
xk+i=fkbk,
UNE~N-(ZN,
ukv
Uk)
t
UN)},
N
L(n, u)=
-L,-
min CleSn E l=h
(xk, uk,
. . . , XX,
UN)hk+i=f&R,
L+(u’,v’)=limsupL(u,u), a*0 (U.0)
Ukr
u,),
(u, v) =Q,
k=Ot I,. . * N 3
II(u, u)-
1
(u’, u’) II+
where 11. 11is the norm in space E(P+~) (N+l). Theorem
I
Assume that the function L+ (u, u) has no local maxima with value zero in the set 8. Then, the lowest upper bound of PI’s guaranteed pay-offs in the game rlx is
wx= sup J, (II, u), L’l,Vl
(u, U)EQ,
L+(u, u)z=O.
32
A. E. Bunakov
Proo& 1. We first show that
(1.9 Given any ufx, we construct
a sequence
ulf”‘)2
J”(?&‘,
The trajectories
and programs
{L:F) } such that
sup J*(&‘, z’x
z&)--E(“),
&(n)-+O.
zCn), ZL(~), UC”‘, generated by pairs ufx, vx@), satisfy inequalities N
(Lzy.up,.....&“‘, Lp)
L,-
G
z:
h~(x;*),zp, u:“‘)
l=k
k=O, 1,. . . , N,
and hence $k-,-
L(a’“‘, (&
guaranteeing
v(~‘) >-aCn),
(n) (n) , & , z$;),
and moreover, nothing prevents P2 from choosing each time
k=O.
1, . . . , ,V- 1, u.?) ~17~- (5:))
u.?’ ), thereby
(u(“), c’“)) E-Q.
the inclusion
It can be assumed without We have
loss of generality
-ECn) G sup L(U, V),
that { ( utn), u(“))}
(u, u)
(“ZO’)
=Q,
converges to (u’, v’) EB.
II (u. v) - (u’, u’) II
G /I (ZP), uCn))- (zz’, u’) II. In the limit as n + 00, we obtain 0
small neighbourhood
inequality of functions
(1 S) is proved. J, (u, u) , L+ (u, u) , given any positive E, a (u+, u’) E Q
exists such that ~t+~=f~(~~+, &.+, uk+), k-0, L(a+,u+)>O,
1,. . . , N:
li(U+,u+)LSUpJi(u,u)--E, (U,Ul
L+ (22,u) 30.
(4 u) 4 We construct
of the set {(u, 7.7)EB 1L+ (12,L’)>O},
u,-:
Lk’ bk, uk- (xk), < inf Lk’ uk EUk
Xk++ir uk;,, . . . xN+,
UN+)
L(u+, u’) (zk,
uk,
xk:1,
uk:i?
. . . , dN+T
&v+)
+y,
r<
NSI
’
33
Inform&ion pattern in dynamic models of conflict situations
X [ l-p(xy-x.kf)
Let us show that the strategies u,+:ur+ (G) =uk+p(xcr--zk+) +u~‘(w guarantee P1 the pay-off./1 u+,, realize the trajectory
]
(u+, v’). In fact, on choosing v, from among those which, jointly with x+, P2 can gain N
c
hk(xk+ruk+,uk+)+h,+i (d+J ,
k=O
k
while if xk=xk+,
xm+i#xk+l
, then
N
k=m+1 N-l
G h, (xv,+,urn+, urn)+
E hk(xk,
uk, v,) +LN+ (xN+,
k=m+
+
LN+(x~, UN) [I-p(x~-XN+)
inf
UN+)p (XN--5rc+)
1
l+y
UNCUN
+ + urn+, urn’) +Lwi++,(xm+ir &n+i, %x+2, Um+2,.
. . . G hm(x,+, +r
(N-m)
(x,+,
Urn+,
. . . , xN+,
UN+)
+‘r
*.
, xN
+,
UN +>
(N+l-m)
N
<
c
hk(xk+, uR+, uk+) +hN+i (x:+d.
k-m
Hence it is disadvantageous
for P2 to deviate from the trajectory
x+ even temporarily.
Hence
N
min tZ ER(u;
_‘, ( ur+, v,) = )
E
g, (a&+, r&+7 Uk+) f&Y,+, (Xht+i)
k=O
2 sup I, (u, u) -&>W,--&, (U>O) Since E is arbitrary,
L+ (72, V) 2 0.
the theorem is proved.
Notice that the idea underlying trajectories
(u, u) EQ,
the solution of the problem - construction
which are in a sense advantageous
of a set of
to both players - remains unchanged
in the absence
of the condition imposed on the function L+ (u, v) in the theorem, though the structure e-optimal strategies then becomes much more complex,
of PI’s
2
We shall now assume that the players in game (1 .l) - (1.4) remember all the system states UF= {uO(FO), . . . ,
realized before the given step, and that their strategies are the vector-functions UN(Z,) } ; here, 5c= zYE={Ua(Zo),..., UN@A’)}, notation iiI={uO, . . . , ub}, EiR={uO, .. . , vk} ) , while E(k+i)r intO uk and ECk+‘jr intO v,.
(50,
. . . , xR} uk (5-L))
(below, we shall also use the vk (5-k)
are
any
mappings of
34
A. E. Bumzkov
The pair r+ 5 uniquely determines u, v on substitution into (1. I); and the mapping n (up, VZ)= (u, u) is thereby specified. The general scheme of the game (i.e. the order of moves and the rules of the players’ behaviour) remains as before. In view of the change in the players’ information patterns, we call the game rr F. We evaluate
and construct the corresponding optimal strategy of PI. We introduce the notation
SW hv+*>=hv+i h+A, ~&(5~) = min Sh+(a, 7.~~)
U,- (zJ = Arg min Sk+(9, ub),
Yl& =Uk
Sk+
(%
uk)
%k e”k
=
min
{hk
bk,
uk,
{hk
(xk:k, uk.
[ fkbk9
+sk+i
uk)
ukt
uk)
1))
‘REVA
vk+
(a?
uk)
=
Ax
max
vk)
+sk+i
[fk
(zk,
[fk
(XkLk, Ukr
uk?
vk)
I},
‘kEvk
Mk-bk,
inf
uk)=
{gk
bkr
uk,
uk)
+fifk+f
uk)
I},
‘k
Vkf
l’k+ (xk, &,) ,
%-={%-@O),
Mk-
uz+
k&O,
1.
. . . , h’-(+)}.
[sky
uk-
(Uf)
=
Uk+[Sk,
(Sk)
{uo+
uk=Dk+
Uk
1 ZMk
[zs,
Uk (5,)
Mk- [xk,
220 (To)
] =‘C7R+
(5-k) ] 2
[xk,
=fk
m=I,2,...
(Urn,
bk,
Uk-(5k)EUk-(5k),
hk)
(&
--E/N,
] 1 . . . ( vx+
r5k,
uk
(9,
uk,
zrn-,
Uk+
uk)
uk)
+Mk+I
(ah,
h-,
[fk
v~+( - )
VIE
uk)
us
(Fx)
I } *
I*‘,-
(%
uk,
‘k)
]
1 ‘k-“k
@k)
)
1,. . . , N.
k=O,
1 Uk=Uk,
: Sk
155,
@k)] ,
uk(~k>l}--r,
Obviously, strategies of the type D,=
. . . . N.
can be constructed for any positive e, y: Uk,
(xk,
(xnI,
xkif
k=Gm-1, u,sU,},
u,),
, N, & = &; the sets v,- (xk, .%A,+,)
xk+L),
are defined in Section 1;
m-i gk bk,
ifD,,, = c$,we put W, = - =,
uk,
ok)
(2.1)
35
Information pnttem in dynamic models of conflict situations Theorem 2
the lowest upper bound of PI’s guaranteed pay-offs is
kr the game rlx
Wf=
max W,. OCmCN
He is guaranteed this pay-off up to 2~ by the strategy UT+: r&+(Ek) =i&+p (fR--Ek+) +u,- (a$) [I-p uk+ (Z,) =uj+- (Xk),
(Zk--Jk+) J (
k
OS,
where s is the number of the step at which W,=
max W,, (x0, x1+, . . . , z,+} is the segment O~rn6.v of trajectory generated by the “cut-off’ programs US+, E$_r, realizing the right-hand side of (2.1) up to e for m = s. &oof: For any strategy uxO we find
m=
min max tlxCR(u,‘) kA(v;)
{Z},
where N
A (Vi4=
1
llsl+ (Xi, u,) 3
I: hkhk,
&l,
Uk)+hv+, (xN+i)
(
k=,
(u, u> =n (UrO.Vr) }. Let m be reached for some UE’, n (us’, 0;;‘) = (u’, v’) , s’is the corresponding trajectory. We
replace ~~‘(5~) byu,+ [q,, uko(Zk)] , k>m.
Then,
(2.2) where
G:“), V~"={Yo'(~o),
(u”, v”)=x(u,“,
. . . , V,:z-, (IL’m-t), V,+[&, Um”(fm) I,. . . , UN+
[XN,UN”(5f.V) I} d-2 (Go>. But, with I Cm, m-i
s,+(x1”, q”)
=S,f
(Xl’, 22,‘)c
E hk(Xk’,Uk’,Uk’)
kc, .V
+
c
VI--i
hk(Xk’, Uk’, Uk’)+
c
J-h+,(&+d=s
A=,>,
hk(%‘, uk’,h’)
k-l
m-i +&+
n kc,
(&,‘,
u,‘)
=
c
hk (Xk”,
U.k”,uk”) + &,+ txm”,um”)
36
A. E. Bumkov
i.e. the equality sign holds in (2.2), since otherwise we should have
min
{Z}>m,
IrA(v;")
which is impossible.
Since
Q”EV~+
(xl”, u,,“),
vx+ (UT)), it is easily seen from (2.2) that
kanz
k>m+l.
c
gk (ZkN,Uk”,h”)
+g,+i
N+1) Gc (2”
&?A (ZA”,Uk”,VA”)
k-m
k=m
(XN”, z&y”)+y<.
. .a!,-
(x,“, U,“) +y (N-n+l).
In accordance with his rule of behaviour, P2 can choose uA”~VIL-(xR”, Then,
(IL”,
Ek_,)
of the strategy
Hence
K--l
h-
Sfl!,-
(see the definition
uk”~Uk-(rk”),
uk”, z ;+i) , k
ED,, inf Ii (uzo, z+J
Since this is valid for any
UFO, y>O,
(N+l-m).
we obtain as y + 0: (2.3)
At the same time, it is quite obvious that the r++, mentioned
in the theorem, guarantees P1 the
pay-off W, - 2~ for any positive E. Hence we can only have the equality sign in (2.3), which it was required to prove.
3 By introducing
auxiliary phase variables _vk1, _Yk2, varying as $+i=&+gk
(%
UA, VA),
k=l, 2,. . . ,A’,
yo’=O,
t5A?
uh7 UA),
k=l,
y,,*=O,
2 yA+i=yA'+hA
2,. . . , N,
the model (1.1) - (1.4) can be formally reduced to the form Zk+i=Fk(Zk,Uk,UA),
k=l,
2,. . . , N,
where .&=(zA, Yki, &‘). IfPl hopes to obtain information, not only about Xk, but also about yk2, ykl, then the method used in [5] can be employed to solve the game with the above sequer of moves. Notice that here a memory of the past realized states y, does not give PI an extra pay-air as compared with the case when he only remembers the running Zk. After PI fixes a strategy of any type, &, uj;, u z or q, and tells his partner, the actions of the latter amount in essence to solving an optimal control problem. Hence PI’S result is independent of the strategy used by P2, and we have the inequalities
.
31
Information pattern in dynamic models of conflict situations
While the equation W, = IV, is familiar for antagonistic games (see [9]), the inequalities (3.1) are usually strict for non antagonistic games.
The author sincerely thanks A. F. Kononenko for suggesting the problem and for his assistance. Translated by
D. E. Brown
REFERENCES 1. MOISEEV, N. N., Elements of the theory of optimal systems (Elementy teorii optimal’nkh s&em), Nauka, Moscow. 1975. 2. GERMEIER, YU. B., Games with unopposed inferests (Igry s neprotivopolozhnymi Moscow, 1976. 3. KONONENKO, A. F., On equilibrium positional strategies in non-antagonistic Nauk SSSR, 231, No. 2,285-288,1976.
interesami), Nauka,
differential games, DokL Akod.
4. KONONENKO, A. F., On multi-step conflicts with information exchange, Zh. uychisr Mat. mat. Fir., 17, No. 4, ‘322-931,1977. 5. DANIL’CHENKO, T. N., and KONONENKO, A. F., Dynamic models of decision-making in hierarchical systems, in: Modern state of operations research theory (Sovremennoe sostoyanie teorii issl. operatsii), Nauka, Moscow, 1979. 6. KONONENKO, A. F., and MAMEDOV, M. B., A game model of economics with financial interactions, Rot. International Conf on Modelling of Economic Processes, VTs Akad. Nauk SSSR, Moscow, 1975. 7. MAMEDOV, M. B., Solution of a differential game corresponding to a model of rational bank credit to an undertaking, in: fioceedings of the Scientific Conference of Post-GraduateStudents of Akad. Nauk AzerbSSR, Baku: Elm, 1975. 8. KONONENKO, A F., Game-theoretic analysis of a dynamic model of interaction of control system elements. Fifth All-Union Conference-Seminar on the Control of Large Systems, Alma-Ata, KazPTI im. V. 1. Lenina, 1978. 9. KRASOVSKII, N. N., and SUBBOTIN, A. I, Positionaldifferential games (Pozitsionnye differentsial’nye igry), Nauka, Moscow, 1974.