Analysis of the role of information pattern in dynamic models of conflict situations

Analysis of the role of information pattern in dynamic models of conflict situations

L!S.S.R. Cornput. Mxhs. Math. Phys. VoL 21, No. 4, pp. 29-31, 1981. Printed in Great Britain 0041-5553/81/040029-09$07.50/O 0 1982. Pergamon Press Lt...

474KB Sizes 0 Downloads 16 Views

L!S.S.R. Cornput. Mxhs. Math. Phys. VoL 21, No. 4, pp. 29-31, 1981. Printed in Great Britain

0041-5553/81/040029-09$07.50/O 0 1982. Pergamon Press Ltd.

ANALYSIS OF THE ROLE OF INFORMATION PATTER-N IN DYNAMIC MODELS OF CONFLICT SITUATIONS*

A.E.BUNAKOV Moscow (Received 8 October 1979; revised 27 Februar.v 1980)

THE THEORY of games with a futed sequence of moves is used to study the control of a discrete dynamic system by two players pursuing different aims, formally described by certain general efficiency criteria. The pay-offs of the player with the right to the first move are compared when he has different information patterns about the system states, and his optimal strategies are found, It is generally accepted that the conflicts inherent in hierarchical control systems should be modelled [ 1- 81 by using the theory of games with unopposed interests [2]. A solution is constructed in [3, 41 in the class of positional strategies, for dynamic decision-making models described by non-antagonistic differential games. This construction has also proved to be suitable for the discrete form of the models [ 51, and for analyzing economic planning and control models [6 - 81. It was shown in [3 - 51 that the optimal strategy consists in choosing a joint program of action, in monitoring the performance of the program, and in threatening a penalty for deviations from it. The discrete dynamic model of decision-making examined below differs from that of [5] in that the players’ criteria contain integral as well as terminal terms. It should be noted that the introduction of extra phase variables, taking account of the running values of the integral terms, still does not reduce the model of conflict to the form of [5] if the players use positional strategies, since the player’s information about the new components of the state vector then usually increases his maximum guaranteed result. This fact was first noted by A. F. Kononenko, to whom is due the example quoted in our Section 3. It is from this stand-point that we study the role of information on the different variables characterizing the system state. We find that, apart from information about the running values of the criteria, a memory of the history of the past states can give a significant advantage.

1

We describe the multi-step game by the expressions

xk+i=fk bk, uk,

uk>,

k=O,

uk=lJk~EP,

*Zh. vj%hisZ.Mat. mat. Fiz., 21,4,844-852,

1,. . . , N,

G’k=

1981.

29

Vkdg,

xkEE’:

(1.1) (1.2)

30

A. .I2 Bunnkov N

c

+e+ibN+d

(1.3)

h, (9, uk, %a) +hN+l cxN+i);

(1.4)

gR(xR, $i, 4

Jt =

7

k=O

iv

Jz =

is k-0

here,

Xk

is the system state at the k-th step, Uk, Uk are the

COIItrOl

actions of players PI, P2

respectively,

and ./I, J2 are their pay-off functionals.

All the functions

continuous,

the sets uk, vk are closed and bounded

in the appropriate

in (1.1) - (1.4) are Euclidean spaces, and the

initial state x0 is given. At each step the players get to know the running state next Uk, Yk in the light of this information,

Xk,

and they make their choice of the

using as their strategies the synthesis

u~={uo(zo),..., U&N)},

~~={~o(xo),. . ., dLv)},

where I.&,(s&,)and vk (&) are mappings off? into Uk and of Er into vk respectively. On substituting into (1. l), the pair uX, vX uniquely determines the phase trajectory z= {to,. . . , JX+~}, the programs

U= {uO, . . . ,

Consider the statement

uN}, u = {vo, . . . , VN}, and hence the values of J1, J2.

of the game FIX ( see [3, S]), in which the order of moves is fvted:

player P1 chooses first, and tells P2 his strategy u,; then P2 chooses any vX from the set

(The functional

6 ( . ) is known to PI ; 6 (u,) =0,

this rule of behaviour

if sup J, (uz, v,) is reached.) Knowing about

of P2, player P1 chooses u+~, wi:h

guarantees that he obtains, maybe with

e-accuracy, the quantity inf J, (u+, v,) , ux “x=.R(u,)

IV, = sup which is the upper bound of his guaranteed

pay-offs.

Our problem in the present section is to find IV, and U= in the class of strategies indicated. We introduce new variables TV’, uk’, vk’, connected, like Xk, Uk, vk, by rdatiOnS (1 .l), (1.2). We define the constructions

31

Information pnttern in dynamic models of conflict situations

L, (Xk,Xk’, Uk’, . . . , XN’, UN’) = LA+(Xk’,Uk’, . . . , XN’,UN’)p (Xk_Xk’)

inf Lk+h

+

4,

&, & . . . ,XN’,2~~‘)[l-p txk-xk’) I,

uk="h

+

Lk+i[fktxh, Uk, Uh>,XL+*, &+i,.-*rXN’7UN’113

L,-

(Xh’,z&h’,. . . XN’,UN’)= sup {h (Xk’,Uk’r4 )

Oh

+L+*cfk(x’ k t”k, ’ u& > 5’k+k,“h+i,-*-,xN, ’

’ UN’]),

I

Pk(xk,

VA+

uk,

(XI,

xkfi)

=

uh, %+i)

=

{Uk=vkj Arg

fk(xk,

UA,

Uid

mix

b

min

gk bkr

ukt

1,.

. . , N-1,

bk,

uk7

k=O,

=Tk+i},

uk),

Ul,‘,&l),

~kE~k\Pkbk',

UkEPk

I,

(xk,

Uh,

*,

* *

xk+i)

A’,

,

‘k

vk-

(&,

ukr

irk+%)

=

Arg

Uk) ,

94 Uh=v,,+

vN+

(xk,

uk,

k=O,

xk+i),

(XN, UN) = Arg max

{h(Xnr,

UN1

U,)

+hN+,

UN)

+gN+i

[f~

(XN,

UN,

UN)

UN,

Uh.)

I},

VKdN

VN-(XN,

UN)

Arg min {gN (zxT UN,

=

[f~

(EN,

I},

DN UN=

vN+

(XN,

UN),

Q= {u, VI Uk=Uk, k=O,

UkEvk-

1,. . . ( N-l,

(Sk,

u.YEUN,

uk?

xkd)

t

xk+i=fkbk,

UNE~N-(ZN,

ukv

Uk)

t

UN)},

N

L(n, u)=

-L,-

min CleSn E l=h

(xk, uk,

. . . , XX,

UN)hk+i=f&R,

L+(u’,v’)=limsupL(u,u), a*0 (U.0)

Ukr

u,),

(u, v) =Q,

k=Ot I,. . * N 3

II(u, u)-

1

(u’, u’) II+

where 11. 11is the norm in space E(P+~) (N+l). Theorem

I

Assume that the function L+ (u, u) has no local maxima with value zero in the set 8. Then, the lowest upper bound of PI’s guaranteed pay-offs in the game rlx is

wx= sup J, (II, u), L’l,Vl

(u, U)EQ,

L+(u, u)z=O.

32

A. E. Bunakov

Proo& 1. We first show that

(1.9 Given any ufx, we construct

a sequence

ulf”‘)2

J”(?&‘,

The trajectories

and programs

{L:F) } such that

sup J*(&‘, z’x

z&)--E(“),

&(n)-+O.

zCn), ZL(~), UC”‘, generated by pairs ufx, vx@), satisfy inequalities N

(Lzy.up,.....&“‘, Lp)

L,-

G

z:

h~(x;*),zp, u:“‘)

l=k

k=O, 1,. . . , N,

and hence $k-,-

L(a’“‘, (&

guaranteeing

v(~‘) >-aCn),

(n) (n) , & , z$;),

and moreover, nothing prevents P2 from choosing each time

k=O.

1, . . . , ,V- 1, u.?) ~17~- (5:))

u.?’ ), thereby

(u(“), c’“)) E-Q.

the inclusion

It can be assumed without We have

loss of generality

-ECn) G sup L(U, V),

that { ( utn), u(“))}

(u, u)

(“ZO’)

=Q,

converges to (u’, v’) EB.

II (u. v) - (u’, u’) II

G /I (ZP), uCn))- (zz’, u’) II. In the limit as n + 00, we obtain 0
small neighbourhood

inequality of functions

(1 S) is proved. J, (u, u) , L+ (u, u) , given any positive E, a (u+, u’) E Q

exists such that ~t+~=f~(~~+, &.+, uk+), k-0, L(a+,u+)>O,

1,. . . , N:

li(U+,u+)LSUpJi(u,u)--E, (U,Ul

L+ (22,u) 30.

(4 u) 4 We construct

of the set {(u, 7.7)EB 1L+ (12,L’)>O},

u,-:

Lk’ bk, uk- (xk), < inf Lk’ uk EUk

Xk++ir uk;,, . . . xN+,

UN+)

L(u+, u’) (zk,

uk,

xk:1,

uk:i?

. . . , dN+T

&v+)

+y,

r<

NSI



33

Inform&ion pattern in dynamic models of conflict situations

X [ l-p(xy-x.kf)

Let us show that the strategies u,+:ur+ (G) =uk+p(xcr--zk+) +u~‘(w guarantee P1 the pay-off./1 u+,, realize the trajectory

]

(u+, v’). In fact, on choosing v, from among those which, jointly with x+, P2 can gain N

c

hk(xk+ruk+,uk+)+h,+i (d+J ,

k=O

k
while if xk=xk+,

xm+i#xk+l

, then

N

k=m+1 N-l

G h, (xv,+,urn+, urn)+

E hk(xk,

uk, v,) +LN+ (xN+,

k=m+

+

LN+(x~, UN) [I-p(x~-XN+)

inf

UN+)p (XN--5rc+)

1

l+y

UNCUN

+ + urn+, urn’) +Lwi++,(xm+ir &n+i, %x+2, Um+2,.

. . . G hm(x,+, +r

(N-m)


(x,+,

Urn+,

. . . , xN+,

UN+)

+‘r

*.

, xN

+,

UN +>

(N+l-m)

N

<

c

hk(xk+, uR+, uk+) +hN+i (x:+d.

k-m

Hence it is disadvantageous

for P2 to deviate from the trajectory

x+ even temporarily.

Hence

N

min tZ ER(u;

_‘, ( ur+, v,) = )

E

g, (a&+, r&+7 Uk+) f&Y,+, (Xht+i)

k=O

2 sup I, (u, u) -&>W,--&, (U>O) Since E is arbitrary,

L+ (72, V) 2 0.

the theorem is proved.

Notice that the idea underlying trajectories

(u, u) EQ,

the solution of the problem - construction

which are in a sense advantageous

of a set of

to both players - remains unchanged

in the absence

of the condition imposed on the function L+ (u, v) in the theorem, though the structure e-optimal strategies then becomes much more complex,

of PI’s

2

We shall now assume that the players in game (1 .l) - (1.4) remember all the system states UF= {uO(FO), . . . ,

realized before the given step, and that their strategies are the vector-functions UN(Z,) } ; here, 5c= zYE={Ua(Zo),..., UN@A’)}, notation iiI={uO, . . . , ub}, EiR={uO, .. . , vk} ) , while E(k+i)r intO uk and ECk+‘jr intO v,.

(50,

. . . , xR} uk (5-L))

(below, we shall also use the vk (5-k)

are

any

mappings of

34

A. E. Bumzkov

The pair r+ 5 uniquely determines u, v on substitution into (1. I); and the mapping n (up, VZ)= (u, u) is thereby specified. The general scheme of the game (i.e. the order of moves and the rules of the players’ behaviour) remains as before. In view of the change in the players’ information patterns, we call the game rr F. We evaluate

and construct the corresponding optimal strategy of PI. We introduce the notation

SW hv+*>=hv+i h+A, ~&(5~) = min Sh+(a, 7.~~)

U,- (zJ = Arg min Sk+(9, ub),

Yl& =Uk

Sk+

(%

uk)

%k e”k

=

min

{hk

bk,

uk,

{hk

(xk:k, uk.

[ fkbk9

+sk+i

uk)

ukt

uk)

1))

‘REVA

vk+

(a?

uk)

=

Ax

max

vk)

+sk+i

[fk

(zk,

[fk

(XkLk, Ukr

uk?

vk)

I},

‘kEvk

Mk-bk,

inf

uk)=

{gk

bkr

uk,

uk)

+fifk+f

uk)

I},

‘k

Vkf

l’k+ (xk, &,) ,

%-={%-@O),

Mk-

uz+

k&O,

1.

. . . , h’-(+)}.

[sky

uk-

(Uf)

=

Uk+[Sk,

(Sk)

{uo+

uk=Dk+

Uk

1 ZMk

[zs,

Uk (5,)

Mk- [xk,

220 (To)

] =‘C7R+

(5-k) ] 2

[xk,

=fk

m=I,2,...

(Urn,

bk,

Uk-(5k)EUk-(5k),

hk)

(&

--E/N,

] 1 . . . ( vx+

r5k,

uk

(9,

uk,

zrn-,

Uk+

uk)

uk)

+Mk+I

(ah,

h-,

[fk

v~+( - )

VIE

uk)

us

(Fx)

I } *

I*‘,-


(%

uk,

‘k)

]

1 ‘k-“k

@k)

)

1,. . . , N.

k=O,

1 Uk=Uk,

: Sk

155,

@k)] ,

uk(~k>l}--r,

Obviously, strategies of the type D,=

. . . . N.

can be constructed for any positive e, y: Uk,

(xk,

(xnI,

xkif

k=Gm-1, u,sU,},

u,),

, N, & = &; the sets v,- (xk, .%A,+,)

xk+L),

are defined in Section 1;

m-i gk bk,

ifD,,, = c$,we put W, = - =,

uk,

ok)

(2.1)

35

Information pnttem in dynamic models of conflict situations Theorem 2

the lowest upper bound of PI’s guaranteed pay-offs is

kr the game rlx

Wf=

max W,. OCmCN

He is guaranteed this pay-off up to 2~ by the strategy UT+: r&+(Ek) =i&+p (fR--Ek+) +u,- (a$) [I-p uk+ (Z,) =uj+- (Xk),

(Zk--Jk+) J (

k
OS,

where s is the number of the step at which W,=

max W,, (x0, x1+, . . . , z,+} is the segment O~rn6.v of trajectory generated by the “cut-off’ programs US+, E$_r, realizing the right-hand side of (2.1) up to e for m = s. &oof: For any strategy uxO we find

m=

min max tlxCR(u,‘) kA(v;)

{Z},

where N

A (Vi4=

1

llsl+ (Xi, u,) 3

I: hkhk,

&l,

Uk)+hv+, (xN+i)

(

k=,

(u, u> =n (UrO.Vr) }. Let m be reached for some UE’, n (us’, 0;;‘) = (u’, v’) , s’is the corresponding trajectory. We

replace ~~‘(5~) byu,+ [q,, uko(Zk)] , k>m.

Then,

(2.2) where

G:“), V~"={Yo'(~o),

(u”, v”)=x(u,“,

. . . , V,:z-, (IL’m-t), V,+[&, Um”(fm) I,. . . , UN+

[XN,UN”(5f.V) I} d-2 (Go>. But, with I Cm, m-i

s,+(x1”, q”)

=S,f

(Xl’, 22,‘)c

E hk(Xk’,Uk’,Uk’)

kc, .V

+

c

VI--i

hk(Xk’, Uk’, Uk’)+

c

J-h+,(&+d=s

A=,>,

hk(%‘, uk’,h’)

k-l

m-i +&+

n kc,

(&,‘,

u,‘)

=

c

hk (Xk”,

U.k”,uk”) + &,+ txm”,um”)

36

A. E. Bumkov

i.e. the equality sign holds in (2.2), since otherwise we should have

min

{Z}>m,

IrA(v;")

which is impossible.

Since

Q”EV~+

(xl”, u,,“),

vx+ (UT)), it is easily seen from (2.2) that

kanz

k>m+l.

c

gk (ZkN,Uk”,h”)

+g,+i

N+1) Gc (2”

&?A (ZA”,Uk”,VA”)

k-m

k=m

(XN”, z&y”)+y<.

. .a!,-

(x,“, U,“) +y (N-n+l).

In accordance with his rule of behaviour, P2 can choose uA”~VIL-(xR”, Then,

(IL”,

Ek_,)

of the strategy

Hence

K--l

h-

Sfl!,-

(see the definition

uk”~Uk-(rk”),

uk”, z ;+i) , k
ED,, inf Ii (uzo, z+J
Since this is valid for any

UFO, y>O,

(N+l-m).

we obtain as y + 0: (2.3)

At the same time, it is quite obvious that the r++, mentioned

in the theorem, guarantees P1 the

pay-off W, - 2~ for any positive E. Hence we can only have the equality sign in (2.3), which it was required to prove.

3 By introducing

auxiliary phase variables _vk1, _Yk2, varying as $+i=&+gk

(%

UA, VA),

k=l, 2,. . . ,A’,

yo’=O,

t5A?

uh7 UA),

k=l,

y,,*=O,

2 yA+i=yA'+hA

2,. . . , N,

the model (1.1) - (1.4) can be formally reduced to the form Zk+i=Fk(Zk,Uk,UA),

k=l,

2,. . . , N,

where .&=(zA, Yki, &‘). IfPl hopes to obtain information, not only about Xk, but also about yk2, ykl, then the method used in [5] can be employed to solve the game with the above sequer of moves. Notice that here a memory of the past realized states y, does not give PI an extra pay-air as compared with the case when he only remembers the running Zk. After PI fixes a strategy of any type, &, uj;, u z or q, and tells his partner, the actions of the latter amount in essence to solving an optimal control problem. Hence PI’S result is independent of the strategy used by P2, and we have the inequalities

.

31

Information pattern in dynamic models of conflict situations

While the equation W, = IV, is familiar for antagonistic games (see [9]), the inequalities (3.1) are usually strict for non antagonistic games.

The author sincerely thanks A. F. Kononenko for suggesting the problem and for his assistance. Translated by

D. E. Brown

REFERENCES 1. MOISEEV, N. N., Elements of the theory of optimal systems (Elementy teorii optimal’nkh s&em), Nauka, Moscow. 1975. 2. GERMEIER, YU. B., Games with unopposed inferests (Igry s neprotivopolozhnymi Moscow, 1976. 3. KONONENKO, A. F., On equilibrium positional strategies in non-antagonistic Nauk SSSR, 231, No. 2,285-288,1976.

interesami), Nauka,

differential games, DokL Akod.

4. KONONENKO, A. F., On multi-step conflicts with information exchange, Zh. uychisr Mat. mat. Fir., 17, No. 4, ‘322-931,1977. 5. DANIL’CHENKO, T. N., and KONONENKO, A. F., Dynamic models of decision-making in hierarchical systems, in: Modern state of operations research theory (Sovremennoe sostoyanie teorii issl. operatsii), Nauka, Moscow, 1979. 6. KONONENKO, A. F., and MAMEDOV, M. B., A game model of economics with financial interactions, Rot. International Conf on Modelling of Economic Processes, VTs Akad. Nauk SSSR, Moscow, 1975. 7. MAMEDOV, M. B., Solution of a differential game corresponding to a model of rational bank credit to an undertaking, in: fioceedings of the Scientific Conference of Post-GraduateStudents of Akad. Nauk AzerbSSR, Baku: Elm, 1975. 8. KONONENKO, A F., Game-theoretic analysis of a dynamic model of interaction of control system elements. Fifth All-Union Conference-Seminar on the Control of Large Systems, Alma-Ata, KazPTI im. V. 1. Lenina, 1978. 9. KRASOVSKII, N. N., and SUBBOTIN, A. I, Positionaldifferential games (Pozitsionnye differentsial’nye igry), Nauka, Moscow, 1974.