U.S.S.R. Comput.Maths.Math.Phys.,Vo1.28,No.l,1-7,1988 Printed in Great Britain
OO41-5553/E@ $10.00+0.00 01989 Pergamon Press plc
ON COMPROMISE SCHEMES IN DYNAMIC MODELS OF CONFLICTING SITUATIONS* A.B. BELYAYEV and I.S. MEN'SHIKOV
Some schemes of talks between players in a dynamic conflicting situation are considered. In the contex of the formal treatment, recurrence equations can be obtained for the player's payoffs, and the efficiency of the scheme of talks can be proved, We consider below dynamic models of two-player conflicting situations. Our study is based on the leader-followermodel, first introduced in /l-3/, then later used in /4-6/ (see also the references there). It is assumed in this model that the leader makes a decision and tells the follower, who is guided in his choice both by his own interests and by the leader's choice. When the leader knows the follower's interests he can predict the latter's response and use his prediction when choosing his own strategy. Most authors assume that the leaderfollowing hierarchical structure is specified from outside and cannot be stipulated by the players. But in many practical problems this is too serious a restriction on the model. A loose distribution of the player's roles was first considered in /7/, where is was shown that a classification of two-person games can be based on this distribution. Each player assesses his payoff both in the role of leader and in the role of follower. Allgames then fall into 3 classes. The first includes games in which there is no outcome which gives both players a payoff at least as great as that which each would obtain in the role of leader. Inherent in this class is a struggle for leadership. The second class includes games in which there is no outcome which gives both players a payoff greater than his payoff in the role of follower. Inherent here is the struggle for the right to be the follower. In the 3rd class of compromise is possible, in which the two players distribute the roles of leader and follower according to a prior agreement. OIIpassing from static to dynamic models, there is obviously greater scope for a comproAssume e.g., that initially we have a struggle for leadership. From the static mise. point of view, we can merely state this fact. In the dynamic model, however, one player is obliged to make a move, so that both players are then in a new situation, in which they will doubtless be ready to compromise. Let us formalize the schemes of talks. We introduce the following definitions. A finite tree is a pair {M,V), where M is a finite set of positions, and V_ is a mapping of M into itself, defining for each position m of M the previous position V(m). There is a unique For any other position m there must be an initial position m. in M for which V(md=m,. I>1 such that V'(m)=mo, i.e., m. is a previous position of rank 1 for any pqsiton m of M. The set A(m) consists Let A be the inverse mapping to V, i.e., A(m)={m'EMlV(m')=m}. are called final; we of the positions following the m-th. The vertices for which A(m)=@ denote by T the (necessarilynon-empty) set of final vertices. Now, to obtain in explicit {Al,V}, we have to specify the payoffs at final vertices of the form the games of the tree set T, and specify the order of moves at non-final positions of the set MP\T. The order of In moves is given by means of a division of the set M\T,M\T=M,UMZ, where M,flM,-0. positions of the set M, the move is made by player 1, and in positions of M2, by player 2. A move in position m consists in choosing a position m' from the set A(m) of positions after the m-th. When the game hits the final m-th position, it is ended and the payoffs u,(m) and uz(m) are calculated. In short, the functions u,(m) and us(m) are given on the set T, and their values are real numbers. in short, the two-person game in explicit form is the set of objects G-{M, V,MI, Mrr utr u,}.
i=l,2, in this game is regarded as the mapping of z, of The i-th player's strategy, set M, into the set M, for which we have the condition t,(m)EA(m). Thus the strategy is the rule for choosing the next position for each vertex of M,. If the players fix their strategies 5, and za, we call this pair the outcome of the game x=(x,,2,). To every outcome there corresponds one-to-one a play of the game, i.e., a set iii-(mo,...,ml} such that mt+l=xi(ml),
t-o,
1,. . . , k-l,
m&f,,
m+T,
which is obtained if the players successivelyuse their chosen rules for choosing their next positions. Thus, to each outcome z there corresponds one-to-one a final position m, and This means that we can regard I& and u1 as defined in the set of payoffs h(mJ, u2(mJ). by means of the equation u<(x)-t&(m,). outcomes X--X1Xx, It will be assumed throughout that the payoffs satisfy the one-to-one correspondence condition [u~(m)=ul(m’)l~[u,(m)=ur(m’)l. This condition simplifies the discussion considerably and at the same time is not too restrictive. *Zh.vychisl.bfat.mat.'Fiz.,
USSR 28:1-A
28,1,3-13,1988
1
Under this condition, the following procedure, called the search for composite equilibrium /7/, can be performed. We define the set L(G)of prefinal positions of the game G: L(G)-(m=iW\T[A(m)cT}, i.e., the set of vertices which are all followed by final vertices. If both players know both their own and their opponent's interests, and do not enter into talks, then each can predict the other's move in a position m of L(G). For, if rnE then player 2 can predict player l's move x,(m) in position m: L(G) fiM,, u,(xi(m))=
max u,(m’). rn’.h(rn)
Similarly, player 1 can predict the move xl(m) with m=L(G)Mfz. But the game can then be curtailed by announcing positions of L(G) as final and thence transferring the respective payoffs. We can continue this reduction until the game is curtailed to the point corresponding to the initial position ,mo. Call the payoffs at this point c!(mo) and c2(mo), and call those obtained at vertices m during the reduction c,(m) and c,(m). Noting that (~,(?a,), are the payoffs in the case of composite equilibrium, it would seem that we have in s(m)) this case an "ideal solution" of the game in explicit form without information exchange. For, each player is in a position to carry out the above reduction of predictions independentlyonthebasis of the total information about the game available to him. However, Example 1 below shows that composite equilibrium does not always lead to an efficient point. Example 1. In the game of Fig.1, the payoffs corresponding to composite equilibrium are 1(1,1)~(2,2). In every compromise scheme below, the rules of talks with a certain order of information exchange are formalized. If we take account of all these rules and write a new order of moves that includes both moves in the original game and those connected with having talks, we obtain an extended game in explicit form, in which every information is blocked, so that it makes sense to take composite equilibrium as the basic concept. In our first scheme, the players talk only of the distribution of the roles of leader and follower. Scheme 1. In his move a player can propose to his opponent a distribution of the roles of leader and follower, or else make a move in the initial game G. Having received the proposal, the player can either accept, in which case the game becomes one with fixed roles, or he can refuse, in which case the proposer is obliged to make a move in the initial game. A game with a fixed distribution of roles is played in the way described above. If the one-to-one correspondence condition holds, the study of the leader-followermodel amounts to finding the composite equilibrium in the following game in explicit form. The leader (player i) first makes his move, by choosing one of his strategies x4, then the follower (player j) makes his move by responding with strategy xi. Let G, denote this game, constructed from game G. It can be described schematically by Fig.2, where u(s)-(~~(r,, x,),ut(z,, 5,)). Let be the same game, but starting from the position m%lf\T. It is constructed from 8G&a) the sub-game G(m) of G. Let C denote the game considered in the light of the first scheme of talks. It is best described recurrently, in terms of all possible subgames G(m). For this, we need the further auxiliary game c(m). The only difference is that, here, player 1 in position m with m@f, does not have the right to make a proposal to player j, but must make his move by choosing m'eA(m). The game then reduces to subgame G(m'). Using our notation, the subgame C(m), for the first scheme of talks, is as shown in Fig.3. mdfi,
Fig.1
Fig.3
Fig.2
Fig.4
In short, to find the composite equilibrium in the extended game G, we have to find the composite equilibrium in the subgames G,(m). The leader-followermodel was first studied in the dynamic case in /0/. If the one-to-one correspondence condition holds, the expressions for the payoffs of leader and follower are greatly simplified. quantities are given Let a,(m) be player l's guaranteed payoff in position m. These
3
recurrently by
me,
a‘(mf=udmf,
at(m)-= max c&a'), d.A[lll) min ar(m'), tn'rA(9l&)
am-
m=&, m&i,, jzs.
The quantity a&n) corresponds to a cautious estimate of player i's payoff which takes no account of player j's interests. We introduce the set &(n)of semi-equilibriumtrajectories from position m. This set contains those, any only those, trajectories {m, m,,..., mk}, for which we have the condition utfmr)
max W4~WW, im,m*,...,mkI=D, (9
and the follower's payoff B,(m) is y(fib). The payoffs (r*(m), @j(m)) correspond to composite equilibrium in the game G‘(m). if s(m) is i’s payoff at composite equilibrium in game C(m), to find these quantities recurrently we must also find the payoffs at composite equilibrium in subgame G'(m).Obviously, i's payoff at composite equilibrium of this subgame is mar s&n')-g,(m'),
m’-bpo)
m'=A(m),
while j's payoff is g,Cm'). We can thus perform a reduction and obtain a game with a tree of length 2 (see Fig.4). We must now first compare j's payoffs in the two prefinal positions of this reduced game, then compare the three possible payoffs from the first player's position. We have thus proved a theorem for the player's payoffs in the first scheme of talks. Theorem 1. Let g,(m), i=l, 2, be the payoffs with the first scheme of talks fox subgame game G. Then the g,(m) are connected by the following recurrence G(m),rn=:M,, of the initial
equations: gJm)=zdm), g&d)=
m=T,
max g&a'),
m=M,,
m'=A(m),
“‘*A(m) a(mJ)=max(BJ(m),
,
gJ(m’)),
gJ(mJ’)=max{yJ(m)7
gJ(m’)}f
g4(m)=maxMm'),
gdd),
g6(mS)}-gr(m"),
gJ(m)~8J(m1*).
For composite equilibrium in game G, extended on the basis of the first scheme of talks, thereisa corresponding trajectory in the initial game G. This trajectory can be provisionally divided into two parts: before and after the agreement. be the play of game G corresponding to composite equilibrium of game Let {me,..., md G. Let m, be the instant of concluding the agreement. By Theorem 1 it immediately follows that g,(m*)~g,(m~), i-i, 2, IGtBt. Since obviously g‘(rn)~~(rn),~~~, 2, we then have the condition crc(mt)
(gf(m*), g8(m'))<(Pi(m), 7,(m)).
we isolate the minimum positions m (nearest to m+) on the tree, at which an agreement is inevitable. We shall regard them as final positions of the reduced game, by putting the payoffs at these new final vertices equal to g,(m). Talks according to echeme 1 in fact amount to the players being in combat according to the rules of composite equilibrium until the trajectory hits a position at which M agreement becomes inevitable. fn this position the players reach a voluntary agreement about the distribution of the roles of leader and follower, and the game from then on follows the scheme of a hierarchical game. ExuzsEle2. In view of the positions at which an agreement is inevitable, the game of Fig.Sa reduces to that of Fig.S,b. In the former game, an agreement according to scheme 1 ie
4
impossible at the initial position.
b
Fig.5
Fig.7
Fig.6
f m b Fig.8 Scheme 2. When player i moves, j can either announce his strategy g, or propose nothing. Player i either accepts the follower's role with leader's strategy z,,or he can refuse this role. If no agreement is reached, i can himself propose a leader's strategy zc. Player j agrees or else refuses the follower's role with leader's strategy xl. If no agreement is reached, player i makes a move in game G. When constructing the extended game C in accordance with scheme 2 of talks, we require, apart from the auxiliary game C'(m) (which, as before, differs from subgame G(m)only in the fact that, in the position m in it, player i no longer has the right to make his opponent a proposal), the following modification of the leader-followermodel. The follower, knowing the leader's strategy, can refuse to be the follower in this modification, i.e., to take account of the message received when forming his strategy. As before, let G,(m) denote this subgama if player 1 gives the message. It can then be represented as shown in Fig.6. Player i first announces his strategy, then player j can either refuse to pay attention of it (in which case the game reduces to the subgame i?'(m))or else he responds to strategy with strategy q, thereby determining the outcome x-(q,q) and the payoff u(x),The game X< (7 is then given recurrently, in Fig.7. Note that, with schemes 1 and 2, the games G,(m) are differently defined. To obtain recurrence relations for scheme 2, we have to find the payoffs in subgames be the payoffs at composite equilibrium in the extended game G'(m) and ‘G,(m). Let w(m) G with scheme 2; then, as before, i's payoff in game is C(m), mdi,, w,(m’)-
mar wdm’), d.A(W
m'=A (m),
and his opponent's is a?,(m’). The study of G+(m)is very similar to that when the talks are by scheme 1. The leader's only extra condition is that the follower can refuse his role, in which case he can count on the payoff u+(m'). We therefore modify the setof semi-equilibriumtrajectories D,(m) and introduce the set
Ddm, a)-{{m, ml,...,mJ~DAm)luAmA>4. We use the notation
5
As usual, we put the maximum over the empty set equal to -CQ, i.e., v,(m,a)--03, We can now find the leader's payoff in subgame G,(m): a)==@.
if
Wm,
rc(m,w$m‘))=max{w4(m'), v,(rn, ia,(m* For, the leader can always obtain lu,(m')by proposing in Gt(m)the threatening strategy defined in positions rn'=M, by the condition &~(m'))==
xt,
min s(m"). tn"=a(tn')
Since w,(m’)~a~(mf)>cq(m), player j can either refuse an agreement if m&m'f>o&n), in which case the game reduces to subgame G’fmhor else he accepts the proposalif w~(m~)=~(m), in which case j obtains only the payoff w,(m’), and i, in view of the one-to-one correspondence, obtains utt (m’). If +m, wjfm‘))>--, player i, by proposing a suitable strategy, can force acceptance of it and obtain payoff qi(m, r.uI(m’)). It is easily seen that i cannot obtain more than max{w,(m'f, Q(m, wj(m'))) in subgame Gtfm). Let the follower's payoff in G,(m) be It follows from the above arguments that Mm, wt(m’) ). rc(m,
udm’))3wi(m’),
BAm, wi(m’) Pdm’).
In view of these inequalities,C(m) reduces to the subgame of Fig.B,a. The study of this game is almost exactly the same as the study of the game of Fig.8,b. The only difference is that, if i refuses the proposal to become the follower with j's strategy q, he can count on payoff rt(m,q(m')), and not on w,(m’). Hence j's payoff in G(m) is ri(m, rifm, w,(m9)-max(Mm, 4m')), r+(m,+f,(m, m&e')))), while i's corresponding payoff is &(m, y&m, w,(m'))). We have thus proved: Theorem 2.
The payoffs with the second scheme of talks are given by recurrence relations mET,
wr(mf=utlm),
w,(m*)= max w<(m'),
nr’.A(l!q
z.+(m)=rt(m,
7c(m, w,(4),
w,(m)+(m,
rdm,
m=M‘,
m'mh (m),
wdm’))).
The trajectory in game G corresponding to composite equilibrium in game c in general consists of two parts, as It does in the case of the first scheme of talks: before and after the mh} agreement is made. Since, in the first part (me,..., for all t, we have ~wh~~+=w,(m,), while in the second part one player becomes leader, then, since wS(m)Z%(m), as for r-4, 2, the first scheme of talks, the agreed trajectory {mo,...,m~}dhUD~. me condition for inevitably concluding an agreement can now be rewritten as follows. Since, in view of the one-to-one correspondence, r*(rn, w,(rn~))>w~(rn~), m&l& implies Bt(?% implies &(m, 7f(m, w~(m*))~>w~(rni), inevita~~(m*))>w~(m'),while y,(m, ydrn, ~,~rn’)))>~~(rn~) bility of agreement can be defined as follows (m=iW,): Qm, wj(m'))>wkm'),
r,(m,y,(m,w,(m')))>wj(m').
As for talks by scheme 1, the agreed trajectory with talks byscheme2issemi-equilibrium for at least one player, i.e., is contained in the set DIUDZ. AS distinct from scheme 1, the trajectory with scheme 2 is always effective in the set DtUDa. ml) be the agreed trajectory in scheme 2. Theoxem 3. Let (mo,..., we have trajectory {me,ml',. . . , ml’}ED,UD,,
Then, given any
i,j=l, 2, i+f. [u,(m,')="U,(m,)l~lu,(ml')lafmJ. Assume that fm+,ml',..., make an agreement on the basis of 1 being leader, using tbe strategy ~1: for Zl(m,')==mr:,
m,‘=M,,
xl(m)=x? for mEM,,
m*‘E{mo, m,‘, ...,ml?,
m*{m0, m,',...,m’l,
where x,'is a threatening strategy defined by analysing game G,(m). By the condition {mO,m,', .,.,rn,')~D~, there is a player 2's best response x1 such that the pair (zi,~3) generates a trajectory (m.0,ml’,..., m;). ey the one-to-one correspondence, this is sufficient for the players to obtain the payoff u(m,') when strategy 2, is used in the leader-followermodel. ~0 avoid an agreement whereby 1 is leader with strategy xs and 2 is follower, this must be undesirable for at least one player. We will take the possible cases in turn. Let {??h,..., Since player 1 does not communicate his strategy 31%then either wt(me)>ul(ml'), or @mD,.
6 NOW else u,(m;)zZ,(m;)), or else u’(m~)>u’(ml’). Again we have obtained a contradiction. Finally, let m&f,. Then, since the player moving at a becomes leader, the first player cannot propose anything better, i.e., w(md>u(nb’), which is contradiction. The theorem is proved. From the recurrence relations for scheme 2 we can see an (I,11 essential difference between schemes 1 and 2. With scheme 2 an agreement can always be reached in the initial position ma, thought it is not always inevitable in position mO. For, if trajectory {ma,..., rn,) corresponds in the game to scheme 2 of 2 (V) talks, then it belongs either to D’ or DI. (29 f Example 3. In the game of Fig.9, composite equilibrium and scheme 1 lead to payoffs (3,3),and scheme 2, to payoffs (4,4). (94) 2 Scheme 3. When he moves, a player can propose an agreed trajectory to his opponent, or he can move in the initial game G. On receiving the proposal, the opponent can either agree to the trajectory, or he can refuse it, in which case the proposer moves Fig.9 in the initial game G. The game G extended by scheme 3, is easily defined recurFig.10, where the positions m' are final/in subgame G(m). rently. We can specify a(m An analysis of this game from the point of view of composite equilibrium is largely a repetition of the analysis of scheme 2, so that we need not dwell on the details. When proposing a trajectory of subgame G(m)(or what is the same thing, a final position m' of this subgame), player i is no longer tied by the restrictions of thesetofsemi-equilibrium There are thus the following changes relative to scheme 2. Let P'(m) trajectories D,(m). and pLl(m) be the payoffs with talks by scheme 3. We put +
(o,'Y
u'(m')= max kb'tm') for m’.a(ln)
m=M',
m’EA(m).
We also introduce the quantity jI'(m,a)-mar {u’(m’)~m’~TnG(m)uj(m’)>a).
Fig.11
Fig.10
/
)yqL~
(m
(2,W
(IfI
(s,@
2
1
2
W)
/L&Y* I
2
I
I
2
(43)
a
b Fig.12
As usual, the maximum over the empty set is put equal to --m. It is then easily shown that, on making a proposal in the best way, player i can obtain the payoff mar{u'(m')' Mm, &n'))+Mm"), and player j the payoff pj(m”). We have thus proved. pJ(m’). Therem 4.
As in scheme 2, we can show that ~‘(m”)>p’(m’)
and
With talks by scheme 3,thepayoffs satisfy the recurrence relations r'(m)-u'(m), r*(m’)-
m=T,
mar p,(m’),
9tl’.A(Wl)
mr&,
m'=A (m),
PM')>
7
An agreement in scheme 3 in position m=M, is inevitable if Htir(m, nAm'))>bt(m').For, player i can then propose a trajectory, in the final position of which both payoffsaregreater than in position d. With'talks by scheme 3, the players will always realize an efficient (Pareto-optimal) trajectory in the set of all trajectories. Theorem 5. The vector (p,(md,&h(md) i.e., for any position m=T,
of payoffs with talks by scheme 3 is efficient,
[a(~)~lr~(m,)l~[u,(m)~~,(mo)l,
i, j=1, 2, i#,i.
The proof is similar to the proof of Theorem 3.
Note that, when talks are carried out by scheme 3 , an agreement can always be reached at the initial position mo, though it is not always inevitable in this position. Scheme 1 may not be better for both players than scheme 2, while the latter may not be better for both players than scheme 3. This is because the trajectory agreed with scheme 1 always belongs to the set D,UDZ, while that agreed with scheme 2 is efficient in this set, and that agreed with scheme 3 is efficient in the set of all trajectories. In the game of Fig.11, the composite equilibrium, and schemes 1 and 2, lead Example 4. to payoffs (l,l), and scheme 3, to payoffs (2,2). For an individual player, however, a scheme of higher rank may be worse than one of lower rank. In the game of Fig.lZ,a, scheme 1 leads to payoffs (3,5), and schemes 2 and Example 5. 3, to payoffs (4,4). In the game of Fig.l2,b, schemes 1 and 2 lead to payoffs (2,41, and scheme 3, to payoffs (3,3). Note finally that the search for sensible schemes of compromise in the dynamic case has only just started. The above examples and arguments suggest a wealth of ways in which the player's interests and scope may be interwoven in a conflicting situation. Note also,however, that the study of schemes of compromise enables us to widen the range of situations in which the players can arrive at a mutually satisfactory agreement.
The authors thank N.N. Moiseyev for arousing their interest in seeking sensible schemes of compromise in the dynamic case, and for his interest. REFERENCES 1. GERMEIYER YU.B. and MOISEYEV N.N., On some problems in the theory of hierarchical control systems, Probl. Prikl. Matem. i Mekh., Nauka, Moscow, 1971. 2. GERWEIYER YU.B., Ontwo-person games with a fixed sequence of moves, Dokl. Akad. Nauk SSSR, 198, 5, 1001-1004, 1971. 3. SIWAAN M. and CRUZ J.B., On the Stackelberg strategy in non-zero-sum games. J. Optimizat. Theory and Appl., 11, 533-555, 1973. 1976. 4. GBRMEIYER YU.B., Games with Unopposed Interests, Nauka, Moscow, N.S. and MOROZOV V.V., Theory of Non-antagonisticGames, Izd-vo MGU, Moscow, 5. ICUlCJSHKIN 1984. 6. The present State of Operations Research, Nauka, Moscow, 1979. E., Theory of games with examples from mathematical economics, Mir, MOSCOW, 1985. 7. MOULEN 8. DANIL'CHENRO T.N. and MOSEVICH K.K., Multistep two-person games with a fixed sequence of moves, Zh. vych. Mat. i mat. Fiz., 14, 4, 1047-1052, 1974.
Translated by D.E.B.