Copyright © IFAC Control Science and Technology (8th Triennial World Congress) Kyoto, Japan , 1981
ASPECTS OF THE ST ACKELBERG GAME PROBLEM-INCENTIVE, BLUFF, AND HIERARCHYl Y. C. Ho* and G.
J. Olsder**
*Division of Applied Sciences, Pierce Hall, Harvard University, Cambridge, USA **Department of Applied Mathematics, Twente University of Technology, Enschede, The Netherlands Abstract. This paper considers three-level opt~m~zation problems and touches upon many issues such as information structures, incentives, threats, bluffing and organization theory. The "reverse information structure" is emphasized and in this structure the planner ("leader") first announces his strategy, but he acts after the lower-level optimizers ("followers") have acted. Thus he is sometimes able to force the followers to play in such a way as to help him to obtain his desired solution. The issue of whether the ~lanner should commit to his announced strategy or not (in the latter case he is bluffing) is also considered. Keywords. Game theory; Stackelberg game; hierarchical system; incentive; multi-person optimization; information structure; threat; decentralized control; team theory. THE NODEL
INTRODUCTION
For our purpose it is only necessary to consider three decision makers, labelled A, B, and C. The extension to the general case of n DHs will be obvious. Each DB controls the decision variable ui and has available information z., i = A,B,C and where the joint distributioli of zA,zB'zC is known. lie assi~n a payoff functuon to each DU defined by
A class of multi-person optimization problems commonly known as the Stackelberg games have received much attention in the literature lately. Concentually this class of ~roblems touches upon many issues, such as information structure in decentralized control, differential game and team theory, incentive and threats, and organization theory. Briefly, the problems involve two or more decision makers (DMs). The main distinguishing feature of these problems is the fact that one decision maker, often referred to as the planner or the leader, can declare and commit his strategy before the others. The leader's declaration must of course take into account the possible reactions of the other decision makers, and hence the multi-person nature of the other decision makers, and hence the multi-person nature of the optimization problem. Although this basic idea is well known for a long time and has been applied to many areas, systematic study of the nature of the resultant optimization problem have only begun recently. The nurpose of this pa~er is to add to this study. It is hoped that by this approach we lay bare some of the underlying issues of this problem common to many application areas.
A,B,C. (I) In evaluating (I) we require that the decision variables u. are determined according to the strategy ~. which is a mapping from z.~ to U·. Thus th~ expectation in (1) is 1., well def~ned and we can write the payoff as i
A,B,C (l)'
where y.
€ r., the admissible class of stra. or ~ DMi' 2 G~ven Eq. (1)' we define a 3-level Stakelberg problem as
.
teg~es
~f
* * * * * * Find YA'Y ~B (y A)' YC = SC(YA'Y B) B such that * arg. max JA(YA'~B(YA),SC(YA,SB(YA))) YA * * * * arg. max JB(YA'YB,SB(YA'YB)) := SB(Y A) YB * * * * * arg. max JC(YA'YB'Y YC C) := ~C(YA'YB)' (SP)
The research reported in this paper was made possible through support extended by the U.S. Office of Naval Research under the Joint Services Electronics Program by Contract NOOOI4-75-C-0648 and by the National Science Foundation under Grant ENG 78-15231.
1359
2 Note that we have been deliberately vague about the domain and ranges of ui and zi' These will be made specific when the occasion demands. At present we need only to consider the so-called strategic form of the problem as in Eq. (1)'.
Y. C. Ho and G. J. Olsder
1360
In words, A is the leader who declares his strategy Y first. This is followed by B who chooses YB~ased on Y~. C moves last choosing YC de~ending on the cnpices Y and YB. Note f the choices of Band C are governed by the rules SB and Sc which are often referred to as reaction sets of Band C respectively. If the maximizations of J and J C are not B ur:i~ue, then we can define SB and Sc as the m1n1mum of J and J over these sets respectively. Probtem (SP~ is basically a noncooperative optimization problem. It is to be constrasted with the solution of the nonc ooperative Nash equilibrium problem of
J J
A B
-Y
2 2 2 A Y - YC
-:Y
2 2 2 - (yB-I) - Y C A
J B = -Y
2 2 2 A YB - (YC-I) + f (Y A' YB' Yc:)
where Yi
]RI
€
f = 2y y C B
Case (i)
In this case it is directly verified that (0, I ,2)
N N N Find YA,YB,Y such that C N YA
N N arp,. max JA(yA,yB,y ) C
N YB
N N arg. max .lB(Y 'Y 'Y ) A 13 C
N YC
N N arp,; max J (YA'YB'Y ), C C
=
with IN A
<
J* A
Case (ii) f (NP)
The difference between (SP) and (NP) can be made particularly clear for the control e npineer i f we consider the special case v here J p = J H = .lC' In this case (NP) is the st a tement o :~ an oIl en loop control problem aI!tl ( :OP) , the closed loop version of the same , ·p r oj,lel'\. ~~ he choices YA'Y ' and Y simply reB C present the choice of control variables in three successive time intervals. In fact, the natural solution se~uence suggested by (SP) is simply that of dynamic ~rop,ramming applied to different payoff functions at each stage. This solution approach while conceptually straightforward is most difficult to carry out in practice since we are doing backward inductions on the strategy spaces r .. Furthermore, there is no convenient summ~ry variable such as "state" at this level of abstraction. Thus, actual solution of (SP) requires additional structure and usually different techniques. The statement of (SP) can be specialized to the usual 2-level Stackelberg problem involving Band C by simply regarding Y as a A parameter in J B and J C' It is well known for 2-level (SP)
(2)
However, it is less well known that this general result breaks down for higher level (SP). Consider the simple Example I.
A,B,C.
i
=
-2y y B C
N Now Y N and J A
J* A
Now
(0, I , I) Y*
(O,I,O),y *
(0, I ,1-y ) B
and Thus, in general, we see for higher level (n > 2) (SP)s, unless further assumptions are made about J A, J B, and J C' there need * not be any ordering relationship between J~ and J~. Hence in order for the n-level (SP) be a model of hierarchical systems, much more structures are needed. INCENTIVE MD BLUFF IN STACKELBERG PROBLEMS WITH REVERSE INFORHATICN STRUCTURE A particular class of 2-level (SP) has the following information structure. \·le have Zc = 0 and zB = u where u € ]Rn, U € ]Rn. c B In this case, while B declares his strategy Y : U + u first, causality requires that B B c C actually acts first; hence the name reverse information structure. Such problems are interesting from two viewpoints. First it is a problem of dynamic information structure. Second and more important, it provides the possibility of imposing incentive and threat by the leader B on the follower C, since C now faces the problem of findinr, Y (or U C c since Zc = ~) in order to maximize JC(u 'YB(u C))' In fact by appropriate choice of y~, B can often induce C to behave in such a way that C acts as if he is maximizing J B in cooperation with B i.e. Band C hehave as a team. Briefly, B wants to choose Y such that B
1361
Aspects of the Stacke1berg Game Problem
U
*
c=
arg. max JC(YB(uC)'u C)
* *) (yB(uC),u C
=
arg. max JB(uB,u C).
(3)
Both static and dynamic, deterministic and stochastic versions of this class of problems have been exn10red in recent years. (Basar, 1979; Ho, 1980; Papavassilopou10s, 1979; To1winski.) However, it has always been tacitly assumed that the declared strategy o~ B, YB, re~:e sents an irrevocable and be11evab1e comm1ttment. In many real world situations, this need not be so. Suppose the follower C chooses not be believe that B has irrevocably co~itted YB (i.e. B is bluffing), then C can move f1rst by choosinp Uc and thinking that B must now follow with
:0
uB
=
arg. max J B (uC,lI B) " aB(u C).
The optimal choice of by
u~
=
U
(4a)
c hence will be given (4b)
arg. max JC(aB(uC)'u C).
The situation is as follows: B must decide whether he is committed to YB or merely b1uffin~ to do so without knowing whether or not C will believe this declaration. C must act (i.e. ch~osing u C) with~ut knowing whether or not B 1S merely bluff1ng. Thus Band C faces a super bi-matrix rame as illustrated in Fig. I where Follow
Commit B Bluff
C
(7)
Ja <: J ya B B
(8)
If we have in addition that
JY C
>
Ja C
(9)
then "follow" is a dominant strategy for C and the (SP) with reverse information structure will always result in (JX, J~). On the other hand, if (9) is not satisfied which is more often the case then it is clear by inspection that "don't commit or bluff" and "call" is a Hash equilibrium strategy pairwith "bluffing" in ·fact being a dominant strategy for B.3 This seems to imply that the leader's declaration will never by believed in such cases. However, mixed strategy equilibrium solutions also exist. Let p,~ be the probability that Band C c~oos~ "commit" and "follow" respectively and J , J be the exB C pected payoffs. Consider Example 2. Suppose Fig. takes on the specific values of Fig. 2. Then direct calculation shows that C
Follow
Call -2
10 Commit 5
2
5
6
Bluff ya J C
Fig. 2.
Ja B JY C
JY <: Ja B B
10
JY C JY B
(6)
B
Call ya J B
JY B
JY <: J ya C C
J
(p * ,q * )
a C
I (4 + E, I)
is a mixed strategy equilibrium in the sense that for any E > 0, ~* = I is optimal with respect to all other ~ and J B (I/4+E, I) = 10. Compare this with the pure strategy equilibrium of (p*,q*) = (0,0), we have JB(O,O)
Fig. \.
Y ya a h and similarly for J C' J C ' ~nd J C. Note : at since C must actually act f1rst by carry1ng out u c ' B can always realize J1 indenendent of committment so long as C chooses to believe. Furthermore by definition and E~s. (3) and (4) (5)
I < J B (I/4 + E,I) = 10. In this case, it is important that B does not bluff more than 3/4 of the time in order to realize the higher expected payoff of ID vs. I. This is so des1'ite the fact that "bluffing" is actually a dominant pure strateg:'. Of course, one can question why should the point (1/4 + E, I) be anymore be1ieveab1e than (1,1) or (0,0). We submit that the mixed strategy solution is far more olausib1e. However, to prove this assertion· exp1icity would require behaviour and psychological assumptions outside the 3 "Commit" and "follow" are also an equilibrium pair but is not credible.
Y. C. Ho and G. J. Olsder
1362
scope of this paper. If there are more than 2 players (e.q. in a 3 level (SP)), then the inenuality (8) can be made very dramatic hence making the equilibrium point of (commit and follow) incredible. This is precisely the objection raised by Tolwinski to the 3-level solution that was proposed by Basar (1973). However, as the above analysis shows, the issues are even more involved. When there are more than 2 players, the possibilities of mutual credibilities are enormous. The problem of bluffing is generic with (SP) with reverse information structure. The players are essentially in a game to determine who will be the "leader"; much remains to be explored.
Section 3 gives the impression that the upper and lower bounds of the leader's payoff in a 2 level (SP) of reverse information structure are J~ and J~ respectively. However, the leader is not always so powerful that he can induce C to behave as a team member. As pointed out by Tolwinski (1980) whatever strategy B may announce, C can always unilaterally quarantee himself (10)
for the case in section 3 J
C
= Max
Min JC(uB,u ) C UCEU C UBEU B
Proof: Follows the proof of Theorem 1 in Tolwinski (1980) with some rather obvious modifications.
D
Theorem 2. Under the same conditions as in Theorem I we have: if the roles of the players are not predetermined, then both players prefer to be the leader in a RSP. D Proof: Straightforward since the leader can coerce the foolower to cooperate in a RSP. D
UPPER AND LOHER BOUNDS OF REVERSED STACKELBERG GAlmS
Min JC(YB'Y ) J C = Hax C YCEf C yBEf B
and (u~,u~) is the team solution of J B, are ngnempty. If, in addition the closure of FC = F C' then B can get this costs arbitrarily close to
(10) ,
Then the best that can be accomplished by B is (I I )
It is interesting to note that for a normal Stackelberg problem 4 this theorem does not hold. Examples exist in which both players prefer to be the follower for instance. See Basar (1973). CONCLUSION The asymmetrical roles of the leader and the follower in Stackelberg problems give rise the possibility of using (SF) as a model for hierarchical control and organization. Although not made explicit or studies as such, the role of (SF) with reverse information structure is implicit in the work on organizations by Burkov and others (1977). This is apparently also the motivation behind the ~ork of Basar (1980). However, to fully develop the notions of (SF) into a comprehensive theory or management organization, much more is needed. It is hoped that the above discussion points out the need for additional analysis. REFERENCES
where (12) In other words, we must settle for a constrained team optimum for J . This can be B formalized as the following Theorem which is a minor refinement of Tolwinski's (logO) resui t. Theorem I. Suppose_J and Jeare continuous B on UB x UC' Define J according to (10) and C suppose that the sets
where
Basar, T., (1973). On the Relative Leadership Property of Stackelberg Strategies. The Journal of Optimization Theory and AP=plications, Vol. 11, No. 6, 655-661. Basar, T., (1980). Equil ibrium Strategies in Dynamic Games with Multi Levels of Hierarchy. Proceedings of the 2nd IFAC Symposium on Large Scale Systems, Toulouse, June 24-26. Basar, T. and H. Selbuz, (1979). Closed-Loop Stackelberg Strategies with Applications in the Optimal Control of Multilevel Systems. IEEE Transactions on Automatic Control, Vol. AC-24, No. 2, pp. 166-179. Burkov, V.N., V.V. Kondrat'ev, U.A. Molchanova, and A.V. Schepkin, (1977). Models and Mechanisms of Operation of Hierarchical 4 That is without reverse information structure; the DM who declares first also acts first.
Aspects of the Stackelberg Game Problem
Systems. Automation and Remote Control, No. 11, pp. 106-131. Ho, Y.C., P.B. Lu and G.J. 0lsder (1980). A Control-Theoretic View on Incentives. Proceedings of the 19th IEEE Conference on Decision and Control, Albuquerque, New Mexico, December 10-12; and the Proceedings of the 4th International Conference on Analysis and Optimization of Systems, December, INRIA, Versailles, France. Papavassilopoulos, G.P. and J.C. Cruz, Jr., (1979). Nonclassical Control Problems and Stackelberg Games. IEEF. Transactions on Automatic Control, Vol. AC-24, No. 2, PI'. 155-166. Tolwinski, B., (1980). E~uilibrium Solutions for a Class of Hier~rchical Games. International Report, Polish Academy of Sciences. Tolwinski, B. Closed-Loop Stackelberg Solution to Multi-Stage Linear-(uadratic Game. To a~pear in the Journal of Opti mization Theory and Applications.
Discussion to Paper 45.2 Y. C. Ho (USA): In DG, sometimes the "Banier" suddenly terminates and is replaced by another singular surface such as the equivocal surface. Does your qualitative minimax principle automatically take care of this transition or is a separate analysis (as in Isaacs) necessary? S.Y. Zhang (China): I will consider this case in the future.
1363
Discussion to Paper 45.3 P. Bernhard (France): Is there a way to avoid using "higher level strategies", i.e. the 6 's which are functions of the other players' strategies (not actions) becau~~ then, for N players, you will get (N-l) order super strategies. G.J. Olsderl (Netherlands): The definition of the Stackelberg solution for a three-level game, as defined by formula (SP) in the paper, is valid both for the "classical" as well as for the "reversed" Stackelberg equilibrium concept. Depending on the information structure, which has to be added in order to define a well stated problem, one can deal with either solution concept as well as others. We assume that the strategies are defined on the information sets. The second remark by Professor Basar answers the questions raised alone. T. Basar (USA): Since strategies are defined on the information sets, and not on the strategy sets of the other players, there seems to be no difficulty in conceptual generalization of the Stackelberg solution to a larger number of players with more than two levels of hierarchy. I believe that, by the convexity of J , it is implied that J is convex on the l l product action spaces and not on the product strategy spaces.