Dynamic feedback Stackelberg games with alternating leaders

Dynamic feedback Stackelberg games with alternating leaders

Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546 www.elsevier.com/locate/na Dynamic feedback Stackelberg games with alternating leaders...

193KB Sizes 0 Downloads 74 Views

Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546 www.elsevier.com/locate/na

Dynamic feedback Stackelberg games with alternating leaders Pu-yan Niea, b,∗,1 , Ming-yong Laia , Shu-jin Zhua a College of Economics and Trade, Hunan University, Changsha 410079, PR China b Department of Mathematics and College of Industrial Economics, Jinan University, Guangzhou 510632, PR China

Received 7 October 2006; accepted 28 November 2006

Abstract In all past researches on dynamic Stackelberg games, the leader(s) and the followers are always assumed to be fixed. In practice, the roles of the players in a game may change from time to time. Some player in contract bridge, for example, acts as a leader at some stage but as a follower at the subsequent stage, which motivates the Stackelberg games with unfixed leaders. We aim to analyze the dynamic Stackelberg games with two players under such circumstances and call them dynamic Stackelberg games with alternating leaders. There are two goals in this paper. One goal is to establish models for a new type of games, dynamic Stackelberg games of alternating leaders with two players. The other goal is to extend dynamic programming algorithms to discrete time dynamic Stackelberg games with alternating leaders under feedback information structure. 䉷 2007 Elsevier Ltd. All rights reserved. MSC: 90C39; 91A06; 91A20; 91A25; 91A50 Keywords: Game theory; Discrete time dynamic models; Stackelberg games; Dynamic programming; Alternating leaders

1. Introduction Stackelberg (leader–follower) games have a wide variety of applications. In a Stackelberg game, one player acts as a leader and the rest as followers. The problem is then to find an optimal strategy for the leader, assuming that the followers react in such a rational way that they optimize their objective functions given the leader’s actions. This is the static bilevel optimization model introduced by von Stackelberg [14]. There exists extensive research on bilevel optimization [1,4,12]. However, the studies on dynamic bilevel optimization are relatively scarce. When players interact by playing a similar stage game numerous times, the game is called a dynamic, or repeated game. Unlike static games, players have at least some information about the strategies chosen on others and thus may contingent their play on past moves. Dynamic bilevel optimization was first considered by Chen and Cruz [3], Simaan and Cruz [13], and subsequently studied by a number of authors [2,8,10,9,6,5]. ∗ Corresponding author.

E-mail address: [email protected] (P.-y. Nie). 1 This work was partially completed when the author visited Kyoto University. Thanks to Prof. Masao Fukushima of Kyoto University in Japan

for his sincere help. The Project was partially supported by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China Postdoctoral Science Foundation (No. 20060400875), and also partially supported by National Natural Science Foundation of China (No. 10501019). It is also partly supported by the 2nd Phase construction Item of Hunan University and Base for Economic Globalization and Development of Trade Philosophical Social Sciences. 1468-1218/$ - see front matter 䉷 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.nonrwa.2006.11.019

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

537

The discrete time dynamic optimization problems have many applications in economics and management sciences, see an excellent monograph [2] on dynamic games. In the traditional dynamic Stackelberg games, the positions of leader(s) and followers are assumed to be fixed throughout the game. In many practical games, the leader may change at each stage. In this paper, a new type of dynamic Stackelberg game is brought forward, in which the leaders are unfixed. Moreover, dynamic programming approach is extended to the new problems with feedback information. The following examples are presented to motivate the new model. Example 1 (Tolls on the transportation network). We consider the revenues raised from tolls set on a transportation network. Assume there are two ways, A and B, between two places, and A is highway. When traffic is seriously jammed by the way B, the great majority of drivers is willing to pass the corresponding acres of A to save time if the tolls are not too high. At this stage, if the tolls are set too high, traffic will be effected negatively. On the other hand, low toll values will also yield low revenues [7]. Thus, at this situation, one strikes the right balance by maximizing total revenues, subjecting to the network users. This induces a two-level problem and the toll station plays the leading role at this stage. When the road B is not crowded, the drivers can spend a little more time without the highway A. The drivers, at this situation, strike to balance the saved time, subjecting to the toll of the corresponding arcs. This also yields a two-level problem and the drivers now play the leading roles in decision making at this stage. Example 2 (Stock corporation). A stock corporation, also referred to as the “general corporation” or “open corporation”. A general corporation is allowed a broad spectrum of flexibility, thanks to the general corporation laws of Delaware and the legal cases that have set a 200 year consistent pattern of respecting good-faith management decisions. The stockholders are the owners of the company. Typically, holders of common stock have the right to one vote for each share they own to elect the members of the Board of Directors and to vote on certain other matters of major significance to the company. (According to the Delaware General Corporation Law every corporation must have one class of Common Stock. The “rules” about Common Stock are prescribed by law: each share of stock carries one vote, and common shareholders are entitled to their prorata share of dividends.) Any stockholder who holds a majority of the shares of issued stock can control the company. This is sometimes referred to as a “majority shareholder”. Majority shareholders take on a heightened responsibility to minority shareholders. Actually, the shareholder with a majority of the shares plays the leading role in a stock corporation. It is consequently a Stackelberg game at each stage. In the long run, the shareholder with a majority of the shares is changed according to the number of shares. It is therefore inconsistent with the traditional dynamic Stackelberg games. In this dynamic game, the leader is determined by the current state and is not fixed. Example 3 (Contract bridge). Bridge is played with a standard 52-card deck, or pack. Two decks are customarily used for convenience, although you can get along with just one. The deck is divided into four suits which, like military personnel, have specific rank and insignia: Spades ♠, highest. Hearts , second-highest. Diamonds ♦, third-highest. Clubs ♣, lowest. Each suit contains 13 cards: ace (highest), king, queen, jack, ten, nine, eight, seven, six, five, four, three, and two or “deuce” (lowest). These cards are often abbreviated (in order): A, K, Q, J, 10, 9, 8, 7, 6, 5, 4, 3, 2. The five most powerful cards in each suit (ace through ten) are accorded the privileged title of honor cards; the lower cards (nine through deuce) are referred to as spot cards. The rank of the cards within a suit applies to the phase of bridge called the play. In order to cram a great deal of information into a small amount of space, all bridge writers use diagrams and within them refer to cards by means of symbols. Thus, in a bridge diagram the ace of spades is denoted by ♠ A, the seven of diamonds by ♦ 7, the jack of clubs by ♣ J, and so on. If one player held the ace, king, ten, and seven of spades, this would be expressed concisely as ♠ A K 10 7. Bridge is a game for four players. Unlike some activities in which everyone is out for himself or herself, bridge is a partnership game. Two of the contestants sit opposite each other and are partners; the other two participants, who also sit facing each other, are also partners. Thus, each player has an opponent on either side and a partner across the table. In bridge literature, the players are often referred to by compass directions, so North and South are partners and play against East and West, who are also partners. In every game, each player will have 13 cards (called a hand). Each deal is divided into two major phases: bidding and play. During the bidding, which takes place first, the number of tricks that each side must win in order to capture the laurels of victory is determined. Then, the play ensues and each side tries to fulfill its commitment. However, even

538

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

though the bidding takes place first, to understand what it means it is essential to learn how tricks are won. Whoever plays the first card to a trick is called the leader (and the first card played is called the lead). The leader may play any card. The other players, however, are much more restricted, for they must play a card of the same suit as the one led if they have one; in other words, they must follow suit if they can. For example, if the leader chooses to play a diamond, each of the other players must play a diamond if possible. If, however, a player cannot follow suit, that player may play whatever he or she likes. (A played card of a different suit than the one led is called a discard.) The trick is won by the highest card of the suit led. The player who has won the previous trick leads to the next one. When a trick has been won, one player from the victorious side gathers the four cards into a packet and keeps them, face down, nearby but out of the area of play. Only one player from each partnership should collect the tricks on any one deal. On many deals, one suit will be designated as the trump suit. This is extremely important in determining who wins each trick, for a trump outranks any card of a different suit. The above explanation about bridge is extracted from the article [11] on the world wide web (http://www.bridgeworld. com). In this paper, we will not care about bidding, but only deal with play. Since it is extremely hard to consider the problem with four players, we consider a simplified game with two players under the above playing rules. In the playing process, the leader alternates constantly. A trick is regarded as a stage. The above game is a new type of dynamic Stackelberg game, which is defined and described subsequently. Definition 1. A dynamic Stackelberg game is called a dynamic Stackelberg game with alternating leaders if, at each stage, the leader is determined by the information or the states of previous stages in the game. In this game, at each stage a player is acted as the leader and the others are all followers according to previous information. It is therefore a Stackelberg game at each stage and not a traditional dynamic Stackelberg on the whole. 2. The model The discrete time dynamic Stackelberg game or Stackelberg one over a finite time horizon for two players, with feedback information and alternating leaders, is formally stated. The discrete time periods are denoted t = 0, 1, . . . , T . The variables involved in the problem are listed as follows: Vectors xt ∈ X ⊂ R m1 denote the state of the first player at time t = 0, 1, . . . , T . Vectors yt ∈ Y ⊂ R m2 denote the state of the second player at time t = 0, 1, . . . , T . Vectors ut ∈ 1t (xt ) ⊂ U ⊂ R n1 denote the decision variables for the first player at time t = 0, 1, . . . , T − 1, where 1t (xt ) := {ut |h1t (xt , ut ) 0} and h1t : R m1 × R n1 → R n1 is some given function. Vectors vt ∈ 2t (yt ) ⊂ V ⊂ R n2 denote the decision variables for the second player at time t =0, 1, . . . , T −1, where 2t (yt ) := {ut |h2t (yt , ut ) 0} and h2t : R m2 ×R n2 → R n2 is some given function. Moreover, we denote u := (u0 , u1 , . . . , uT −1 ), x := (x0 , x1 , . . . , xT ), v := (v0 , v1 , . . . , vT −1 ), y := (y0 , y1 , . . . , yT ), xt,T −1 := (xt , . . . , xT −1 ), yt,T −1 := (yt , . . . , yT −1 ), ut,T −1 := (ut , . . . , uT −1 ), vt,T −1 := (vt , . . . , vT −1 ), x0,t := (x0 , . . . , xt ), y0,t := (y0 , . . . , yt ), u0,t := (u0 , . . . , ut ), v0,t := (v0 , . . . , vt ). The state variables {xt }Tt=0 and {yt }Tt=0 are governed by the systems of state transition equations xt+1 = Ft (xt , ut ), yt+1 = ft (yt , vt ),

t = 0, 1, . . . , T − 1, t = 0, 1, . . . , T − 1

(1) (2)

with x0 and y0 being given as the initial states of the two players, respectively. Remark. We always assume that a constraint and the transition equations of a player depend only on the state of himself, which is similar to that in [10].

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

539

Let the set of admissible decisions of the first player be given by 1 (x0 ) := {u|h1t (xt , ut ) 0, t = 0, 1, . . . , T − 1}, and the sets of admissible decisions of the second player be given by 2 (y0 ) := {v|h2t (yt , vt ) 0, t = 0, 1, . . . , T − 1}. We also denote 1t,T −1 (xt ) := {ut,T −1 |h1 (x , u ) 0,  = t, t + 1, . . . , T − 1}, 2t,T −1 (yt ) := {vt,T −1 |h2 (y , v ) 0,  = t, t + 1, . . . , T − 1}. Let Gt and gt be the cost functions of the two players, respectively, at stage t. For t = 0, 1, . . . , T − 1, Jt1 and Jt2 , the total cost functions after stage t, are denoted Jt1 (xt , yt , ut,T −1 , vt,T −1 ) := Jt2 (xt , yt , ut,T −1 , vt,T −1 ) :=

T −1

=t T −1 =t

G (x , y , u , v ) + GT (xT , yT ), g (x , y , u , v ) + gT (xT , yT ),

where Gt and gt are cost functions at the stage t for the first and the second players, respectively. We introduce some logical function Ct at stage t for t = 0, 1, . . . , T − 1, which determines the leader at the next stage according to the values of Gt and gt . Specially, if Ct (Gt , gt ) = 0, the first player will be the leader at the next stage. If Ct (Gt , gt ) = 1, the second player will be the leader at the next stage. We always assume that C−1 = 0. The problem is formally stated as follows: for t = −1, 0, 1, . . . , T − 3, the subproblem is referred to as P¯t+1 (Ct , xt+1 , yt+1 ), which is recursively defined as follows: • If Ct (Gt , gt ) = 0, then min ut+1

1 Jt+1 (xt+1 , yt+1 , ut+1,T −1 , vt+1,T −1 )

2 (xt+1 , yt+1 , ut+1,T −1 , vt+1,T −1 ), vt+1 ∈ 2t+1 (yt+1 )}, s.t. vt+1 ∈ arg min{Jt+1 (ut+2,T −1 , vt+2,T −1 ) solves P¯t+2 (Ct+1 , xt+2 , yt+2 ).

(3)

• If Ct (Gt , gt ) = 1, then min vt+1

2 Jt+1 (xt+1 , yt+1 , ut+1,T −1 , vt+1,T −1 )

1 s.t. ut+1 ∈ arg min{Jt+1 (xt+1 , yt+1 , ut+1,T −1 , vt+1,T −1 ), ut+1 ∈ 1t+1 (xt+1 )}, (ut+2,T −1 , vt+2,T −1 ) solves P¯t+2 (Ct+1 , xt+2 , yt+2 ),

(4)

where s.t. means “subject to”. When t = T − 2, we define • If Ct (Gt , gt ) = 0, then min ut+1

1 Jt+1 (xt+1 , yt+1 , ut+1 , vt+1 )

2 s.t. vt+1 ∈ arg min{Jt+1 (xt+1 , yt+1 , ut+1 , vt+1 ), vt+1 ∈ 2t+1 (yt+1 )}.

(5)

• If Ct (Gt , gt ) = 1, then min vt+1

2 Jt+1 (xt+1 , yt+1 , ut+1 , vt+1 )

1 (xt+1 , yt+1 , ut+1 , vt+1 ), ut+1 ∈ 1t+1 (xt+1 )}. s.t. ut+1 ∈ arg min{Jt+1

(6)

540

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

In fact, P¯t+2 (Ct+1 , xt+2 , yt+2 ) can be rewritten as P¯t+2 (Ct+1 , Ft+1 (xt+1 , ut+1 ), ft+1 (yt+1 , vt+1 )) and the subproblem P¯t+2 (Ct+1 , xt+2 , yt+2 ) is determined by Ct+1 , xt+1 , ut+1 , yt+1 and vt+1 . The relation between Gt and gt , determines whether some player should be the leader in the next stage or not. The above problem, the current state and decision determining the subsequent state, is hence regarded as feedback information structure. If this reformulation of Ct or the state xt , yt for t = 1, 2, . . . , T is related to the information (or states) of early stages (the initial stage), it is a close-loop (open-loop) dynamic Stackelberg game with alternating leaders. We refer to the problem as the one with a dynamic bilevel optimization problem with alternating leaders or DBOPAL for short. We point out that the definition of DBOPAL consists of a sequence of minimization problems. This paper is organized as follows: Dynamic programming algorithm is proposed for DBOPAL in the next section. Some remarks are presented in the final section. 3. Dynamic programming algorithm for feedback DBOPAL problem The aim of this section is to develop a dynamic programming algorithm for DBOPAL under feedback information structure, which is based on the principle of optimality stated below in Proposition 1. For each t = 0, 1, . . . , T − 1, consider the subproblems (3) and (4) where the initial state (xt , yt ) is given, and {x }T=t and {y }T=t are determined by (1) and (2). In the initial stage t = 0, the first player is assumed to be the leader, whose problem for t = 0 is referred to as PT1−0 (x0 , y0 ). We assume that there always exists solution to each subproblem. Actually, at each stage, it is a Stackelberg game between two players. We also point out that it is difficult to describe the optimal solution on the whole to DBOPAL with general optimal solution such as nonlinear programming problems, multiple-level problems. Now, the following result, which shows the validity of the dynamic programming algorithm for DBOPAL under feedback information structure, is presented and is proved. Proposition 1. Let (u0 , u1 , . . . , uT −1 ) and (v0 , v1 , . . . , vT −1 ) constitute an optimal policy for DBOPAL under feedback information structure with the corresponding optimal trajectories (x0 , x1 , . . . , xT ), (y0 , y1 , . . . , yT ) and  , C, C, . . . , C ¯ C−1 0 1 T −1 . Consider the subproblem Pt (Ct−1 , xt , yt ) for every t = 0, 1, . . . , T − 1 with the initial state    (xt , yt ) and Ct−1 = Ct−1 . Then, the truncated policy  , . . . , vT −1 )} {(ut , ut+1 , . . . , uT −1 ), (vt , vt+1  , x , y ). is optimal for the subproblem P¯t (Ct−1 t t

Proof. The result directly follows the definition of the game.



The principle of optimality shown in Proposition 1 suggests that an optimal policy of DBOPAL under feedback information structure can be constructed in a piecemeal manner. First, optimal policy is found for the subproblems involving only the last stage. Then, utilizing these results, we obtain optimal policies for the last two stages. We repeat this procedure step by step backward until an optimal policy for the entire problem is constructed. We denote VT1−t , V¯T1−t and VˆT1−t to be the optimal function values to the first player for t = 0, 1, . . . , T . Similarly, VT2−t , V¯T2−t and VˆT2−t are also defined. VT1−t and VT2−t are the optimal functions of two players that the first player acts as the leaders at the stage t, where VˆT1−t and VˆT2−t are also the optimal functions of two players with another form. V¯T1−t and V¯T2−t are the optimal functions of two players that the second player acts as the leader at the stage t, where V˜T1−t and V˜T2−t are also the optimal functions of two players with another form. When Ct = 0 and the first player acts as the leader at the (t + 1) stage or Ct = 1 and the second player acts as the leader at the (t + 1) stage, it is regarded as rational event. Otherwise, it is irrational. The dynamic programming algorithm is presented as follows: Algorithm 1 (Dynamic programming algorithm for DBOPAL). Step 1: Set t := T and let for each (xT , yT ) ∈ X × Y V01 (xT , yT ) := GT (xT , yT ), V¯01 (xT , yT ) := GT (xT , yT ),

V02 (xT , yT ) := gT (xT , yT ), V¯02 (xT , yT ) := gT (xT , yT ).

Moreover, V01 (xT , yT ), V02 (xT , yT ), V¯01 (xT , yT ) and V¯02 (xT , yT ) are all rational.

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

541

Step 2: Set t := t − 1 and solve the following problem for each (xt , yt ) ∈ X × Y: 1. For xt+1 = Ft (xt , ut ) and yt+1 = ft (yt , vt ), consider min

ut ∈1t (xt )

Gt (xt , yt , ut , vt (xt , yt , ut )) + VT1−t−1 (Ft (xt , ut ), ft (yt , vt (xt , yt , ut ))),

(7)

where vt (xt , yt , ut ) comprises the solution of the lower level problem min

vt ∈2t (yt )

gt (xt , yt , ut , vt ) + VT2−t−1 (Ft (xt , ut ), ft (yt , vt )).

(8)

If Ct (Gt , gt ) = 1, then VT1−t (xt , yt ) and VT2−t (xt , yt ) are thought to be irrational and go to 2. Otherwise, let VT1−t (xt , yt ) and VˆT2−t (xt , yt , ut ) denote the optimal values of problems (7)–(8), respectively. Let ut (xt , yt ) be an optimal solution of (7) and vt (xt , yt ) := vt (xt , yt , ut (xt , yt )). Define VT2−t (xt , yt ) by VT2−t (xt , yt ) := VˆT2−t (xt , yt , ut (xt , yt )). 2. For xt+1 = Ft (xt , ut ) and yt+1 = ft (yt , vt ), consider min

vt ∈2t (yt )

gt (xt , yt , ut (xt , yt , vt ), vt ) + V¯T2−t−1 (Ft (xt , ut (xt , yt , vt )), ft (yt , vt )),

(9)

where ut (xt , yt , vt ) comprises the solution of the lower level problem min

ut ∈1t (xt )

Gt (xt , yt , ut , vt ) + V¯T1−t−1 (Ft (xt , ut ), ft (yt , vt )).

(10)

If Ct (Gt , gt )=0, then, V¯T1−t (xt , yt ), V¯T2−t (xt , yt ) are thought to be irrational and go to 3. Otherwise, let V¯T1−t (xt , yt ) and V˜T2−t (xt , yt , ut ) denote the optimal values of problems (9)–(10), respectively. Let vt (xt , yt ) be an optimal solution of (9) and ut (xt , yt ) to (10). Define V¯T2−t (xt , yt ) by V¯T2−t (xt , yt ) := V˜T2−t (xt , yt , vt (xt , yt )). 3. For xt+1 = Ft (xt , ut ) and yt+1 = ft (yt , vt ), consider min

vt ∈2t (yt )

gt (xt , yt , ut (xt , yt , vt ), vt ) + VT2−t−1 (Ft (xt , ut (xt , yt , vt )), ft (yt , vt )),

(11)

where ut (xt , yt , vt ) comprises the solution of the lower level problem min

ut ∈1t (xt )

Gt (xt , yt , ut , vt ) + VT1−t−1 (Ft (xt , ut ), ft (yt , vt )).

(12)

If Ct (Gt , gt ) = 1, then the solution is irrational and, V¯T1−t (xt , yt ) and V¯T2−t (xt , yt ) are thought to be the same as that in 2 and go to 4. Otherwise, the solutions to the two players are denoted as V¯T1−t (xt , yt ) and V¯T2−t (xt , yt ), respectively. Then, if V¯T1−t (xt , yt ) and V¯T2−t (xt , yt ) have been defined, compare them and redefine. If V¯T2−t (xt , yt ) obtained is different from that in 2, define V¯T1−t (xt , yt ) and V¯T2−t (xt , yt ) such that the corresponding V¯T2−t (xt , yt ) is less than the other. Or else, both of them are defined. 4. For xt+1 = Ft (xt , ut ) and yt+1 = ft (yt , vt ), consider min

ut ∈1t (xt )

Gt (xt , yt , ut , vt (xt , yt , ut )) + V¯T1−t−1 (Ft (xt , ut ), ft (yt , vt (xt , yt , ut ))),

(13)

where vt (xt , yt , ut ) comprises the solution of the lower level problem min {gt (xt , yt , ut , vt ) + V¯T2−t−1 (Ft (xt , ut ), ft (yt , vt ))}.

vt ∈2t (yt )

(14)

If Ct (Gt , gt ) = 0, then the solution is irrational and VT1−t (xt , yt ) and VT2−t (xt , yt ) are thought to be the same as that in 1 and go to Step 3. Otherwise, the solutions are denoted as VT1−t (xt , yt ) and VT2−t (xt , yt ), respectively.

542

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

Then, if VT1−t (xt , yt ) and VT2−t (xt , yt ) have been defined, compare them and redefine. If VT1−t (xt , yt ) obtained is different from that in 1, define VT1−t (xt , yt ) and VT2−t (xt , yt ) such that the corresponding VT1−t (xt , yt ) is less that the other. Or else, both of them are defined and rational. Step 3: If t = 0, a solution to DBOPAL is obtained and stop. Otherwise, go to Step 2. We point out that the two subproblems, at Step 2 for each case, are solved simultaneously. In the long run, the leaders are changed. Moreover, the optimal values VT1−t (xt , yt ) and VT2−t (xt , yt ) depend on VT1−t−1 and VT2−t−1 . The next theorem shows that an optimal solution to DBOPAL is obtained with Algorithm 1. Theorem 1. For each t = 0, 1, . . . , T − 1, and (xt , yt ) ∈ X × Y, let (ut (xt , yt ), vt (xt , yt )) be an optimal solution to the subproblem (7)–(8)/(9)–(10)/ (11)–(12)/(13)–(14). Then, for any t 1 and (Ct−1 , xt , yt ), an optimal solution to P¯t (Ct−1 , xt , yt ) is given by {(ut (xt , yt ), ut+1 (xt+1 , yt+1 ), . . . , uT −1 (xT −1 , yT −1 )),  (xt+1 , yt+1 ), . . . , vT −1 (xT −1 , yT −1 ))} (vt (xt , yt ), vt+1 with x+1 = F (x , u ),

y+1 = f (y , v ),

(15)

where  = t, t + 1, . . . , T − 1. In particular, {(u0 (x0 , y0 ), u1 (x1 , y1 ), . . . , uT −1 (xT −1 , yT −1 )), (v0 (x0 , y0 ), v1 (x1 , y1 ), . . . , vT −1 (xT −1 , yT −1 ))} is an optimal decision to DBOPAL. Proof. We prove by induction in t. Apparently, the results hold for t = T − 1. Assuming the conclusions holding for all t = T − 1, . . . , t¯ + 1, we show that the statement also holds for t = t¯. The following four cases should be considered to prove the result. 1. 2. 3. 4.

(xt¯, yt¯) is obtained from (7)–(8). (xt¯, yt¯) is obtained from (9)–(10). (xt¯, yt¯) is obtained from (11)–(12). (xt¯, yt¯) is obtained from (13)–(14).

We now consider the above first case. Since the result holds when t = t¯ +1, we see that, for any (xt¯+1 , yt¯+1 ) ∈ X×Y, {(ut¯+1 (xt¯+1 , yt¯+1 ), ut¯+2 (xt¯+2 , yt¯+2 ), . . . , uT −1 (xT −1 , yT −1 )),

(vt¯+1 (xt¯+1 , yt¯+1 ), vt¯+2 (xt¯+2 , yt¯+2 ), . . . , vT −1 (xT −1 , yT −1 ))}

is an optimal solution to the subproblem PT1−t¯−1 (xt¯+1 , yt¯+1 ). On one hand, {(vt¯+1 (xt¯+1 , yt¯+1 ), vt¯+2 (xt¯+2 , yt¯+2 ), . . . , vT −1 (xT −1 , yT −1 ))(ut¯+1 (xt¯+1 , yt¯+1 ), ut¯+2 (xt¯+2 , yt¯+2 ), . . . , uT −1 (xT −1 , yT −1 ))} is an optimal solution for two players, where {x }T=t¯+2 and {y }T=t¯+2 are determined by (15). Consider the second player if the decision of the first player is ut¯+1 at the stage t¯ + 1. Namely, for any decision of the second player’s v¯t¯+1 with the corresponding policy v¯t¯+1,T −1 ∈ 2t¯+1,T −1 (yt¯+1 ) and u¯ t¯+1,T −1 ∈ 1t¯+1,T −1 (xt¯+1 ), where (u¯ t¯+2,T −1 , v¯t¯+2,T −1 ) is an optimal solution to P¯T −t¯−2 (Ct¯+1 , x¯t¯+2 , y¯t¯+2 ), we have gT (x¯T , y¯T ) +

T −1

=t¯+1

g (x¯ , y¯ , u¯  , v¯ ) gT (xT , yT ) +

T −1

=t¯+1

g (x , y , u , v )

= VˆT2−t¯−1 (xt¯+1 , yt¯+1 , ut¯+1 ) = VT2−t¯−1 (xt¯+1 , yt¯+1 )

(16)

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

543

with (x¯t¯+1 , y¯t¯+1 , u¯ t¯+1 ) = (xt¯+1 , yt¯+1 , ut¯+1 ) and y¯+1 = f (y¯ , v¯ ),  = t¯ + 1, . . . , T − 1.

x¯+1 = F (x¯ , u¯  ),

(17)

On the other hand, {(ut¯+1 (xt¯+1 , yt¯+1 ), ut¯+2 (xt¯+2 , yt¯+2 ), . . . , uT −1 (xT −1 , yT −1 )) (vt¯+1 (xt¯+1 , yt¯+1 ), vt¯+2 (xt¯+2 , yt¯+2 ), . . . , vT −1 (xT −1 , yT −1 ))} is an optimal decision of the two players, where {x }T=t¯+2 and {y }T=t¯+2 are determined by (15). Namely, consider any decisions of the first player uˆ t¯+1 and the response of the second player vˆt¯+1 at the stage t¯ + 1, with the corresponding (uˆ t¯+2,T −1 , vˆt¯+2,T −1 ), where uˆ t¯+1,T −1 ∈ 1t¯+1,T −1 (xˆt¯+1 ) and the decision of the second player vˆt¯+1,T −1 ∈ 2t¯+1,T −1 (yt¯+1 ). Moreover, (uˆ t¯+2,T −1 , vˆt¯+2,T −1 ) is an optimal solution to P¯T −t¯−2 (Ct¯+1 , xˆt¯+2 , yˆt¯+2 ). We hence have GT (xˆT , yˆT ) +

T −1

T −1

G (xˆ , yˆ , uˆ  , vˆ ) GT (xT , yT ) +

=t¯+1

=t¯+1

G (x , y , u , v )

= VT1−t¯−1 (xt¯+1 , yt¯+1 ),

(18)

where (xt¯+1 , yt¯+1 ) = (xˆt¯+1 , yˆt¯+1 ) and xˆ+1 = F (xˆ , uˆ  ),

yˆ+1 = f (yˆ , vˆ ),  = t¯ + 1, . . . , T − 1.

(19)

Consider t = t¯ and assume (ut¯ (xt¯, yt¯), vt¯ (xt¯, yt¯)) to be an optimal solution to the subproblem PT1−t¯(xt¯, yt¯). Firstly, we show that vt¯ (xt¯, yt¯) is an optimal decision of the second player corresponding to the first player’s decisions ut¯ (xt¯, yt¯). For the purpose of contradiction, suppose that there exist (v¯t¯(xt¯, yt¯), v¯t¯+1 (x¯t¯+1 , y¯t¯+1 ), . . . , v¯T −1 (x¯T −1 , y¯T −1 )) and (u¯ t¯(xt¯, yt¯), u¯ t¯ + 1 (x¯t¯ + 1 , y¯t¯ + 1 ), . . . , u¯ T − 1 (x¯T −1 , y¯T −1 )), along with the corresponding sequences {x¯ }T=t¯+1 and {y¯ }T=t¯+1 , such that (u¯ t¯+1,T −1 , v¯t¯+1,T −1 ) is an optimal solution to P¯T −t¯−1 (Ct¯, x¯t¯+1 , y¯t¯+1 ) and gT (x¯T , y¯T ) +

T −1

g (x¯ , y¯ , u¯  , v¯ ) < gT (xT , yT ) +

=t¯

T −1

=t¯

g (x , y , u , v )

(20)

for (xt¯, yt¯, ut¯ ) = (y¯t¯, y¯t¯, u¯ t¯), where {x¯ }T=t¯+1 and {y¯ }T=t¯+1 are determined by (17). Moreover, from Algorithm 1 and (16) we obtain gT (x¯T , y¯T ) +

T −1

g (x¯ , y¯ , u¯  , v¯ )

=t¯

= VˆT2−t¯−1 (xt¯+1 , y¯t¯+1 , u¯ t¯+1 ) + gt¯(xt¯, yt¯, ut¯ , v¯t¯) = VT2−t¯−1 (xt¯+1 , y¯t¯+1 ) + gt¯(xt¯, yt¯, ut¯ , v¯t¯)  VˆT2−t¯(xt¯, yt¯, ut¯ ) = VT2−t¯(xt¯, yt¯) = gT (xT , yT ) +

T −1

=t¯

g (x , y , u , v )

with (xt¯, yt¯, ut¯ ) = (x¯t¯, y¯t¯, u¯ t¯), where the first inequality follows from the fact, (u¯ t¯+1,T −1 , v¯t¯+1,T −1 ) being an optimal solution to P¯T −t¯−1 (Ct¯, x¯t¯+1 , y¯t¯+1 ), and the second inequality comes from Step 2 of Algorithm 1. However, this contradicts (20).Therefore, vt¯ (xt¯, yt¯) is the optimal response of second player corresponding to the decision of the first player’s ut¯ (xt¯, yt¯) at stage t¯. We now show that {(ut (xt , yt ), ut+1 (xt+1 , yt+1 ), . . . , uT −1 (xT −1 , yT −1 )),  (vt (xt , yt ), vt+1 (xt+1 , yt+1 ), . . . , vT −1 (xT −1 , yT −1 ))}

is an optimal solution to the subproblem PT1−t (xt , yt ). If this were false, there must exist decision variables of the two players {(u¯ t¯, u¯ t¯+1 , . . . , u¯ T −1 ), (v¯t¯, v¯t¯+1 , . . . , v¯T −1 )} such that (u¯ t¯+1,T −1 , v¯t¯+1,T −1 ) is an optimal solution

544

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

to P¯T −t¯−1 (Ct¯, x¯t¯+1 , y¯t¯+1 ) and GT (x¯T , y¯T ) +

T −1

G (x¯ , y¯ , u¯  , v¯ ) < GT (xT , yT ) +

=t¯

T −1

=t¯

G (x , y , u , v ),

(21)

along with the corresponding sequences {x¯ }T=t¯+1 and {y¯ }T=t¯+1 , where xt¯ = x¯t¯ and yt¯ = y¯t¯. However, from Algorithm 1 and (18), we obtain GT (x¯T , y¯T ) +

T −1

=t¯

G (x¯ , y¯ , u¯  , v¯ ) = VT1−t¯−1 (x¯t¯+1 , y¯t¯+1 ) + Gt¯(xt¯, yt¯, u¯ t¯, v¯t¯) VT1−t¯(xt¯, yt¯) = GT (xT , yT ) +

T −1

=t¯

G (x , y , u , v )

with xt¯ = x¯t¯ and yt¯ = y¯t¯, where (u¯ t¯+1,T −1 , v¯t¯+1,T −1 ) being an optimal solution to P¯T −t¯−1 (Ct¯, x¯t¯+1 , y¯t¯+1 ) yields the first inequality and the second inequality is induced from Step 2 of the Algorithm 1. This contradicts (21). Therefore, {(ut¯ (xt¯, yt¯), ut¯+1 (xt¯+1 , yt¯+1 ), . . . , uT −1 (xT −1 , yT −1 ), (vt¯ (xt¯, yt¯), vt¯+1 (xt¯+1 , yt¯+1 ), . . . , vT −1 (xT −1 , yT −1 ))} is optimal and, hence, the result holds for t = t¯. Consequently, the result holds for all t = 0, 1, . . . , T − 1 for the first case. Similarly, the other three cases can be immediately obtained and the detail proof is omitted. The proof is complete.  The following example illustrates the above algorithm. Example 4. Consider the following two-player problem with two stages. Among players, one is the leader and the other is the follower at each stage. The decision variables of the players are u = (u1 , u2 ) for the first player and v = (v1 , v2 ) for the second player, respectively. The state transition equations are given by xt+1 = xt + ut , yt+1 = yt + 2vt ,

t = 1, 2, t = 1, 2.

The cost functions are given by g3 (x3 , y3 ) = 4x3 , G3 (x3 , y3 ) = y3 , and gt (xt , yt , ut , vt ) = (3 − t)u2t − 2ut xt , Gt (xt , yt , ut , vt ) = (9 − 4t)vt2 − 2vt yt , for t = 1, 2. The initial states are x1 = 1, y1 = 1 and the admissible decisions are unrestricted, i.e., 1t (xt ) = R for t = 1, 2. 2t (yt ) = R for t = 1, 2. At the first stage, the first player acts as the leader or C0 = 0. Moreover, Ct = 1 if xt yt for t = 1. Let us apply Algorithm 1 to this example. Let V01 (x3 , y3 ) = 4x3 , V02 (x3 , y3 ) = y3 .

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

545

At the first step, the following problem, which corresponds to (7) with t = 2, is considered, where the second player acts as the leader at this stage: min v2

y2 + 2v2 + G2

s.t. u2 ∈ arg min{4(x2 + u2 ) + g2 }. The solution to this problem is computed as u2 = x2 − 2 and v2 = y2 − 1. Moreover, we have V12 (x2 , y2 ) = −(y2 − 1)2 + y2 , V11 (x2 , y2 ) = −(x2 − 2)2 + 4x2 . Consider that the first player acts as the leader for t = 2 and we also have u2 = x2 − 2 and v2 = y2 − 1. The final step is to consider the following problem, which corresponds to (7) with t = 1, where the first player acts as the leader at this stage: min u1

− (x1 + u1 − 2)2 + 4x1 + 4u1 + g1

s.t. v 1 ∈ arg min{−(y1 + 2v1 − 1)2 + y1 + 2v1 + G1 }. The solution to this problem is computed as u1 = 2x1 − 4 and v1 = −3 + 3y1 . Furthermore, we have V21 (x1 , y1 ) = −x12 − (2x1 − 4)2 + 8x1 − 4, V22 (x1 , y1 ) = 3y1 − y12 − (3y1 − 3)2 − 1. Consider that the second player acts as the leader for t = 2 and we also have u1 = 2x1 − 4 and v1 = −3 + 3y1 . Therefore, the optimal decisions and states are u1 = −2, v1 = 0, x2 = −1, y2 = 1 u2 = −3, v2 = 0 and x3 = −4, y3 = 1. It is irrational that the first player acts as the leader at the second stage. The optimal values to two players are −1 and 1, respectively. In a dynamic programming algorithm the problem is decomposed into a sequence of minimization problems involving only decision variables of each stage, which are easier than the original problem. 4. Concluding remarks A discrete time dynamic Stackelberg game with alternating leaders is proposed in this paper and dynamic programming algorithm is applied. There exists comprehensive applications about DBOPAL in the society. In extreme cases, when Ct (Gt , gt ) = 0 are always true (or always false) for all t, the model is a discrete time dynamic Stackelberg game with fixed leader and followers. The form of the problem seems easier under extreme cases than those in general in this paper. About Stackelberg games with alternating leaders, there are some other applications in the field of economics. In some agencies, agents sometimes play the leading positions while corporation sometimes acts as the leader. It also seems very popular in behavior sciences. In the long run, election policy, which is very popular in our society, will also become a type of DBOPAL. As the beginning of dynamic Stackelberg games with unfixed leaders, it is very interesting to further exploit it. References [1] J.F. Bard, Optimality conditions for the bilevel programming problem, Nav. Res. Logistics Q. 31 (1984) 13–26. [2] T. Ba¸sar, G.J. Olsder, Dynamic Noncooperative Game Theory, second ed., Academic Press, New York, NY, 1995. [3] C.I. Chen, J.B. Cruz Jr., Stackelberg solution for two person games with biased information patterns, IEEE Trans. Autom. Control 6 (1972) 791–798. [4] S. Dempe, Foundations of Bilevel Programming, Kluwer Academic Publishers, Dordrechts, The Netherlands, 2002. [5] G. Feichtinger, W. Grienauer, G. Tragler, Optimal dynamic law enforcement, Eur. J. Oper. Res. 141 (2002) 58–69. [6] T. Fent, G. Feichtinger, G. Tragler, A dynamic game of offending and law enforcement, Int. Game Theory Rev. 4 (2002) 71–89.

546

P.-y. Nie et al. / Nonlinear Analysis: Real World Applications 9 (2008) 536 – 546

[7] M. LabbÉ, P. Marcotte, G. Savard, A bilevel model of taxation and its application to optimal highway pricing, Manage. Sci. 44 (1999) 1608–1622. [8] M. Li, J.B. Cruz Jr., M.A. Simaan, An approach to discrete-time incentive feedback Stackelberg games, IEEE Trans. Syst. Man Cybern. A 32 (2002) 472–481. [9] P.Y. Nie, Dynamic stackelberg games under open-loop complete information, J. Franklin Inst. Eng. Appl. Math. 342 (2005) 737–748. [10] P.Y. Nie, L.H. Chen, M. Fukushima, Dynamic programming approach to discrete time dynamic feedback Stackelberg games with independent and dependent followers, Eur. J. Oper. Res. 169 (2006) 310–328. [11] A. Roth, J. Rubens, Introduction to bridge: the mechanics of bridge, The Bridge World http://www.bridgeworld.com . [12] K. Shimizu, Y. Ishizuka, J.F. Bard, Nondifferentiable and Two Level Mathematical Programming, Kluwer Academic Publishers, Boston, MA, 1997. [13] M.A. Simaan, J.B. Cruz Jr., A Stackelberg solution for games with many players, IEEE Trans. Autom. Control 18 (1973) 322–324. [14] H. von Stackelberg, The Theory of the Market Economy, Oxford University Press, Oxford, UK, 1952.