Information and Dominant Player Solutions in Linear-Quadratic Dynamic Games

Information and Dominant Player Solutions in Linear-Quadratic Dynamic Games

INFORMATION AND DOMINANT PLAYER SOLUTIONS IN LINEAR-QUADRATIC DYNAMIC GAMES B. Tolwiiiski Polish Academy of Sciences, Systems Research Institute, Newe...

4MB Sizes 7 Downloads 49 Views

INFORMATION AND DOMINANT PLAYER SOLUTIONS IN LINEAR-QUADRATIC DYNAMIC GAMES B. Tolwiiiski Polish Academy of Sciences, Systems Research Institute, Newelska 6, 01·447 Warsaw, Poland

Abstract. A dynamic Stackelberg game with linear dynamics, quadratic cost functions, and the dominant player having access to imperfect closed-loop information is considered. The closedloop Stackelberg equilibrium solution is derived, and the relation between the amount of information available to the dominant player and his cost under the equilibrium solution is discussed. As a consequence, the basis for calculating the value of information in hierarchical dynamic games is provided. Keywords. Differential games; game theory; decision theory; system theory~ linear systems; Stackelberg solution.

INTRODUCTION The concept of domination in a nonzero-sum dynamic game can be defined as the ability of one of the players to announce his decision rules ahead of time, or in other words, as the ability of a player to threaten his opponents. The equilibrium solution to a game of this type is called the closed-loop Stackelberg /CLS/ solution~ For a considerable time there have been few results concerning CLS equilibria. Quite recently, however, the CLS solution has been obtained for a class of games with linear dynamics, quadratic cost functions, and perfect state information available to the dominant player, by Baiar and Selbuz /1979/, Papavassilopoulos and Cruz /1979/, and Tolwinski /1979/. In the latter reference it has been pointed out, that the performance of the dominant player, i.e. the value of his cost function under the CLS solution, depends on his ability to identify decisions made by other players. If on the basis of the available information the dominant player can exactly identify all such decisions, then he is in position to attain the minimum of his cost function with respect to all system inputs, including those controlled by his opponents. Otherwise',namely when the dominant player doesn t have access to the suffiCiently detailed information, the other players have greater freedom of action, what usualy results in greater cost incurred by the dominant player

305

at equilibrium. In the present paper we consider the CLS solution for the discrete-time deterministic two-person games with the dominant player /the leader/ having access to an imperfect, or aggregated, information about the state vector. Thus, a further insight into the relation existing between information structures and dominant player solution of nonzero-sum dynamic games is proVided. These results have also direct consequences for the theory of multilevel dynamic systems.

PROBLEM FORMULATION Consider a deterministic linear dynamic system with two controllers /players/ xt +1

= AtX t

+ BtU

t

+ CtV

t ,

(1)

where x t /n-vector/ is the state variable, Ut /m-vector/ and v t /r-vector/ are the control variables handled by players 1 and 2, respectively~ Each player is assumed to have a quadratic cost function, namely JilO,x O ;

{uJo'{v~o)'

where

~\(

s , x 0;

~ t3s ,{V) S) =

B. Tolwinski

306

(C)

For any s = O,1,2, .•• ,T-1 there exist s such ~u F 0, that s

(5) O~S
i

=

1,2,

with T given time horizon, tUt} = ." s = (uS,uS+i,···,uT_l)' t.v~ s = (vs' v S +1 ,···,v T_ 1 ), Sit' Qit' Rit being symmetric and satisfying Sit~O,

Qit~O, i

= 1,2,

Rlt~O, and R2t >O.

LetYt /k-vector/ denote the information acquired by player 1 at stage t, where Yt is a deterministic linear function of x t and/or v t ' to be specified later on. Therefore the information set of player 1 at this stage consists of vectors YO'Yl' ••• 'Yt' and a decision rule of player 1 at stage t can be assumed to be of the form

We assume, that player 1 is the dominant player /the leader/, what implies, that he is in position to announce his decision rules ahead of time. Given a sequence of decision rules, or strategy,t¥1t} player 2 /the follower/ faces an optimal control problem. Restricting the class of admissible strate~ies of the leader to include only {~1tJ for which such problems admit unique solutions, one can define the mapping M assigning the solution of the follower's fro~lem to any admissible {'t1t}. Now, Y;-t} is said to be a CLS strategy of the leader, if

where

6J 2 (s,x s ; t U t ! s,{vtj s;&s )= J 2(

1

s ,x s iUs + 6 u s ' { u t s+1'

- J2(S'Xs;{Ut}s,!Vtls).

t

V ti

s)(6)

This condition implies, that the leader is able to influence the follower's cost function at any stage of the game. An essential element of the definition of the CLS so~ution is the information structure of the game. So far only the games with the perfect state information available to the leader have been considered /see the references mentioned in the previous section/. This paper extends the approach presented in /Tolwinski, 1979/ to problems with more general information structure, determining at the same time the relation linking the information available to the dominant player and his ability to enforce his policies upon the other player.

THE CASE OF PERFECT DOMINATION

The best result the dominant player can dream of is the minimum of J i with resp~ct to both system inputs {Ut] and tVtJ. In the situation when he is actually in position to realize such a result, and excluding the case of J and J being identical/the 1 2 team problem/, one can say that the dominant player perfectly dominates t¥l~J = arg.min J 1 (O'X o ;{i1t}, MttlJ) • his opponent. Now, we are going to show, that the leader perfectly do1tit J the follower in the game de~ (4) minates fined in the previous section, if given information vectors YO'Y1' ••• ' Furthermore, tt2~} = Mt~;t} is called YT-i he is able to exactly identify the CLS strat;e~y })f the follower, and the pa ir is ca lIed all decisions made by the latter in the tJ ' 1 course of the game. Clearly enough, the CLS solution to the game (1), (2) this is possible under information with player 1 as the leader, and with structure in forma t ion st ruct ure 1. y t J •

Ctt l

t;J)

Yt In the sequel, a condition eliminating from our considerations problems which are not actual games is assumed to hold, namely

= vt

'

t

= O,l, •.. ,T-l.

(7)

or information structure Yt

=

xt '

t

= O,l, .•• ,T-l,

(a)

Information and Dominant Player Solutions

Theorem 1

~ith

additional assumptions, that rank (C t) = r, t = 0,1, ••• ,T-2, C _ = 0 T 1

307

Let (9)

Condition (9) implies, that v t ' t = 0, 1, ••• ,T-2, can be unique~y determined from the state equation (1) when Ut' x and x + are known, and that the t 1 t follower doesn't act at the last stage of the game /otherwise, the leader couldn't identify v T_ /. 1 Let us introduce the following notation: L1t and L2t are matrices determining a feedback solution

s

= 0,1, ••• ,T-1,

where Ut

= -L 1t x t

,

t

= s,s+t, .• ~,T-1,

Vs is arbitrary, and v t = -L 2t x t , t = s+1,s+2, •. ~,T-1. Select ht' t = 0,1, •.• ,T-1, satisfying g;ht>O, if 9 t ~ 0

(17)

or h;Gth t ~ 0, otherwise. t = 0,1, ••• ,T-1

(10)

Furthermore, select c t ' t = 0,1, ••• , T-1, according to the formula

to the linear-quadratic optimal control problem

.

min

1:1

1 (0 ,X O ;

t U t ! , tv tJ Xt +1 = AtX t

{Ut]' fv t 1J

={A; ~v:

L 2t X t

J' 0

)/9;h t , if 9 t

{2 At (v t + L 2t X t ) IVh;Gth t

et

,

+

' ~8)

otherwise with

+ BtU t + CtV t •

[lR 2t

'At

+ C;Pt+1Ct)L2t-

- c;Pt+1 ~At - Bt L1t )] Xt • t

Then, strategies 1.'t 1t} and~)'2tJ, where

= 0,1, ••• ,T-1;

X2t =

-L 2t X t

, t = O,1, ••• ,T-1 l20)

constitute the CLS solution of the game (1) - (2), with player 1 as the leader, and with information structure (7).

Theorem 2 t = 0,1, ••• ,T-1

(14)

Furthermore, define a strategy for the leader in the form

Assume, that rank (C t ) = r, t • • • , T- 2, and C _ 1 = o. Let T

9 s = l~/dUs):l2(s,xs; t

= O,1, ••• ,T-1

(15)

where c t is a scalar, and h an m-vect tor. Now, we can state the following results.

s

tu) s,{vt}s),

= 1,2, .... ,T-1

where Ut = -L 1t x t ,

0,1,

l21)

v t = -L 2t X t ,

t = s,s+1, ••• ,T-1. Select ht' t = 1,2, ••• ,T-1, satisfying (17). Furthermore, select

B. Tolwinski

308

c

0

= 0,

and

r; A;

where 1\

21' 31' R11 , R , R it it it 32' 21 22 R , R , R it it it 33 32 31 R , R , R it it it

,

if gt 1 0 (x t - xt)/gtht'

ct

= ~2

(x

t

R it

~t) /~h;Gtht otherwise,

t

=

1,2, ••• ,T-l,

(22)

where ~t is an n-vector satisfying the system of r.~ n linearly independent equations, namely

C;_l~ = (R 2t _ 1

+

C;_lPtCt_l)L2t_1Xt_l-

- C;_lPt(At_1Xt_l

(27)

It is easy to verify, that assuming the existence of minimum of J with 1 respect to tuti and {v t l, and the strict convexity of J 2 with respect to {v t } , the following modification of theorem 2 holds.

+ Bt _ U t _ 1 ),

1

Theorem 2/a/ Assume, that rank (C t )= r, t = 0,1, .r.,T-2, CT_ = 0, and define stra1 tegies Then strategies t¥1t~ and

('t 2t l,

where

i'l{1t J

(a) •

The proof of theorem 2 has been given in /Tolwinski, 1979/, and the proof of theorem 1 is almost identical, so it is omitted. Note, however, that because of condition (C), at the optimal trajectory one has

~ J 2 ls , x s ;

tU t} s '

tv tJ s;

Cs h s )

c s·CcJ2/Ou s )'h s + +

(1/2)c 2s h'(
+

2

/du 2s )h s =

2 (1/2)c s h'G s s h s 'I 0

(25)

so there always exists h satisfying t (17) • In the next section we shall need slighty more general version of theorem 2, dealing with a game determined by state equation ~1) and cost functions of the form

J as

in theorem

2, except for relation (23), for which equality C;_l/;; =

constitute the CLS solution of the game (1) - (2) with player 1 as the leader, and with information structure

and {)(2t

[(R~;_l

C;_lPtCt_l)L2t_l -

+

R~~_l]Xt_l

- C;_l P t At_l -

, - ( Ct_1PtBt_1

+

-

32 ) R2t _ 1 u t _ 1

should be substituted. Then strategies {tlt} and {t2t J constitute the CLS solut ion of t he game (1), (26), with player 1 as the leader, and with information structure (a)1~ The strategies defined for the leader in theorems 1 and 2 enable him to obtain the best of all possible results. Announcing these strategies can be interpreted as threatening the follower with penalties for any deviations from policies chosen by the leader. If the follower belives in leader's threats, then his best choice is to follow leader's wishes. So, in this context domination means the ability to formulate credible threats. The difference between information structures (7) and (a) consists in that that under l7) the leader knows v t at stage t, while under (a) only at stage t+l. Other structures, of the latter type will be considered in the next section.

1/ The theorem holds also in the csse of x , Ut and v having different dimensIons at different stages, prov1~ ded column vectors of Ct are linearly independent for t a O,1, ••• ,T-2.

Information and Dominant Player Solutions

309

THE CASE OF IMPERFECT 00r"11 NATION

Let 't: = \. to' t 1 ' • • ~ , t k 1 be a sub set 0 f to, 1 , • • • , T- 1.1 wit h t 0 = 0, t 1< = T- 1 , and 1~ K
=

Gt xt' t

= 0, 1 , ••• , T -

1

t


for t

;:Z-,

x t +1

( 29)

where GtiS k t x n matrix of rank kt' k

subject to

and Gt = I In x n unit p

matrix/ for te.1: In other words, at some stages, including the initial and the final ones land possibly some others/, the leader has access to the perfect state information, while at other stages only the information in an aggregated form is available. We assume also that CT _ = O. so there 1 is no problem with identification of •

AtX t + GtU t +

et v t

+

J , {y tV,

({U t

and fYt1 restricted to be feasible~ To make the solution of (PL) possible by standard methods we are going to reformulate it as a standard linearquadratic optimal control problem. For this purpose the following notation will be needed. Let a. = t. 1 + 1 and b. = t. 1 -1, J JJ J+ then

~

t

~

T 1

= (~tj · J

_v.

• J.)

v _ •

J-

ub .

Under such an information structure the leader may be able to enforce the realization of a desired sequence Y1'Y2' •• ~'YT-l' but he is in no position to influence the follower's choice of a specific tVt} leading to this sequence. Having this much of freedom the follower can be expected to choose. for any given fuJ and feasible {Yt)' the control sequence {v t l constituting solution to the problem 1 ~ 1) (30) (PF) min J2lo,xo; t Ut J ,tv t "

(36 )

vb.

J

= O.l, ... ."K;

j

x

o x.J ,

Xa

=

:.. (x t

1,2,.r.,K,

j

.\

J) j

-xK 1 +

xT ;

=

{v t J

subject to

t

j

O,l, ••• ,T-l

diag (Q it j " " ,Qibi '

diag(R'1. t . , ••• ,R·1. b . ),

1,2,.~.,T-l

J

l32) where trajectory {y 1 is said to be t feasible under {u~ , if there exists {vtJ satisfying l31) and (~2). Let \Vt(\Utl'{YtJ)}denote the solution to

t PF ).

Clearly, the best result the leader can now expect is (PL)

min

l Ut l,f Vt

}J 1 (a,x a •

{Ut} ,{vtO ,

sUbj ect to (31), (32) and

J

= O,l, ••• ,K,

j

i

=

1,2;

SiO '

-S ..

= d ia 9 (S.

1.a j

1.J

j

-SiK+l

• ' .... , S.

t

1.

j

),

= 1, •••• ,K

=

SiT' i

= 1,2

(33) G.

J

j

d ia g ( Ga . ' • • , , Gt . ) , J J = 1,2, .... ,K

(42)

B. Tolwinski

310

Consider the solution of tPF), which at present has the form

Let C(t,s denote the state transition matrix of the system

(PF)

i.e. 'et ,s =

A t _ 1A t _ 2 •• .A s

' for t

where (44)

cet ,aXa t

t-1 +

~eet, s+l (BsU s

;)2\a,x a ;

sUbj ect to

>s

Then one has xt =

~i7

t v j.)

+ Csv s) ,

lU j !

t Uj

,\.vjD

(51)

l4S) and (50),

is assumed to be given,

and lY.} - to be given and feasible. J Because of C _ = 0 the optimal T 1 ~K = O. Furthermore, condition G a I t for tE'Limplies, that lPF) decomposes into K independent problems

= 1,2, ••• ,T,

or xj

= E 1J0 1x t

j

_ + E2Jo_1UJo_1+ E3j_1VJo_1' 1

1,2, ••• ,K+1,

j

where '€t .+1,t 0 • J J

J9

,

E 1j =

E

2j = Dj x

'€t j + 1 ,t j

Ct ~

E3j = Do x J

Dj

~b .i ' Jl

j

(t

Cb

j

= (d~l)

k,l=to+1,t o+2, ••• ,t j l ' J J + where d kl j = 0 for k
~ " and d kl j ="k,l for kgl.

In the new notation relations (1), (2) and (29) become respectively

j

J . (t

1.

subject to

A

Bt J

Yo 1 J+

= GoJ +1(E 1_oJox t j +

E2j U.+ E3j Vj ). J (53)

In the sequel we assume, that matrix G + E3j has linearly independent row j 1 vectors, and thus the constraints (53) are linearly independent. If it is not the case, then one should first eliminate the constraints which are linearly dependent, and instead of G + E consider 8 matrix obtained j 1 3j from it by deleting all but linearly independent row vectors. Now, let Wo denote the Lagrange J multiplier associated with problem (52)-(53). By a straightforward calculation one obtains the solution to lPF) as

= 0,1, ••• ,K

k' x t k ; t UJ01 ,{ vJ01)

=

wj

K _ _ X = (1/2)2:t j'SOjXo+ u~5. ou.+ VJ~R1.0jVj)+ j =k 1. J J -J.J J

, -, )-1[= I\ -Gj+1E3jFj-1 E3j Gj + 1 yj+1

+

+(Gj+1E3jFj1E;jS2j+1- Gj+1)(E1jXtj+ + E

U

2j j

)] ,

(55)

where (50)

(ss)

Information and Dominant Player Solutions

Relation (55)determines a one-to-one mapping between the space of all twjl and the set of all feasible trajec. tor i e s {Y j \ • The re for e (P L) ca n be a t present reformulated as

311

(Si 0 , 0) j '

N ~R .. N.

VI ..

1J

J 1J J

+

I

.

0

\0

, Q1' J'

I

, 0

0

,

0

(65) Nj

= F-j 1 E' 3j (0 , • .... , 0 ,- S2j +1 E 1j'

So \PL) becomes the standard optimal control problem

=

j

(5S)

0,1, ••• ,1<

Substituting (54) in gives

l4S) and

(49) subj ect to (60). Observe, that

(6S)

or

~60) and Jiltk,x t

Ji

lt

k

k

,x

; tUj-~ tk

; {

,\"vjD =

Uj} ,{wjJ

So far, we have shown, that (67) is the lower bound for the leader's cost under information structure (29). Now, we are going to show, that the leader can actually attain this lower bound by applying an appropriate closed-loop strategy. To this purpose we define a new game satisfying the assumptions of theorem 2/a/, aQd with cost functions differing from J., 1. i = 1,2, only by constant terms .

y.

Let 1<

,. 2 )~l-' ~ ,1/ ~ x., -, u . , -') w. VI. j =k . J

J

J

.

1J

(61)

J

be the new state variable. As

x t .= Yt.' and Yt. is the last comJ J _ J ( ) ponent of y., so multiplying 59 by Gj + 1 gives J Yj + 1

AjYj

+ BjU

j

+ CjW ,

j

(69)

where

?loJ dim x.-J

n ."",

A.

J

l62) B.

J

C

j =

E

F-1, -G'

3j j E3j j+1'

l64)

Substituting (46) and (54) in for k = 0, results in

J.(o,x o ; iu.j~,{V.3) 1.

J

J

(49),

B. Tol:winski

312

and tWj~ ~ In other words, we have demonstrated the following theorem. where Theorem 3 Assume that CT_ = 0, and suppose 1 that game (1) - (2) has information structure (29). Then, the transformed game defined by state equation (69) and cost ~~nctions (73) has the CLS solutionll~lj~' L~2j~) which can be determined by means of theorem 2/a/. Moreover, the CLS solution of the original game can be defined as

with V. .

H ~y~ . H.

Y ..

I:~~)

J 1J J

~J

~J

-.."

Sij +1 (E 1j , E 2j , E 3j '

\ E 3j

+

J

h h

= 0, 1 , • • • , I< ,

l77)

0, 0 , 0 ) 0, Qij' ~ i

·

(78 )

0, I

, 0

, 0

0, 0

I

0

where

~it .

j2

_F j

j3

_F j

i

1,2,

J

j

, -

E3jS2j+1E2j'

-1 , E3j Gj +1 •

J

~ib.

-1 , _F j E3jS2j+1E1j' -1

.

~ij

h j l ' h j2 ' h.J

_-:b,.j 1

j

and

( O,O,R

H.

~lj'

+

(75)

/0 and I denote here zero and unit matrices of appropriate dimensions./ Clea rly

'\lo,x o; tUj} ,!jvjD = = Ji\.o,x O ; t Uj 5,tw}) +

x~SiOxO (76) Observe, that cost function '~1 has the

minimum, cost function J is strictly 2 convex with respect to w. , C. is of J J_ ran k equal to the dimension of w., "" J j = 0,1, ••• ,K, and CK+1 = O. Therefore the assumptions of theorem 2/a/ are satisfied, implying the existence of CLS solution of the game defined by state equation (69) and cost functions (73), under whi~ the leader attains the minimum of J 1 with respect to tUj3

=

1,2,.,.#,1<~

The strategy defined for the leader in theorem 3 is at stage t.~ t< t. J "' J +1 a function of all observations made by the leader at stages t. + 1, J-1 t _ + 2, ••• ,t • It is easy to verify 1 j j that this function has the form (80) t .

J

~

with c t depending on L'A;(y s=a.· s (A

t

- y), s

J

where tYs~ is the trajectory enforced by the leader.

CONCLUSION

the main result of this paper is the derivation of the CLS solution for a dynamic game with the dominant player having access to the closed-loop information, which at some stages, excluding the initial and the final ones, may be imperfect /aggregated/.

Information and Dominant Player Solutions

This makes possible the evaluation of the dominant player's cost under the equilibrium solution as a function of number of independent state measurements made by the dominant player in the course of the game, and therefore, the determination of value of each such measurement. Practically it means, that a dominant player, or a higher level controller in a multilevel system, is provided with the basis for evaluating value versus cost of information acquired at specific moments of time. It is worth to note, that the assumption about the follower not acting at the last stage of the game is not essential for the determination of CLS solutions, because as pointed out in /Baiar and Selbuz, 1979/, or /Tolwinski, 1979/ an arbitrary game can be transformed to a problem with CT_ = o. 1 Moreover, condition (C) can be formulated in a weaker form, implying the satisfaction of (5) only at some stages of the game /Tolwinski, 1979/. Thus, it is clear that the results of this paper are applicable to a fairly general class of hierarchical games. Also further extensions are possible, notably to the case of games with many followers.

REFERENCES Baiar, T., and H. Selbuz /1979/. Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems. IEEE Trans. Automat. Control, AC-24, No 2. Papavassilopoulos, G.p., and J.B. Cruz, Jr. /1979/. Sufficient conditions for Stackelberg and Nash strategies with memory. J. 0ptimiz. Theor~ a. AlPl., to appear. Tolwinski, 8.1979/. Closed-loop Stackelberg solution to multistage linear-quadratic game. Tech. Report of stems Research Institute Po an ,ZTSW-64 79. A so to appear in J. Optimiz. Theory a. Appl.

313