On the regularization of a cooperative solution in a multistage game with random time horizon

On the regularization of a cooperative solution in a multistage game with random time horizon

Discrete Applied Mathematics ( ) – Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier.com/locat...

439KB Sizes 0 Downloads 24 Views

Discrete Applied Mathematics (

)



Contents lists available at ScienceDirect

Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam

On the regularization of a cooperative solution in a multistage game with random time horizon E.V. Gromova a,b, *, T.M. Plekhanova a a b

Faculty of Applied Mathematics and Control Processes, St. Petersburg State University, St.Petersburg, Russia Krasovskii Institute of Mathematics and Mechanics (IMM UB RAS), Yekaterinburg, Russia

article

info

Article history: Received 10 November 2017 Received in revised form 3 June 2018 Accepted 27 August 2018 Available online xxxx Keywords: Game theory Multistage games Dynamic games Decision making under uncertainty Random duration Cooperation

a b s t r a c t In this paper, we consider a general class of cooperative multistage games with random time horizon and discuss the problem of implementing a cooperative solution. It is known that in many cases a cooperative solution can be time-inconsistent and hence not realizable. To solve this problem, the imputation distribution procedure was proposed. However, the computed payment distribution scheme may result in negative payments which are not feasible. In this case, one has to carry out a regularization procedure as described in the paper. We describe a general regularization scheme and apply it both to the core and to the Shapley value. It is shown that for the mentioned two cases the regularization can be carried out in two alternative ways thus providing a basis for developing efficient numerical schemes. For the Shapley value the regularization procedure was elaborated and described in the form of an algorithm. The obtained results are illustrated with two numerical examples. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Dynamic (discrete-time) games, in particular in cooperative formulation, have multiple applications in ecological management, resource extraction and exploitation, and various economical applications, see, e.g., [1,5,11,20,21,36,41,43]. The central problem of any cooperative game consists in determining a cooperative solution. There are many different notions of cooperative solutions, most important of which are the core [2] and the Shapley value [38]. The former is a set-valued solution while the latter is uniquely defined. The works [8,13] are devoted to studying the core and the Shapley value for discrete-time games. When studying cooperative behavior in dynamic games the problem of implementing a cooperative agreement turns out to be of vital importance. The reason for this is that the players may break the cooperative agreement at an intermediate stage of the game, thus preventing the cooperative solution from being realized. This situation is referred to as dynamic instability (more recently called time-inconsistency). The problem of the sustainability of a cooperative solution has largely gained in popularity and was studied in a number of works, see, e.g., [4,6,7,15–17,24,34,35,43]. A standard approach to overcome the problem of time-inconsistency boils down to redistributing the stage payments using an imputation distribution procedure (IDP) [26]. The IDP is a rule to allocate the components of the imputation from cooperative solution as the game evolves in such a way that in any subgame a deviation from the cooperative agreement would not be optimal for a player. For differential (continuous-time) games the problem of time-inconsistency was addressed in [10,26,27,29,33], for the discrete-time case it was solved in [15,16,32] in general form.

*

Corresponding author at: Faculty of Applied Mathematics and Control Processes, St. Petersburg State University, St.Petersburg, Russia. E-mail address: [email protected] (E.V. Gromova).

https://doi.org/10.1016/j.dam.2018.08.008 0166-218X/© 2018 Elsevier B.V. All rights reserved.

Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

2

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



A particularly important class of dynamic games includes the games under uncertainty, [1,37,40]. The problem of time-inconsistency for dynamic games with different stochastic components was studied in a number of works, e.g., [4, 24,25,28,42,43]. We shall follow the approach first developed for differential cooperative games with random duration that was introduced in [30,31] and studied in detail in a series of works, see [9,12,19,39]. In this paper a general discrete multistage game with random termination time, [28], is considered. We present a general approach to computing the imputation distribution and show that the IDP computed according to this scheme can involve negative payments hence be unfeasible. To overcome this difficulty we propose a regularization method, i.e., a method of constructing a feasible and time-consistent solution on the base of any classical cooperative solution (see [31] for a continuous-time version). The developed regularization procedure is applied to two particular cooperative solutions: the core and the Shapley value. It is shown that these two solutions can be equivalently regularized in two different ways which allows for designing efficient numerical procedures. The developed regularization scheme is illustrated with a thoroughly designed worked numerical example. The paper is organized as follows. In Section 2, the general definition of dynamic games with random time horizon is given (non-cooperative in Section 2.1 and cooperative case in Section 2.2). In Section 3, the problem of sustainability of the cooperative solution for described class of the games is investigated. In particular, the problem of time-consistency is formulated in Section 3.1 and the general form of the regularization procedure is discussed in Section 3.2. In Section 4 we treat two particular cases, the core and the Shapley value. The regularization algorithm for the Shapley value is given in Section 4.3. Finally, Section 5 contains numerical examples illustrating the results presented in the preceding sections. 2. Problem statement 2.1. The model Consider a finite, connected acyclic graph (tree), G = (Z , M ), where Z is a finite set of graph vertices and M is the mapping M : Z → 2Z , M(z) = Mz ⊂ Z , for any z ∈ Z . Let furthermore there be a single vertex z0 such that ∄z ′ ∈ Z : z0 ∈ Mz ′ . The vertex z0 is said to be the root and G is hence the rooted tree. In a rooted tree every vertex can be characterized by its height, which is the length of the longest downward path to a leaf from that vertex and the depth, which is defined as the length of the path to its root (the root path). Finally, we denote by l the maximum path length of the graph G (the height of the tree). Assume that at each vertex z ∈ Z of the graph G an N-player simultaneous game is defined in the normal form:

⟨ ⟩ Γ (z) = N , U1z , . . . , Unz , hz1 , . . . , hzn , where N = {1, 2, . . . , n} is the set of players, the same for every z ∈ Z ; Uiz is the action set of the ith player at the vertex z, i ∈ N; hzi = hzi (uz1 , . . . , uzn ) is the payoff function of player i, uzi ∈ Uiz . We consider only the class of pure strategies. No randomizing over pure strategies or over actions is allowed. The set of strategies uz = (uz1 , . . . , uzn ) is referred to as the state in the game Γ (z). The payoff functions of players are assumed to be non-negative, hzi (uz1 , . . . , uzn ) ⩾ 0. The dynamics are described by the difference equation zk+1 = f (zk , uzk ),

(1)

where uzk is the state realized at the step k, k = 0, 1, . . . , l − 1. We assume that the structure of the game tree is such that any admissible choice of players’ strategies results in a sequence of vertices of the length equal to l + 1: (z0 , z1 , . . . , zl ), where the last vertex z l ∈ Z is such that Mz l = ∅. In the following, we shall consider a multistage game Γ p (z0 ) starting from z0 which differs from that one described above in that at each stage k the game can stop with probability pk . Thus the actual number of steps in the game, denoted by m, is an integer random variable with the probability distribution {pk }lk=0 such that the regularity conditions hold: 0 ≤ pk ≤ 1, ∑ l k=0 pk = 1. The expected value of the i-player’s payoff in the game Γ p (z0 ) is given by Ki (u , u , . . . , u ) = z0

z1

zl

( k l ∑ ∑ k=0

) z hi s (uzs )

pk .

(2)

s=0

Expanding the above sum and noting that for the probabilities pk it holds that as z

∑l

k=j pk

= 1−

∑k−1

m=0 pm ,

we can rewrite (2)

z

Ki (uz0 , . . . , uzl ) = hi 0 (uz0 ) + hi 1 (uz1 ) (1 − p0 ) + . . .

( +

z hi k (uzk )

1−

k−1 ∑ m=0

) pm

( + ··· +

z hi l (uzl )

1−

l−1 ∑

) pm

.

(3)

m=0

Assume that game Γ p (z0 ) evolves along the trajectory z to the vertex zk ∈ Z . At zk , k = 1, . . . , l, the players enter the sub-game Γ p (zk ) which evolves on the subgraph Gk of the graph G with the initial vertex zk . The expected value of the payoff Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



3

for i-player in the sub-game Γ p (zk ) is defined as follows: Ki (u , . . . , u ) = zk

( m l ∑ ∑

1

zl

∑k−1

1−

pm

m=0

m=k

) z hi s (uzs )

pm .

(4)

s=k

Thus from (3) and (4) we get z

z

Ki (uz0 , uz1 , . . . , uzl ) = hi 0 (uz0 ) + hi 1 (uz1 ) (1 − p0 ) + · · · +

( z

+ hi k−1 (uzk−1 ) 1 −

k−2 ∑

)

⎛ + ⎝1 −

pm

m=0

k−1 ∑

⎞ pj ⎠ Ki (uzk , . . . , uzl ).

(5)

j=0

Eq. (5) is particularly important as it allows for determining the optimal strategy recursively. Note that the optimal strategies considered in the paper are assumed to be pure stationary strategies. Expression (5) can be used when applying numerical methods for computing the optimal strategy for differential games both in the cooperative and non-cooperative settings (Nash equilibrium), [3,14]. Another possibility consists in using this result for computing the ϵ -Nash equilibria proposed in [18]. 2.2. Cooperative game The cooperative form of the game Γ p (z0 ) implies that prior to the start of the game the players arrange to use such strategies u¯ zi ∈ Uiz that the resulting trajectory z¯ = (¯z0 , . . . , z¯k , . . . , z¯l ) maximizes the expected value of the total payoff of all players n ∑

max

uz0 ,...,uzk ,...,uzl

Ki (uz0 , uz1 , . . . , uzl ) =

i=1

( k n l ∑ ∑ ∑ i=1 k=0

) z hi s (uzs )

¯

pk .

(6)

s=0

In the following, z¯ will be called the optimal trajectory. Note that the first vertex z¯0 always coincides with z0 . We assume that the maximum in (6) is achieved and that the optimal trajectory z¯ is unique. To construct the characteristic function V (S , z), S ⊆ N for every vertex z¯i ∈ Z along the optimal trajectory we define an p auxiliary zero-sum game ΓS ,N \S (z) between the coalition S ⊂ N, acting as the first (maximizing) player, and the coalition N \ S, acting as the second (minimizing) player while regarding z as the initial vertex. We start with z¯0 = z0 and introduce the function V (S , z0 ), S ⊂ N as the maximin solution (the lower value of the game), [22]: V (∅, z0 ) =0,

(7) l

V (S , z0 ) = max



m



∑ ∑ ∑ zj ⎝ hi (uzj )⎠ pm .

min

zl 0 z z0 uS k ,..., uSl uN \S ,..., uN \S i∈S m=k

(8)

j=k

It is known, [29], that characteristic function (8) satisfies the property of super-additivity, i.e. V (S1 ∪ S2 , z0 ) ≥ V (S1 , z0 ) + V (S2 , z0 ),

∀S1 , S2 ⊆ N , S1 ∩ S2 = ∅.

(9)

We note that some authors consider conditions (7) and (9) as the defining properties of a characteristic function (see, e.g., [23, Chap. X, Sec. 2]). We shall stick to this convention and require a characteristic function to satisfy (7) and (9). Let N be the coalition of all players. The value of the characteristic function V (N , z0 ) in the game Γ p (z0 ) is given by (6), which can be rewritten using (3) as V (N , z0 ) =

n ∑ i=1

z hi 0 (uz0 )

¯

+

n ∑

z hi 1 (uz1 )

i=1

¯

(1 − p0 ) + · · · +

n ∑ i=1

( z hi l (uzl )

¯

1−

l−1 ∑

) pm

.

(10)

m=0

p

Thus, (10) and (8) define the cooperative game of n-players ΓV (z0 ) in the form of the characteristic function V . p Let L(z0 ) be the set of all imputations in the game ΓV (z0 ), such that L(z0 ) = {ξ = {ξ1 , . . . , ξn } :

n ∑

ξi = V (N , z0 ), ξi ≥ V ({i}, z0 ), i = 1, . . . , n},

i=1

where V ({i}, z0 ) is defined as the value of the characteristic function V (S , z0 ) for the coalition S consisting of a single player i (that is, V ({i}, z0 ) is the expected payoff which the ith player would obtain when acting individually while others forming an anti-coalition). A subset C (z0 ) of the set of all imputations L(z0 ): C (z0 ) ⊆ L(z0 ) is referred to as the cooperative solution in p the game ΓV (z0 ). p Assume that the game ΓV (z0 ) evolves along the cooperative trajectory z¯ until it reaches the vertex z¯k ∈ Z . At z¯k , k = 1, . . . , l, players enter the sub-game Γ p (z¯k ) which in turn evolves on the subgraph Gk of the graph G with the initial vertex z¯k . Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

4

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



The expected value of the payoff∑ for i-player in the sub-game Γ p (z¯k ) is defined as in (4) and the expected total joint payoff n in the sub-game Γ p (z¯k ) is given by i=1 Ki (uzk , . . . , uzl ), which can be equivalently written as n ∑

Ki (uzk , . . . , uzl ) =

n ∑

i=1

z

hi k (uzk ) +

i=1 n ∑

1 z hi k+1 (uzk+1 )



1−

i=1

∑k

j=0

∑k−1 j=0

pj

+ ··· +

pj

n ∑

1−

∑l−1

pj

1−

∑k−1

pj

z hi l (uzk+l )

i=1

j=0

j=0

.

(11)

p

Thus ∑n the zvalue ofz the cooperative sub-game ΓV (z¯k ) (provided all players act optimally) is given by (11): V (N , z¯k ) = k l i=1 Ki (u , . . . , u ). Note that V (N , z0 ) satisfies Bellman’s Principle of Optimality for any k = 1, . . . , l: V (N , z0 ) =

n ∑

z

hi 0 (u¯ z0 ) +

i=1

n ∑

z

hi 1 (u¯ z1 ) (1 − p0 ) + · · · +

i=1

(

n

+



z hi k−1 (uzk−1 )

¯

1−

i=1

k−2 ∑



) pm

+ ⎝1 −

m=0

k−1 ∑

⎞ pj ⎠ V (N , z¯k ).

(12)

j=0

That is to say, if the game evolves along the optimal trajectory z¯ , the expression (11) will attain its maximum on the truncated p trajectory z¯k , z¯k+1 , . . . , z¯l , i.e. the truncated optimal trajectory z¯ of the game ΓV (z0 ) is the optimal trajectory in the sub-game p ΓV (z¯k ). The characteristic function V (S , z¯k ), S ⊆ N in the game Γ p (z¯k ) is introduced in the same way as for the game Γ p (z0 ). Let p V (∅, z¯k ) = 0. To compute V (S , z¯k ), S ⊂ N we consider an auxiliary game ΓS ,N \S (z¯k ) between the coalition S ⊂ N, acting as the first (maximizing) player and the coalition N \ S, acting as the second (minimizing) player. Hence the functional equation for computing the characteristic function V (S , z¯k ) is: V (S , z¯k ) =

1 1−

∑k−1 j=0

max min

pj

( m l ∑∑ ∑

z¯ ,...,¯zl z¯k ,...,¯zl uN \ S uSk i∈S m=k

z¯ ,...,¯z



) z hi r (uzr )

pm ,

(13)

r =1 z¯

z¯ ,...,¯z





l where the shorthand notations uNk\S l = uNk\S , . . . , uNl \S , uSk = uSk , . . . , uSl are used. Finally, V (N , z¯k ) is computed by (11). p The respective imputation set L(z¯k ) in the cooperative sub-game ΓV (z¯k ) is defined as follows:

L(z¯k ) = {ξ = {ξi } :

n ∑

ξi = V (N , z¯k ), ξi ≥ V ({i}, z¯k ), i = 1, . . . , n},

i=1

where z¯k is the initial vertex of the sub-game. A subset C (zk ) ⊂ L(zk ) is called the cooperative solution in the sub-game p ΓV (zk ). 3. Sustainability of the cooperation 3.1. Imputation distribution procedure p

It is well known that even though the Bellman Principle of Optimality (12) holds for the multistage game ΓV (z0 ), this does not guarantee the sustainability of the cooperation as the game evolves. Assume that prior to the start of the game the players agreed to use a certain cooperative solution C (z0 ). The subsequent evolution of the game corresponds to a movement along the optimal trajectory z¯ . The players assume that at the end of the game the amount V (N , z0 ) earned by all players will be divided between them such that the imputation ξ = {ξi } will belong to the same cooperative solution C (z0 ) upon which the players had previously agreed. However, while moving along the trajectory z¯ , a situation may occur in which one of the players will have an incentive to break up the cooperative agreement. Definition 3.1. Let ξ = {ξi } ∈ C (z0 ). If the components of the imputation can be represented as

ξi =

l m ∑ ∑

βik pm

(14)

m=0 k=0 k=0,... l

such that βik ≥ 0 for all i = 1, . . . , n, k = 0, . . . , l, then the vector-valued function β = {βik }i=1,...,n is said to be the imputation p distribution procedure (IDP) in the game ΓV (z0 ). Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



5

The IDP is an n × l matrix for the graph G. The formula (14) can be equivalently written as:

( ξi = β + β (1 − p0 ) + · · · + β 0 i

k i

1 i

1−

k−1 ∑

)

( + ··· + β

pm

l i

1−

m=0

l−1 ∑

) pm

,

(15)

m=0

where we use the shorthand notation βik instead of βi (z¯k ). The IDP defines a rule according to which the components of the expected imputation ξ = {ξi }i=1,...,n are distributed along the optimal trajectory z¯ . The expected value of the amount obtained by the ith player upon the first (s − 1) steps according to the payments βik is denoted by αi (s − 1):

αi (s − 1) =

s−1 ∑

βim (1 −

m=0

m−1 ∑

αi (0) = βi0 .

pk ),

(16)

k=0 p

In this way, the ith player earns the amount αi (s − 1) before entering the sub-game ΓV (z¯s ). The player also obtains some ∑s−1 p imputation ξ s after completing the sub-game ΓV (z¯s ) (with probability 1 − m=o pm ). A crucial question is whether the new p p vector ξ s = {ξis } in the game ΓV (z¯s ) belongs to the same cooperative solution as the vector ξ in the game ΓV (z0 ). If this is p not true the players in the sub-game ΓV (z¯s ) do not have an incentive to stay with the previously chosen cooperative solution C (z0 ). This threatens the sustainability of the cooperation and implies that the imputation ξ will not be realizable, i.e. the cooperative solution C (z0 ) will be time-inconsistent. Below, we give a formal definition of the notion of time-consistency. p We first assume that C¯ (z¯s ) ̸ = ∅ in every sub-game ΓV (z¯s ) (otherwise the players cannot follow this principle at every step of game). Note that the question about the non-emptiness of the cooperative solution in every sub-game along the optimal trajectory is not trivial, but we shall not dwell on this. Definition 3.2. The cooperative solution C¯ (z¯0 ) is said to be time-consistent if for any imputation ξ¯ ∈ C¯ (z¯0 ) there exists an { } p IDP βis ⩾ 0, s = 0, . . . , l, i = 1, . . . , n, such that the vector ξ¯ s = ξ¯is computed for the sub-game ΓV (z¯s ) by

ξ¯is =

( k l ∑ ∑

1 1−

∑s−1

k=0

pk

) βim pk ,

i = 1, . . . , n,

(17)

m=s

k=s

p

belongs to the same cooperative solution C¯ (z¯s ) in the sub-game ΓV (z¯s ). The definition of a time-consistent cooperative solution implies that for each imputation ξ¯ from C¯ (z¯0 ) one can define p step-by-step payments {βik } such that in any game ΓV (z¯s ) the truncation of the payment scheme (i.e. the payments {βim }, s m = s, . . . , l) will guarantee that the imputation ξ belongs to the same cooperative solution C¯ (z¯s ) as the imputation ξ¯ in p the game ΓV (z0 ). Let ξ¯ ∈ C¯ (z¯0 ), then

ξ¯i =

s−1 ∑

[( k ∑

k=0

m=0

s−1 ∑

[( k ∑

k=0

m=0

s−1 ∑

[( k ∑

k=0

m=0

β

) β

m i

m i

pk

] pk

) β

]

) m i

+

] pk

+

l ∑

[( k ∑

k=s

m=0

l ∑

[( s−1 ∑

k=s

m=0

( + 1−

s−1 ∑ k=0

]

) β

m i

pk

) β

pk

m i

=

( pk +

k ∑

) β

m i

] pk

(18)

=

m=s

) ( s−1 ∑ m=0

) β

m i

+

l ∑

[( k ∑

k=s

m=s

) β

m i

] pk .

Taking into account the definition of time-consistency (17) and the notation (16) we obtain

ξ¯i = αi (s − 1) + (1 −

s−1 ∑

pj )ξ¯is .

(19)

j=0 p

The second component of (19) is the expected value of i-player in the game ΓV (z¯s ) provided the game does not finish up to the step s. If the cooperative solution C¯ (z¯0 ) is time-consistent, the IDP {βik } ≥ 0 will exist for any imputation ξ¯ from C (z¯0 ). It is clear that in the general case any imputation can be represented in the form (19) if we sacrifice the non-negativity requirement imposed on {βik }. However, if the ith player were to obtain a negative payoff −βik > 0 at some step k, i.e. the player had to give away some amount βik , it would be difficult to expect that this player will stick to such a cooperative agreement. An analytical expression for the IDP is needed to check whether the cooperative solution is time-consistent. Theorem 3.1. Let for any imputation ξ¯ from C¯ (z¯0 ) and for the respective imputations ξ¯ k from the cooperative solutions C¯ (z¯k ) p corresponding to the sub-games ΓV (z¯k ) the following inequalities be satisfied:

βi0 = ξ¯ − (1 − p0 ) ξ¯ 1 ≥ 0,

(20)

Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

6

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

βik = ξ¯ k −

1−

∑k

1−

∑k−1

m=0

pm

m=0 pm

ξ¯ k+1 ≥ 0,

)



k = 1, . . . , l,

(21)

βil = ξ¯ l ≥ 0.

(22)

Then C¯ (z¯0 ) is a time-consistent cooperative solution and the corresponding IDP for any ξ¯ ∈ C¯ (z¯0 ) is expressed by (20)–(22). Proof. From Definition 3.1 of IDP we have the representation of the imputation ξ in the form (15). Consider (15) for s varying from 1 to l. We have

) ( ξ¯ =βi0 p0 + (1 − p0 ) βi0 + ξ¯ 1 = βi0 + (1 − p0 ) ξ¯ 1 , ... ( ) ( ) k−2 k−1 ∑ ∑ k − 1 1 0 pm βi + 1− pm ξ¯ k , ξ¯ =βi + (1 − p0 ) βi + · · · + 1 − m=0

(23)

m=0

...

( ξ¯ =βi0 + (1 − p0 ) βi1 + · · · + 1 −

l−1 ∑

) ξ¯ l .

pm

m=0

Hence, according to (23) the components of IDP can be computed recurrently:

βi0 = ξ¯ − (1 − p0 ) ξ¯ 1 , ... β = k i

1 1− ...

∑k−1

m=0

pm

ξ¯ −

1−

∑k

1−



m=0 k−1 m=0

pm pm

ξ¯ k+1 −

1 1−

∑k−1

m=0

pm

β − ··· − 0 i

1−

∑k−2

pm

1−

∑k−1

pm

m=0 m=0

βik−1 ,

(24)

If the payments βik are non-negative, the cooperative solution C (z¯0 ) is time-consistent. Using (24), an analytical expression for calculating the IDP is obtained in the form (20)–(22). □ Thus at the last step l each player gets the imputation computed during the game Γ (zl ) (which is an analogue of the terminal payoff). Example 3.1. Consider the probability distribution p0 = 0, . . . , pl−1 = 0, pl = 1 corresponds to the multistage game with determined number of steps l [15] and the expressions for βik turn into

βi0 = ξ¯i − ξ¯i1 , ... l−1 βi = ξ¯il−1 − ξ¯il ,

(25)

βil = ξ¯il . Obviously, in this case the non-negativity of βik , k = 1, . . . , l − 1, cannot be achieved. The above example shows that the non-negativity of IDP cannot be guaranteed in general. The following subsection proposes a solution to this issue. 3.2. Regularization procedure To check whether the cooperative solution is time-consistent one has to compute the step-by-step payoffs according to (24) and then check the non-negativity of the obtained values of βik . If the components of IDP are non-negative, then the chosen cooperative solution is time-consistent. Otherwise, a regularization should be performed according to the algorithm p described below. The ultimate goal of the regularization procedure is to ensure that at each step of the game ΓV (z0 ) the k ¯ player obtains payoffs βi that would not offer an incentive to leave the optimal trajectory z¯ . p

Extension of the game ΓV (z¯0 ). Consider a new function that will be used as a characteristic function of the regularized game. Let the function V¯ (S , z0 ) be defined as follows: V¯ (S , z¯0 ) =

l m ∑ ∑ m=0 k=0

V (S , z¯k )

∑n

z

i=1

hi k (u¯ zk )pm

V (N , z¯k )

.

(26)

We can check directly that V¯ (∅, z¯0 ) = 0 and V¯ (N , z¯0 ) = V (N , z¯0 ). Furthermore, the super-additivity of V (S , z¯0 ) (9) implies that V¯ (S1 ∪ S2 , z¯0 ) ≥ V¯ (S1 , z¯0 ) + V¯ (S2 , z¯0 ). Hence we can formulate the following result. Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



7

Proposition 3.1. The function V¯ (S , z¯0 ) defined by (26), satisfies the properties (7) and (9). Furthermore, it holds that V¯ (N , z¯0 ) = V (N , z¯0 ). This Proposition justifies our choice of V¯ (·, z¯0 ) as a characteristic function of the cooperative game Γ p (z¯0 ). Similarly, we can show that the function l m ∑ ∑

1

V¯ (S , z¯s ) =

1−

∑s−1 j=0

pj

V (S , z¯k )

∑n

z

i=1

hi k (u¯ k )pm

V (N , z¯k )

m=s k=s

is a characteristic function in the sub-game Γ p (z¯s ). p p p We denote by ΓV¯ (z¯0 ) the extension of the game ΓV (z¯0 ), i.e., the game constructed on the base of ΓV (z¯0 ) with the new p characteristic function (26). The vector ξ¯ is an imputation in the game ΓV¯ (z¯0 ) constructed from the respective vector ξ ∈ C (z¯0 ) using (27). Regularization of a cooperative solution. Given a cooperative solution C (z¯0 ), we shall construct a regularized cooperative solution C¯ (z¯0 ) such that C¯ (z¯0 ) is time-consistent. Assume that C (z¯k ) ̸ = ∅ for every sub-game G(z¯k ) and define the vector ξ¯i for ξik ∈ C (z¯k ), i = 1, . . . , n, k = 0, . . . , l according to the formula: l m ∑ ∑

ξ¯i =

ξ

k i

z

∑n

i=0

hi k (u¯ k )

V (N , z¯k )

m=0 k=0

pm .

(27) p

One can readily see that ξ¯i ’s sum to V (N , z¯0 ) in the game ΓV (z0 ). Indeed, n ∑

ξ¯i =

n l m ∑ ∑ ∑

i=1

z

hi k (u¯ k )pm = V (N , z¯0 ).

i=1 m=0 k=0

Proposition 3.2. The vector ξ¯ = (ξ¯1 , . . . , ξ¯n ) constructed according to (27) satisfies the property of individual rationality w.r.t. the characteristic function V¯ (·, z¯0 ), i.e., ξ¯i ≥ V¯ ({i}, z¯0 ), ∀i ∈ N. Proof. Since ξ k is an imputation in the game ΓV (z¯k ), it holds that ξiτ ≥ V ({i}, z¯τ ). Therefore, p

l m ∑ ∑

ξ¯i =

ξik

i=0

hi k (u¯ k )

V (N , z¯k )

m=0 k=0 l m ∑ ∑

z

∑n

z

∑n

i=1

V ({i}, z¯k )

pm ≥

hi k (u¯ zk )pm

V (N , z¯k )

m=0 k=0

= V¯ ({i}, z¯0 ). □

The regularized cooperative solution is thus defined as

{ ξ¯ : ξ¯i =

C¯ (z¯0 ) =

l m ∑ ∑

ξ

k i

∑n

i=1

V (N , z¯k )

m=0 k=0

}

z

hi k (u¯ k )

pm , i = 1, . . . , n, ∀ξ ∈ C (z¯k ) . k i

(28)

Finally, the new IDP β = {βik }, i = 1, . . . , n, k = 0, . . . , l, βik ≥ 0 is defined as

β¯ ik = ξik

z

∑n

i=1

hi k (u¯ k )

V (N , z¯k )

.

(29)

Note that at the last step l the payments to the players are equal: β¯ il = βil , ∀i = 1, . . . , n. p To check the validity of (29) we consider the sub-game ΓV (z¯s ) and define

ξ¯is =

l m ∑ ∑

1 1−

∑s−1 j=0

pj

ξik

∑n

m=s k=s

i=1

z

hi k (u¯ k )

V (N , z¯k )

.

(30)

∑n

ξ¯ = V (N , z¯s ). Furthermore we have: ∑n zk k ∑n zk k s−1 m s−1 s−1 ∑ ∑ ∑ ∑ h (u¯ ) h (u¯ ) ξ¯i = ξik i=1 i pm + (1 − pj ) ξik i=1 i + V (N , z¯k ) V (N , z¯k )

Obviously,

s i=1 i

m=0 k=0

+

l m ∑ ∑ m=s k=s

j=0

ξik

zk k i=1 hi (u )

∑n

¯

V (N , z¯k )

pm .

k=0

(31)

Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

8

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



Taking into account (16), (29) and (30), we obtain

ξ¯i = αi (s − 1) + (1 −

s−1 ∑

pj )ξ¯is .

j=0

Thus we proved the following theorem: p Theorem 3.2. The set C¯ (z¯0 ) is a time-consistent cooperative solution in the regularized game ΓV¯ (z¯0 ).

Note that the IDP defined by (29) satisfies the balance condition defined in [16]. This means that the above defined payments are feasible, i.e., the total payment to all players at every step k is equal to the amount earned by the players: n ∑

β¯ ik =

i=1

n ∑

ξ

z

∑n

hi k (u¯ k )

i=1

k i

V (N , z¯k )

i=1

=

n ∑

z

hi k (u¯ k ),

∀k = 0, . . . , l.

(32)

i=1

Thus by using the payments (29), time-consistency of the cooperative solution C¯ (z¯0 ) is guaranteed and no additional investment is required. p

4. Regularization of the core and the Shapley value in the game ΓV (z0 ) In this section we apply the described regularization procedure to two particular classes of cooperative solutions: the core and the Shapley value. It will be shown that while for a general cooperative solution the regularization has to be performed according to the scheme (28), the regularized core and the regularized Shapley vector can be simply obtained from the new c.f. V¯ (·, z¯0 ). We first present the regularization procedure for the core and then specify the obtained result for the Shapley value. This specialization is important on its own as the Shapley value does not need to belong to the core (it does if the characteristic function is convex). Moreover, the Shapley value is a uniquely defined imputation while the core is a set-valued cooperative solution. This allows us to consider the regularization procedure for the Shapley value in more detail. In Section 4.3 we describe a constructive algorithm which is applied for a specific discrete game in Section 5. 4.1. Core regularization Let the initially chosen cooperative solution C (z0 ) be the core (meaning that i∈S ξik ≥ V (S , z¯k ), ∀S), we define by C¯ (z0 ) the new cooperative solution (28) (the regularized core), constructed on the base of the imputations ξ from C (z0 ). Let, p furthermore, Cˆ (z0 ) be the core of the game ΓV¯ (z0 ), constructed for the characteristic function V¯ (S , z0 ). We have the following result.



Lemma 4.1. C¯ (z0 ) ⊆ Cˆ (z0 ). Proof. The imputation ξ¯ belongs to the core Cˆ (z0 ) if and only if the following condition is satisfied:



ξ¯i ≥ V¯ (xo , S),

∀S ⊂ N .

i∈S

For any imputation ξ¯i ∈ C¯ (z0 ) we have

∑ i∈S

ξ¯i =

l m ∑∑ ∑

ξ

k i

∑n

i∈S m=0 k=0

i=1

z

hi k (·)

V (N , z¯k )

pm .

Since C (z0 ) is the core, it follows that for any ξ k ∈ C (z¯k )



ξik ≥ V (S , z¯k ),

∀S ⊂ N .

i∈S

Then l m ∑∑ ∑ i∈S m=0 k=0

ξik

∑n

i=1

z

hi k (·)

V (N , z¯k )

pm ≥

l m ∑ ∑ m=0 k=0

V (S , z¯k )

∑n

i=1

z

hi k ( · )

V (N , z¯k )

pm .

Since the characteristic function V¯ (S , z0 ) is given by (26), we have



ξ¯i ≥ V¯ (S , z0 ),

∀S ⊂ N .

i∈S

Thus, C¯ (z0 ) ⊆ Cˆ (z0 ). □ Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



9

Moreover, we also have the converse result. Lemma 4.2. Cˆ (z0 ) ⊆ C¯ (z0 ). Proof. Let us show that for each imputation ξˆ ∈ Cˆ (z0 ) there exists a sequence of imputations ξ k ∈ C (z¯k ) defined for the subgames C (z¯k ), k = 0, . . . , l, such that

ξˆi =

l m ∑ ∑

ξik

z

∑n

i=1

hi k (·)

V (N , z¯k )

m=0 k=0

pm .

Since ξˆ is an imputation, from the property of individual rationality and the definition of V¯ (26) we get

ξˆi ≥ V¯ ({i}, z¯0 ) =

l m ∑ ∑

z

∑n

i=1

V ({i}, z¯k )

hi k (u¯ zk )pm

V (N , z¯k )

m=0 k=0

.

Moreover, by summing the respective components we get V¯ (N ; z¯0 ) =

n ∑

ξˆi .

(33)

i=1 z

The non-negativeness of the utility functions hi k (u¯ zk ) implies that there exist αik ≥ 0, i = 1, . . . , n, k = 0, . . . , m such that

ξˆi =

l m ∑ ∑

(αik + V ({i}, z¯k ))

∑n

z

i=1

m=0 k=0

hi k (u¯ zk )pm

V (N , z¯k )

.

It is easy to see that

ξik = αik + V ({i}, z¯k ),

i = 1, . . . , n

is an imputation in the game with the characteristic function V (S ; z¯k ), because from (33) we get n ∑

ξik = V (N ; z¯k );

ξik ≥ V ({i}; z¯k ).

i=1

But it also holds that the imputation ξ with the components ξik = αik + V ({i}, z¯k ) belongs to the core C (z¯k ). For {ξˆi } ∈ Cˆ (z0 ) by definition of the core we get



ξˆi =

i∈S

l m ∑ ∑ ∑

(α + V ({i}, z¯k )) k i

i=1

l m ∑ ∑

V (S , z¯k )

m=0 k=0

hi k (u¯ zk )pm

V (N , z¯k )

m=0 k=0 i∈S

≥ V¯ (S , z¯0 ) =

z

∑n

∑n

z

i=1

hi k (u¯ zk )pm

V (N , z¯k )

.

and hence we have



(αik + V ({i}, z¯k )) ≥ V (S , z¯k ).

i∈S

We thus find {ξik } = {αik + V ({i}, z¯k )} which belongs to the core C (z¯k ). The lemma is proved. □ The preceding two Lemmas lead to the following theorem. Theorem 4.1. C¯ (z0 ) ≡ Cˆ (z0 ). This way, the core Cˆ (z0 ) constructed using the characteristic function V¯ coincides with the set of imputations C¯ (z0 ) constructed by formula (27) on the base of the imputations ξ k from the initial core C (z¯0 ) and the subsequent cooperative solutions (subcores) C (z¯k ) defined along the cooperative trajectory z¯ = (z¯0 , . . . , z¯l ). For the subgames we can show using the same arguments that C¯ (zk ) ≡ Cˆ (zk ), ∀k = 0, . . . , l. Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

10

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



4.2. The Shapley value regularization Let the players choose the Shapley value, [38], as the cooperative solution. In the case when the components of the Shapley value cannot be represented in the form (19) with non-negative payments at every step k, another cooperative solution (the regularized Shapley value) has to be constructed, which would ensure the sustainability of the chosen cooperative agreement. p

Theorem 4.2. The Shapley value, calculated for V¯ (S , z¯0 ), S ⊆ N(26), is time-consistent in the game ΓV¯ (z¯0 ). p

Proof. The components of the Shapley value in the game ΓV¯ (z¯0 ) with the characteristic function V¯ (S , z¯0 ) are defined as:

¯ i (z¯0 ) = Sh

∑ (n − s)!(s − 1)! S ⊂N i∈S

n!

[V¯ (S , z¯0 ) − V¯ (S \{i}, z¯0 )].

(34)

¯ i (z¯0 ) is given Taking into account (26) and (34), the expression for the components of the regularized Shapley value (RSV) Sh by ¯ i (z0 ) = Sh

l m ∑ ∑ (n − s)!(s − 1)! ∑

n!

m=0 k=0

=

l m ∑ ∑

S ⊂N

∑n Shi (z¯k )

m=0 k=0

[V (S , z¯k ) − V (S \{i}, z¯k )]

i=1

z

∑n

i=1

hi k (u¯ k )

V (N , z¯k )

pm

z

hi k (u¯ k )

V (N , z¯k )

pm .

¯ i (z¯0 ), calculated for the characteristic The defined RSV is represented in the form (27). This implies that the Shapley value Sh function V¯ (S , z¯0 ), is time-consistent. □ According to the proved theorem, RSV can be constructed by using the calculations of new IDP (29), also it can be computed for the new characteristic function (c.f.) V¯ (S , z¯0 ) (26). p

4.3. An algorithm for the regularization of the Shapley value in the game ΓV (z0 ) Below, we present an algorithm to regularize the Shapley value for the described discrete game. Note that this algorithm can be used for the regularization of any cooperative solution with obvious changes done when computing the imputations. p Stage 0. Computation of the Shapley value {Shi } in the game ΓV (z¯0 ). z

(1) Find the optimal trajectory z¯ along with the corresponding optimal strategies u¯ = ({¯ui 0 }, . . . , {¯uli }) which maximize the total expected payoff (6). We use the shorthand notation uki to denote the chosen strategy at the step k (at the vertex zk ). (2) Compute the value V (N , z¯0 ) by (10). (3) To compute the V (S , z¯0 ), a number of auxiliary games based on Γ p (z0 ) is considered, where S is the maximizing player and N \S is the minimizing player. The values V (S , z¯0 ) for all S ⊂ N are calculated by (8). (4) Components of the Shapley value are calculated by (34). p

Stage 1. Computation of the components of the Shapley value {Shki }i = 1, . . . , n in the sub-games ΓV (z¯k ), k = 1, . . . , l. p

(1) Consider the sub-game ΓV (z¯1 ) with initial state z¯1 . According to Bellman’s Principle of Optimality (12), the truncated optimal trajectory z¯1 , . . . , z¯l (or, to be more precise, the respective strategies) will maximize the expected payoff in p the sub-game ΓV (z¯1 ): V (N , z¯1 ) =

1 1 − p0

( m n l ∑ ∑ ∑ i=1 m=1

) z hi k (uzk )

¯

pm .

(35)

k=1

To compute V (S , z¯1 ), S ⊂ N we use V (S , z¯1 ) =

1 1 − p0

max min

( m l ∑∑ ∑

z¯ ,...,zl z¯1 ,...,zl uN \S uS1 i∈S m=1

) z hi k (uzk )

pm .

(36)

k=1 p

Next, the Shapley value Sh1 (z1 ) in the sub-game ΓV (z¯1 ) is computed according to [38]. (k) At the step k, V (N , z¯k ), V (S , z¯k ), S ⊂ N are obtained using (11), (13). Compute the Shapley value. Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



11

Fig. 1. A tree graph of the game.

(l) At the last step l, the values V (N , z¯l ), V (S , z¯l ), S ⊂ N are obtained as the values of the game Γ (z¯l ) and the corresponding zero-sum game. Compute the Shapley value. Stage 2. Time-consistency check. Compute the components of the IDP by (21). If all βik are non-negative, then the constructed Shapley value Shi (z¯0 ) is dynamically stable. Otherwise proceed to the next step. Stage 3. Construction of the RVS (1) Compute new payments to the players at every step β¯ ik using (29), where ξik corresponds to component i of the Shapley p value Shki in the sub-game ΓV (z¯k ).

¯ i by (14) or (15). Thus (2) Using the new IDP, compute RSV Sh

{ ¯ = {Sh ¯ i} = Sh

(

β¯ + β¯ (1 − p0 ) + · · · + β¯ 0 i

1 i

l i

1−

l−1 ∑

)} ,

pm

m=0

(37)

i=1,...,n

is a dynamically stable cooperative solution. When performing the regularization of the core one has to apply the same approach for any imputation from the core. 5. Examples 5.1. A time-consistent solution in the 2-players multistage cooperative game Consider a cooperative game Γ¯ p (z0 ) of two players defined on the graph G as shown in Fig. 1. The set of vertices Z of the graph G is M(z0 ) = {z1,1 , z1,2 , z1,3 , z1,4 };

z0 = z0 ; M(z1,1 ) = {z2,1 , z2,2 , z2,3 , z2,4 };

M(z1,2 ) = {z2,5 , z2,6 , z2,7 , z2,8 };

M(z1,3 ) = {z2,9 , z2,10 , z2,11 , z2,12 };

M(z1,4 ) = {z2,13 , z2,14 , z2,15 , z2,16 },

where M is the set of successor vertices. The game is formulated as follows: at the initial step players enter the game Γ (z0 ) determined by the bi-matrix

( H0 =

(5; 5) (8; 0)

(0, 8) . (1, 1)

)

The game Γ¯ p (z0 ) either ends with probability p0 = 0.25 or goes over to the next vertex z1 ∈ Mz0 = {z1,1 , z1,2 , z1,3 , z1,4 }. This transition depends on the controls chosen by the players at the vertex z0 . If the situation (1, 1) is realized, the game moves from z0 = z0 to z1 = z1,1 . Respectively, the situations (1, 2), (2, 1), and (2, 2) result in the transition to the vertices z1,2 , z1,3 , and z1,4 . Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

12

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



Table 1 Values of the total expected payoff. z0



i=1,2 Ki (z0 )

(z0 , z1,1 , z2,1 )

(z0 , z1,1 , z2,2 )

(z0 , z1,1 , z2,3 )

(z0 , z1,1 , z2,4 )

15 (z0 , z1,2 , z2,5 ) 19.75 (z0 , z1,3 , z2,9 ) 12.25 (z0 , z1,4 , z2,13 ) 12.25

20.5 (z0 , z1,2 , z2,6 ) 15.5 (z0 , z1,3 , z2,10 ) 12.5 (z0 , z1,4 , z2,14 ) 10.25

19 (z0 , z1,2 , z2,7 ) 11.75 (z0 , z1,3 , z2,11 ) 10.25 (z0 , z1,4 , z2,15 ) 8

16 (z0 , z1,2 , z2,8 ) 12.5 (z0 , z1,3 , z2,12 ) 13.25 (z0 , z1,4 , z2,16 ) 14

At z1 , the players play the game Γ (z1 ) which is defined by one of the following bi-matrices depending on the outcome of the game at the previous stage:

[

(3; 0) (5; 6)

(6, 4) (1; 11) ; H2 = (2, 2) (1; 3)

[

(1; 1) (2; 0)

(0, 2) (5; 5) ; H4 = (1, 2) (1; 6)

H1 =

H3 =

]

[

]

[

(4, 2) ; (1, 1)

]

(6, 1) . (6, 6)

]

At this step the game can either end with probability p1 = 0.5 or can go over to the next vertex z2 . The transition to the vertex z2 is governed by the same rules as for the vertex z1 and the games Γ (z2 ) are defined by the same bi-matrices as above. The probability that the game ends at the second step is equal to p2 = 0.25. First, we solve the simultaneous games Γ1 , . . . , Γ4 . The sub-games played at the last step have the following values of the characteristic function: V ({1, 2}, z2,1 ) = V ({1, 2}, z2,5 ) = V ({1, 2}, z2,9 ) = V ({1, 2}, z2,13 ) = 5 + 6 = 11; V ({1}, z2,1 ) = V ({1}, z2,5 ) = V ({1}, z2,9 ) = V ({1}, z2,13 ) = 3; V ({2}, z2,1 ) = V ({2}, z2,5 ) = V ({2}, z2,9 ) = V ({2}, z2,13 ) = 2; V ({1, 2}, z2,2 ) = V ({1, 2}, z2,6 ) = V ({1, 2}, z2,10 ) = V ({1, 2}, z2,14 ) = 1 + 11 = 12; V ({1}, z2,2 ) = V ({1}, z2,6 ) = V ({1}, z2,10 ) = V ({1}, z2,14 ) = 1; V ({2}, z2,2 ) = V ({2}, z2,6 ) = V ({2}, z2,10 ) = V ({2}, z2,14 ) = 3;

(38)

V ({1, 2}, z2,3 ) = V ({1, 2}, z2,7 ) = V ({1, 2}, z2,11 ) = V ({1, 2}, z2,15 ) = 1 + 2 = 3; V ({1}, z2,3 ) = V ({1}, z2,7 ) = V ({1}, z2,11 ) = V ({1}, z2,15 ) = 1; V ({2}, z2,3 ) = V ({2}, z2,7 ) = V ({2}, z2,11 ) = V ({2}, z2,15 ) = 2; V ({1, 2}, z2,4 ) = V ({1, 2}, z2,8 ) = V ({1, 2}, z2,12 ) = V ({1, 2}, z2,16 ) = 6 + 6 = 12; V ({1}, z2,4 ) = V ({1}, z2,8 ) = V ({1}, z2,12 ) = V ({1}, z2,16 ) = 5; V ({2}, z2,4 ) = V ({2}, z2,8 ) = V ({2}, z2,12 ) = V ({2}, z2,16 ) = 5; .

Assume that the players {1, 2} have agreed to divide the total amount using the Shapley value. For a 2-player game the components of the Shapley value are calculated by Sh1 (z) = Sh2 (z) =

1 2 1

V (1, z) + V (2, z) +

V ({1, 2} , z ) − V (2, z) 2 V ({1, 2} , z ) − V (1, z)

, (39)

.

2 2 We shall now apply the algorithm described in Section 4.3. Stage 0. For the game Γ p (z0 ), the maximum value of total expected payoff (10) on one of the 16 possible paths is computed. For the probability distribution p0 = 0.25, p1 = 0.5, p2 = 0.25 we have



(

(o)

(0)

Ki (z0 ) = h1 + h2

)

( ) ( ) (1) (2) + 0.75 h(1) + 0.25 h(2) , 1 + h2 1 + h2

(40)

i=1,2 (k)

(k)

(2)

(2)

where h1 , h2 are the payoffs of the respective players in the game at the step k. At the last step, h1 + h2 = V ({1, 2}, z2 ), i.e. it is computed by (38), where z2 — one of the 16 ∑ possible vertices z2,1 , . . . , z2,16 . The obtained values of the total expected payoff Ki are shown in Table 1 for every possible trajectory. Thus, the optimal trajectory is z¯ = (z¯0 , z¯1 , z¯2 ) = (z0 , z1,1 , z2,2 ) along which the players obtain the maximal expected value of the total payoff, i.e. V ({1, 2}, z¯0 ) = 20.5. To find the values V ({1}, z0 ), V ({2}, z0 ) we consider auxiliary zero-sum games in which the payoff matrices are derived from the bi-matrices H1 , H2 , H3 , H4 according to the following rule: for zero-sum game when the first player {1} is maximizing, Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



13

the first values from bi-matrices are used; for zero-sum game when the second player {2} is maximizing, the second values from bi-matrices are used. Taking one branch of the game, we calculate the values of the expected payoff for the first player: (z0 , z1,1 , z2,1 )

(z0 , z1,1 , z2,2 )

(z0 , z1,1 , z2,3 )

(z0 , z1,1 , z2,4 )

8

9.75

9

7.75

Obviously, these values are also available to the second player. In the following the game will be considered ‘‘bottom-up’’, i.e. from the last step upwards. At the first step the first and the second players play the zero-sum game defined by the following bi-matrix:

(

8 9

9.75 . 7.75

)

Then the maximizing player can secure the payoff equal to 8 by selecting the strategy 1 (row 1) at the vertex z1,1 . For the remaining branches we can perform similar calculations to obtain the expected payoff values of the first player: 1, 5 at the vertex z1,2 , 9.75 at the vertex z1,3 , and 5.5 at the vertex z1,4 . Now we can go one step upwards and consider the game Γ (z0 ) with the new matrix

(

8 9.75

1.5 . 5.5

)

By choosing (2, 2), the player {1} secures the payoff equal to 5.5. Thus in the zero-sum game starting at z0 = z0 the first (maximizing) player chooses the trajectory (z0 , z1,4 , z2,13 ) for which the payoff can be guaranteed to be 5.5. Thus, V ({1}, z0 ) = 5.5. In a similar way, we get V ({2}, z0 ) = 5.25. Now we can calculate the Shapley value in the game Γ p (z0 ) by (39): Sh1 = 10.375, Sh2 = 10.125. This completes Stage 0. Stage 1. Compute the Shapley value for all sub-games evolving along the optimal trajectory z¯ = (z¯0 , z¯1 , z¯2 ) = (z0 , z1,1 , z2,2 ). Note that for the game Γ (z¯2 ) starting at the last step we have all required information to calculate the Shapley value. Using the values V ({1, 2}, z2,2 ), V ({1}, z2,2 ), V ({2}, z2,2 ), obtained in (38), we get Sh21 = 5, Sh22 = 7. The sub-game Γ (z¯1 ) includes four possible variants of its evolution from the position z¯1 = z1,1 . According to Bellman’s Principle of Optimality, the value of V ({1, 2}, z¯1 ) can be computed as a sum of expected payoffs along the part of the optimal trajectory (z1,1 , z2,2 ) by



(

(1)

(1)

Ki (z¯1 ) = h1 + h2

i=1,2

In (41), the coefficient step. Therefore

p2 1−p0

1 3

=

)

+

1( 3

(2)

(2)

h1 + h2

)

.

(41)

is the probability that the game ends at the second step ( provided it)has not finished at the initial

1 3

(1)

(1)

corresponds to p2 . The players will obtain the amount h1 + h2

with the probability 1 − p0 1−p

under the assumption that the game will evolve to the sub-game Γ (z¯1 ), i.e. the respective probability is equal to 1−p0 = 1. 0 Thus, V ({1, 2}, z¯1 ) = 14. To calculate V ({1}, z¯1 ), V ({2}, z¯1 ) we consider zero-sum games starting at z¯1 = z1,1 and with the payoff function: (1)

Ki (z¯1 ) = hi

1

+ h(2) i .

(42)

3

Following the procedure similar to that applied above, we get V ({1}, z¯1 ) = 4 and V ({2}, z¯1 ) = 3 23 . ¯ z¯1 ) using the obtained results: Sh1 = 7 1 , Sh1 = 6 5 . Stage 1 is Finally, we compute the Shapley value for the sub-game G( 1 2 6 6 completed. Stage 2. Checking for time-consistency. Compute the IDP βik by (21) using the previously computed components of the Shapley value in games Γ¯ p (z0 ), Γ¯ (z¯1 ), Γ¯ (z¯2 ). We have:

β10 = 5; β20 = 5;

β11 = 5.5; β21 = 4.5;

β12 = 5; β22 = 7.

(43)

Thus at the every step all payoffs are non-negative and realizable. Therefore the constructed Shapley value Sh1 = ¯ z¯0 ). We can check that if i-player gets the value β k according 10.375 Sh2 = 10.125 is time-consistent in the game G( i to (43) at the every step the players will get the value Sh1 = β10 + (1 − p0 )β11 + (1 − p0 − p1 )β12 = 10.375, Sh2 = β20 + (1 − p0 )β21 + (1 − p0 − p1 )β22 = 10.125. 5.2. Regularization of the Shapley value in the 2-players multistage game In the previous example the Shapley value turned out to be time-consistent. Now we consider an example in which the Shapley value is not time-consistent and hence should be regularized according to Stage 3 of the algorithm presented in Section 4.3. Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

14

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



Fig. 2. Game tree G2 .

To do so we add an additional step to the game. The Shapley value will turn out to be time inconsistent, which agrees with the intuition: as the number of the steps (duration of the game) increases so are the odds that time inconsistency occurs. Consider the multistage cooperative game Γ¯ p (z0 ) with the game graph G2 shown in Fig. 2. Note that the height of G2 is equal to 3. The probabilities pk , k = 0, 1, 2, 3, are p0 = 0.25, p1 = 0, p2 = 0.25, and p3 = 0.5. At the vertex z0 the players participate in the game Γ (z0 ) determined by the bi-matrix

( H0 =

(0; 0) (1; 0)

(0, 1) . (0, 0)

)

The matrices Hi , i = 1, . . . , 4 are the same as in Section 5.1. This problem is solved in a way similar to that described in Section 5.1. The maximum total expected payoff Γ¯ p (z0 ) is computed( using the exhaustive ) search over all versions of the game. It is found that this maximum is achieved on the trajectory z¯ = z0 , z1,4 , z2,16 , z3,64 and is equal to V ({1, 2}, z0 ) = 24. The values of the respective auxiliary zero-sum games are V ({1}, z0 ) = 4.75 and V ({2}, z0 ) = 6.25. The Shapley value for the game Γ¯ (z¯0 ) is hence Sh1 = 11.25, Sh2 = 12.75. In the next step we consider sub-games starting from the vertices along the optimal trajectory. First, consider the subgames starting at z¯1 = z1,4 . We have V ({1, 2}, z¯1 ) = 32, V ({1}, z¯1 ) = 9, V ({2}, z¯1 ) = 10, and the respective Shapley values are Sh11 = 15.5, Sh12 = 16.5. Going one step ahead we get the following results for the sub-game starting at z¯2 = z2,16 : V ({1, 2}, z¯2 ) = 20, V ({1}, z¯2 ) = 6 31 , V ({2}, z¯2 ) = 6 32 , and Sh21 = 10 16 , Sh22 = 9 65 . Finally, we consider the last sub-game starting at z¯3 = z3,64 to get V ({1, 2}, z¯3 ) = 12, V ({1}, z¯2 ) = 5, V ({2}, z¯2 ) = 5, and Sh31 = 6, Sh32 = 6. The values of the IDP βik , i = 1, 2, k = 0, 1, 2, 3 are computed by (21) to yield 1

1

β10 = −0.375;

β11 = 5 ;

β12 = 6 ;

β13 = 6;

β20 = 0.375;

β21 = 6 ;

β22 = 5 ;

β23 = 6.

3 2 3

6 5 6

(44)

The constructed Shapley value {Shi } is time inconsistent because of β10 < 0. Regularize the obtained Shapley value according to stage 3 of the algorithm. Introducing new payoffs β¯ ik by (29) we obtain

β¯ 10 = 0;

β¯ 11 = 5

β¯ 20 = 0;

β¯ 21 = 6

13 16 13 16

;

β¯ 12 = 6.1;

β¯ 13 = 6;

;

β¯ 22 = 5.9;

β23 = 6.

(45) (k)

(k)

Observe that the sum of new payoffs at every step k is equal to the earned sum h1 + h2 . Compute the regularized Shapley ¯ 1 = 11 299 , Sh ¯ 2 = 12 21 . One can see that the constructed RSV is an value based on the IDP (45) by (15). The final result is Sh 320 320 ¯ ¯ imputation, i.e. Sh1 + Sh2 = V ({1, 2}, z0 ) = 24. Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



15

6. Conclusion In this paper, a regularization procedure was described and a number of results on regularizing two particular cooperative solution were presented. The main strength of the proposed approach is that it always provides a feasible IDP under sufficiently mild conditions. We plan to develop it further, down to the efficient numerical algorithms applicable for a wide class of discrete games. Apart from that, the further development of the proposed method can go in a number of directions. First, one could consider the case where the payments are allowed to be negative. In contrast to the proposed method it might turn out that it will not always be possible to regularize a given IDP. Second, it would be very interesting to study how the choice of the characteristic function influences the respective cooperative solution. Acknowledgments The authors are grateful to the anonymous reviewer for the valuable comments. This research has been supported by the grant 17-11-01093 from Russian Scientific Foundation, Russia. References [1] R. Amir, Stochastic games in economics and related fields: An overview, in: A Neyman, S. Sorin (Eds.), Stochastic Games and Applications, NATO Advanced Study Institute Series D, Kluwer, Dordrecht, 2003, pp. 455–470. [2] R. Aumann, The core of a cooperative game without side payments, Trans. Amer. Math. Soc. 98 (1961) 539–552. [3] Y. Averboukh, Universal nash equilibrium strategies for differential games, J. Dyn. Control Syst. 21 (3) (2015) 329–350. [4] K. Avrachenkov, L. Cottatellucci, L. Maggi, Cooperative Markov decision processes: Time-consistency, greedy players satisfaction, and cooperation maintenance, Internat. J. Game Theory 42 (1) (2013) 239–262. [5] M. Bailey, U.R. Sumaila, M. Lindroos, Application of game theory to fisheries over three decades, Fish Res 102 (1) (2010). [6] T. BenDor, J. Scheffran, B. Hannon, Ecological and economic sustainability in fishery management: A multiple agent model for understanding competition and cooperation, Ecol Econom 68 (2009) 1061–1073. [7] I. Curiel, Multi-stage sequencing situations, Internat. J. Game Theory 39 (2010) 151–162. [8] M. Germain, H. Tulkens, A. Magnus, Dynamic core-theoretic cooperation in a two-dimensional international environmental model, Math. Soc. Sci. 59 (2010) 208–226. [9] D. Gromov, E. Gromova, Differential games with random duration: A hybrid systems formulation, in: Contributions to Game Theory and Management, vol. 7, 2014, pp. 104–119. [10] E. Gromova, The shapley value as a sustainable cooperative solution in differential games of 3 players, in: Recent Advances in Game Theory and Applications, 2016, pp. 67–89. [11] R. Joosten, Strong and weak rarity value: Resource games with complex price-scarcity relationships, Dyn. Games Appl. 6 (1) (2015). [12] S. Kostyunin, A. Palestini, E.V. Shevkoplyas, On a nonrenewable resource extraction game played by asymmetric firms, J. Optim. Theory Appl. 163 (2) (2014) 660–673. [13] L. Kranich, A. Perea, H. Peters, Core concepts for dynamic TU games, Int. Game Theory Rev. 7 (2005) 43–61. [14] N.N. Krasovskii, A.N. Kotelnikova, Unification of differential games, generalized solutions of the Hamilton–Jacobi equations, and a stochastic guide, Differ. Equ. 45 (2009) 1653–1668. [15] D.V. Kuzyutin, On the problem of the stability of solutions in extensive games, Vestnik Sankt-Peterburgskogo Universiteta. Ser 1 4 (22) (1995) 18–23 (in Russian). [16] D. Kuzyutin, M. Nikitina, Time consistent cooperative solutions for multistage games with vector payoffs, Oper. Res. Lett. 45 (3) (2017) 269–274. [17] F.E. Kydland, E.C. Prescott, Rules rather than discretion: The inconsistency of optimal plans, J. Political Econ. 85 (3) (1977) 473–491. [18] O. Malafeev, Stationary strategies in differential games, USSR Comput. Math. Math. Phys. 17 (1) (1977) 37–46. [19] J. Marin-Solano, E.V. Shevkoplyas, Non-constant discounting and differential games with random time horizon, Automatica 47 (12) (2011) 2626–2638. [20] V.V. Mazalov, A.N. Rettieva, Fish wars and cooperation maintenance, Ecol Model 221 (12) (2010) 1545–1553. [21] M. Nagarajan, G. Sošić, Game-theoretic analysis of cooperation among supply chain agents: Review and extensions, European J. Oper. Res. 187 (2008) 719–745. [22] J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, 1953. [23] G. Owen, Game Theory, third ed., Academic Press, San Diego, 1995. [24] E. Parilina, Stable cooperation in stochastic games, Autom. Remote Control (2015) 1111–1122. [25] E.M. Parilina, G. Zaccour, Node-consistent Shapley value for games played over event trees with random terminal time, J. Optim. Theory Appl. 175 (1) (2017) 236–254. [26] L. Petrosjan, Stability of solutions in differential many-player games, Vestn. Leningr. Univ. 4 (1977) 46–52 (in Russian). [27] L.A. Petrosjan, The Shapley value for differential games, New Trends Dyn. Games Appl. 3 (1995) 409–417. [28] L. Petrosjan, E. Baranova, E. Shevkoplyas, Cooperative multistage game with random duration, Proc. Steklov Inst. Math. (Suppl.) (suppl. 2) (2004) S126–S141. [29] L. Petrosjan, N. Danilov, Cooperative Differential Games and Applications, Tomsk University Publishing House, Tomsk, 1985. [30] L. Petrosjan, E. Shevkoplyas, Cooperative differential game with random duration, Vestn. Leningr. Univ. 4 (2000) 18–23 (in Russian). [31] L.A. Petrosjan, E.V. Shevkoplyas, Cooperative solutions for games with random duration, Game Theory Appl. IX (2003) 125–139. [32] L. Petrosyan, D. Kuzyutin, Consistent Solutions of Positional Games, Saint Petersburg University Press, 2008 (in Russian). [33] L.A. Petrosyan, G. Zaccour, Cooperative differential games with transferable payoffs, in: Handbook of Dynamic Game Theory, 2016, pp. 1–38. [34] G. Pieri, L. Pusillo, Multicriteria partial cooperative games, Appl. Math. 6 (2015) 2125–2131. [35] P.V. Reddy, E.V. Shevkoplyas, G. Zaccour, Time-consistent Shapley value for games played over event trees, Automatica 49 (6) (2013) 1521–1527. [36] A. Sedakov, The strong time-consistent core, MGT & A 7 (1) (2015) 69–84. [37] L.S. Shapley, Stochastic games, Proc. Natl. Acad. Sci. 39 (10) (1953) 1095–1100. [38] L.S. Shapley, On balanced sets and cores, Nav. Res. Logist. Q. 14 (4) (1967) 453–460. [39] E.V. Shevkoplyas, Stable cooperation in differential games with random duration, Mat. Teor. Igr Pril. 2 (3) (2010) 79–105. [40] A. Toriello, N.A. Uhan, Dynamic linear production games under uncertainty, 2013. www.optimization-online.org/DB_HTML/2013/10/4064.html.

Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.

16

E.V. Gromova, T.M. Plekhanova / Discrete Applied Mathematics (

)



[41] H.M. Wagner, T.M. Whitin, Dynamic version of the economic lot size model, Manage. Sci. 5 (1) (1958) 89–96. [42] D.W. Yeung, L.A. Petrosyan, Cooperative stochastic differential games, Springer, 2006. [43] G. Zaccour, Sustainability of cooperation in dynamic games played over event trees, in: Recent Progress and Modern Challenges in Applied Mathematics, Modeling and Computational Science, Springer, New York, NY, 2017, pp. 419–437.

Please cite this article in press as: E.V. Gromova, T.M. Plekhanova, On the regularization of a cooperative solution in a multistage game with random time horizon, Discrete Applied Mathematics (2018), https://doi.org/10.1016/j.dam.2018.08.008.