JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.1 (1-11)
J. Math. Anal. Appl. ••• (••••) •••–•••
Contents lists available at ScienceDirect
Journal of Mathematical Analysis and Applications www.elsevier.com/locate/jmaa
Robust Markov perfect equilibria Anna Jaśkiewicz a , Andrzej S. Nowak b,∗ a
Institute of Mathematics and Computer Science, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland b Faculty of Mathematics, Computer Science and Econometrics, University of Zielona Góra, Podgórna 50, 65-246 Zielona Góra, Poland
a r t i c l e
i n f o
Article history: Received 10 December 2013 Available online xxxx Submitted by J.A. Filar Keywords: Markov decision model Quasi-hyperbolic discounting Non-randomised Markov perfect equilibrium Robust equilibrium
a b s t r a c t In this paper we study a Markov decision model with quasi-hyperbolic discounting and transition probability function depending on an unknown parameter. Assuming that the set of parameters is finite, the sets of states and actions are Borel and the transition probabilities satisfy some additivity conditions and are atomless, we prove the existence of a non-randomised robust Markov perfect equilibrium. © 2014 Elsevier Inc. All rights reserved.
1. Introduction and the model The quasi-hyperbolic discounting concept in a dynamic choice model reveals time-consistency in making decisions by a person whose utility changes over time, see the seminal papers by Strotz [26] and Phelps and Pollak [25]. As suggested in [25] finding a time-consistent rational (optimal in some sense) policy can be done by considering a non-cooperative intergenerational game played by short lived generations. When each generation has a countably many descendants one can think of a game between countably many selves representing the same person whose utility changes over discrete time, see [4,15] and Subsection 4.3 in [20] for general accounts and discussion. The Markov perfect equilibrium concept supported by a discussion in [21] seems to be appropriate solution for the study of dynamic games under the quasi-hyperbolic discounting. More precisely, a Markov perfect equilibrium is a fixed point of some operator in a suitably chosen strategy function space. This fact emphasises the difference between the approaches used in game theory and in dynamic programming. The first paper on quasi-hyperbolic discounting is the work of Alj and Haurie [1], where the state and action spaces are finite. Further literature on this type of problems (within deterministic and stochastic framework) the reader may find in [2,4,15,20] and the references cited therein. * Corresponding author. E-mail addresses:
[email protected] (A. Jaśkiewicz),
[email protected] (A.S. Nowak). http://dx.doi.org/10.1016/j.jmaa.2014.05.061 0022-247X/© 2014 Elsevier Inc. All rights reserved.
JID:YJMAA 2
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.2 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
All the aforementioned papers except [20] deal with very specific transition law and one-dimensional state and action spaces. The proofs are based on certain lattice programming techniques exploiting monotonicity and concavity assumptions. The models with Borel state and action spaces are treated only in [20], but the main focus is put on the risk-sensitive generations that employ the exponential utility aggregator function. Assumptions made on the transition probability function in [20] are relatively weak when compared to previous papers and, moreover, they allow to apply the Dvoretzky–Wald–Wolfowitz theorem [10] on elimination of randomisation to obtain non-randomised equilibria in the atomless case. Such equilibria are extremely desirable in economic applications, see for instance, [16,17]. All the papers discussed above are concerned with the transition probability functions that are perfectly known to all players. In [3] this assumption is weakened by saying that the transitions are dependent on some parameter controlled by Nature. Then, the concept of a Markov perfect equilibrium is slightly modified by the idea of maxmin optimisation. The case is also considered in this paper. An equilibrium we are interested in is now called robust and is shown to exist for a large class of Markov decision models with Borel state and action spaces. Firstly, we prove the equilibrium in the class of randomised strategies (Theorem 1) and then in the class of non-randomised strategies in the atomless case. The latter result (Theorem 2) is more important when one thinks of applications in economics. Our proof makes use of the Dvoretzky–Wald–Wolfowitz theorem and is new compared with the approach taken in [3]. Specifically, the authors in [3] deal with special consumption/investment models, the assumptions are stronger and the state and action spaces are subsets in the real line. The only exception is that the set of parameters in [3] may be infinite. In this paper, this set is finite but Nature can apply infinitely many probability distributions for selecting a parameter. For a further discussion of our results the reader is referred to Remarks 1 and 2. Let R be the set of all real numbers and R+ = [0, ∞), T = N = {1, 2, ...}. By a Borel space Y we mean a non-empty Borel subset of a complete separable metric space endowed with the Borel σ-algebra B(Y ). Let P (Y ) denote the space of all probability measures on Y endowed with the weak topology and the Borel σ-algebra, see Chapter 7 in [7]. Let S and A be Borel spaces. Assume that A(s) is a non-empty compact subset of A for each s ∈ S and the set C := (s, a) : s ∈ S, a ∈ A(s) ⊂ S × A is Borel. Let F be the set of all Borel measurable selectors of the set-valued mapping s → A(s). By Corollary 1 in [8], F = ∅. By Φ we denote the set of all transition probabilities φ from S to A such that φ(A(s)|s) = 1 for each s ∈ S. Clearly, F can be seen as a subset of Φ, so Φ = ∅. We consider a Markov decision model with quasi-hyperbolic preferences and unknown transition probability function in which the decision maker is viewed as a sequence of autonomous temporal selves. The selves are indexed by the respective periods t ∈ T in which they choose their actions. Let S and A, considered above, be the state space and the action space, respectively. Then A(s) is the set of all actions available in state s ∈ S. Further, assume that Θ is a finite set of some parameters. At the beginning of t-th period, self t observes the state st ∈ S and chooses (possibly at random) at ∈ A(st ). Then, q(·|st , at , θ) is the probability distribution of the next state. Note that q is a transition probability from S ×A ×Θ to S. Self t’s satisfaction is reflected by an instantaneous utility function u : C → R+ that remains unchanged over all periods. It is assumed that θ is chosen according to a certain probability measure γt ∈ P, where P denotes the action set of Nature and it is assumed to be a closed subset of P (Θ). Thus, P can be viewed as a compact subset of an Euclidean space. We impose the following assumptions on the model. (A1) There exist probability measures μθ1 , . . . , μθl on S and Borel measurable functions g1 , . . . , gl : C → [0, 1] such that
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.3 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
q(B|s, a, θ) =
l
gk (s, a)μθk (B),
3
B ∈ B(S), θ ∈ Θ.
k=1
Moreover, for every k = 1, ..., l the function gk (s, ·) is continuous on A(s), s ∈ S, and l
gk (s, a) = 1
for all (s, a) ∈ C.
k=1
l 1 θ Put μ := l|Θ| θ∈Θ k=1 μk , where |Θ| is the cardinality of the set Θ and observe that q(·|s, a, θ) μ for each (s, a) ∈ C, θ ∈ Θ. (A2) The function u is Borel measurable, u(s, ·) is continuous on A(s) for each s ∈ S, and max u(s, a)μθk (ds)
c := max max
θ∈Θ 1≤k≤l
a∈A(s) S
for some positive constant c. A strategy for self t is a function φt ∈ Φ. If φt = φ for all t ∈ T and some φ ∈ Φ, then we say that the selves employ a stationary strategy. Let G be the set of all Borel measurable mappings γt : C → P. By Γ we denote the set of all sequences (γt )t∈T where γt ∈ G for every t ∈ T . For any t ∈ T and γ = (γt )t∈T ∈ Γ , we put γ t := (γτ )τ ≥t . Obviously γ t ∈ Γ . A Markov strategy for Nature is a sequence γ = (γt )t∈T ∈ Γ . Note that γ t can be called a Markov strategy used by Nature from period t onwards. We note that the consideration of history dependent strategies for Nature contributes nothing and does not change our results. This observation follows from standard results on Markov decision processes, see [7,13] or Remark 3 in [3]. For any t ∈ T , define H t as the set of all sequences ht = (at , θt , st+1 , at+1 , θt+1 , ...),
where (sk , ak ) ∈ C, θk ∈ Θ and k ≥ t.
Here, H t is the set of feasible future histories of the process from period t onwards. Endow H t with the product σ-algebra. Assume that the selves employ a stationary strategy φ ∈ Φ and Nature chooses some γ ∈ Γ . By the Ionescu-Tulcea theorem (see Proposition V.1.1 in [22] or Chapter 7 in [7]) there exists a t unique probability measure Psφ,γ on H t induced by a stationary strategy φ ∈ Φ used by each self τ (τ ≥ t), t t t a Markov strategy of Nature γ ∈ Γ and the transition probability q. By Esφ,γ we denote the expectation t t φ,γ operator corresponding to the measure Pst . Let us define q(B|s, a, ξ) = q(B|s.a, θ)ξ(θ) and u(s, φ) = u(s, a)φ(da|s), θ∈Θ
A(s)
where B ∈ B(S), ξ ∈ P and φ ∈ Φ. If self t knew γ, his expected utility would be
∞ t t φ,γ τ −t−1 ˆ φ, γ (st ) := E u(st , at ) + αβ β u(sτ , aτ ) , W st τ =t+1
where β ∈ (0, 1) is a long-run discount factor and α (α > 0) is a short-run discount factor, see [15]. Assuming that in each period k (k ≥ t) Nature chooses a probability γk from the set P with the objective of minimising self t’s utility and that the choice of Nature may depend on the current state and action performed by self k,
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.4 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
4
we may accept an idea coming from robust dynamic programming (see [19] and its references) and say that the preferences of self t are represented by the following utility function ˆ φ, γ t (st ). W (φ)(st ) := inf W t γ ∈Γ
Furthermore, for any φ ∈ Φ, γ ∈ Γ , j ≥ 2 and sj ∈ S define ∞
j j φ,γ τ −j β u(sτ , aτ ) Jˆ φ, γ (sj ) = Esj
(1)
τ =j
and note that ˆ φ, γ t (st ) W
= u(st , φ) + αβ
Jˆ φ, γ t+1 (st+1 )q(dst+1 |st , at , θ)γt (θ|st , at )φ(dat |st ).
A(st ) θ∈Θ S
For φ ∈ Φ, j ≥ 2 and sj ∈ S set J(φ)(sj ) = inf Jˆ φ, γ j (sj ). γ j ∈Γ
The function sj → J(φ)(sj ) is Borel measurable on S for any φ ∈ Φ (see Chapter 4.2 in [18], Chapter 9 in [7] or [19]). Moreover, making use of dynamic programming arguments [7,13], one can show that W (φ)(st ) = u(st , φ) + A(st )
inf αβ
ξ∈P
J(φ)(st+1 )q(dst+1 |st , at , ξ) φ(dat |st ).
(2)
S
For any s ∈ S, ν ∈ P (A(s)) and φ ∈ Φ, let us consider u(s, a) + inf αβ J(φ) s q ds |s, a, ξ ν(da). U (ν, φ)(s) = ξ∈P
A(s)
S
If s = st , then U (ν, φ)(s) is the utility for self t choosing a ∈ A(st ) according to a probability measure ν ∈ P (A(st )), when all future selves employ a stationary strategy φ ∈ Φ. A Robust Markov Perfect Equilibrium (RMPE) is a strategy φ∗ ∈ Φ such that for every s ∈ S we have sup
U ν, φ∗ (s) = U φ∗ (s), φ∗ (s) = W φ∗ (s),
(3)
ν∈P (A(s))
where φ∗ (s) := φ∗ (·|s) ∈ P (A(s)). Note that (3) says that if the followers of any self t are going to use φ∗ , then the best choice for self t in state s = st ∈ S is to apply φ∗ (s). Remark 1. The concept of RMPE was introduced in [3] where a consumption/investment choice model is studied. In this note we essentially make much weaker assumptions on the primitive data except that the set of parameters Θ is finite. Using the Dvoretzky–Wald–Wolfowitz theorem [10] we prove the existence of a non-randomised equilibrium under pretty weak conditions on the state and action spaces. The additivity condition (A1) is also made in [3]. However, allowing for an infinite set Θ in [3], we are able to deal with a special case, i.e., when S is an interval in the real line and A(s) ⊂ S for each s ∈ S. Strategies considered in [3] are a subclass of continuous from the left functions on S.
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.5 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
5
Remark 2. The additivity assumptions on q akin to (A1) were already formulated for proving the existence of Nash equilibria in stochastic games having applications to economics, see, e.g., [9,23]. Similar conditions were also considered in intergenerational stochastic games with infinitely many descendants [2,4,20]. In all the aforementioned papers, the transition probability functions are known by players or generations. The main objective in [2,4,20] was to examine non-randomised equilibria in the context of some economic growth models. The only paper that makes use of the Dvoretzky–Wald–Wolfowitz theorem is [20]. However, this work is mainly devoted to the study of games with the exponential utility function. The results contained in [24] are based on a weaker assumption on q, but they are concerned with randomised equilibria and, moreover, q is perfectly known by the players. A related consumption/investment model that deals with quasi-hyperbolic discounting concept is studied by Harris and Laibson [15]. The transition function is again perfectly known by the decision maker and is expressed by a linear difference equation with some additive i.i.d. shocks (the state space is S = R+ ). The proof of the existence Markov perfect equilibrium is based on a fixed point argument in a class of locally bounded variation functions. 2. Main results In this section we assume that conditions (A1)–(A2) are satisfied. Our first main result is as follows. Theorem 1. There exists an RMPE φ∗ ∈ Φ. A simple modification of Example 3.3 in [24] (with |Θ| = 1) shows that non-randomised RMPE may not exist even if S and A are finite sets. Our second main result concerns non-randomised equilibria, which are of great importance with respect to their applications in economics. Theorem 2. Let the measures μθk (k = 1, ..., l, θ ∈ Θ) be atomless. Then, there exists a non-randomised RMPE f ∗ ∈ F . l 1 θ μ Recall that μ := l|Θ| θ∈Θ k=1 μk . Let Φ denote the quotient space of all equivalence classes of functions φ ∈ Φ which are equal μ-a.e. Since for each s ∈ S the set A(s) is compact, Φμ is compact and metrisable, when endowed with the weak-star topology, see [5] or Chapter IV in [27]. Here, we only mention that a sequence (φm ) converges to φ in Φμ if and only if for every w : C → R such that w(s, ·) is continuous on A(s) for each s ∈ S, w(·, a) is measurable for each a ∈ A(s), and s → maxa∈A(s) |w(s, a)| is μ-integrable over S (i.e., w is a Carathéodory function), we have that
w(s, a)φm (da|s)μ(ds) → S A(s)
w(s, a)φ(da|s)μ(ds)
as m → ∞.
S A(s)
Clearly, the function s → maxa∈A(s) |w(s, a)| is also μθk -integrable over S for every k = 1, . . . , l and θ ∈ Θ. Therefore, the function (s, a) → w(s, a)ρθk (s) is also Carathéodory, where ρθk is a density of μθk with respect to μ. Thus,
w(s, a)φm (da|s)μθk (ds) = S A(s)
→
w(s, a)φm (da|s)ρθk (s)μ(ds) S A(s)
w(s, a)φ(da|s)ρθk (s)μ(ds) S A(s)
w(s, a)φ(da|s)μθk (ds)
= S A(s)
as m → ∞.
(4)
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.6 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
6
For any Borel measurable function v : S → R+ integrable with respect to every measure μθk , k = 1, . . . , l and θ ∈ Θ, and any φ ∈ Φ define the operator Tφ as follows
v(y)q(dy|s, a, ξ) φ(da|s),
inf β
Tφ v(s) = u(s, φ) +
ξ∈P
s ∈ S.
S
A(s)
Clearly, Tφ v is Borel measurable and by dynamic programming arguments (Chapter 3.4 in [18]), it follows that Tφn v0 (sj ) = inf Esφ,γ j j
j
j+n−1
γ ∈Γ
β τ −j u(sτ , aτ ) ,
τ =j
where v0 ≡ 0 and Tφn denotes n-th composition of the operator Tφ with itself. Since the action space for Nature is compact, under our continuity assumptions, one can easily use standard dynamic programming methods (see, e.g., Proposition 9.17 in [7]) to prove that sj ∈ S
lim Tφn v0 (sj ) = J(φ)(sj ),
n→∞
(5)
and in consequence, sj ∈ S.
Tφ J(φ)(sj ) = J(φ)(sj ),
(6)
Lemma 1. Assume that (φm ) converges to φ in Φμ . Then,
Tφnm v0 (s) − Tφn v0 (s) μθk (ds) → 0
S
for every k = 1, . . . , l and θ ∈ Θ. Proof. By assumption (A2), the function u is Carathéodory. Hence, the result easily follows from (4) with w := u. Assume now that the results hold for some positive integer n. Then, by the definition of the operators Tφ and Tφm we obtain that
v0 (s) − Tφn+1 v0 (s) μθk (ds) Tφn+1 m
S
=
u(s, φm ) − u(s, φ) μθk (ds) +
S
Zm (s) + Ym (s) μθk (ds),
(7)
S
where
inf β
Zm (s) =
ξ∈P A(s)
Tφnm v0 (y)q(dy|s, a, ξ)
− inf β
Tφn v0 (y)q(dy|s, a, ξ)
ξ∈P
S
S
and
Tφn v0 (y)q(dy|s, a, ξ)
inf β
Ym (s) =
ξ∈P A(s)
S
φm (da|s) − φ(da|s) .
φm (da|s)
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.7 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
7
Clearly, the first term in (7) goes to 0. Moreover, we note that by (A1) Tφn v0 (y)q(dy|s, a, ξ) =
l
θ∈Θ k=1
S
Tφn v0 (y)μθk (dy)ξ(θ),
gk (s, a) S
which implies that the function (s, a) → inf ξ∈P S Tφn v0 (y)q(dy|s, a, ξ) is Carathéodory. Thus, (4) with
n
w(s, a) := inf ξ∈P S Tφ v0 (y)q(dy|s, a, ξ) yields that S Ym (s)μθk (ds) → 0 as m → ∞. Next observe that again by (A1) l n Tφ v0 (y) − Tφn v0 (y) μθk (dy). Zm (s) ≤ β max m θ∈Θ k=1 S
Making use of the convergence of φm → φ in Φμ and the induction hypothesis, we infer that Zm → 0
uniformly on S as m → ∞. Hence, S Zm (s)μθk (ds) → 0. 2 Lemma 2. Assume that (φm ) converges to φ in Φμ . Then,
J(φm )(s) − J(φ)(s) μθk (ds) → 0
S
for every k = 1, . . . , l and θ ∈ Θ. Proof. In view of Lemma 1, it suffices to show that the convergence in (5) is uniform with respect to φ. Indeed, by (A2) for sj ∈ S we have that
∞ j n φ,γ n+τ supJ(φ)(sj ) − Tφ v0 (sj ) ≤ sup sup Esj β u(sτ +j+n , aτ +j+n ) ≤ β n φ∈Φ γ j ∈Γ
φ∈Φ
as n → ∞.
τ =0
c →0 1−β
2
Lemma 3. Assume that (φm ) and (ψm ) converge to φ and ψ, respectively, in Φμ . Then, (a) for every s ∈ S sup U ψ(s), φm (s) → sup U ψ(s), φ (s), ψ∈Φ
ψ∈Φ
(b) for every k = 1, . . . , l and θ ∈ Θ
U ψm (s), φm (s) − U ψ(s), φ (s) μθk (ds) → 0.
S
Proof. (a) Observe that l θ sup U ψ(s), φm (s) − sup U ψ(s), φ (s) ≤ αβ max J(φm )(y) − J(φ)(y) μk (dy). ψ∈Φ
ψ∈Φ
Hence, the result follows from Lemma 2.
θ∈Θ
k=1 S
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.8 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
8
For part (b) we have that ∗ U ψm (s), φm (s) − U ψ(s), φ (s) = Zm (s) + Ym∗ (s), where ∗ Zm (s) = U ψm (s), φm (s) − U ψm (s), φ (s), ∗ By part (a), Zm → 0 uniformly on S, hence
Ym∗ (s) = u(s, ψm ) − u(s, ψ) +
S
Ym∗ (s) = U ψm (s), φ (s) − U ψ(s), φ (s).
∗ Zm (s)μθk (ds) → 0. Furthermore,
inf αβ
ξ∈P
J(φ)(y)q(dy|s, a, ξ) ψm (da|s) − ψ(da|s) .
S
A(s)
Now the result follows from (4) by taking φ := ψ, φm := ψm and w := u for convergence of the first term
and w(s, a) := inf ξ∈P S J(φ)(y)q(dy|s, a, ξ) for convergence of the second term. 2 := {(s, ν) : s ∈ S, ν ∈ P (A(s))} is Borel Proof of Theorem 1. Let φ ∈ Φμ . Under our assumptions the set C and P (A(s)) is compact. Define the set F (φ)(s) := ν ∗ ∈ P A(s) : U ν ∗ , φ (s) = sup U (ν, φ)(s) ν∈P (A(s))
for each s ∈ S. From [24] the set-valued mapping s → F (φ)(s) admits a Borel measurable selector. Let G(φ) ⊂ Φμ denote the set of all μ-equivalence classes of Borel measurable selectors of s → F (φ)(s). We shall prove that φ → G(φ) has a closed graph. Since Φμ is metrisable we can work with countable sequences. Let φm → φ ∈ Φμ and let ψm ∈ G(φm ) for each m. Since Φμ is compact, (ψm ) has a convergent subsequence. Without loss of generality assume that ψm → ψ ∈ Φμ as m → ∞. We have to show that ψ ∈ G(φ). Indeed, observe firstly that U ψm (s), φm (s) = max U ϕ(s), φm (s) = ϕ∈Φ
max
ν∈P (A(s))
U (ν, φm )(s),
μ-a.e.
Secondly, making use of Lemma 3 we conclude that U ψ(s), φ (s) = max U ϕ(s), φ (s), ϕ∈Φ
μ-a.e.
In other words, ψ ∈ G(φ). By the fixed point theorem of Glicksberg [14], there exists some ψ ∗ ∈ Φμ such that ψ ∗ ∈ G(ψ ∗ ). Hence, there is a measurable set S1 ⊂ S, μ(S1 ) = 1 and U ψ ∗ (s), ψ ∗ (s) =
max
ν∈P (A(s))
U ν, ψ ∗ (s),
s ∈ S1 .
(8)
By Corollary 1 in [8] there exists a Borel measurable selector ϕ∗ : S \ S1 → A such that ϕ∗ (s) ∈ A(s) and U ϕ∗ (s), ψ ∗ (s) =
max
ν∈P (A(s))
U ν, ψ ∗ (s)
(9)
for each s ∈ S \ S1 . Define φ∗ (s) = ψ ∗ (s) for s ∈ S1 and φ∗ (s) = ϕ∗ (s) for s ∈ S \ S1 . Since q(·|s, a) μ for all (s, a) ∈ C, we have U φ∗ (s), ψ ∗ = U φ∗ (s), φ∗ (s) for all s ∈ S. This fact, (8) and (9) imply that φ∗ is an RMPE.
2
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.9 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
9
To prove Theorem 2 we need a classical result on elimination of randomisation due to Dvoretzky–Wald– Wolfowitz [10] that was used in the statistical theory [6]. The version for our application is given in Theorem 1 in [12] and Theorem 2.1 in [11]. Lemma 4. Let ν1 , ..., νr be atomless probability measures on S and let w : C → R be a Borel measurable function. Assume that there is a Borel measurable function w : S → R+ such that |w(s, a)| < w(s) and
o w(s)ν k (ds) < +∞ for all k = 1, ..., r. Suppose that A (s) is a non-empty compact subset of A(s) for each S s ∈ S and the set {(s, a) : s ∈ S, a ∈ Ao (s)} is Borel. Then, for each φ ∈ Φ such that φ(Ao (s)|s) = 1 for all s ∈ S, there exists some f ∈ F such that f (s) ∈ Ao (s) for each s ∈ S and
w(s, a)φ(da|s)νk (ds) = S A(s)
w s, f (s) νk (ds)
S
for all k = 1, ..., r. Proof of Theorem 2. By Theorem 1, there exists a randomised RMPE φ∗ ∈ Φ. Using (2) it follows that U φ∗ (s), φ∗ (s) = W φ∗ (s) =
max
ν∈P (A(s))
U ν, φ∗ (s)
for all s ∈ S. Put
∗
A (s) = arg max
a∈A(s)
u(s, a) + inf αβ
∗
J φ (y)q(dy|s, a, ξ) .
ξ∈P
S
Under our continuity and compactness assumptions A∗ (s) is non-empty and compact. Moreover, φ∗ (A∗ (s)|s) = 1 for each s ∈ S. We have that W φ∗ (s) = max ∗
a∈A (s)
J φ∗ (y)q(dy|s, a, ξ) ,
u(s, a) + inf αβ ξ∈P
s ∈ S.
S
By Lemma 4 with Ao (s) = A∗ (s), there exists a Borel measurable mapping f ∗ ∈ F such that f ∗ (s) ∈ A∗ (s) for each s ∈ S, and
Tφ∗ J φ∗ (y)μθk (dy) =
S
Tf ∗ J φ∗ (y)μθk (dy)
S
for all k = 1, . . . , l and θ ∈ Θ. By assumption (A2) the last display gives that inf
ξ∈P
Tφ∗ J φ∗ (y)q(dy|s, a, ξ) = inf
ξ∈P
S
Tf ∗ J φ∗ (y)q(dy|s, a, ξ)
S
for all (s, a) ∈ C. Hence, u s, f ∗ (s) + inf β
ξ∈P
Tφ∗ J φ∗ (y)q dy|s, f ∗ (s), ξ
S
= u s, f ∗ (s) + inf β
ξ∈P
S
Tf ∗ J φ∗ (y)q dy|s, f ∗ (s), ξ
(10)
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.10 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
10
and consequently, Tf ∗ Tφ∗ J φ∗ = Tf ∗ Tf ∗ J φ∗ . This fact and (6) imply that Tf ∗ J φ∗ = Tf ∗ Tφ∗ J φ∗ = Tf2∗ J φ∗ = . . . = Tfn∗ J φ∗
(11)
for all n ≥ 2. It is easy to check that
∗ j Tfn∗ J φ∗ (sj ) = inf Esfj ,γ
n+j−1
γ j ∈Γ
β τ −j u(sτ , aτ ) + β n J φ∗ (sn+j ) ,
sj ∈ S.
τ =j
Under assumption (A2), we get the following ∗ j 0 ≤ β n Esfj ,γ J φ∗ (sn+j ) ≤ β n
c , 1−β
for every γ j ∈ Γ, sj ∈ S.
Letting n → ∞ in (11) and making use of (6), we infer that Tf ∗ J φ∗ (sj ) = J f ∗ (sj ) = Tf ∗ J f ∗ (sj ),
for every sj ∈ S.
Hence, it follows that inf β J f ∗ (y)q dy|s, f ∗ (s), ξ = inf β J φ∗ (y)q dy|s, f ∗ (s), ξ , ξ∈P
ξ∈P
S
(12)
s ∈ S.
S
Multiplying both sides of this equality by α, adding u(s, f ∗ (s)) and using (2) we get that W f ∗ (s) = u s, f ∗ (s) + inf αβ J φ∗ (y)q dy|s, f ∗ (s), ξ , s ∈ S. ξ∈P
(13)
S
Since f ∗ (s) ∈ A∗ (s) for each s ∈ S, it follows from (13) that W f ∗ (s) = max u(s, a) + inf αβ J φ∗ (y)q(dy|s, a, ξ) . ξ∈P
a∈A(s)
S
We claim that f ∗ is an RMPE. Indeed, on the contrary, suppose it is not true. Then, there are s ∈ S and as ∈ A(s) such that W f ∗ (s) < u(s, as ) + inf αβ J f ∗ (y)q(dy|s, as , ξ). ξ∈P
S
Thus, using (12) and (10) we get that W f ∗ (s) < u(s, as ) + inf αβ
ξ∈P
Tf ∗ J φ∗ (y)q(dy|s, as , ξ)
S
= u(s, as ) + inf αβ ξ∈P
Tφ∗ J φ∗ (y)q(dy|s, as , ξ)
S
= u(s, as ) + inf αβ ξ∈P
S
This contradiction completes the proof. 2
J φ∗ (y)q(dy|s, as , ξ) ≤ W f ∗ (s).
JID:YJMAA
AID:18563 /FLA
Doctopic: Optimization and Control
[m3L; v 1.134; Prn:29/05/2014; 15:37] P.11 (1-11)
A. Jaśkiewicz, A.S. Nowak / J. Math. Anal. Appl. ••• (••••) •••–•••
11
Acknowledgment The authors acknowledge financial support of the National Science Centre: Grant DEC-2011/03/B/ ST1/00325. References [1] A. Alj, A. Haurie, Dynamic equilibria in multigenerational stochastic games, IEEE Trans. Automat. Control 28 (1983) 193–203. [2] Ł. Balbus, A.S. Nowak, Existence of perfect equilibria in a class of multigenerational stochastic games of capital accumulation, Automatica 44 (2008) 1471–1479. [3] Ł. Balbus, A. Jaśkiewicz, A.S. Nowak, Robust Markov perfect equilibria in a dynamic choice model with quasi-hyperbolic discounting, in: Dynamic Games in Economics, in: Dynamic Modelling and Econometrics in Economics and Finance, Springer, 2014, pp. 1–22. [4] Ł. Balbus, K. Reffett, Ł. Woźny, Time consistent Markov policies in dynamic economies with quasi-hyperbolic consumers, Internat. J. Game Theory (2014), http://dx.doi.org/10.1007/s00182-014-0420-3, in press. [5] E.J. Balder, An extension of the usual model in statistical decision theory with applications to stochastic optimization problems, J. Multivariate Anal. 10 (1980) 385–397. [6] E.J. Balder, Elimination of randomization in statistical decision theory reconsidered, J. Multivariate Anal. 16 (1985) 260–264. [7] D.P. Bertsekas, S.E. Shreve, Stochastic Optimal Control: The Discrete-Time Case, Academic Press, New York, 1978. [8] L.D. Brown, R. Purves, Measurable selections of extrema, Ann. Statist. 1 (1973) 902–912. [9] L.O. Curtat, Markov equilibria of stochastic games with complementarities, Games Econom. Behav. 17 (1996) 177–199. [10] A. Dvoretzky, A. Wald, J. Wolfowitz, Elimination of randomization in certain problems of statistics and the theory of games, Ann. Math. Stat. 22 (1951) 1–21. [11] E.A. Feinberg, A.B. Piunovskiy, Nonatomic total rewards Markov decision processes with multiple criteria, J. Math. Anal. Appl. 273 (2002) 93–111. [12] E.A. Feinberg, A.B. Piunovskiy, On the Dvoretzky–Wald–Wolfowitz theorem on nonrandomized statistical decisions, Theory Probab. Appl. 50 (2006) 463–466. [13] E.A. Feinberg, A. Shwartz (Eds.), Handbook of Markov Decision Processes: Theory and Methods, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002. [14] I.L. Glicksberg, A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points, Proc. Amer. Math. Soc. 3 (1952) 170–174. [15] C. Harris, D. Laibson, Dynamic choices of hyperbolic consumers, Econometrica 69 (2001) 935–957. [16] A. Haurie, A multigenerational game model to analyze sustainable development, Ann. Oper. Res. 137 (2005) 369–386. [17] A. Haurie, A stochastic multigeneration game for global climate change impact assessment, Ann. Internat. Soc. Dynam. Games 8 (2006) 309–332. [18] O. Hernández-Lerma, J.B. Lasserre, Discrete-Time Markov Control Process: Basic Optimality Criteria, Springer-Verlag, New York, 1993. [19] A. Jaśkiewicz, A.S. Nowak, Stochastic games with unbounded payoffs: applications to robust control in economics, Dyn. Games Appl. 1 (2011) 253–279. [20] A. Jaśkiewicz, A.S. Nowak, Stationary Markov perfect equilibria in risk sensitive stochastic overlapping generations models, J. Econom. Theory 151 (2014) 411–447. [21] E. Maskin, J. Tirole, Markov perfect equilibrium: I. Observable actions, J. Econom. Theory 100 (2001) 191–219. [22] J. Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, San Francisco, 1965. [23] A.S. Nowak, On a new class of nonzero-sum discounted stochastic games having stationary Nash equilibrium points, Internat. J. Game Theory 32 (2003) 121–132. [24] A.S. Nowak, On a noncooperative stochastic game played by internally cooperating generations, J. Optim. Theory Appl. 144 (2010) 88–106. [25] E. Phelps, R. Pollak, On second best national savings and game equilibrium growth, Rev. Econom. Stud. 35 (1968) 195–199. [26] R.H. Strotz, Myopia and inconsistency in dynamic utility maximization, Rev. Econom. Stud. 23 (1956) 165–180. [27] J. Warga, Optimal Control of Differential and Functional Equations, Academic Press, New York, 1972.