An LP approach to dynamic programming principles for stochastic control problems with state constraints

Nonlinear Analysis 77 (2013) 59–73 Contents lists available at SciVerse ScienceDirect Nonlinear Analysis journal homepage: www.elsevier.com/locate/n...

Download PDF

302KB Sizes 0 Downloads 61 Views

Report

PDF Reader
Full Text

Nonlinear Analysis 77 (2013) 59–73

Contents lists available at SciVerse ScienceDirect

Nonlinear Analysis journal homepage: www.elsevier.com/locate/na

An LP approach to dynamic programming principles for stochastic control problems with state constraints Dan Goreac a,∗ , Carina Ivaşcu b , Oana-Silvia Serea c a

Université Paris-Est, LAMA (UMR 8050), UPEMLV, UPEC, CNRS, F-77454, Marne-la-Vallée, France

b

Universitatea Transilvania, Facultatea de Matematica-Informatica, Str. Iuliu Maniu nr. 50, Brasov, Romania

c

Université de Perpignan, 52 av. Paul Alduy, 66000 Perpignan, France

article

info

Article history: Received 25 April 2012 Accepted 1 September 2012 Communicated by S. Carl MSC: 93E20 49L25 Keywords: Linear programming Stochastic control under state constraints Dynamic programming principles

abstract We study a class of nonlinear stochastic control problems with semicontinuous cost and state constraints using a linear programming (LP) approach. First, we provide a primal linearized problem stated on an appropriate space of probability measures with support contained in the set of constraints. This space is completely characterized by the coefficients of the control system. Second, we prove a semigroup property for this set of probability measures appearing in the definition of the primal value function. This leads to dynamic programming principles for control problems under state constraints with general (bounded) costs. A further linearized DPP is obtained for lower semicontinuous costs. © 2012 Elsevier Ltd. All rights reserved.

1. Introduction To our best knowledge, the constrained optimal control problem with continuous cost was studied for the first time in [1]. The value function of an infinite horizon control problem with space constraints was characterized as a continuous solution to a corresponding Hamilton–Jacobi–Bellman equation. For discontinuous cost functionals, the deterministic control problem with state constraints was studied in [2–4] using viability theory tools. The DPP is rather easily proven in the deterministic framework or in the discrete-time setting, whenever one deals with finite probability spaces. For regular cost functionals and general probability spaces, the stochastic dynamic programming principle (without constraints) has been extensively studied (e.g. [5,6]). In general, dynamic programming principles are rather difficult to prove if the regularity of the value function is not known a priori. One should guarantee a priori measurability properties and employ technical arguments on measurable selections. In discontinuous settings, an alternative to the classical method is to use a weak formulation (cf. [7]). This method has been recently employed in [8] to provide a dynamic programming principle for stochastic optimal control problems with expectation constraints. The idea in [7] is to replace the value function by a test function. This allows to avoid measurability issues and it is rather natural in the context of viscosity theory. Both in classical and weak DPPs, a key ingredient is the so-called ‘‘stability under concatenation’’ property (cf. A3 in [7]). Although trivially satisfied for several applications, this is not true for all classes of admissible controls. For example, in the case of piecewise deterministic Markov processes (cf. [9]), one uses piecewise open loop controls which do not enjoy the

∗

Corresponding author. E-mail address: [email protected] (D. Goreac).

0362-546X/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.na.2012.09.002

60

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

concatenation property. Linearization techniques provide a way of avoiding the assumption of stability under concatenation (cf. [10]). These techniques are very similar to the diffusion setting on which we focus in the present paper. The aim of the present paper is to provide linearized formulations for the general control problem with state constraints and deduce linearized formulations of the dynamic programming principles in the discontinuous framework. Linear programming tools have been efficiently used to deal with stochastic control problems (see [11–16] and references therein). An approach relying mainly on Hamilton–Jacobi(–Bellman) equations has been developed in [17] for deterministic control systems. This approach has been generalized to controlled Brownian diffusions (cf. [18] for infinite horizon, discounted control problems and [19] for Mayer and optimal stopping problems). The control processes and the associated solutions (and, eventually stopping times) can be embedded into a space of probability measures satisfying a convenient condition. This condition is given in terms of the coefficient functions. Using Hamilton–Jacobi–Bellman techniques, it is proven in [18,19] that minimizing continuous cost functionals with respect to the new set of constraints leads to the same value. This approach has the advantage that it allows to extend the control problems to a discontinuous setting. Moreover, these formulations turn out to provide the generalized solution of the (discontinuous) Hamilton–Jacobi–Bellman equation. For further details, the reader is referred to [19,20] and references therein. We begin by recalling the main results in the unconstrained stochastic framework in Section 2.2. The results for continuous cost functions (see Theorem 1) allow to characterize the set of constraints defining the primal linearized formulation as the closed convex hull of occupational measures associated to controls. We briefly recall the linear formulations of the value in the case of lower and upper semicontinuous cost functionals and specify the connections between classical value function, primal and dual values in Theorem 4. These results are taken from [19]. In Section 2.3, we consider the case when the solution is constrained to some closed set K . It appears natural to modify the linearized value function by only minimizing with respect to probability measures whose support is (included in) K . Whenever the cost functions are bounded and lower semicontinuous, the linearized value function can also be obtained as the limit of (classical) penalized problems (cf. Theorem 11). We provide a dual formulation similar to the unconstrained framework (Theorem 11). The dual formulation links the value function to the associated HJB equation on the set of constraints K . Furthermore, if standard convexity conditions hold true, the linearized value function coincides with the standard weak formulation under state constraints. This is consistent with the results in the unconstrained framework (cf. [19]). Using the characterization of the set of constraints in the linear formulation, we prove a semigroup property (Section 3.1). This property follows naturally from the structure of the sets of constraints. We derive dynamic programming principles under state constraints for general bounded cost functionals in Section 3.2. In the bounded case, an abstract DPP is given (cf. Theorem 18, assertion 1). In the lower semicontinuous setting, we provide a further linearized programming principle (Theorem 18, assertion 2). This is just a first step in studying the equations that would characterize the general value function. It suggests that the test functions in the definition of the viscosity solution might be defined on the space of probability measures. 2. Linear formulations for stochastic control problems 2.1. Preliminaries We let (Ω , F , P) be a complete probability space endowed with a filtration F = (Ft )t ≥0 satisfying the usual assumptions and W be a standard, d-dimensional Brownian motion with respect to this filtration. We denote by T > 0 a finite time horizon and we let U be a compact metric space. We consider the stochastic control system dXst ,x,u = b s, Xst ,x,u , us ds + σ s, Xst ,x,u , us dWs , t ,x ,u Xt = x ∈ RN ,











for all s ∈ [t , T ] ,

(1)

where t ∈ [0, T ]. Throughout the paper, we use the following standard assumption on the coefficient functions b : RN +1 × U −→ RN and σ : RN +1 × U −→ RN ×d :

 (i) the functions b and σ are bounded and uniformly continuous on RN +1 × U , (ii) there exists a real constant c > 0 such that |b (t , x, u) − b (s, y, u)| + |σ (t , x, u) − σ (s, y, u)| ≤ c (|x − y| + |t − s|) ,

(2)

for all (s, t , x, y, u) ∈ R2 × R2N × U. We recall that an admissible control process is any F-progressively measurable process with values in the compact metric space U. We let U denote the class of all admissible control processes on [0, T ] × Ω . Under the assumption (2), for every (t , x) ∈ [0, T ] × RN and every admissible control process u ∈ U, there exists a unique solution to (1) starting from (t , x) denoted by X·t ,x,u . 2.2. Lipschitz continuous cost functionals In this subsection, we recall the basic tools that allow to identify the primal and dual linear formulations associated to (finite horizon) stochastic control problems. The results can be found in [19] (see also [18] for the infinite time horizon).

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

61

To every 0 < r ≤ T , (t , x) ∈ [0, r ) × RN and every u ∈ U, we associate the (expectation of the) occupational measures r



1

γ1t ,r ,x,u (A × B × C ) =

r −t



1A×B×C s, Xst ,x,u , us ds ,



E



   γ2t ,r ,x,u (D) = E 1D Xrt ,x,u ,

t r ,r ,x ,u

for all Borel subsets A × B × C × D ⊂ [t , r] × RN × U × RN . Also, we can define γ1 δr ,x (A × B) P (ur ∈ C ) , γ2r ,r ,x,u = δx , for all Borel subsets A × B × C ⊂ [t , r] × RN × U. We denote by

(A × B × C ) =

        Γ (t , r , x) = γ t ,r ,x,u = γ1t ,r ,x,u , γ2t ,r ,x,u ∈ P [t , r] × RN × U × P RN : u ∈ U . Here, P (X) stands for the set of probability measures on the metric space X. Due to the assumption (2), there exists a positive constant C0 such that, for every T > 0, every (t , x) ∈ [0, r ) × RN and every u ∈ U, one has



 

2 sup E Xst ,x,u  ≤ eC0 (r −t ) |x|2 + C0 (r − t ) .





(3)

s∈[t ,r]

Therefore,

     |y|2 γ1t ,r ,x,u ([t , r] , dy, U ) ≤ eC0 (r −t ) |x|2 + C0 (r − t ) ,  N R      |y|2 γ2t ,r ,x,u (dy) ≤ eC0 (r −t ) |x|2 + C0 (r − t ) .

(4)

RN

We define

Θ ( t , r , x) =

  

       γ ∈ P [t , r] × RN × U × P RN : ∀φ ∈ Cb1,2 R+ × RN ,    v (r − t ) L φ (s, y) + φ (t , x) − φ (T , z ) γ1 (ds, dy, dv) γ2 (dz ) = 0,



(5)

[t ,r]×RN ×U ×RN

where

Lv φ (s, y) =

1

  σ σ ∗ (s, y, v) D2 φ (s, y) + ⟨b (s, y, v) , Dφ (s, y)⟩ + ∂t φ (s, y) , 2     1,2 R+ × RN for all (s, y) ∈ R+ × RN , v ∈ U and all φ ∈ C 1,2 R+ × RN . Itô’s formula applied to test functions φ ∈ Cb Tr



yields

Γ (t , r , x) ⊂ Θ (t , r , x) . Moreover, the set Θ (t , r , x) is convex and a closed subset of P [t , r] × RN × U × P RN . For further details, the reader is referred to [19]. Let us suppose that g : R × RN × U −→ R, h : RN −→ R are bounded and uniformly continuous such that









|g (t , x, u) − g (s, y, u)| + |h (x) − h (y)| ≤ c (|x − y| + |t − s|) ,

(6)

for all (s, t , x, y, u) ∈ R × R

× U, and for some positive c > 0. We introduce the usual value function   t ,x ,u   t ,x ,u  g s, Xs , us ds + h XT

2

V r (t , x) = inf E

2N

r



u∈U

t

 =

inf

γ ∈Γ (t ,r ,x)

(r − t )

 [t ,r]×RN ×U

g (s, y, u) γ1 (ds, dy, du) +



 RN

h (y) γ2 (dy) ;

(7)

 h (y) γ2 (dy) ,

(8)

and the primal linearized value function

Λ ( t , x) = r

 inf

γ ∈Θ (t ,r ,x)

(r − t )

 [t ,r]×RN ×U

g (s, y, u) γ1 (ds, dy, du) +

 RN

for all (t , x) ∈ [0, r] × RN . We also consider the dual value function

    µ ∈ R : ∃φ ∈ Cb1,2 R+ × RN  µr (t , x) = sup s.t. ∀ (s, y, v, z ) ∈ [t , r] × RN × U × RN ,   µ ≤ (r − t ) (Lv φ (s, y) + g (s, y, u)) + h (z ) − φ (r , z ) + φ (t , x) ,

(9)

for all (t , x) ∈ [0, r] × RN . The following result is a slight generalization of Theorem 4 in [19]. The proof is very similar and will be omitted. Theorem 1 (See [19], Theorem 4). Under the assumptions (2) and (6), V r = Λr = µr .

62

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

Since this result holds true for arbitrary (regular) functions g and h, a standard separation argument yields: Corollary 2. The set of constraints Θ (t , r , x) is the closed, convex hull of Γ (t , r , x):

Θ (t , r , x) = coΓ (t , r , x) .

(10)

The closure is taken with respect to the usual (narrow) convergence of probability measures. Remark 3. Due to the inequality (4), Prohorov’s Theorem yields that coΓ (t , r , x) is relatively compact and, thus, Θ (t , r , x) is compact. Moreover,

     |y|2 γ1 ([t , r] , dy, U ) ≤ eC0 (r −t ) |x|2 + C0 (r − t ) ,  N R      |y|2 γ2 (dy) ≤ eC0 (r −t ) |x|2 + C0 (r − t ) ,

(11)

RN

for all γ = (γ1 , γ2 ) ∈ Θ (t , r , x) . In the case when g and h are only semicontinuous, one can still define V r , Λr and µr . One can also (formally) consider the associated Hamilton–Jacobi–Bellman equation

  − ∂t V (t , x) + H t , x, DV (t , x) , D2 V (t , x) = 0,

(12)

for all (t , x) ∈ (0, T ) × RN , and V (T , ·) = h (·) on RN , where the Hamiltonian is given by

   1  ∗ H (t , x, p, A) = sup − Tr σ σ (t , x, u) A − ⟨b (t , x, u) , p⟩ − g (t , x, u) , u∈U

2

(13)

for all (t , x, p, A) ∈ R × RN × RN × S N . We recall that S N stands for the family of symmetric N × N-type matrix. The following Theorem states the connection between these various functions and the Eq. (12). Proofs for these results can be found in Section 4 of [19]. They are based on inf/sup-convolutions and the compactness of Θ (t , T , x). Theorem 4. If the functions g and h are bounded and lower semicontinuous, then the primal and dual value functions coincide Λr = µr . The common value is the smallest l.s.c. viscosity supersolution of (12). If the functions g and h are bounded and upper semicontinuous, then V r coincides with Λr . The common value is the largest u.s.c. viscosity subsolution of (12). In the l.s.c. case, the assertions are given by Theorem 7 in [19]. The u.s.c. case is covered by Theorem 8 in [19]. Remark 5. If the functions g and h are bounded and lower semicontinuous, V r ≥ Λr . Under the usual convexity assumption (cf. [21]) F (t , x, u) =



b (t , x, u) , σ σ ∗ (t , x, u) , g (t , x, u) : u ∈ U is convex ∀ (t , x) ∈ RN +1 ,





if g is continuous in u, then Λr coincides with the weak formulation of V r

 i.e. Λ (t , x) = r

inf

π=(Ω ,F ,(Ft )t ≥0 ,P,W ,u)

E

π

T



g s,



Xst ,x,u

, us ds + h 

t ,x ,u XT



   .

t

For further details, the reader is referred to Proposition 12 and Example 13 in [19]. If the functions g and h are bounded and upper semicontinuous, then V r and µr may not coincide (cf. Example 9 in [19]). Remark 6. In the case of control problems with discontinuous cost and governed by a deterministic equation x′ (t ) = f (t , x(t ), u (t )) , the standard assumption is convexity of the dynamics. In this case, the value function is also discontinuous (see for instance [22,23]). Moreover, if the cost is continuous, the value function associated to the previous dynamics coincide with the value function associated to the convexified differential inclusion x′ (t ) ∈ F (t , x (t )) ,

F (t , x) = co (f (t , x, u) : u ∈ U ) .

because of Filippov’s Theorem (see for instance [24]). However, when the dynamics is not convex and the cost is l.s.c. the value functions associated to the previous dynamics do not coincide. Consequently, in order to characterize the value function with respect to the associated Hamilton–Jacobi system, one has to modify the Definition by considering the convexified differential inclusion. This definition is similar to (8). In the stochastic framework, one can reason in a similar way, by taking the weak limit of solutions (i.e. the closed convex hull of occupational measures). This set turns out to be Θ introduced in our paper. We emphasize that the actual procedure is somewhat reversed: we consider the explicit set and deduce that it is the closed convex hull of occupational measures. It is our opinion that this is the natural method allowing to obtain good properties of the value function whenever the cost is semicontinuous (or just bounded).

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

63

2.3. Control problems with state constraints We now consider a closed set K ⊂ RN . We recall the definition of the second order normal cone to K . Definition 7. Given a point x ∈ K , the first order normal cone to the set K is given by

NK1 (x) = p ∈ RN : ⟨p, y − x⟩ ≤ o (|y − x|) as K ∋ y → x .





The second order normal is given by

 (p, A) ∈ RN × S N : as K ∋ y → x,  .  1 ( x) = ⟨p, y − x⟩ + ⟨A (y − x) , y − x⟩ ≤ o |y − x|2 

NK2

2

N

Here, S stands for the set of symmetric matrix of N × N type. Throughout the rest of the paper, we assume that, for every (t0 , x0 ) ∈ [0, T ] × ∂ K and every (p, A) ∈ NK2 (x0 ), one has



min ⟨p, b (t0 , x0 , u)⟩ + u∈U

1 2

Tr σ σ ∗ (t0 , x0 , u) A



 

≤ 0.

(14)

The following result is (by now) standard in the viability theory. It is similar to the main results (in infinite horizon) of [25,26]. For further results on the subject, the reader is referred to [27]. Proposition 8. Whenever (14) holds true, the set K is near viable, i.e. for every ε > 0, every r > 0, t0 ∈ [0, r] and every initial datum x0 ∈ K , one can find an admissible control uε such that r

 E

ε





dK Xst0 ,x0 ,u

   ε ∧ 1ds + dK Xrt0 ,x0 ,u ∧ 1 ≤ ε.

t0

We recall that the function dK (·) ∧ 1 is bounded and Lipschitz continuous (with Lipschitz constant 1). We will skip the proof of the previous proposition, the arguments being essentially those in [26], Theorem 1; they consist in using the assumption (14) to prove that the function (t , x) −→ (r − t + 1) 1K c (x) is a (lower semicontinuous) viscosity supersolution of the Hamilton–Jacobi–Bellman equation

  − ∂t V (t , x) + H t , x, DV (t , x) , D2 V (t , x) = 0,

(15)

for all (t , x) ∈ (0, T ) × R , and V (r , ·) = dK (·) ∧ 1 on R , where the Hamiltonian is given by N

N

  ∗  H (t , x, p, A) = sup − Tr σ σ (t , x, u) A − ⟨b (t , x, u) , p⟩ − dK (x) ∧ 1 , 

1 2

u∈U

(16)

for all (t , x, p, A) ∈ R × RN × RN × S N . Then, by classical comparison results for viscosity solutions,



r

u∈U



dK Xst0 ,x0 ,u ∧ 1ds + dK Xrt0 ,x0 ,u ∧ 1 = 0,



inf E







t0

if x0 ∈ K . Remark 9. The reader is invited to notice that, in general, near viability does not imply viability. If the set  b (t , x, u) , 12 σ σ ∗ (t , x, u) : u ∈ U is convex for all (t , x) ∈ [0, T ] × RN , standard compactification methods yield that viability of K and near viability are equivalent.

We consider the classical value function for control problems under constraints



T

u∈U t0 ,x0 ,u

X·

∈K



t ,x 0 ,u

g t , Xt 0

inf





t ,x 0 ,u

dt + h XT0



,

t0

where g : R × RN −→ R and h : RN −→ R are bounded and measurable functions. The dependency on the control variable in g has been dropped to simplify the presentation. Under suitable assumptions (e.g. uniform continuity in u, uniformly w.r.t. both time and state variables), one can adapt the results to the more general framework. For every n ≥ 1, it is classical to consider the penalized value function VKn,g ,h (t0 , x0 ) = inf JKn,g ,h (t0 , x0 , u) , u∈U

(17)

64

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

for all (t0 , x0 ) ∈ [0, T ] × RN , where JKn,g ,h

(t0 , x0 , u) = E

T







gn t ,

t ,x ,u Xt 0 0





+ n dK



t ,x ,u Xt 0 0





∧1

dt + hn

t ,x ,u XT0 0







+ n dK

t ,x ,u XT0 0





∧1



,

t0

for all u ∈ U. Here, gn and hn are the inf-convolution functions given by gn (t , x) =

inf

(s,y)∈RN +1

(g (s, y) + n |(s − t , y − x)|)

and hn (x) = inf (h (y) + n |y − x|) , y∈RN

for all (t , x) ∈ [0, T ] × RN . In particular, if g and h are Lipschitz continuous functions, then gn = g and hn = h. We now introduce the linearized value function under state constraints. To this purpose, for every r ∈ [0, T ], t0 ∈ [0, r] and every x0 ∈ K , we consider



ΘK (t0 , r , x0 ) =

 (γ1 , γ2 ) ∈ Θ (t0 , r , x0 ) , . Supp (γ1 ) ⊂ [t0 , r] × K × U , Supp (γ2 ) ⊂ K .

Remark 10. This set is nonempty due to the assumption (14). Indeed, Proposition 8 yields that, whenever t0 ∈ [0, r] and x0 ∈ K , inf E

 r 

u∈U



t ,x 0 ,u

dK Xt 0



    ∧ 1 dt + dK Xrt0 ,x0 ,u ∧ 1 = 0.

t0

Theorem 1 and the compactness of Θ (t0 , r , x0 ) imply the existence of some (γ1 , γ2 ) ∈ Θ (t0 , r , x0 ) such that

(r − t0 )

 [t0 ,r]×RN ×U

(dK (y) ∧ 1) γ1 (dsdydu) +

 RN

(dK (z ) ∧ 1) γ2 (dz ) = 0,

and, thus, (γ1 , γ2 ) ∈ ΘK (t0 , r , x0 ) . It is clear that ΘK (t0 , r , x0 ) is a convex and compact set (see Remark 3). The linearized value functions are

ΛK ,g ,h (t0 , x0 ) =

 inf

(γ1 ,γ2 )∈ΘK (t0 ,T ,x0 )

(T − t0 )

 [t0 ,T ]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN

h (z ) γ2 (dz )

 (18)

and its dual

    µ ∈ R : ∃φ ∈ Cb1,2 R+ × RN s.t. ∀ (s, y, v, z ) ∈ [t0 , T ] × K × U × K , µK ,g ,h (t0 , x0 ) = sup , µ ≤ (T − t0 ) [Lv φ (s, y) + g (s, y)] + h (z ) − φ (T , z ) + φ (t0 , x0 ) .

(19)

for all (t0 , x0 ) ∈ [0, T ] × K . The connection between these functions is given by the following result. Theorem 11. 1. We assume (2) and (14) to hold true. If g and h are bounded, lower semicontinuous functions, then

ΛK ,g ,h (t0 , x0 ) = lim VKn,g ,h (t0 , x0 ) = µK ,g ,h (t0 , x0 ) ,

(20)

n→∞

for all (t0 , x0 ) ∈ [0, T ] × K . 2. Moreover, if g and h are bounded, Lipschitz continuous and the set F (t , x) =

b (t , x, u) , σ σ ∗ (t , x, u) : u ∈ U is convex,







for all (t , x) ∈ [0, T ] × RN , then ΛK ,g ,h (t0 , x0 ) = VK ,g ,h (t0 , x0 ) for all (t0 , x0 ) ∈ [0, T ] × K , where VK ,g ,h (t0 , x0 ) =

(

inf

Eπ

)

π = Ω ,F ,(Ft )t ≥0 ,P,W ,u t ,x ,u Xt 0 0 ∈K ,Pπ −a.s,∀t ∈ t0 ,T

[



T



t ,x0 ,u

g t , Xt 0





t ,x 0 ,u

dt + h XT0



.

t0

]

Remark 12. 1. As supremum of continuous functions, ΛK ,g ,h = supn≥1 VKn,g ,h is lower semicontinuous. The dual formulation µK ,g ,h essentially means that the value function ΛK ,g ,h can be seen as the pointwise supremum over regular subsolutions of the associated HJB equation (12) on the set of constraints K . 2. The linearized value function ΛK ,g ,h is always the supremum of penalized problems VKn,g ,h , independently of convexity assumptions. It might, however, not coincide with VK ,g ,h . Under the convexity assumption, equality can also be obtained for l.s.c costs. 3. Considering the set ΘK (t0 , r , x0 ) (or Θ (t0 , r , x0 )) seems to be the stochastic equivalent of considering differential inclusions with convex, compact valued terms.

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

65

Proof of Theorem 11. 1. Let us fix (t0 , x0 ) ∈ [0, T ] × K . We recall that gn ≤ g and hn ≤ h, for all n ≥ 1. Due to Theorem 1, one has VKn,g ,h (t0 , x0 ) =

 inf

(γ1 ,γ2 )∈Θ (t0 ,T ,x0 )

(T − t0 )

 [t0 ,T ]×RN ×U

[gn (s, y) + n (dK (y) ∧ 1)] γ1 (dsdydu)

 (hn (z ) + n (dK (z ) ∧ 1)) γ2 (dz )

 + RN

≤ ΛK ,g ,h (t0 , x0 ) < ∞. It follows that

ΛK ,g ,h (t0 , x0 ) ≥ lim sup VKn,g ,h (t0 , x0 ) .

(21)

n→∞

On the other hand, the compactness of Θ (t0 , T , x0 ) yields the existence of some γ n ∈ Θ (t0 , T , x0 ) such that VKn,g ,h (t0 , x0 ) =

  (T − t0 )

[t0 ,T ]×RN ×U

[gn (s, y) + n (dK (y) ∧ 1)] γ1n (dsdydu)



 + RN

[hn (z ) + n (dK (z ) ∧ 1)] γ2n (dz ) .

We suppose that

|g (s, y)| + |h (y)| ≤ M , for all (s, y) ∈ R × RN . There exists a subsequence still denoted (γ n )n converging to some γ ∈ Θ (t0 , T , x0 ). Moreover, one has

(T − t0 )

 [t0 ,T ]×RN ×U

(dK (y) ∧ 1) γ1n (dsdydu) +

 RN

(dK (z ) ∧ 1) γ2n (dz ) ≤

1 n

ΛK ,g ,h (t0 , x0 ) + M (T + 1)



which implies that γ ∈ ΘK (t0 , T , x0 ). Using the fact that (gn )n and (hn )n are non-decreasing, one has lim inf VKn,g ,h (t0 , x0 ) ≥ n→∞



(T − t0 )

 [t0 ,T ]×RN ×U

gn0 (s, y) γ1 (dsdydu) +



 RN

hn0 (z ) γ2 (dz ) ,

for every n0 ≥ 1. Passing to the limit as n0 → ∞, we get lim inf VKn,g ,h (t0 , x0 ) ≥ ΛK ,g ,h (t0 , x0 ) .

(22)

n→∞

The first equality in our assertion follows by combining (21) and (22). To prove the second equality, we denote by

    µ ∈ R : ∃φ ∈ Cb1,2 R+ × RN s.t. ∀ (s, y, v, z ) ∈ [t0 , T ] × RN × U × RN , µnK ,g ,h (t0 , x0 ) = sup (T − t0 ) [Lv φ (s, y) + gn (s, y) + n (dK (y) ∧ 1)] µ ≤  +hn (z ) + n (dK (z ) ∧ 1) − φ (T , z ) + φ (t0 , x0 ) . One easily notices that sup µnK ,g ,h (t0 , x0 ) ≤ µK ,g ,h (t0 , x0 ) . n

On the other hand, due to Theorem 1,

µnK ,g ,h (t0 , x0 ) = VKn,g ,h (t0 , x0 ) , and, therefore, using (22), we have

ΛK ,g ,h (t0 , x0 ) ≤ µK ,g ,h (t0 , x0 ) . For the converse, let us consider µ ∈ R and φ ∈

(23) 1 ,2 Cb



R+ × R

 N

s.t. ∀ (s, y, v, z ) ∈ [t0 , T ] × K × U × K ,

v

µ ≤ (T − t0 ) [L φ (s, y) + g (s, y)] + h (z ) − φ (T , z ) + φ (t0 , x0 ) . Whenever γ ∈ ΘK (t0 , T , x0 ), by integrating the previous inequality with respect to γ1 (dsdydv) γ2 (dz ), one gets

µ ≤ (T − t0 )

 [t0 ,T ]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN

h (z ) γ2 (dz ) .

66

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

Therefore, µ ≤ ΛK ,g ,h (t0 , x0 ) and, passing to the supremum over such µ,

ΛK ,g ,h (t0 , x0 ) ≥ µK ,g ,h (t0 , x0 ) .

(24)

The proof of the first assertion is now complete. 2. Let us consider the augmented system

 t ,x,u     = b s, Xst ,x,u ,us ds + σ s, Xst ,x,u , us dWs , dXs dY t ,x,y,u = (dK ∧ 1) Xst0 ,x0 ,,u ds, for all s ∈ [t , T ]  t ,sx,u t ,x,y,u Xt = x ∈ RN , Yt = y ∈ R.

(25)

The associated (weak) value function is given by VK+ (t , x, y) =

inf

π=(Ω ,F ,(Ft )t ≥0 ,P,W ,u)

Eπ



T



t ,x 0 ,u

g t , Xt 0





t ,x0 ,u

dt + h XT0



t0

       t ,x ,u t ,x ,y ,u . + 1(0,∞) dK XT0 0 ∧ 1 + (2 (T + 1) M + 1) 1(0,∞) YT0 0 

As for strong control processes, one can define an occupational measure associated to the weak control π . As before, it will t ,x ,u belong to the set of constraints (5). Moreover, if Xt 0 0 ∈ K , Pπ − a.s, ∀t ∈ [t0 , T ], the associated occupational measure belongs to ΘK (t0 , r , x0 ) . One easily proves that ΛK ,g ,h (t0 , x0 ) ≤ VK ,g ,h (t0 , x0 ) = VK+ (t0 , x0 , 0). For the converse inequality, we use Theorem 4 to get VK+ (t0 , x0 , 0)

    µ ∈ R : ∃φ ∈Cb1,2 R+ × RN +1 s.t. ∀ (s,(y1 , y2 ) , v, (z1 , z2 )) ∈ [t0 , T ] × RN +1 × U × RN +1  v φ (s, y1 , y2 ) + g (s, y1 ) + h (z1 ) = sup µ ≤ (T − t0 ) L .     + (2 (T + 1) M + 1) 1(0,∞) (z2 ) + 1(0,∞) (dK (z1 ) ∧ 1) − φ (T , z1 , z2 ) + φ (t0 , x0 , 0) Here,

v φ (s, y1 , y2 ) = L

1

 ∗     σ σ (s, y1 , v) D2y1 φ (s, y1 , y2 ) + b (s, y1 , v) , Dy1 φ (s, y1 , y2 )   + dK (y1 ) , Dy2 φ (s, y1 , y2 ) + ∂t φ (s, y1 , y2 ) ,   1,2 R+ × RN +1 . If (µ, φ) are such that for all φ ∈ Cb     v  φ (s, y1 , y2 ) + g (s, y1 ) + h (z1 ) + (2 (T + 1) M + 1) 1 1(0,∞) (z2 ) + 1(0,∞) (dK (z1 ) ∧ 1) µ ≤ (T − t0 ) L − φ (T , z1 , z2 ) + φ (t0 , x0 , 0) , 2

Tr

for all (s, (y1 , y2 ) , v, (z1 , z2 )) ∈ [t0 , T ] × RN +1 × U × RN +1 , then,

µ ≤ (T − t0 ) [Lv φ0 (s, y1 ) + g (s, y1 )] + h (z1 ) − φ0 (T , z1 ) + φ0 (t0 , x0 ) . , for all (s, y1 , v, z1 ) ∈ [t0 , T ] × K × U × K . Here, φ0 (s, y1 ) = φ (s, y1 , 0), for all (s, y1 ) ∈ RN +1 . Thus, µ ≤ µK ,g ,h (t0 , x0 ) = ΛK ,g ,h (t0 , x0 ). The proof of our theorem is now complete. Let us end the section with a simple Example. Example 13. We introduce the control set U = {−1, 1}. We consider the (deterministic) control system



t ,x0 ,y0 ,u(·)

dxt0

t ,x ,y ,u(·) dyt0 0 0

= ut dt ,  2 t ,x ,y ,u(·) = x t0 0 0 ∧ 1dt ,

for every t0 ∈ [0, 1], all (x0 , y0 ) ∈ R2 and all U-valued measurable functions u (·). The dynamics are obviously not convex. We fix t0 ∈ [0, 1) . One notices that, for K = {(0, 0)} , ΓK (t0 , (0, 0)) is empty. For every ε > 0 small enough, using uε = −1[0,ε)∪(3ε,5ε]∪(7ε,9ε]∪··· + 1(ε,3ε]∪(5ε,7ε]∪(9ε,11ε]∪··· ,

 

t ,0,0,uε (·)

(alternates 1 and −1 on small intervals), one gets xt0

     t0 ,0,0,uε (·)  + yt  ≤ ε+ε 2 , for all t ∈ [t0 , 1]. Thus, ΘK (t0 , (0, 0)) ̸=

φ . We consider the terminal lower semicontinuous cost function h : R2 −→ R defined by  1, if (x, y) ̸= (0, 0) , h (x, y) = 0, if (x, y) = (0, 0) .

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

67

Then, VK ,0,h (t0 , (0, 0)) = +∞ > ΛK ,0,h (t0 , (0, 0)) = 0. It seems natural to consider ΛK ,0,h if one intends to connect it to the associated HJB equation (see first assertion in Remark 12). Moreover, even without state constraints, the natural choice for the smallest l.s.c. supersolution should be Λ. Indeed, one easily proves that ΛR2 ,0,h = h, while VR2 ,0,h (t0 , (0, 0)) = 1 > h (0, 0) . 3. Dynamic programming under state constraints One might aim at characterizing the value function VK ,g ,h in connection to the associated Hamilton–Jacobi–Bellman equation. Whenever the cost is Lipschitz continuous and there are no state constraints (K = RN ), the value function is known to be the unique (continuous) viscosity solution (in some class of test functions) for the associated HJB equation. If the costs are lower semicontinuous, the generalized viscosity solution (i.e. the smallest bounded l.s.c. viscosity supersolution of (12)) turns out to be the linear value function ΛRN ,g ,h (see Theorem 4 and Remark 5). Thus, if one deals with discontinuous costs and still intends to connect the value function to the HJB system, it is judicious to turn one’s attention to the ‘‘good’’ candidate, i.e. ΛK ,g ,h . Bearing this in mind, we will investigate, in a first step, the existence of dynamic programming principles for ΛK ,g ,h . Moreover, the DPP is a very useful tool to develop numerical algorithms in order to approximate the value function and the optimal control. 3.1. A semigroup property We fix x0 ∈ RN . Let us consider t1 , t2 ≥ 0 such that 0 < t1 + t2 ≤ T , where T > 0 is a fixed terminal time. By analogy   to Θ (t , T , x) for x ∈ RN , we define sets of constraints starting from measures. If γ ∈ P [0, t1 ] × RN × U × P RN , we define the sets

Θ (t1 , t1 + t2 , γ )         η = (η1 , η2 ) ∈ P [t1 , t1 + t2 ] × RN × U × P RN : ∀φ ∈ Cb1,2 R+ × RN ,          u   φ t , x γ dx + t L φ s , y η dsdydu − φ t + t , z η dz = 0 , ( ) ( ) ( ) ( ) ( ) ( )   1 2 2 1 1 2 2   N N ×U N   [t ] R , t + t × R R   1 1 2   = 2 C0 (t1 +t2 ) 2   |y| η1 ([t1 , t1 + t2 ] , dy, U ) ≤ e |x0 | + C0 (t1 + t2 ) ,       RN            2 C0 (t1 +t2 ) 2   |z | η2 (dz ) ≤ e |x0 | + C0 (t1 + t2 ) , RN

and

ΘK (t1 , t1 + t2 , γ ) = {η ∈ Θ (t1 , t1 + t2 , γ ) : Supp (η1 ) ⊂ [t1 , t1 + t2 ] × K × U , Supp (η2 ) ⊂ K } . The reader is invited to notice that x can be identified with γ t ,t ,x,u , for all u ∈ U. Proposition 14. We assume (2) and (14) to hold true. For every t1 , t2 ≥ 0 such that t1 + t2 ≤ T and every γ = (γ1 , γ2 ) ∈ ΘK (0, t1 , x0 ) , the sets Θ (t1 , t1 + t2 , γ ) and ΘK (t1 , t1 + t2 , γ ) are nonempty, convex and compact with respect to the usual (narrow) convergence of probability measures. Proof. We only need to prove that ΘK (t1 , t1 + t2 , γ ) is nonempty. Convexity and closedness is obvious from the definition, while the second-order moment inequalities guarantee relative compactness (due to Prohorov’s Theorem). The same assertions hold true for Θ (t1 , t1 + t2 , γ ) . Step 1. We fix ε > 0. Then, there exists a compact set  K (⊂ K ) such that γ2  Kc



 ε



≤ ε . Since  K is compact, there exists a

(finite) family xn n=1,m ⊂ K such that ε

 K ⊂

mε  

B xεn , ε .



n =1

We construct B1 = B xε1 , ε ∩ K , Bn+1 = B xεn+1 , ε ∩ K r ∪n≥k≥1 Bk , for mε − 1 ≥ n ≥ 1 and Bmε +1 = K r ∪mε ≥k≥1 Bk . It is clear that









   c γ2 Bmε +1 ≤ γ2  K ≤ ε.

(26)

Due to (14) (see Proposition 8), for every k = 1, 2, . . . , mε , there exists a control uk such that



t1 + t2



dK

E t1

t1 ,x ε ,u k Xt k







∧ 1 dt + dK

t1 ,x ε ,u k Xt1 +tk2







∧ 1 ≤ ε.

68

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

We also fix a control umε +1 ∈ U. We construct the probability measures

   ε   η1 (dsdydu) =    η2ε (dy) =

mε +1

RN



RN

1Bk (z ) γ11

(dsdydu) γ2 (dz ) ,

k=1

mε +1



t ,t1 + t2 ,z ,u k



t ,t1 + t2 ,z ,u k

1Bk (z ) γ21

(dy) γ2 (dz ) .

k=1 1,2

We claim that η1ε , η2ε ∈ Θ (t1 , t1 + t2 , γ ). Indeed, for every k, if φ ∈ Cb



φ (t1 , z ) +





t ,t1 +t2 ,z ,uk

t2 Lu φ (s, y) γ11

[t1 ,t1 +t2 ]×RN ×U

(dsdydu) −



R+ × RN , for every z ∈ RN ,





t ,t1 +t2 ,z ,uk

φ (t1 + t2 , y) γ21

RN

(dy) = 0.

Then, recalling that γ2 (K ) = 1, one gets

 RN

φ (t1 , z ) γ2 (dz ) +



ε

t2 L φ (s, y) η1 (dsdydu) − u

[t1 ,t1 +t2 ]×RN ×U

 RN

φ (t1 + t2 , y) η2ε (dy) = 0.

Also, due to (11), we have

 RN

|y|2 η2ε (dy) =

mε +1





RN k=1 C 0 t2

≤e

1Bk (z )

C 0 t1



Similar estimates hold true for



C0 t 2 + e

    (dy) γ2 (dz ) ≤ eC0 t2 |z |2 + C0 t2 γ2 (dz )

t ,t1 + t2 ,z ,u k

RN

|y|2 γ21

K

 2    |x0 | + C0 t1 ≤ eC0 (t1 +t2 ) |x0 |2 + C0 (t1 + t2 ) .



[t1 ,t1 +t2 ]×RN ×U

|y|2 η1ε (dsdydu) and the assertion follows.

Step 2. Standard estimates and the assumption (2) yield the existence of some generic constant C > 0 (that may change from one line to another) such that, for every x, y ∈ RN and every u ∈ U, sup E Xst1 ,x,u − Xst1 ,y,u  ≤ C |x − y| .





s∈[t1 ,T ]

For every 1 ≤ k ≤ mε and every z ∈ Bk , the choice of uk implies



t ,t1 + t2 ,z ,u k

[t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) γ11



(dsdydu)

t ,t1 + t2 ,z ,u k

≤ [t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) γ11

(dsdydu) −

t1 ,t1 +t2 ,xεk ,uk

 + [t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) γ1

t1 ,t1 +t2 ,xεk ,uk

 [t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) γ1

(dsdydu)

(dsdydu)

  ≤ C z − xεk  + ε ≤ (C + 1) ε. Recalling that (26) holds true, it follows that

 [t1 ,t1 +t2 ]×RN ×U

  (dK (y) ∧ 1) η1ε (dsdydu) ≤ (C + 1) ε + γ2 Bmε +1 ≤ (C + 2) ε.

Similar estimates hold true for RN (dK (y) ∧ 1) η2ε (dy) .   Since η1ε , η2ε ε ⊂ Θ (t1 , t1 + t2 , γ ), one finds a subsequence converging to some η = (η1 , η2 ) ∈ Θ (t1 , t1 + t2 , γ ). Moreover,



 [t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) η1 (dsdydu) +

 RN

(dK (y) ∧ 1) η2 (dy) ≤ lim inf 2 (C + 2) ε = 0. ε→0

This implies that η ∈ ΘK (t1 , t1 + t2 , γ ) and our Proposition is complete. Definition 15. Whenever γ ∈ P [0, t1 ] × RN × U × P R







 N

and η ∈ P [t1 , t1 + t2 ] × RN × U × P RN , we define





    η ◦ γ ∈ P [0, t1 + t2 ] × RN × U × P RN by setting:  t1 t2 γ1 (A ∩ [0, t1 ] , dydu) + η1 (A ∩ [t1 , t1 + t2 ] , dydu) , (η ◦ γ )1 (A, dydu) = t1 + t2 t1 + t2 (η ◦ γ )2 = η2 , for all Borel sets A ⊂ [0, t1 + t2 ] .





D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

69

We introduce the following notation:

ΘK (t1 , t1 + t2 , ·) ◦ ΘK (0, t1 , x0 ) = {η ◦ γ : γ ∈ ΘK (0, t1 , x0 ) , η ∈ ΘK (t1 , t1 + t2 , γ )} . Remark 16. Whenever γ ∈ ΘK (0, t1 , x0 ) and η ∈ ΘK (t1 , t1 + t2 , γ ), it is clear that η ◦ γ ∈ ΘK (0, t1 + t2 , x0 ). Indeed, if   φ ∈ Cb1,2 R+ × RN , one gets   φ (t1 , z ) γ2 (dz ) = 0, and t1 Lu φ (s, y) γ1 (dsdydu) − φ (0, x0 ) + [0,t1 ]×RN ×U RN    u φ (t1 + t2 , z ) η2 (dz ) = 0. t2 L φ (s, y) η1 (dsdydu) − φ (t1 , z ) γ1 (dz ) + [t1 ,t1 +t2 ]×RN ×U

RN

RN

The conclusion follows by summing the two equalities. Moreover, the definition of η ◦ γ implies the support conditions. In fact, we can prove that the converse also holds true. Proposition 17. We have the following semigroup property

ΘK (t1 , t1 + t2 , ·) ◦ ΘK (0, t1 , x0 ) = ΘK (0, t1 + t2 , x0 ) , for all x0 ∈ K , t1 ≥ 0, t2 > 0 such that t1 + t2 ≤ T . Proof. We only need to prove that

ΘK (t1 , t1 + t2 , ·) ◦ ΘK (0, t1 , x0 ) ⊃ ΘK (0, t1 + t2 , x0 ) . Let us suppose that γ ∈ ΘK (0, t1 + t2 , x0 ). Due to Corollary 2, there exists a family of convex combinations



kn 

 α γ i n

0,t1 +t2 ,x0 ,uin

i =1

n ≥1

converging to γ . For every n ≥ 1, we introduce

 kn   0 ,t ,x ,u i n  αni γ1 1 0 n (A, dydu) ,  γ1 (A, dydu) =

γ2n =

i=1

kn 

,

i=1

kn     t1 + t2  0,t +t ,x ,ui    αni γ1 1 2 0 n A′ , dydu , η1n A′ , dydu =

t2

0,t1 ,x0 ,uin

αni γ2

η2n =

kn 

i =1

0,t1 +t2 ,x0 ,uin

αni γ2

,

i =1

for all Borel sets A ⊂ [0, t1 ] , A′ ⊂ [t1 , t1 + t2 ]. It is clear that

 n  n n     ηn = η1n , η2n ∈ Θ t1 , t1 + t2 , γ n  γ = γ1 , γ2 ∈ Θ (0, t1 , x0 ) , kn  i n n η ◦ γ = αni γ 0,t1 +t2 ,x0 ,un , for all n ≥ 1.  

and

i=1

Then, there exists a subsequence (still denoted (γ n , ηn ) converging to some (γ , η) ∈ Θ (0, t1 , x0 ) × (P ([t1 , t1 + t2 ] × RN × 1 ,2 U ) × P (RN )). One easily proves that η ∈ Θ (t1 , t1 + t2 , γ ). Indeed, whenever φ ∈ Cb (R+ × RN ),

 RN

=

φ (t1 , z ) γ (dz ) + n 2

kn 



t2 L φ (s, y) η (dsdydu) − u

[t1 ,t1 +t2 ]×RN ×U

    0 ,x ,u i αni E φ t1 , Xt1 0 n +

i=1

t1 + t2

n 1



0,x0 ,uin

Lun (s) φ s, Xs i

t1



 RN



φ (t1 + t2 , z ) η2n (dz ) 0,x ,uin

ds − φ t1 + t2 , Xt1 +0t2



=0

and the conclusion follows by letting n → ∞. Since η◦γ = γ ∈ ΘK (0, t1 + t2 , x0 ), one gets that Supp (γ1 ) ⊂ [0, t1 ] × K × U and η ∈ ΘK (t1 , t1 + t2 , γ ). To complete the proof, we still have to prove that

 RN

(dK (z ) ∧ 1) γ2 (dz ) = 0.

Standard estimates and the assumption (2) yield the existence of some generic constant C > 0 (that may change from one  line to another) such that, for every ξ ∈ L2 Ω , Ft1 , P; RN and every u ∈ U,



 

2 E Xst1 ,ξ ,u − ξ  ≤ C 2 (s − t1 ) ,

for all s ∈ [t1 , T ] .

(27)

70

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

Let us assume that, on the contrary,

 RN

(dK (z ) ∧ 1) γ2 (dz ) > 2δ,

for some δ ∈ 0, C



√ 

t2 . Using (27), for every n ≥ 1, one has

 t2

[t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) η1n (dsdydu) ≥ t2

 

2

t1 ,t1 + δ 2 × R N × U



C   = αni E 

αni

t1 + δ 2



C

t1

i=1



α

i n



C2



d K Xs



0,x0 ,uin



E d K X t1

t1 + δ 2 C

t1

i=1

δ





∧ 1 ds



 ∧ 1 ds

2

kn

≥

0,x0 ,uin

2

kn 

2





C

t1

i=1

−



2

t1 + δ 2

kn

≥

(dK (y) ∧ 1) η1n (dsdydu)

RN

 

0,x0 ,uin

0,x0 ,uin

− Xt1

E Xs

(dK (z ) ∧ 1) γ2n (dz ) −

δ3 C2

   ds

.

Passing to the limit as n → ∞, one gets

 t2

[t1 ,t1 +t2 ]×RN ×U

(dK (y) ∧ 1) η1 (dsdydu) ≥

δ3 C2

> 0,

which provides a contradiction (η ∈ ΘK (t1 , t1 + t2 , γ )). The proof of our Proposition is now complete.

3.2. Dynamic programming under state constraints Let us now return to the primal value function previously introduced (see equality (18)):

ΛK ,g ,h (t0 , x0 ) =

 inf

(γ1 ,γ2 )∈ΘK (t0 ,T ,x0 )



(T − t0 )

[t0 ,T ]×RN ×U

g (s, y) γ1 (dsdydu) +



 RN

h (z ) γ2 (dz ) ,

where g and h are bounded measurable functions. We extend this definition to initial measures γ ∈ ΘK (t0 , t , x0 ) by setting

ΛK ,g ,h (t , γ ) =

 inf

(η1 ,η2 )∈ΘK (t ,T ,γ )

(T − t )

 [t ,T ]×RN ×U

g (s, y) η1 (dsdydu) +



 RN

h (z ) η2 (dz ) ,

whenever x0 ∈ K , 0 ≤ t0 < t < T and γ ∈ ΘK (t0 , t , x0 ). We can now state and prove the following dynamic programming principles: Theorem 18. 1. Let us assume that g and h are bounded measurable functions. Then, for every x0 ∈ K , 0 ≤ t0 < t < T ,

ΛK ,g ,h (t0 , x0 ) =

 inf

γ =(γ1 ,γ2 )∈ΘK (t0 ,t ,x0 )

(t − t0 )



 [t0 ,t]×RN ×U

g (s, y) γ1 (dsdydu) + ΛK ,g ,h (t , γ ) .

2. If the functions g and h are bounded and l.s.c., one also has

ΛK ,g ,h (t0 , x0 ) =

 inf

γ ∈ΘK (t0 ,t ,x0 )

(t − t0 )

 [t0 ,t]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN

 ΛK ,g ,h (t , x) γ2 (dx) ,

for all x0 ∈ K and all t ∈ (t0 , T ) . Proof. 1. Using Proposition 17 and Definition 15, one has

ΛK ,g ,h (t0 , x0 ) =

inf

γ ∈ΘK (t0 ,t ,x0 ) η∈ΘK (t ,T ,γ )

  (T − t0 )

[t0 ,T ]×RN ×U

 =

inf

γ ∈ΘK (t0 ,t ,x0 )

(t − t0 )

g (s, y) (η ◦ γ )1 (dsdydu) +



 RN

h (z ) η2 (dz )



 [t0 ,t]×RN ×U

g (s, y) γ1 (dsdydu) + ΛK ,g ,h (t , γ ) .

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

71

2. Let us assume that g and h are bounded and l.s.c. Then, there exists some generic constant C > 0 (that may change from one line to another) such that, for every (t , z ) ∈ [0, T ] × K , u ∈ U and every T ≥ s ≥ t



 

2 E Xst ,z ,u − z  ≤ C 2 |s − t | .

Then, whenever dK (z ) > 0,



E dK

Xst ,z ,u





dK (z ) ∧ 1



∧1 ≥

2

 ,

for all t ≤ s ≤ T ∧

 t+

2 

dK (z ) ∧ 1 2C

and all u ∈ U.

Therefore, VKn,g ,h

(t , z ) = inf E u∈U

≥n

T



Xst ,z ,u

gn s,









+ n dK

Xst ,z ,u







∧ 1 ds + hn

t ,z ,u XT







+ n dK

t ,z ,u XT









∧1

t

dK (z ) ∧ 1

 (T − t ) ∧

2



dK (z ) ∧ 1

2 

2M

− C (T + 1) ,

(28)

for all n ≥ 1. Step 1. We begin by proving that

ΛK ,g ,h (t0 , x0 ) ≥

 inf

γ ∈ΘK (t0 ,t ,x0 )

(t − t0 )

 [0,t]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN



ΛK ,g ,h (t , x) γ2 (dx) .

(29)

n

Theorem 11 yields that ΛK ,g ,h = supn0 VK ,0g ,h . For every n ≥ 1,

(T + 1) C ≥ ΛK ,g ,h (t0 , x0 ) ≥ VKn,g ,h (t0 , x0 )   t     t ,x ,u    t ,x ,u   t0 ,x 0 ,u n 0 0 0 0 ≥ inf E g n s, X s + n dK Xs ∧ 1 ds + VK ,g ,h t , Xt u∈U

t0





t ,t ,x 0 ,u

[gn (s, y) + n (dK (y) ∧ 1)] γ10

(dsdydv)  t ,t ,x ,u + VKn,g ,h (t , z ) γ20 0 (dz ) N R   [gn (s, y) + n (dK (y) ∧ 1)] γ1 (dsdydv) ≥ inf (t − t0 ) γ ∈Θ (t0 ,t ,x0 ) [t0 ,t]×RN ×U   + VKn,g ,h (t , z ) γ2 (dz ) = inf (t − t0 ) u∈U

[t0 ,t]×RN ×U



RN

= (t − t0 )

 [t0 ,t]×RN ×U

[gn (s, y) + n (dK (y) ∧ 1)] γ1n (dsdydv) +



VKn,g ,h (t , z ) γ2n (dz ) .

RN

(30)

The existence of such γ n ∈ Θ (t0 , t , x0 ) is guaranteed by the compactness of Θ (t0 , t , x0 ). Again by compactness arguments, there exists a subsequence still denoted by (γ n )n converging to some γ ∈ Θ (t0 , t , x0 ). Using (28) in (30), one gets

(T + 1) C ≥ (t − t0 )

 [t0 ,t]×RN ×U

dK (z ) ∧ 1

 +n RN

2

n (dK (y) ∧ 1) γ1n (dsdydv)

 (T − t ) ∧



dK ( z ) ∧ 1

2 

2M

γ2n (dz ) − C (2T + 1) .

Therefore, γ ∈ ΘK (t0 , t , x0 ). The inequality (30) and the fact that (gn )n and VKn,g ,h that



ΛK ,g ,h (t0 , x0 ) ≥ (t − t0 )

 [t0 ,t]×RN ×U

gn0 (s, y) γ1n (dsdydv) +



 n

are nondecreasing sequences imply

n

RN

VK ,0g ,h (t , z ) γ2n (dz ) ,

for every n ≥ n0 . We pass to the limit as n → ∞, then as n0 → ∞. We recall that ΛK ,g ,h = supn VKn,g ,h and γ ∈ ΘK (t0 , t , x0 ) to get (29).

72

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

Step 2. To prove the converse inequality, we recall that VKn,g ,h is uniformly continuous and g = supn≥1 gn , ΛK ,g ,h = supn≥1 VKn,g ,h . Corollary 2 and the dynamic programming principle for the unconstrained value function VKn,g ,h yield



   g (s, y) γ1 (dsdydu) + ΛK ,g ,h (t , z ) γ2 (dz ) (t − t0 ) γ ∈ΘK (t0 ,t ,x0 ) [t0 ,t]×RN ×U RN     VKn,g ,h (t , z ) γ2 (dz ) ≥ inf (t − t0 ) (gn (s, y) + n (dK (y) ∧ 1)) γ1 (dsdydu) + inf

γ ∈Θ (t0 ,t ,x0 )

[t0 ,t]×RN ×U

t

 

gn s, Xst0 ,x0



= inf E u∈U

 ,u

RN

+ n dK Xst0 ,x0 



 ,u

t0

t ,x 0 ,u



∧ 1 ds + VKn,g ,h t , Xt 0 



= VKn,g ,h (t0 , x0 ) . Taking the supremum over n ≥ 1, one has

 inf

γ ∈ΘK (t0 ,t ,x0 )

(t − t0 )



g (s, y) γ1 (dsdydu) +

[t0 ,t]×RN ×U

The proof of our Theorem is now complete.

 RN

 ΛK ,g ,h (t , z ) γ2 (dz ) ≥ ΛK ,g ,h (t0 , x0 ) .

Proposition 19. Let us suppose that the set K = RN . If the functions g and h are bounded and u.s.c., one also has

ΛRN ,g ,h (t0 , x0 ) ≥

 inf

γ ∈Θ (t0 ,t ,x0 )

(t − t0 )

 [t0 ,t]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN

 ΛRN ,g ,h (t , x) γ2 (dx) ,

for all x0 ∈ K and all t ∈ (t0 , T ) . Proof. We take t0 = 0 for simplicity. Due to [19], Proposition 11,

ΛRN ,g ,h (s, x) = VRwN ,g ,h (s, x) ,

(31)

for all (s, x) ∈ [0, T ] × RN . Here, VRwN ,g ,h is the weak value function and it is defined on the family of weak control processes Uw = π : π = Ω , F , (Ft )t ≥0 , P, W , u chapter 4, Lemma 3.2 and Theorem 3.3) that





VRwN ,g ,h (0, x0 ) ≥ inf Eπ

t



π∈Uw

. It is known by the classical optimality principles (see, for example [6],

0,x0 ,u



g s, Xs0,x0 ,u ds + VRwN ,g ,h t , Xt



0







.

(32)

To every weak control π we can associate, as before, an occupational couple 0,t ,x0 ,π

γ1

(A × B × C ) =

1 π E t

t





0,t ,x0 ,π

1A×B×C s, Xs0,x0 ,u , us ds ,





γ2

  0 ,x ,u (D) = Pπ Xt 0 ∈ D ,

0

for all Borel subsets A × B × C × D ⊂ [0, t] × RN × U × RN . As in the strong case, γ 0,t ,x0 ,π ∈ Θ (0, t , x0 ). With this notation, the inequality (32) gives VRwN ,g ,h (0, x0 ) ≥

  inf

γ ∈Θ (0,t ,x0 )

and, we conclude using (31).

t [0,t]×RN ×U

g (s, y) γ1 (dsdydu) +



 RN

VRwN ,g ,h (t , y) γ2 (dy) ,

Remark 20. Inthe deterministic framework, if γ is an occupational couple associated to x0 and u ∈ U, then Θ (t , T , γ ) =  0,x ,u

Θ t , T , Xt 0 and, by definition,    0,x ,u ΛRN ,g ,h (t , x) γ2 (dx) = ΛRN ,g ,h t , Xt 0 = ΛRN ,g ,h (t , γ ) . RN

Thus, in general,

ΛRN ,g ,h (0, x0 ) ≤

  inf

γ ∈Γ (0,t ,x0 )

t [0,t]×RN ×U

g (s, y) γ1 (dsdydu) +

 RN

 ΛRN ,g ,h (t , x) γ2 (dx) .

In this inequality, Γ (0, t , x0 ) can be replaced by Θ (0, t , x0 ) = coΓ (0, t , x0 ) using the u.s.c. of both g and Λg ,h . It follows from the previous proposition that we have equality for u.s.c. costs and deterministic dynamics.

D. Goreac et al. / Nonlinear Analysis 77 (2013) 59–73

73

References [1] H.M. Soner, Optimal control with state-space constraint. I, SIAM J. Control Optim. 24 (6) (1986) 552–561. [2] H. Frankowska, S. Plaskacz, Semicontinuous solutions of Hamilton–Jacobi–Bellman equations with state constraints, in: Differential Inclusions and Optimal Control, in: Lecture Notes in Nonlinear Anal., vol. 2, 1998, pp. 145–161. [3] H. Frankowska, R. Vinter, Existence of neighbouring trajectorie: applications to dynamic programming for state constraints optimal control problems, J. Optim. Theory Appl. 104 (1) (2000) 20–40. [4] S. Plaskacz, M. Quincampoix, Discontinuous Mayer control problem under state-constraints, Topol. Methods Nonlinear Anal. 15 (2000) 91–100. [5] W.H. Fleming, H.M. Soner, Controlled Markov Processes and Viscosity Solutions, in: Applications of Mathematics (New York), vol. 25, Springer-Verlag, New York, 2006. [6] J. Yong, X.Y. Zhou, Stochastic Controls. Hamiltonian Systems and HJB Equations, Springer-Verlag, New York, 1999. [7] B. Bouchard, N. Touzi, Weak dynamic programming principle for viscosity solutions, SIAM J. Control Optim. 49 (3) (2011) 948–962. [8] B. Bouchard, M. Nutz, Weak dynamic programming for generalized state constraints, preprint, 2011. [9] M.H.A. Davis, Markov Models and Optimization, in: Monographs on Statistics and Applied Probability, vol. 49, Chapman & Hall, London, 1993. [10] D. Goreac, O.-S. Serea, Linearization techniques for controlled piecewise deterministic Markov processes; application to Zubov’s method, Appl. Math. Optim. 66 (2) (2012) 209–238. [11] A.G. Bhatt, V.S. Borkar, Occupation measures for controlled Markov processes: characterization and optimality, Ann. Probab. 24 (1996) 1531–1562. [12] V. Borkar, V. Gaitsgory, Averaging of singularly perturbed controlled stochastic differential equations, Appl. Math. Optim. 56 (2) (2007) 169–209. [13] T.G. Kurtz, R.H. Stockbridge, Existence of Markov controls and characterization of optimal Markov control, SIAM J. Control Optim. 36 (2) (1998) 609–653. [14] J.B. Lasserre, D. Henrion, C. Prieur, E. Trélat, Nonlinear optimal control via occupational measures and LMI-Relaxations, SIAM J. Control Optim. 47 (4) (2008) 1643–1666. [15] R.H. Stockbridge, Time-average control of a martingale problem. Existence of a stationary solution, Ann. Probab. 18 (1990) 190–205. [16] G.G. Yin, Q. Zhang, Continuous- Time Markov Chains and Applications. A Singular Perturbation Approach, Springer-Verlag, New York, 1997. [17] V. Gaitsgory, M. Quincampoix, Linear programming approach to deterministic infinite horizon optimal control problems with discouting, SIAM J. Control Optim. 48 (4) (2009) 2480–2512. [18] R. Buckdahn, D. Goreac, M. Quincampoix, Stochastic optimal control and linear programming approach, Appl. Math. Optim. 63 (2) (2011) 257–276. [19] D. Goreac, O.S. Serea, Mayer and optimal stopping stochastic control problems with discontinuous cost, J. Math. Anal. Appl. 380 (1) (2011) 327–342. [20] D Goreac, O.S. Serea, A note on linearization methods and dynamic programming principles for stochastic discontinuous control problems, Electron. Commun. Probab. 17 (12) (2012) 1–12. [21] N. El Karoui, H. Nguyen, M. Jeanblanc-Picqué, Compactification methods in the control of degenerate diffusions: existence of an optimal control, Stochastics 20 (3) (1987) 169–219. [22] E.N. Barron, R. Jensen, Semicontinuous viscosity solutions for Hamilton–Jacobi equations with convex Hamiltonians, Commun. Partial Differ. Equ. 15 (12) (1990) 1713–1742. [23] H. Frankowska, Lower semicontinuous solutions of Hamilton–Jacobi–Bellman equations, SIAM J. Control Optim. 31 (1) (1993) 257–272. [24] J.P. Aubin, A. Cellina, Differential Inclusions. Set-Valued Maps and Viability Theory, Springer-Verlag, Berlin, 1984. [25] R. Buckdahn, S. Peng, M. Quincampoix, C. Rainer, Existence of stochastic control under state constraints, C. R. Acad. Sci. Paris, Sér. I Math 327 (1998) 17–22. [26] M. Bardi, R. Jensen, A geometric characterization of viable sets for controlled degenerate diffusions, Set-Valued Anal. 10 (2002) 129–141. [27] J.P. Aubin, H. Frankowska, Set-valued Analysis, Birkhäuser, Boston, 1990.

An LP approach to dynamic programming principles for stochastic control problems with state constraints

An LP approach to dynamic programming principles for stochastic control problems with state constraints

Recommend Documents