An optimal stopping time problem with time average cost in a bounded interval

An optimal stopping time problem with time average cost in a bounded interval

173 Systems t Control Letters 8 (1986) 173-180 North-Holland An optimal stopping time problem with time average cost in a bounded interval Min SUN D...

427KB Sizes 4 Downloads 90 Views

173

Systems t Control Letters 8 (1986) 173-180 North-Holland

An optimal stopping time problem with time average cost in a bounded interval Min SUN Department

of Mathematics,

Wa.we

Slate

Uniuersit.y,

Detroit,

MI

48202,

UsA

Received 5 December 1985 Revised 8 May 1986 Abstract: An optimal stopping time problem for a diffusion process stopped at the boundary of a bounded interval is studied. Our cost functional is of time average type which is different from standard long term average cost functional but can be considered as a version of the Gittins index. Both characterization of the’vafue function and construction of optimal stopping policy will be presented.

Kq~ords:

Optimal stopping, Time average cost, Bounded interval, Quasi-Dirichlet

problem.

1. Introduction

Optimal stochastic control problems with long term average costs have been fairly extensively studied. Let us mention some recent work given in Bensoussan and Lions [l], Gimber [4], Lasry [9] and [lo], Menaldi and Robin [12], Robin [14] and [15], Stettner [16], Tarres [20]. Long term average cost problems seem to be suitable for adaptive situations (see e.g. Kumar and Lin [7], Kumar [8], and Stettner [17]). In papers [7], [8], [lo], [12], [17], [20], controlled processes evolve in a whole Euclidean space R”, while in [l, 4, 91, Lions and Perthame [ll], Menaldi and Robin [13], processes evolving in a bounded region of Rd and reflected at its boundary are considered. The former reduces to dynamic variational inequalities without any boundary conditions. The latter corresponds to variational inequalities with Neumann boundary conditions. In this paper, we shall consider a diffusion process stopped at the boundary of a bounded interval in R'. We are interested in an optimal stopping time problem w.r.t. the diffusion. The cost we want to consider will be of a time average type. We shall give some results on the characterization of the.value function and the construction of an optimal stopping policy. Our new feature here is the introduction of a quasi-Dirichlet problem for the characterization of a new version of the Gittins index. Before getting into the formulation of our problem, let us briefly recall the main results in [l]. Let G be a bounded domain in Rd. Consider the reflected diffusion

dy = g(y) dt + u(y) dw,, ~(0) =x, y reflected at the boundary 8G of G, and the optimal stopping time problem with the long term average cost

This problem was studied with the help of the discounted problems Us

= infE ‘f(xX(t)) T J0

0167-6911/86/$3.50

e-=‘dt.

0 1986, Elsevier Science Publishers B.V. (North-Holland)

174

Al Sun / An oprinwl

It was proven that under some conditions, u,(x)

A% -

(Yu,
stopping

time prolhw~

is the unique solution of

u,
(Aya-q-f)u,=O, i3U 2

avA ao

=O,

u,M2(G),

where .$

= C,iiL,. A

axi

ni= (n, e,),



e, is the i-th unit vector in Rd, n is the outward unit normal to G,

the infinitesimal

generator. Moreover, X = A, is independent

CXU,-+h,

74, -

M(mu,)

+

of x and as 01+ 0,

w,

where M(h) = L/h(x) m=(G)

dx, G

am ,,-mzginiIac=O,

A*m=O,

mEC2(G2,

m>O,

M(m)=l,

A

Aw+A=f,

$

=O,

M(mw)

=O.

A ac For the diffusion process stopped at the boundary of G, one’ probably corresponding stopping time problem with the long term average cost

wants to consider the’

where 7X is the exit time of yX(-) from G. However, one can easily see that A,V is, in general, identically zero. So this formulation is not interesting. Instead, we shall consider dt

E/7hi,‘ATf(ysJt))

A, = inf lim inf 7

O E(Th,.A

T+CO

T)

(*)

*

Our main results are that under some reasonable assumptions there exists one and only one solution (A, w) = (A,,, w(*; x’)) of -fa2D2w-gDw+A
gDw+A-f)w=O,

w
w 1ac=O,

WE C’*‘(G),

(**I

and that the A is exactly the same as that in (* ). To conclude this section, let us point out that (* *) is not quite a standard variational inequality because of the unusual condition that w(x’) = 0. Our result may be regarded as a generalization of [l] to the Dirichlet problem. But ours is more restrictive in the sense that we only adopt one-dimensional

hi. Sun / Aa optimal

stopping

time problem

175

diffusions. However, some of our results can be generalized to higher dimensional diffusions, which will appear in Sun [19]. But we shall use different techniques to treat the multi-dimensional problem since in that case the second order dynamic programming equation can no longer be reduced to a first order ordinary differential equation. One is also refered to Gittins [5], Karatzas [6] and Sun [18] for related time average cost problems.

2. Formulation

of problem and basic assumptions

Given a bounded interval [a, 61 in R' and a probability space (0, F, P) with a standard filtration ( F, }, we consider the one-dimensional diffusion process defined by dy,vW

= dy&)>

df ++Ar))

dw,,

y,*(O) = x’ E (a, b).

(1)

We will assume that g and u are Lipschitz continuous from [a, b] into R' ,

(2)

and that u & u0 > 0 for some constant a,.

(3)

Without loss of generality, we may take [a, b] = [0, 11. We also assume that f> 0 is Holder continuous on [0, 11,

(4)

II/ 2 0 is a C2 function on [0, 11.

(5)

For any F,-stopping time 7, we define the cost J,,(T) = lim inf T+o?

EJ TAT”hTf(~xd~)) dr+W(yx+A O E(TAT$A T)

T))$<,,, (6)

where TXr=inf{t20:

y,,(t)

E {O,l)}.

(7)

Let us denote the value function A,< = infTY,(7). 7

(8)

Our cost is of a time average type, but not quite the same as the standard long term average cost. In [5] and the references there-in, Gittins and his co-workers showed the interesting usage of the so-called Gittins index in the study of the multi-armed bandit problem. They proved the optimality.of Gittins’ index policy. A continuous version of the Gittins index M(x)

= supEjTe-ay(yX(L)) T>O 0

dt[l-

Ee-ti7]-1

has been examined in [6]. Our cost criterion (8) can be considered as another continuous version of the Gittins index. Our techniques and assumptions will be quite different from those in [6], and the Gittins index is studied from a different point of view. In contrast to [6], we shall not use any result for the dynamic allocation problem. Furthermore, a different type of characterization result of the Gittins index (Theorem 1) will be obtained by means of the dynamic programming technique. Finally, let us mention Robin’s version (see [15])-of a stopping time problem with the optimal cost h= ir$ infJx,(r). 7

i

(9)

176

M. Sun / AN optimul

stopping

time problem

The crucial difference between (8) and (9) is that A,, depends on x’, while A is clearly an absolute constant. It is the dependence of A,, on x’ that causes some technical problems and shows new features of our formulation. One technical problem is the study of the regularity of h,Y, as a function of x’. Furthermore, in [15] for the diffusion model only reflected diffusions were considered, which amounts to a dynamic variational inequality with Neumann boundary condition. For ours, we shall essentially deal with a variational inequality with Dirichlet boundary condition. 3. Main results

Our main results are stated in the following theorem. Theorem 1. Under the assumptions (2)-(5): (a) There exists a solution (A, w) = (A,,, w( * ; x’)) of - fa2D2w - gDw + h <

f,

w<#, w(x’)=O,

(-fa2D2w-gDw+h-f)(w+)=O,

w(O)=w(l)=O,

WEC’~‘[O,l].

(10)

(b) h = A,, is exactly the same as the optimal stopping cost given in (8). (c) The optimal stopping time exists, and is given explicitly by

r*=inf{t>,O:

~(y~~~(t))=~(y,,(t))}.

The proof of Theorem 1 will be given in several lemmas which follow. However only main ideas will be presented here. One is referred to [19] for more details. Lemma 1. For any fixed pair (a, E), the following penalized dynamic programming solution : - $a2D2w; - gDwz + f {w’, -

w’,(O)= w;(1) = wZ(x’) 80, awi(x’)

equation has at least one

wi(x’) - #} + + aw; = f,

01)

WZE c2[0, 11,

is bounded in (a, E).

Proof. Observe that (11) is not a standard Dirichlet

problem because the boundary condition depends on the solution itself. In the case of a variational inequality (see e.g. ,Bensoussan and Lions [2]), it has been called a quasi-variational inequality (QVI). Let us call (11) a quasi-Dirichlet problem (QDP). We will use the monotonicity technique to prove the existence of a solution to QDP (11). Consider the following iteration scheme: 1 - 120 2D2WWJ a - gDw;N + -{w;” w?“(O) = w;n(l)

= w;fi-:(xf,,

- w;“-l(xl)

_ ,J,} + +aWy=f,

w:” E C2[0, 11,

n = 1, 2 ,...,

w.yJ E 0.

02)

A standard result in stochastic control theory (e.g. Bensoussan and Lions [3]) says that (12) has a unique solution and the solution is explicitly given by

w;“(x) = in’,“:qu,, where

(13)

M. Sun / AH optimul

stopping

time problem

171

and .P-( x

u) = EJ,“[ f+ fu( II, + w;+‘(x’))] +Ew;“-l(x’)

exp[ - [(

exp[-[(a+;)

a + 4) dsj dt

ds].

We then claim: (a)OGw;‘t as nfcc; (b) aw;“(x) G C (C independent of E, LY,n, x); 2”(x) is a solution to (11). (c) w;(x) = lim,,,,w

Cl

Remark. (11) is a variant of the penalized problem used in.[3] to study the standard optimal stopping time

problem. Lemma 2. Let w: be a solution to (11). Then

and

~i(~)=;:EY(El~(/+qJl)exp[-Jd(a+!)d~]dt +EJo”w:(x’):

exp[ -/d(

a+:

arw;(x’)

+ #z]

) ds ] dt+w;(x’)Eexp

-i”(a+q)ds]}.

Hence (11) has a unique solution. Proof. It follows from (11) that

w:(x)

- wz(x’) = inf Ei5[f-

from which we get the desired results. Lemma

exp ~[-~$-x+s,

I

ds dt.

0

3. Consider the following ODE for (p E C*[O, 11: - +*D*u

- gDu =

f,

Under the assumptions (2)-(4), explicitly by u(x)=Ejgrf

u(O)= @J(O),u(l) = $4). problem (14) has one and only one solution in C*[O,l]

dt+EqS(yeY(7,)).

Lemma 4. For the solution w: of QDP (ll), (a) II awi IIcIo,ll Q C; (b) Ile-‘{ w: - ax’) - 4 > + IlC[O.l]Q c Cc) 11Dw: lIcIo.ll Q C; (4 11D*~:ll,~o;~ s C; where C is indepen ent of (E, a).

• I

and it is given

M. Sun / An optimal

178

stopping

time problem

Proof. (a) It follows from the maximum principle. (b) Let u; = M$ - w;(x’) - #. Then u; satisfies - fa2D2uL - gDuL -I- 1 {u;} E

+ =f-

a( w;(x’)

+ I//) + +a2D’rl/ + gD$.

Applying the maximum principle to the above equation, we obtain part (b). (c) Consider the following problems: - $a2D2w - gDw =

f - Cl - C,,

w(0) = w(1) = 0,

and h(0) = h(1) = 0,

- $a2D2h - gDh = f,

where C, and C, are constants given in parts (a) and (b), respectively. Thus we get IDw:(O)l
vlDw(O)l,

i.e. Dw’,(O) is bounded, independent of (a,

E).

We now deduce from (11) that

where f,(x)=f(x)-fxw;(x)-

f{w;(x)-W:(X’)+(X)}+.

Then part (c) follows. (d) It follows from parts (a), (b), (c), and (11).

0

Lemma 5. For the solution wz of (ll), (a) wa(x) = l&, ow:(x) exists and is a (unique) solution of - $a2D2w, - gDwu + aw,
w, G # + wub’),

- gDw, + aw, -f)(W/W,(X’) = w,(x’),

-#)

=O,

(15)

w, E C’J[O, 11;

(b) results (a), (c), (d) in Lemma 4 remain true for wa; (c) Ihe solution of (15) is unique and is given by

W=(x’) = inf 7

E 7hTxre~a’fdf+E J

O

e-a’~(r,,(7))1~,,,,~,)

1 - E e-a’ A r.xl

and

Proof. Parts (a) and (b) follow from Lemma 4. The proof of part (c) is similar to that of Lemma 2. Proof of Theorem 1. (a) For the solution w, of (15),

A = A,, = ali~oa,wa”(x’) ,I

q

M. SUII /An

oprimal

slopping

time problem

179

exists for some { (Y,~}.On the other hand, it is easy to see that Ilc[o.ll < C (uniformly

II w% - w&‘) Let u,,, = wu, - We,,.

in n).

Then we get < C (uniformly

11ua,, lIC1[0,1]+ IID2%L0.11

(16)

in n ) .

Hence a,,ua,,(x) + O as n + co, for all x. On the other hand, we get from (16) that for all 0 < I’ < 1, u,,, E C”‘[O,

l]

bounded withIIUU” llC’.‘(O.1]

uniformly { OL,,}, still denoted { (Y,,}, such that

in n and r. Thus for a fixed r0 E (0, l), we take a subsequence of

u at, + w in C’~‘“[O, 11, for some w E C1*ro[O, 11. In fact, we can see by (16) that w E CiV1[O,l] and that (10) is satisfied for the (A, WI. (b) and (c). By the generalized It6 formula, we get for any r and T > 0, w(x’) < E w(y&A

?$ * 0)

+ j 7h7.“nTf(yX,(t))

df] -XE(~A~,A

T).

0

so E TA”‘ATf(~x4t)) / A< O

dt+E(yx+A E(w+A

T))$,<,,,.

T)

9

from which we have A< infJx,(r). 7 On the other hand, for r*=inf{t>,O:

~(y,#(l))=+(y,~,(r))},

we easily have A = Jx,(r*).

0

Remark. In the proof of Theorem 1, we actually have u, + w .weakly- * in W2+‘(0, 1) and strongly in

W’*“(O, 1). 0 Corollary 1. Under the assumptions made in Theorem 1, (10) has one and only one solution. Proof. Let (k, u) be any solution of (10). Then as in the proof of Theorem 1,

p= infJ;.,(r) 7 and consequently u(x)=

=A,,

infEIAl*(f_hl.) 7

This completes our proof.

dt+E~(y,(7~7,))1~,~~,~). Cl

180

M. SUII / A!] optimal

stopping

tbne problem

Acknowlegement

The author would like to thank the referee for his comment on the Gittins index.

References [l] A. Bensoussan and J.L. Lions, On the asymptotic behaviour of the solution of variational inequalities, in: Theory of Nonlirrerw Operators (Akademie-Vedag, Berlin, 1978) pp. 24-40. [2] A. Bensoussan and J.L. Lions, CotmG/e Impulsionnel et InPquutions Quasi-oariatiorvrel (Dunod, Paris, 1982). [3] A. Bensoussan and J.L. Lions, Applications of Variational Inequalities iu Stochastic Control (North-Holland, Amsterdam-New York, 1982). [4] F. Gimber, Sur quelques equations nonlineaires intervenant en contrBle stochastique, These. Universite de Paris IX-Dauphine. [5] J.C. Gittins, Bandit processes and dynamic allocation indices (with Discussion), J.Roy. Statist. Sot. Ser. B 41 (1979) 148-164. [6] I. Karatzas, Gittins indices in the dynamic allocation problem for diffusion processes, Ann. Probub. 12 (1984) 173-192. [7] P.R. Kumar and W. Lin, Optimal adaptive controllers for unknown Markov chains, IEEE Trans. Auronwt. Control 27 (1982) 765-774.

[S] P.R. Kumar, Optimal adaptive control of LQG systems, SIAM J. Control Optim. 21 (1983) 163-178. [9] J.M. Lasry, Un probleme de contrBle stochastique avec cridre asymptotique, Cahier de Math. de la DCcision, no. 7715, Universit6 Paris IX-Dauphine (1977). [lo] J.M. Lasry, Contrale stationnaire asymptotique, These (1974). [ll] P.L. Lions and B. Perthame, Quasi-variational inequalities and ergodic impulse control, Cahier de Math. de la Decision, no. 8412, Uuiversit6 Paris IX-Dauphine (1984). [12] J.L. Menaldi and M. Robin, Some singular control problem with long term average criterion, Lecture Notes in Control und Ittformation Sciences, No. 59 (Springer, Berlin-New York, 1984) 424-432. [13] J.L. Menaldi and M. Robin, An ergodic control problem for reflected diffusion with jumps, IMA J. Math. Control aud I,ljornf., to appear. [14] M. Robin, Long term average cost control problems ‘for continuous time Markov processes: a survey, Acta Appl. Math. 1 (1983) 281-299. [15] M. Robin, On some impulse control problem with long term average cost, SIAM J. Control Optim. 19 (1981) 333-358. [la] L. Stettner, On impulsive control with long run average cost criterion, Studia Math. 76 (1983) 279-298. [17] L. Stettner, Discrete time adaptive impulsive control theory (1984). [18] M. Sun, Singular control problems in bounded intervals, submitted. [19] M. Sun, PhD Thesis. [20] R. Tarres, Comportement asymptotique d’un probltme de contrB1e stochastique, Cahier de Math. de la Decision, no. 8215, Universite de Paris IX-Dauphine (1982).