Duality theorem in Markovian decision problems

Duality theorem in Markovian decision problems

JOURNAL OF MATHEMATICAL Duality ANALYSIS Theorem AND 50, 579-595 (1975) APPLICATIONS in Markovian Decision Problems KEIGO YAMADA UNIVAC Inf...

712KB Sizes 3 Downloads 152 Views

JOURNAL

OF MATHEMATICAL

Duality

ANALYSIS

Theorem

AND

50, 579-595 (1975)

APPLICATIONS

in Markovian

Decision Problems

KEIGO YAMADA UNIVAC

Information

and Control Sciences Center, Tokyo, Japan

Submitted by Harold

Kushner

1. INTRODUCTION In this paper we present a duality theorem which arises in a Markovian decision problem of certain types of dynamic systems. We assume such a system is observed periodically at times n = 0, 1, 2,.... After each observation the system is classified into one of possible states. Let S denote the space of possible states. We assume S is a compact set in d-dimensional Euclidean space Rd. After each classification an action must be chosen from the action space U, which is dependent on state x of the system. The action space U, is a compact subset of m-dimensional Euclidean space Rm which may be countable or uncountable. We assume that the action space U, is contained in a compact set UC Rm. Let {X, ; n = 0, 1,2,...} and {a, ; 71= 0, 1,2,...} denote the sequence of states and actions. It is assumed that for every x E S and a E U, , there is a known probability measure P(. 1x, u) on the o-field gs of Bore1 subsets of S such that

fYxn+1 E B I 4, , a,,,..., X, = ~,a, = a} = P(B / x,u>. Whenever the process is in state x and an action a is chosen, then a cost L(x, a) is incurred. The control we consider here is a randomized policy which is specified, for each x E S, by a probability measure ~(3, *) on the space (U, 3YU) where &YUis the u-field of Bore1 subsets of U such that cp(x, U,) = 1, i.e., when we use a randomized policy v(*, a) and the system is in state x, then an action a E U, is realized with a probability measure ~(x, .). We assume that for each A E 33cT, cp(.,A) is Bore1 measurable. Let % be the set of all controls with this property. For any control ‘p E Q, let

Q&d= li~+yp(N+ 11-l5 &VL ,a,> 76-O

where {X,) and {a,} are random sequences of states and actions corresponding

579 Copyright All rights

0 1975 by Academic Press, of reproduction in any form

Inc. reserved.

580

KEIGO

YAMADA

to the control v E @ and E, represents the expectation with respect to random processes with the initial state X,, = X. Thus Q,(X) is the expected average cost per unit time when the system starts in state x and control q~is used. To find an optimal control in % which minimizes the cost criterion (l), if we apply formally the principle of optimality of dynamic programming we are led to the following equation:

As a matter of fact, if there exists a constant X and a bounded Bore1 measurable function w(x) defined on S, then under appropriate assumptions it can be shown that there exists an optimal control 4(x) where C(x) is a Bore1 measurable function: S 3 x -+ d(x) E U, and it minimizes, for each x, the right side of (2). Assuming that the transition probability measure P(* 1X, a) has a density p(* 1x, u) with respect to Lebesgue measure, the problem of finding A and D(X) which satisfy (2) is equivalent to the following problem (P): (P):

Maximize X

subject to

h+49- s,

fdY)P(Y

I x, 4 dY
4,

x E s,

aeU,.

WI

For this problem (P) we shall call the following problem (D) the dual of the primal (P). The dual (D) is of the form: (D) : Minimize (Dl)

subject to

s,G, 4 - ~-,P(IIx,4 4(x> d4dx= 0 q(x, da) dx = J-1 SrJ

WI

1

where 4(x, a) is a measure on (U, W,) and support of p(x, .) is U, . The purpose of this paper is to examine the relation between the primal (P) and the dual (D) and establish a duality theorem under appropriate assumptions. When S and U are both finite, then the primal (P) and the dual (D) are both linear programming in finite dimension space and the duality relation of (P) and (D) comes from the general duality theorem in linear programming. When S or U is not finite, then (P) and (D) can be viewed as

DUALITY

THEOREM

IN

MARKOVIAN

DECISION

PROBLEMS

581

linear programming in function spaces, and although there exist several results, e.g., [I], on the duality for such a problem, they are not so general as to cover our problem. Owing to the special structure of our problem, the approach we take is probabilistic and the Markovian decision problems described above are exploited as underlying phenomena for the problems (P) and (D). In Section 2 some preliminary results such as the existence of invariant probability measure for the controlled Markov processes and its consequences are presented. Duality discussion is given in Section 3. As an application of the duality theorem in Section 3, the existence of optimal solutions of (P) is shown in Section 4 under some assumptions when U, = U for all x and U is finite. In [2], Ross gave some different sufficient conditions for such cases. It is known [2] that the existence of optimal solutions of (P) implies the existence of optimal stationary controls among quite broad class of admissible policies. 2.

ASSUMPTIONS

AND

PRELIMINARIES

The following assumption (Al) on the probability measure P(* 1x, a) will be made: (Al) The probability measure P(- 1X, a) has the density p(* 1X, a) with respect to Lebesgue measure and there exists a Bore1 set C E g’s and a 6 > 0 such that Y(C) > 0 and p(r I x, 4 3 4

XE s,

y E c,

a E r/;,

where 9 denotes Lebesgue measure. ~(y I X, u) is jointly measurable in y, x and a. A further assumption on p(. j x, a) is given in the next section. We shall show an example of inventory control problems which satisfy (Al) at the end of this section. A direct consequence of the assumption (Al) is the following lemma which shows that any process corresponding to a control in & is geometrically ergodic. LEMMA 1. For each q~E 9!‘, the corresponding process {X,, , n = 0, 1, 2,...} is geometrically ergodic, i.e., the process X, has an invariant probability measure pm such that, for some 0 -=ch < 1 not depending on 9) E % and x,

I Pm@,x, 4 - P,WI G (1 - W1,

VAEWs,

where P,Jn, x, A) is the n-step transition probability function of the process {X,} with X,, = x. The invariant measure ~1~has a density function p,( .) with

582

KEIGO

YAMADA

respectto Lebesguemeasure,and is greater than 6 on the set C with the inequality: I P3Y

I 4 - P,(Y)1 < (1 - V--l2

VY E s,

where pp)(y 1x) is the density function corresponding to PJn, x, A). Proof. By using a result of Doob [3, p. 1971,the conclusions in the lemma follows directly. In fact, the process {X,) with X,, = x corresponding to v E Q is a Markov process whose transition probability has a density given by

PAY I4 =Iup(y I*,a> dx,da). Hence by (Al), we have P*(Y I 4 3 8,

XES, YE c,

and the condition (D’) in Doob [3, p. 1971is satisfied.

Q.E.D.

Remark 1. From Lemma 1 pa is the unique invariant probability measure of the process X, . In fact, let /&, be any invariant probability measure of the process X, . Then for any A E gS and arbitrary n > 0

i%(A) =s,P&x3 4AW By Lemma 1, P@(n,x, A) + pW(A) as n--t co uniformly Lebesgue’s convergence theorem,

in x. Hence by

x3 4P&d - P,(A) sf’&, S

as n -+ co, and hence

Lemma 2 is a key fact in the next section. Let L(*, -) be a real valued bounded measurable function defined on Rn x Rm. LEMMA 2. For p E Q let pB be the invariant measure of the process corresponding to I. Define

(3)

DUALITY

THEOREM

IN

MARKOVIAN

%W = f &&(X?J

DECISION

583

PROBLEMS

x E s.

- b)Y

(4)

n-0

Then a,(x) is uniformly bounded with respect to 9 E % and satisfies the equations:

ha +%(X) =4&4 +s,

%(Y> PQil Yx7 dY) (5)

=-%4x)

+ s,%(Y)

P,(Y I 4 dY.

Proof. First we show that q,,(x) is finite and uniformly bounded with respect to x E S and v E %. Let 0 < 1L,(x)1 < K and, for an arbitrary integer N > 0, define a set Ai by Ai = (x ; [(i -

1)PW < &,(x) < (i/N)Q,

i = -N

+ 1, 1,2, . ... N.

Then

ISJ%(Y)Pm(n7x, 44 S

cI

s

i=.$+l

A. *

L&)

/sL,O

P-h

P&Y)

x9 4)

/

-

i_$+l

s, t

-UY)

PJn, x7 dy) -

:

J%(Y)

& KP,(n,

cL&+)

X, 4)

1

1

i-;-N+1

(6) .,

KP&, x, 4

+j f / i=-N+l

4

-WY) P#Y)

f i=-N+l +FkdAJ I - $ $44)1 -

i=-N+l

by Lemma 1. Now if we take N = [(l - h)-n/z] + 1 for each n where [a] denotes the largest integer not greater than a, since

584

KEIGO

YAMADA

we have 1w,(x)/ < 5 2q

- A)nlz + 2K f

n-0

(1 - X)n-1.

?%=O

The right side is convergent and independent of p E % and x E S. Next we prove the equation (5). By the Markov property of the process (X,} 4&wG+3

- &I = c!3Egn(xl),

where Hence

=L&)

- A, + et! : &y-q T&=0

=JLM

-

AZ +

QJ,(Xl)

(7)

This proves (5). Since CzsE1gn(x) converges uniformly in x as we have shown already, the interchange of E, and CL, in (7) is valid. Remark 2.

We have shown in the proof that

(see (6)). Hence

and this shows that Q,(x) in (1) does not depend on x. Remark 3. If there exists a h and a bounded measurable function a(x) which satisfy the Equation (5), then it can be shown that (9)

DUALITY

THEOREM

IN

MARKOVIAN

DECISION

PROBLEMS

585

where the process X, corresponds to v. In fact, from (5) we have

Since w(.) is bounded and measurable and the measure p+,(.) is invariant, the second and fourth terms in the above equation are equal and we have h=

ss

L,(x) p&fx) = A, .

We can also show that V(X) differs from o,Jx) only by a constant. To show this, let V(x) = V(X) - wJx). Then

Iterating the above relation k times

w4 = j,

V(Y) ~Q(k, x, dY).

By using Lemma 1 and the boundedness of V(.),

Hence V(x) is a constant. Remark 4. In [4], conditions for the existence and uniqueness of solutions of the equation (5) are discussed when S is denumerable. When S is finite, the result of Lemma 2 is a direct consequence of geometric ergodicity of the controlled Markov processes and is a known fact ([6, p. 1511). Lemma 2 is a generalization of this case and the method of the proof is essentially same as in the case of finite state space except some technical parts. In closing this section we shall give an example which satisfies the condition (Al). EXAMPLE (inventory control problem). At the begining of each period n, we observe the inventory level X, (the state of the system) of a single item and order some amount which depends on the level X, , and is supplied without delay. Let & be the demand of the period tl and the (53 are assumed to be independent and identically distributed. Let 4(X,) be the amount of

586

KEIGO

YAMADA

order at the beginning of the period n, then the inventory level X,,,, at the beginning of the period n + 1 is X n+1 = -52 + wk)

- 5, -

Suppose that the random variable & takes values on [0, D] and has a continuous positive density pE(*). The action space U, is determined by

u e = lo, M - 4 I [s - x, M - x]

s
where s and M are some positive numbers such that s < M. In other words, if the inventory level X,, = x is smaller than s, then we order at least s - x so that the inventory level becomes greater than s, but we cannot order more than M - x so that the maximal inventory level is less than M. In this example the state space S = [s - D, M] and the probability measure P(. 1x, u) (x E S, a E U,) has a density p(y 1x, a) = p,(x + a - y). Let C = [o, s] and assume that D > M. Then we see that the assumption (Al) is satisfied. For any x E S, a E U, and y E C, we have x + a - y & [o, D]. Hence P(Y I x, 4 = P,(x + a - Y) b dg& P<(d) = 6 > 0, foranyxES, UE U,andyEC. Lemma 3 is a version of implicit function theorems which exist in various forms and will be used in the next section. For an arbitrary compact set U in R”, let k(U) be the set of all nonempty compact subsets in U, and gk(u) be the Bore1 u-field of k(U) generated by the Hausdorff metric on k(U). We say that a set valued mapping F: S + k(U) is measurable if for every B E gktu) , F-l(B) E 9Ys . LEMMA 3. Let a set vulued mapping S 3 x--t U, E k(U) be meusuruble. Let k(x, u) be a function S x U + R1 such that k(., u) is Bore1 measurableand k(x, *) is continuous. Then there exists a Bore1 measurablefunction d(x) such that ;pJ 4x, 4 = k(x, d(x)>. a For the proof, see [5]. Remark 5. The Hausdorff metric on k(U) is defined as follows [5]: For an arbitrary nonempty set A of U and E > 0, we write S(A, c) for {x E U 1d(x, A) < e} where d(x, A) is the Euclidean distance of x and A. The Hausdorf metric 6 on k(U) is defined by 6(A, B) = inf(a > 0 1A C S(B, a), B C S(A, a)},

A, BEk(U).

DUALITY

THEOREM

IN

MARKOVIAN

DECISION

PROBLEMS

587

It can be shown that 6 is a metric on K(U). For set valued measurable mappings, their basic properties and some methods for constructing them from other measurable mappings are given in [S, Section 1.71. For example, any continuous mapping is measurable. Hence the mappingF: S 3 x + U, E k(U) given in the example of inventory control problem is measurable.

3.

DUALITY

THEOREM

In this section we shall establish a duality theorem for the problems (P) and (D). The exact statement of the primal problem (P) is the following: Find a constant h and a Bore1 measurable function v(e) which satisfy the relation (PI) almost everywhere. Similarly the dual problem (D) is to find, for each x E S, a measure CJ(~,.) on (U, 9Zr,) such that (D2)-(D3) is satisfied almost everywhere and q(‘, A), for each Bore1 set A E .!?8U, is Bore1 measurable. We shall make further assumptions (A2) and (A3) onp(y / x, a),L(x, Q) and U, . (A2) p(y / x, a) is continuous in y and a for each fixed x. L(x, u) is a bounded measurable function and continuous in a for each fixed x. (A3) The mapping F: S 3 x --f U, E K(U) is measurable. In the course of establishing the duality theorem, it will turn out that the dual (D) is equivalent to the original Markovian decision problem, i.e., the problem of finding a control CPin @ such that Q,,(X) is minimized. THEOREM 1 (weak duality). Both (P) and (D) feasible solution (A, v(x)) of (P) and q(x, u) of (D),

are

feasible, and for any

da) dx. A
Proof. The feasibility of the primal (P) is obvious. Take, for example, v(.) = 0 and X sufficiently small. To prove the feasibility of the dual (D), take an arbitrary 9 E 4. Then by Lemma 1 the corresponding Markov process {X,} with X,, = x has the invariant probability measure pm with the density p,( *). Define a measure 4(x, .) on (U, ~8~) by 4(x, *) = &x, .) p,(x). We shall prove n(x, *) thus defined is feasible for (D). Since p,(x) and ~(x, *) are a probability density and a probability measure on S and U respectively,

/s, n@, da)dx= j-S(1”4~ da)P,(X) dx= 1a

588

KF.IGO

YAMADA

Since pJ*) is an invariant probability density, using Fubini’s theorem

JAP,(X) dx= JsP,U 3~9A)PAX) dx = s,(JjdYI4 dY)A&>dx = JA(JsP.(r Ix)P&)dx)dY? AEL2fs. Hence we have P,(Y) = j)‘.(~

I x>z’&> dx,

a~

and noting that MY

I 4 = J$Y

I x9 4 dx, w

we get

=

p(yIx,a)4(x, da) dx,a.e., JJ SD

and this is equivalent to

Now we have established the feasibility of q(x, .) for (D). Let (A, w(x)) and q(x, *) be feasible solutions of (P) and (D) respectively. Then integrating both sides of

h + 44 - Js O(Y) P(y I x, 4 4 < L(x, 4

a.e.,

with the measure q(x, a), we have

h + j-ju 44 n(x,4 dx - j-s, cs,W(Y) P(Y I ~94 dy] 4~sW dx L(x, a) q(x, da) dx. ,( fS su

DUALITY

THEOREM

IN

MARKOVIAN

DECISION

589

PROBLEMS

Interchanging the integral in the third term of the above equation by Fubini’s theorem and using (D2), we get qx, A
Q.E.D.

a) q(x, da) dx.

Our main result on the duality of (P) and (D) is contained in the following two theorems. The basic techniques in the proof of these theorems are essentially same as those which have been developed and used in the study of policy iteration technique and linear programming algorithm for solving the finite state, finite action Markovian decision problems with average cost criteria (see, for example, [7]). THEOREM 2. Under (Al)--(A3) if the primal problem (P) has an optimal solution (h*, v*(.)) and v*(*) is bounded, then the dual (D) also has an optimal solution and both optimal values of objective functions are equal.

Proof. Since, for each X, p(. ] X, .) is continuous on the compact set S x U, , p(* / X, .) is bounded. Hence with boundedness of v*(v) we have the continuity of Js w*(y) p(r ] X, a) dy in a by Lebesgue’s bounded convergence theorem. Let us define a function K(x, a) by

k(x,4 = -4%4 + ls V*(Y) P(Y

I x, a> dy,

x E s,

aEU,.

By (Al)-(A2) and the fact just mentioned, K(x, a) is Bore1 measurable in x and continuous in a. Hence by Lemma 3 there exists a Bore1 measurable function #J*(X) such that j$

z

k(x, a) = k(x, 4*(x)),

x E s,

and by the optimality of (X*, u*(e)),

=-W d*(x))+ JsV*(Y) P(Y I *, C*(x)) dy, a-e. Consider the Markov process whose transition probability P(x, -) is given by

Ph4 =s,

P(Y I *, +*W

409/50/3-10

dy>

AE@~.

590

KEIGO

YAMADA

Define a probability measure v*(x, -) on (U, @u) for each x E S so that

4*(x) E4 otherwise. Then p* E 9 and the corresponding Markov process has the unique invariant probability measure pti which has a density ‘function &,(a) by Lemma 1. By the boundedness of V*(O) and Remark 3,

Now define a measure q*(x, .) for each x E S by

4*(x, *) = 9J*tx,9P,*W

x E s,

then x* =

ISsu

(10)

L(x, u) q*(x, da) dx.

Obviously s,s, q*(x, da) dx = j$,&)

dx = 1.

Since p& -) is an invariant probability density function of the Markov process whose transition probability density is given by

P,*(Y I xl = j=/Y

I x, 4 v*(x, da) = P(Y I x, C*(x))>

we have P,*(Y) = J$Y

I x7 d*(x)> P,*(X) dx,

a.e.,

and this is equivalent to

fu q*b 4 = /sfu~(z I x94 4*(x,da)dx, as. Thus we have shown that q*(x, -) is a feasible solution of (D) and by Theorem 1 and (10) q*(x, *) is an optimal solution of (D) with the objective value equal to A*. Q.E.D.

DUALITY

THEOREM

IN

MARKOVIAN

DECISION

PROBLEMS

591

Continuity of p(y j x, a) in y was required only for the boundedness of p( * 1x, *) as is shown in the proof. The continuity of p( y 1x, .) is, however, indispensable. The dual version of Theorem 2 holds under more restricted assumptions. We assume (A4): Remark 6.

(A4)

The assumption (Al) holds for C = S.

Note that, under this assumption, the Markov process corresponding to any ~JJ E @ has the invariant probability density p,(e) which is everywhere positive. In fact, p,(x) >, 8. THEOREM 3. Assume (Al)-(A4). If the dual problem (D) has an optimal solution q*(x, .), then the primal (P) has also an optimal solution, and both optimal values of objective functions are equal.

Proof.

Let q*(x, .) be an optimal solution of (D). Define p*(x) = s, q*(x> 4,

x E s,

and

4*(x7->

F*(x, *> =

p*(x) arbitrary fixed probability measure on (U, 99~)

if

p*(x) f 0

if

p*(x) = 0.

Then p* E 9. Let CL*be the invariant probability measure of the process X, corresponding to q~* and define

L,*(X) =@x,4v*(x, 4 A*=f,&&)p*(dx) v*(x) =f E.@,*(X >- A*). ?L=O

Then by Lemma 2

A*+ v*(x)=&4x) + j)*(~) PJY

P,*(Y I 4 dr

I x) = J~P(Y I x, 4 ~*(x, da).

592

KEIGO

YAMAJIA

Borrowing a technique in Derman [8], we shall show that A* and V*(X) is an optimal solution of (P). Suppose X* and V*(X) is not optimal. By Lemma 3 there exists a measurable function B(z): S -+ U such that

Let

Then by the assumption that X* and V*(Z) is not optimal, Lebesgue measure of the set (x; d(x) > 01 is positive. Let jW(y j x) be the tl step transition probability density of the process X,, corresponding to the control d;(x) (nonrandomized control). Then

Is

d(x) fW(x 1z) dx = x* + s, w*(x) j+)(x -

s

s w*(y) jP+l)(y

1z) dx - //i(x,

q(x)) jJn’(x

z) dx

1z) dy.

Since jJ(n)(x 1z) + p(x) as n -+ co uniformly in x by Lemma where p(x) is the invariant probability density of the process X,, corresponding to B(x), we have

where

x=j-/lx,&x>> d(x)dx. Since p(x) > 0 everywhere by the assumption (A4) and Lebesgue measure of the set {x; d(x) > 0} > 0, A* - x > 0. But as we have already shown in the proof of Theorem 2 there exists a

DUALITY THEOREM IN MARKOVIAN DECISION PROBLEMS

593

feasible solution of (D) for which the value of the objective function is equal to A. Hence the inequality X* > x contradicts to the optimality of h* for (D). Q.E.D. Remark 7. Through the proof of Theorem 1 and 2, it was shown that the dual (D) is equivalent to the problem of finding inf,,* Q, .

4. EXISTENCE OF OPTIMAL SOLUTIONS FOR THE PRIMAL (P)

As an application of the result obtained in the preceding section, we shah show the existence of optimal solutions for the primal problem (P) under the assumption (A5) which will be stated soon. As was stated in the introduction, if there exist a X and a bounded measurable function W(X)which satisfy (2), then we can show that measurable function b(x), which minimizes the right side of (2), is an optimal control in the admissible class of controls %*, where 4!* is the set of all randomized policies which are functions of all the past history of the system and hence %* is bigger than QZ!.The proof of this fact can be found in [2] where U = U, for all x E S and U is finite, and the extension to our case is straightforward. Let us assume the following (A5): (AS)

For all x E S, U = U, and U is finite. There exist a 6 > 0 and a

M > 0 such that

s
x, y e S,

a E U.

Under (A5), the dual (D) takes the form: (D)‘:

Minimize

qx, a)4(x,a>dx SC sl2E.u subject to

& d? 4 - s,& P(ZI *, 4 dx, 4 dx= 0,

SC s..u

q(x,a)dx=

1,

n(x, a) > 0.

Note that in this case 9 is the set of all functions ~(x, a): S x 7J ---f R1 such that ~(x, a) > 0, CaeUv(x, a) = 1, and ~(x, a) is Bore1 measurable for each a. Now we have

594

KEIGO YAMADA

THEOREM 4. Under (A5) there exists an optimal solution (A*, u*(.)) for the primal (P) where w*( -) is bounded,

Proof. From Theorem 3, it is sufficient to show that the dual (D)’ has an optimal solution. From Remark 7, it is clear that if there exists a feasible solution Q*(x, u) for (D)’ such that

then Q*(x, a) is an optimal solution for (D)‘. We shah show the existence of such a feasible solution 4*(x, a). Let (vn} b e a minimizing sequence, i.e., lim Qwn = $4 Q, . n-*m Let p,,(*) be the invariant probability density of the Markov process corresponding to the control q~,,E%. Then it was shown in the proof of Theorem 2 that 4*(x, u) = vp,(x, a)~~,,( x ) is a feasible solution of (D)‘, i.e.,

and by (8)

By (A5) and Lemma 1, we have 6
s

sf (x) C&(X,4 dx + s,f (4 Q*(x,4 dx,

UE u,

as n -+ co. We prove that Q*(x, a) is a feasible solution of (D)’ and (11) is satisfied. From (12), for any A E gs ,

l&(S) = the set of all bounded measurable measurable and integrable functions on S.

functions

on S, L,(S)

= the set of all

DUALITY

THEOREM IN MABKOVIAN

DECISION PROBLEMS

595

Since $(z / ., u) EL,(S) for any x E S and a E U I s $4~ I x, 4 G(X, 4 dx -+ f $z

I x, 4 P*(x, 4 dx,

and we have

0 < i P(Z I x, a>qn(x,4 dx < JP’. s Thus by Lebesgue’s bounded convergence theorem,

Since this holds for all A E ~29~, we have

Similarly from (13),

CT*@, 4 dx= 1 0s aeU

and

Q*(x, 4 > 0.

Thus we have established the feasibility of q*(x, u). Since

(11) holds.

Q.E.D. REFERENCES

part one: linear objectives, J. of Math. 1. R. C. GRINOLD, Continuous programming Anal. Ap~l. 28 (1969), 32-51. 2. S. M. Ross, Arbitrary state Markovian decision processes, Ann. Math. Statist. 39 (1968), 2118-2122. 3. J. L. DOOB, “Stochastic Process,” Wiley, New York, 1953. 4. C. DERMAN AND A. F. VEINOTT, JR., A solution to a countable system of equations arising in Markovian decision processes, Ann. Math. Statist. 38 (1967), 582-584. 5. J. WARGA, “Optimal Control of Differential and Functional Equations,” Academic Press, New York, 1972. 6. H. J. KUSHNER, “Introduction to Stochastic Control,” Holt, Rinehart and Winston, Inc., New York, 1971. Decision Processes,” American Elsevier, 7. H. MINE AND S. OSAKI, “Markovian New York, 1970. 8. C. DERMAN, Denumerable state Markovian decision processes-average cost criterion, Ann. Math. Statist. 37, 1545-1554.