Control of Dynamic Systems under the Influence of Singularly Perturbed Markov Chains

Control of Dynamic Systems under the Influence of Singularly Perturbed Markov Chains

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS ARTICLE NO. 216, 343]367 Ž1997. AY975770 Control of Dynamic Systems under the Influence of Singul...

320KB Sizes 0 Downloads 21 Views

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS ARTICLE NO.

216, 343]367 Ž1997.

AY975770

Control of Dynamic Systems under the Influence of Singularly Perturbed Markov Chains G. YinU Department of Mathematics, Wayne State Uni¨ ersity, Detroit, Michigan 48202

and Q. Zhang† Department of Mathematics, Uni¨ ersity of Georgia, Athens, Georgia 30602 Submitted by E. Stanley Lee Received April 22, 1996

This work is concerned with nearly optimal controls of nonlinear dynamic systems under the influence of singularly perturbed Markov chains. The underlying Markov chains have fast and slow components and their states can be divided into a number of groups. Within each group of states, the chain varies in a fast pace whereas the jumps from one group to another occur relative infrequently. To obtain the desired optimality, the states of the chain are naturally aggregated in accordance with the transition rates; i.e., replacing the states in a group by a single state to obtain an average system. Then the averaged system is used as a reference to develop the nearly optimal control for the actual system via comparison control methods. The technique used is the method of weak convergence together with the utilization of relaxed control representation. Q 1997 Academic Press Key Words: Stochastic dynamic system; nearly optimal control; singularly perturbed Markov chain; weak and strong interaction.

1. INTRODUCTION This work is concerned with controls of dynamic systems involving nonstationary Markov chains. Our main goal is to develop a general framework for obtaining nearly optimal controls of stochastic systems U

Supported in part by National Science Foundation under Grant DMS-9529738. Supported in part by the Office of Naval Research Grant N00014-96-1-0263 and in part by the University of Georgia Faculty Research Grant. †

343 0022-247Xr97 $25.00 Copyright Q 1997 by Academic Press All rights of reproduction in any form reserved.

344

YIN AND ZHANG

driven by singularly perturbed Markov chains. By treating nonlinear systems and using weak convergence methods, we aim to obtain asymptotic optimal strategy. The theory of Markov chains is a well studied subject with an enormous amount of literature. The classical book of Chung w2x contains many important properties of continuous time Markov chains. Due to the pervasive practical use of Markovian formulation in manufacturing systems, queueing networks, and system reliability, there has been a resurgence of interest in studying further properties of the Markov chains recently. The classical work Že.g., w2x among others. mainly concentrates on Markov chains with stationary transition properties, whereas recent applications make extensive use of nonstationary Markov chains and such chains with singular perturbations. Various applications have posed challenging problems in understanding the basic properties of such chains. Along another line, the technique of singular perturbation has an extensive literature. The associate modeling, optimization and control problems have been studied thoroughly and documented in many excellent books, for instance, the work of Bensoussan w1x, Kokotovic, Khalil and O’Reilly w11x among others. Nevertheless, the fact that the generator of any Markov chain has a zero eigenvalue precludes the direct citation of standard references on singular perturbation theory. A few authors considered singularly perturbed Markov chains and related applications; see w4, 7, 16x etc. In a recent paper w9x, Khasminskii, Yin and Zhang analyzed singularly perturbed Markov chains by a combined approach of matched asymptotic expansion techniques and Markov properties. The combined approach enables them to overcome the inherent difficulty and obtain the desired asymptotic expansion. They further generalized their results to a much more difficult situation in which the singularly perturbed chains are under weak and strong interactions w10x. This line of investigation has been continued and substantially extended to include asymptotic normality, exponential error bounds, Markov chains with countable state spaces, and the associated semigroups Žsee Zhang and Yin w22x and Yin and Zhang w19x.. Such a study sets up the foundation for indepth investigation on various asymptotic properties of the underlying stochastic dynamic systems under the influence of singularly perturbed Markov chains. The current paper further investigates the related control problems of dynamic systems and aims at asymptotic optimality. In our formulation, the dynamic system under consideration involves an unknown parameter process, which naturally arises in various applications in adaptive control, estimation, and manufacturing systems. Since the Markov chains are changing in a fast speed, using the usual approach of adaptive control will no longer allow one to track the system. Thus one needs to seek viable alternatives. Previously, nearly optimal control for systems with an un-

CONTROL OF DYNAMIC SYSTEMS

345

known parameter process was considered in w18x. In that reference, the system is driven by a stationary wide band noise and the limit is a diffusion. The current work differs from w18x in that we treat jump systems here. Furthermore, these random processes are nonstationary. Our approach is mainly probabilistic. It is an application of the weak convergence methods of Ethier and Kurtz w5x, and Kushner w12, 13x, and asymptotic properties of the singularly perturbed Markov chains Yin and Zhang w19x. The rest of the paper is arranged as follows. Section 2 briefly recalls a number of useful results for singularly perturbed chains with nonstationary transition probabilities and finite state spaces. Section 3 formulates a control problem involving an unknown parameter process, in which both the parameter process and the dynamic system are driven by singularly perturbed Markov chains. Since the setting of the problem is sufficiently general, it is more convenient to use a relaxed control representation. We do so and reformulate the problem under such variational framework in Section 4. Using weak convergence methods, Section 5 presents nearly asymptotic optimality. The first subsection deals with weak convergence of an auxiliary problem, and the second one concerns the near optimality by virtue of the chattering lemma. Finally, additional remarks are made in Section 6. To make the presentation clear and to avoid complex notation, we use K to denote a generic positive constant throughout. The values of the K may be different for each appearance, but it should be clear from the context.

2. SINGULARLY PERTURBED CHAINS WITH WEAK AND STRONG INTERACTIONS This section collects a few preliminary results concerning asymptotic properties of the probability distribution of singularly perturbed Markov chains. The main ideas are outlined and the detailed developments can be found in the cited references. Relevant to the control problem to follow, we concentrate on Markov chains having nonstationary transition probabilities with finite state spaces. For results on singularly perturbed chains having countable state spaces, see w19, Chapter 6x. First let us recall a few basic definitions. Suppose that a Ž t . is a finite state Markov chain. The formulation of piecewise deterministic Markov chains Žsee Davis w3x. and the use of martingales are very useful for the study of continuous time Markov chains that are not necessarily stationary. Generator. Suppose that a Ž t . for t G 0 is a Markov chain with finite state space M . An m = m matrix QŽ t . s Ž qi j Ž t .., t G 0, is an infinitesimal generator Žor simply a generator. of a Ž?. if it satisfies: qi j Ž t . is Borel

346

YIN AND ZHANG

measurable for all i, j g M and t G 0, qi j Ž t . is uniformly bounded, qi j Ž t . G 0 for j / i, qii Ž t . s yÝ j/ i qi j Ž t ., t G 0, and for all bounded function f defined on M f Ž a Ž t.. y

t

H0 Q Ž t . f Ž ?. Ž a Ž t . . dt

Ž 1.

is a martingale, where Q Ž t . f Ž ?. Ž i . s

Ý qi j Ž t . f Ž j . s Ý qi j Ž t . Ž f Ž j . y f Ž i . . . j/i

jg M

Irreducibility and quasi-stationarity. Let n Ž t . s Ž n 1Ž t ., . . . , nmŽ t .. denote a row vector with nonnegative components for all t G 0. n Ž t . is said to be a quasi-equilibrium Žor quasi-stationary . distribution of a Ž t . with generator QŽ t . if m

n Ž t . Q Ž t . s 0 and

Ý n i Ž t . s 1.

Ž 2.

is1

For each t G 0, the matrix QŽ t . is weakly irreducible if Ž2. has a unique nonnegative solution n Ž t .. For a small parameter « ) 0, let a « Ž t . for t G 0, be a Markov chain taking values in M s  z 11 , . . . , z 1 m 1 , . . . , z l1 , . . . , z l m l 4 .

Ž 3.

Let m s m1 q ??? qm l denote the total number of states in M . Suppose that the generator of the Markov chain a « Ž?. is Q« Ž t . s

1

«

AŽ t . q B Ž t . .

Ž 4.

where AŽ t . s Ž A i j Ž t .. and B Ž t . s Ž Bi j Ž t .. are m = m measurable matrixvalued functions, for all t g w0, T x and some 0 - T - `. We assume that AŽ t . has the diagonal form A1 Ž t . AŽ t . s



A2 Ž t .

..

. Al Ž t .

0

and

347

CONTROL OF DYNAMIC SYSTEMS

BŽ t. s



B 11 Ž t .

B 12 Ž t .

???

B1 l Ž t .

B 21 Ž t . .. .

B 22 Ž t . .. .

??? ???

B2l Ž t . , .. .

B l1 Ž t .

B l2 Ž t .

???

B ll Ž t .

0

where Ai Ž t . g R mi=m i for i s 1, 2, . . . , l, are themselves generators. Similarly, B ik Ž t . for i , k s 1, . . . , l, are sub-matrices having compatible dimensions. Since Ai Ž t . for i s 1, . . . , l, AŽ t . and B Ž t . are generators of Markov chains, mi

Ý Aii j Ž t . s 0,

for i s 1, . . . , l, i s 1, . . . , mi , and

js1 m

Ý Bi j Ž t . s 0,

for i s 1, . . . , m.

js1

It is easily seen that the probability distribution p « Ž?. of the chain satisfies

˙p « Ž t . s p « Ž t . Q « Ž t . , p « Ž 0 . s p Ž 0 . with Ý pi Ž 0 . s 1, pi Ž 0 . G 0 i

where p « Ž t . s Ž p1« Ž t ., . . . , pm« Ž t .. and pi« Ž t . s P Ž a « Ž t . s i .. For each i s 1, . . . , l, let M i s  zi1 , . . . , zi mi 4 represent the group of states corresponding to Ai Ž t .. In w10x, under weak irreducibility and smoothness of the generator, it is proved that the solution of the above ODE Žequivalently, the probability distribution of the singularly perturbed Markov chain. admits an asymptotic expansion. The expansion involves a regular part and boundary layer corrections. The two parts are highly intertwined and tangled together. For « ) 0 sufficiently small, the Markov chain a « Ž?. jumps more frequently within the states in M i and less frequently from the ‘‘grouped’’ state M i to M k with k / i . Consequently, it makes sense to aggregate the states in M i as a single state. Define a « Ž?. by

a « Ž t . s i if a « Ž t . g M i .

Ž 5.

Equivalently, a « Ž t . s Ýil s1 i I a « Ž t .g M i 4 , which indicates that a « Ž t . is an ‘‘average’’ of a « Ž t .. To proceed, assume the following condition holds: ŽQ. For each t g w0, T x, and each i s 1, . . . , l, Ai Ž t . is weakly irreducible. On w0, T x, AŽ?. is continuously differentiable and its derivative is Lipschitz continuous; B Ž?. is Lipschitz continuous.

348

YIN AND ZHANG

The condition above is referred to as Condition ŽQ. in what follows. Our subsequent study will make crucial use of the lemma below. LEMMA 1. Ži .

Assume Condition ŽQ.. Then the following assertions hold:

For any i s 1, . . . , l, j s 1, . . . , mi , sup E 0-tFT

ž

2

t

H0 Ž I

 a « Žt .s zi j 4

y n ji Ž t . I a « Žt .s i 4 . dt

/

s OŽ « . ,

Ž 6.

where n ji Ž t . denotes the j-th component of the quasi-stationary distribution n i Ž t . Ž the quasi-station distribution corresponding to Ai Ž t ... Žii. a « Ž?. con¨ erges weakly to a Marko¨ chain a Ž?. g  1, . . . , l 4 on Dw0, T x, as « ª 0; the generator of a Ž?. is gi¨ en by Q Ž t . s diag Ž n 1 Ž t . , . . . , n l Ž t . . B Ž t . diag Ž |m 1 , . . . , |m l . ,

Ž 7.

where X

|mi s ^1,` ...,_ 1 g R mi , for i s 1, . . . , l,

ž

/

mi 1’s

the symbol X denotes the transpose of a matrix or a ¨ ector, and diagŽ?. denotes a block diagonal matrix with appropriate dimension and indicated diagonal elements. Žiii. For each t g w0, T x, i s 1, . . . , l, P Ž a « Ž t . s i . ª P Ž a Ž t . s i . as « ª 0.

Ž 8.

Proof. We only outline the main ideas; its detailed proof can be found in Yin and Zhang w19, Theorem 7.2, Theorem 7.3x. To prove i., define t

«

O Ž t. s E

žH Ž 0

2

I a « Ž s.s zi j 4 y

n ji

/

Ž s . I a « Ž s.s i 4 . ds .

Then

˙« Ž t . s 2 O

t

H0

C1« Ž t , s . q C2« Ž t , s . ds, O « Ž 0 . s 0,

where C1« Ž t , s . s P Ž a « Ž t . s zi j , a « Ž s . s zi j . yn ji Ž t . P Ž a « Ž t . g M i , a « Ž s . s zi j . , C2« Ž t , s . s yn ji Ž s . P Ž a « Ž t . s zi j , a « Ž s . g M i . qn ji Ž s . n ji Ž t . P Ž a « Ž t . g M i , a « Ž s . g M i . .

CONTROL OF DYNAMIC SYSTEMS

349

Using the asymptotic expansion similar to w9, 10x,

˙« Ž t . s 2 O

t

H0 O Ž « q exp Ž yg Ž t y s . r« . . ds s O Ž « . .

Hence i. holds. To prove ii., notice that a « Ž?. is bounded. Then the tightness of a « Ž?. is proved via the Kurtz’ Criterion w12, Theorem 3, p. 47x. Detailed calculations reveal that the finite dimensional distributions of a « Ž?. also converge. Finally, Ž8. holds by virtue of the convergence of the finite dimensional distributions of a « Ž?.. The lemma is concluded. Remark. The following points are worth of noticing. The process a « Ž?. generally does not converge in distribution because it fluctuates very rapidly in a group of states. Due to the presence of the interactions between groups of states, its limit transition probabilities depend on the initial state and distribution Žsee w19x.. Although the process a « Ž?. may evolve irregularly, its aggregation a « Ž?. displays certain limit property.

3. CONTROLLED DYNAMIC SYSTEMS Consider a stochastic dynamical system with the state x « Ž t . g R r , the parameter process u « Ž t . g R r 1 , and control uŽ t . g U ; R r 2 . Suppose that a « Ž t . and b « Ž t . are independent Markov chains with finite state spaces Ma and Mb , and generators Qa« Ž t . and Qb« Ž t ., respectively, and that a « Ž0. s a and b « Ž0. s b . Let f Ž ?, ? , ? , ? . : R r = R r 1 = R r 2 = Ma ¬ R r , g Ž ?, ? . : R r 1 = Mb ¬ R r 1 ,

p Ž ?, ? , ? . : R r = R r 1 = R r 2 ¬ R. Assume the state process x « Ž?. is observable. The problem of interest is minimize J « Ž u Ž ? . . s E

T

H0

p Ž x « Ž t . , u « Ž t . , u Ž t . . dt

subject to: ˙ x « Ž t . s f Ž x « Ž t . , u « Ž t . , u Ž t . , a « Ž t . . , x « Ž 0. s x

Ž 9.

with the parameter: u˙« Ž t . s g Ž u « Ž t . , b « Ž t . . , u « Ž 0 . s u 0« . The functions f Ž?. and g Ž?. represent the dynamics of the system and parameter process, respective; p Ž x, u , u. is the cost function. Consider the case that neither a « Ž?. nor b « Ž?. is observable. Instead, we assume their aggregated processes a « Ž?. and b « Ž?., defined as in Ž5., are observable. The process u « Ž?. is an unknown process with unknown u 0« such that u 0«

350

YIN AND ZHANG

converges weakly Žequivalently, converges in probability. to u , a known constant vector. Our objective is to find a control process uŽ?., as a function of x « Ž?., a « Ž?., and b « Ž?., that minimizes the cost J « Ž uŽ?... Owing to the presence of the unknown parameter and processes a « Ž?., «Ž . b ? , and u « Ž?., the problem is very difficult to handle, both theoretically and computationally, especially when the state spaces of the chains are large. Thanks to the expansion of the probability distribution of the Markov chains, the aggregation provides a viable alternative of solution for the underlying problem. The essence of our approach is to take advantage of the weak and strong interactions of the Markov chains by using the aggregated processes a « Ž?. and b « Ž?. and their limit as « ª 0, and to apply the weak convergence methods. In lieu of the actual systems, we consider their limits and apply the optimal or nearly optimal control of the limit problem to that of the original problem. The goal is to show that such a procedure leads to near optimality of the actual systems by means of comparison control techniques, which is inspired by the work of Kushner w13x. 3.1. Examples Before proceeding to the subsequent study, we give several examples below. The first one is an adaptive estimation problem, the second one concentrates on quadratic controls of linear regulators, and the last one arises from certain manufacturing systems. EXAMPLE 1. Let M s  1, . . . , m4 . Suppose that f Ž?.: R r = R r = M ¬ R r is a continuous function, a « Ž?. and b « Ž?. are unobservable Markov chains generated by Qa« and Qb« , respectively. A continuous time adaptive estimation algorithm takes the form:

˙x « Ž t . s f Ž x « Ž t . , u « Ž t . , a « Ž t . . , x « Ž 0 . s x, u˙« Ž t . s g Ž u « Ž t . , b « Ž t . . , u « Ž 0 . s u 0« . Suppose that Qa« Ž?. and Qb« Ž?. are given by Ž4. such that Aa Ž?. and Ab Ž?. are weakly irreducible, and that for both processes, the state space is M s  1, . . . , m4 . By combining the weak convergence method with results of singularly perturbed chains, it can be shown that Ž x « Ž?., u « Ž?.. converges weakly to Ž x Ž?., u Ž?.. which is a solution of m

˙x Ž t . s Ý f Ž x Ž t . , u Ž t . , i . na , i Ž t . , x Ž 0 . s x is1 m

u˙Ž t . s

Ý g Ž u Ž t . , i . nb , i Ž t . , u Ž 0. s u , is1

CONTROL OF DYNAMIC SYSTEMS

351

where na Ž?. and nb Ž?. are the quasi-stationary distributions of a « Ž?. and b « Ž?., respectively. This example contains no controls. Similar algorithms are employed in a wide variety of signal processing problems Žsee Kushner and Yin w15, Chapters 2 and 3x and the references therein .. EXAMPLE 2. Let a « Ž?. and b « Ž?. be singularly perturbed Markov chains as in the formulation above, and

˙x « Ž t . s A Ž t , u « Ž t . , a « Ž t . . x « Ž t . q B Ž t , u « Ž t . , a « Ž t . . u Ž t . x « Ž 0 . s x, where x « Ž t ., u « Ž t . and uŽ t . g R r , AŽ t, u , a ., B Ž t, u , a . g R r=r. The objective is to find the optimal control uŽ?. that minimizes the expected quadratic cost function J « Ž u Ž ?. . s E

T

H0

X

Ž x « , Ž t . F Ž t , a « Ž t . . x « Ž t . q uX Ž t . C Ž t , a « Ž t . . u Ž t . . dt,

where F and C are symmetric positive definite matrices. The problem is different from the usual setup of linear quadratic problem in that the system matrices involve jump Markov chains and with bounded control space. Modeling quadratic regulator with an unknown parameter process stems from certain robustness consideration. Using the technique presented in what follows, the nearly optimal control of the system can be obtained. EXAMPLE 3. This example is originated from control and optimization problems arising in manufacturing systems Žsee Sethi and Zhang w17x among others.. Consider a manufacturing system given by

˙x « Ž t . s f Ž x « Ž t . , u « Ž t . , a « Ž t . , u Ž t . . , x « Ž 0 . s x u˙« Ž t . s g Ž u « Ž t . , b « Ž t . . , u « Ž 0 . s u 0« 0 F uŽ t . F a « Ž t . , where u « Ž?. models certain parameter process, and uŽ?. represents the production rate, a « Ž?., a finite state Markov chains, models the underlying machine capacity, and b « Ž?. is an external random noise. The objective is to choose the production rate so as to minimize a cost function J « Ž u Ž ?. . s E

T

H0

p Ž x « Ž t . , u « Ž t . , u Ž t . . dt.

With the absence of the parameter process, using a dynamic programming approach, one normally needs to assume that the production rate uŽ?.

352

YIN AND ZHANG

appears in the dynamic equation linearly w17x, whereas using weak convergence method is in what follows, we are able to treat fully nonlinear cases in the current setting. 3.2. Assumptions ŽA1. a « Ž?. and b « Ž?. are independent Markov chains having state spaces Ma and Mb , and generators Qa« Ž?. and Qb« Ž?., respectively, where Qa« Ž t . s Qb« Ž t . s

1

« 1

«

Aa Ž t . q Ba Ž t . , Ab Ž t . q Bb Ž t . ,

and Aa Ž t ., Ba Ž t ., Ab Ž t . and Bb Ž t . are generators such that Aa Ž t . consists of block diagonal matrices Aia Ž t . for i s 1, . . . , la and Ab Ž t . consists of block diagonal matrices Abi Ž t . for i s 1, . . . , lb and that Aia Ž t ., Abi Ž t ., Ba Ž t . and Bb Ž t . are themselves generators of Markov chains with state spaces Mai and Mbi , respectively. Condition ŽQ. holds for both Qa« Ž t . and Qb« Ž t .. ŽA2. f Ž?. is continuous, and is bounded on bounded Ž x, u .-sets. For each à and each a , f Ž?, ? , à , a . satisfies a linear growth condition, is Lipschitz continuous, and has bounded mixed partial derivatives on bounded Ž x, u .-sets. The differential equation for x Ž?. in P0 has a unique solution for each initial condition Žw.p.1.. The function g Ž?. is continuous and is bounded on bounded u-sets. For each b , g Ž?, b . is Lipschitz continuous, and has bounded partial derivatives on bounded u-sets. The initial data u 0« converges weakly to u , a known vector. ŽA3. p Ž?. is bounded and continuous. Remark. The Lipschitz continuity of g Ž?. implies that the differential equation for u Ž?. in Ž9. has a unique solution for each initial condition Žw.p.1.. To have this hold for the differential equation for x Ž?., certain convexity condition is needed. We simply assume the existence of the unique solution. With the preliminary result}Lemma 1, the following is immediate. LEMMA 2. Under Condition ŽA1., for both a « Ž?. and b « Ž?., Ž6. holds. In addition, a « Ž?. and b « Ž?. con¨ erge weakly to a Ž?. and b Ž?., respecti¨ ely. The generators of a Ž t . and b Ž t . are as in Ž7. with B Ž t . replaced by Ba Ž t . and Bb Ž t ., respecti¨ ely. In addition Ž8. holds for both a « Ž?. and b « Ž?.. Proof. This is a direct consequence of Lemma 1.

CONTROL OF DYNAMIC SYSTEMS

353

Condition ŽA1. guarantees the existence of the quasi-stationary distributions nai Ž t . and nbi Ž t ., respectively. To study the control problem, we will also assume: ŽA4. The quasi-stationary distributions nai Ž t . for i s 1, . . . , la and for i s 1, . . . , lb are known. The matrix-valued functions Ba Ž?. and Bb Ž?. are known.

nbi Ž t .

This condition leads to a formulation of a limit problem which is free of unknown parameters. A direct consequence of the assumption reveals that Qa Ž?. and Qb Ž?. defined as in Ž7. are known so are the aggregated processes a Ž?. and b Ž?.. To proceed, we use the notation of relaxed control.

4. RELAXED CONTROL REPRESENTATION In the early 60’s, Warga w21x set up the framework of relaxed control formulation for deterministic problems. Its stochastic counter part is in Fleming w6x. Such a formulation has been proven to be quite useful for various stochastic control problem w13x. We modify their results for our controlled Markov chain problems. The first subsection recalls results from relaxed control which are necessary for our study. The second subsection gives the relaxed control representation of Ž9.. 4.1. Relaxed Control Assume that the control space U is a compact set in R r 2 , and Ft is a filtration. Denote the s-algebra of Borel subsets of any set S by BŽ S .. Let M s  m Ž ? . ; m Ž ? . is a measure on B Ž U = w 0, ` . . such that m Ž U = w 0, t x . s t for all t G 0 4 . A random M-valued measure mŽ?. is an admissible relaxed control if for ˆ t . ' mŽ Bˆ = w0, t x. is Fteach Bˆ g BŽU ., the function defined by mŽ B, adapted. An equivalent formulation reads that mŽ?. is a relaxed control if H0t hŽ s, à . mŽ ds = dà . is progressively measurable with respect to  Ft 4 for each bounded and continuous function hŽ?.. If mŽ?. is an admissible relaxed control, there is a measure-valued function m t Ž?. Žthe ‘‘derivative’’. such that m t Ž dà . dt s mŽ dt = dà . and

354

YIN AND ZHANG

for smooth function hŽ?.,

H h Ž s, à . m Ž ds = dà . s H ds H h Ž s, à . m Ž dà . , s

Ž 10 .

Žsee Kushner w13, p. 52, Chapter 3x.. To proceed, topologize M as follows. Let  f n iŽ?.; i - `4 be a countable dense Žunder the sup-norm. set of continuous functions on U = w0, n x for each n. Let ² m, f : s

H f Ž s, à . m Ž ds = dà . ,

Ž 11 .

and define `

d Ž m1 , m 2 . s

1

Ý ns1

2n

d n Ž m1 , m 2 . ,

where d n Ž m1 , m 2 . s

`

Ý is1

1 2

i

ž

Ž m1 y m 2 , f n . 1 q Ž m1 y m 2 , f n . i

i

/

.

m nŽ?. « mŽ?. for a sequence of measures means the convergence in M under this weak topology. In accordance with w13, Chapter 3x, an ordinary admissible control uŽ?. is a feedback control for the system of interest if there is a U-valued Borel measurable function u 0 Ž?. such that uŽ t . s u 0 Ž x Ž t .. for almost all v , t. For each x, let m ˆ Ž x, ? . be a probability measure on ŽU, BŽU .. and suppose that for each Bˆ g BŽU ., m ˆ Ž?, ? , Bˆ. is Borel measurable as a function of x. If for almost all v and t, the derivative m t Ž?. of a relaxed control mŽ?. can be written as m t Ž?. s m ˆ Ž x Ž t ., ? ., then mŽ?. is said to be a relaxed feedback control. 4.2. Formulation under Relaxed Control Representation Although the relaxed controls cannot be used directly in the actual applications, they are very useful in studying asymptotic properties of the systems under consideration. First under such a formulation, the control appears to be linearly. More importantly, owing to the structure of the space M, tightness usually can be obtained easily.

355

CONTROL OF DYNAMIC SYSTEMS

To proceed, we set up the problem using relaxed control representation. Rewrite Ž9. as

¡min J

«

T

Ž m « . s EH

0

x P :~ «

«

HpŽ x

«

Ž t . , u « Ž t . , à . m «t Ž dà . dt

HfŽ x

«

Ž s . , u « Ž s . , à , zi j . m «s Ž dà . I a « Ž s.s zi j4 ds,

«

Ž s . , zi j . I b « Ž s.s zi j4 ds.

t

Ž t. s x qH

Ý

0 z gM ij a

¢

u « Ž t . s u 0« q

t H0 Ý g Ž u

zi jg Mb

Ž 12 . Admissibility for P« . A relaxed control representation m « Ž?. is admissible if m « Ž?. g M and is Fˆt« s s  x « Ž s ., u « Ž s ., a « Ž s ., b « Ž s .: s F t 4 adapted, where u « Ž t . s u q H0t g Ž u « Ž s ., b « Ž s .. ds. Use R « to denote the set of all admissible controls. Corresponding to Ž12., there is an associate limit problem P0 :

¡J Ž m. s EH H p Ž x Ž t . , u Ž t . , à . m Ž dà . dt T

t

0

P0 :

~

xŽ t. s x q

t

la

mi

H0 Ý Ý H f Ž x Ž s . , u Ž s . , à , z . m Ž dà . ij

is1 js1

s

=nai , j Ž s . I a Ž s.s i 4 ds

¢

u Ž t. s u q

t

lb

mi

i b, j

H0 Ý Ý g Ž u Ž s . , z . n is1 js1

ij

Ž 13 .

Ž s . I b Ž s.s i 4 ds.

Denote by R 0 the set of admissible controls for the limit problem, i.e., R 0 s  mŽ?. g M; mŽ?. is Ft adapted4 , where Ft s s  x Ž s ., u Ž s ., a Ž s ., b Ž s .; s F t 4. The rationale is that the P« is very ‘‘close’’ to P0 in an appropriate sense when « is small enough. Typically, the dimensionality of P0 is much smaller than that of P« in Ž12., so there is a real incentive to consider P0 . By working with P0 , if we can show that the limit problem well approximates P« , then the optimal or nearly optimal controls of P0 can be used as a guide for getting the optimality for P« . Nevertheless, there is a main difficulty since u « Ž?. is an unknown process. To overcome the obstacle, introduce an auxiliary problem P« , u , which has exactly the same form as that of Ž12., but now u « Ž?. is assumed to be a known process. As for the limit problem P0 , with given initial data u , known nbi Ž t . for i s 1, . . . , lb ,

356

YIN AND ZHANG

u Ž t . is completely specified. Let R « , u denote the set of admissible relaxed controls for the auxiliary problem, R « , u s  m « Ž?. g M; m « Ž?. is Ft« adapted4 , where Ft« s s  x « Ž s ., u « Ž s ., a « Ž s ., b « Ž s .; s F t 4 . Define the corresponding value functions as ¨«s

inf J « Ž m « . , ¨ « , u s

m «gR «

inf

m « gR « , u

J « Ž m « . , and

¨ 0 s inf J Ž m . . mgR 0

Since R « , u contains more information than R « , it is clear, ¨ « , u F ¨ « . Our aim is to show that any nearly optimal strategy for P0 is also nearly optimal for P« when « is small enough. Using the auxiliary problem as bridge, work with Problem P« , u first. That is, assume that u « Ž?. to be known, and show that the process Ž x « Ž?., u « Ž?.. converges weakly to Ž x Ž?., u Ž?.. which satisfies the differential equation of Problem P0 . To obtain the desired weak convergence result, both processes x « Ž?. and u « Ž?. need to be treated. In fact, consider the ‘‘joint’’ process z « Ž?. s Ž x « Ž?., u « Ž?... After the weak convergence of the auxiliary problem is established, we obtain the asymptotically nearly optimal control of the original problem by finding appropriate bounds on the cost functions for Problem P« via ¨ « , u F ¨ « , and deduce the desired results. 4.3. Preliminary Results This subsection collects a number of preliminary results concerning the limit problem P0 . The proofs are similar to the well-known results in the literature w6, 13x. Lemma 3 concerns the admissibility of P0 , and the existence of the optimal control within the class of relaxed controls, whereas Lemma 4 takes care of the case of approximate optimal control and is similar to the well-known chattering theorem for deterministic cases w21x and stochastic cases w6x. LEMMA 3. The following assertions hold: Ži. Let mŽ?. be an admissible relaxed control for P0 , then there is a  a Ž s ., b Ž s ., u Ž s .; s F t 4 adapted solution Ž x Ž?., u Ž?.. of the differential equations in P0 such that the following estimates also hold w. p.1., sup 0FtFT

x Ž t . F K Ž 1 q < x < . , sup

u Ž t . F K Ž 1 q < u <. ,

0FtFT

where x and u are the initial conditions of differential equations in P0 . Žii. Let m n Ž?. « mŽ?., where m n Ž?. are admissible. Suppose n Ž x Ž?., u n Ž?.. is the solution to the differential equation in P0 with mŽ?. replaced by m n Ž?.. Then Ž x n Ž?., m n Ž?.. « Ž x Ž?., mŽ?.. such that mŽ?. is admissible.

CONTROL OF DYNAMIC SYSTEMS

357

The proof follows the standard argument. For an analog of diffusion systems, see Fleming w6x. To prove the existence of the solution of differential equation, use the usual argument of successive approximation. Then the a priori bound is obtained via the well-known Gronwall’s inequality. To verify ii., similar to the diffusion system treated in w13, 14x, it suffices to consider a discrete parameter system. The admissibility of m n Ž?. further implies that of mŽ?.. LEMMA 4. The following assertions hold: Ži. There is an optimal control in R 0 . Žii. For each h ) 0, there is an admissible uh Ž?. for the limit problem which is h-optimal for P0 , i.e., J Ž uh . F inf m g R 0 J Ž m. q h. Žiii. There exists a piecewise constant Ž in t . and locally Lipschitz continuous in Ž x, u . Ž uniformly in t . control uh Ž?. such that J Ž uh . F inf m g R 0 J Ž m. q h. The lemma is modelled as in w14x Žsee also w13x.. To prove i., choose a weakly convergent subsequence mh Ž?. as h ª 0 such that J Ž mh . ª inf m g R 0 J Ž m.. Denote the limit of Ž x Ž mh, ? ., mh Ž?.. by Ž x Ž m, ? ., mŽ?... Lemma 3 leads to the admissibility of mŽ?.. In addition, the weak convergence yields that m is an optimal control for R 0 . To verify ii. and iii., we follow the approach of w14, Theorem 4x, and the chattering theorem of w6x.

5. NEAR OPTIMALITY This section consists of two parts. The first part is on the weak convergence of P« , u and the second part derives the asymptotic optimality. 5.1. Weak Con¨ ergence This section is concerned with Problem P« , u . We reiterate that u « Ž?. is considered to be known in Problem P« , u . Let d« be a function of « such « that d« ª 0. Let m ˜ « Ž?. be a d«-optimal admissible relaxed control for the process defined in P« , u . Our first result concerns the weak convergence of the processes associated with the problem P« , u . THEOREM 1.

Assume ŽA1. ] ŽA4.. Then the following assertions hold:

1.  x « Ž u « , m ˜ « , ? ., u « Ž?., m ˜ « Ž?.4 is tight in D r w0, T x = D r1 w0, T x = M; «Ž . 2. If m ˜ ? «m ˜ Ž?. as « ª 0, then m ˜ Ž?. g R 0 and the limit of any weakly con¨ ergent subsequence of  x « Ž u « , m ˜ « , ? ., u « Ž?., m ˜ « Ž?.4 satisfies P0 with m t replaced by m ˜ t;

358

YIN AND ZHANG

3. For the relaxed controls m ˜ « and m ˜ g M gi¨ en abo¨ e, J « Ž m ˜«. ª JŽ m ˜ . as « ª 0. Proof. We divide the proof into several stages. Using weak convergence method and averaging techniques, the proof proceeds by a series of approximations, each one simplifying the process a little more to eventually obtain the desired result. Stage 1: This stage focuses on the a priori bounds to be used in the subsequent development. LEMMA 5. Under the conditions ŽA1. ] ŽA3., the following a priori estimates hold for P« , u w. p.1. sup

x « Ž t . F K Ž 1 q < x <.

0FtFT

and

sup

u « Ž t . F K Ž 1 q < u 0« < . . Ž 14 .

0FtFT

Proof. The proof is exactly the same as that of i. in Lemma 3. Remark. Owing to the finite state Markov chains and the linear growth condition, one can actually obtain an a priori bound. Notice that a similar bound also holds for Problem P« . By the compactness of U, U = w0, T x is compact. As a result,  m ˜ « Ž?.4 is « tight in M. By virtue of the a priori bounds on u Ž?. and x Ž?.,  x Ž?., u « Ž?.4 is tight and all limits have continuous paths w.p.1 by virtue of w12, Lemma 7, p. 51x. This yields the desired tightness of  x « Ž?., u « Ž?., m ˜ « Ž?.4. Stage 2: Since  x « Ž?., u « Ž?., m ˜ « Ž?.4 is tight, using Prohorov’s theorem, we may extract a convergent subsequence. For ease of presentation, still use « as its index. Suppose the limit is Ž x Ž?., u Ž?., m ˜ Ž?... In view of the Skorohod representation, without changing notation, suppose Ž x « Ž?., u « Ž?.. converges to Ž x Ž?., u Ž?.. w.p.1, and the convergence is uniform on any bounded time interval. ˆ m First, for each Borel B, ˜  Bˆ = w0, t x4 depends on Ž v , t . and is absolutely Ž continuous uniformly in v , t .. This, in turn, implies that the ‘‘derivative’’

m ˜ t Ž Bˆ. s lim

Dª0

1 D

Ž m˜  Bˆ = w0, t x 4 y m˜  Bˆ = w0, t y D x 4 .

359

CONTROL OF DYNAMIC SYSTEMS

exists for almost all Ž v , t .. Moreover m ˜ t Ž?. is Ž v , t .-measurable, such that m ˜ t ŽU . s 1, and for each bounded and continuous function D Ž?., t

t

H0 H D Ž s, à . m˜ Ž dà . ds s H0 H D Ž s, à . m˜ Ž dà = ds . . t

m ˜ Ž?. is admissible. To proceed, write lb

u « Ž t . s u 0« q

is1 js1 0 mi

t

is1 js1 0 mi

«

Ž s.s i 4

ds

g Ž u « Ž s . , zi j . y g Ž u Ž s . , zi j . nbi, j Ž s . I b « Ž s.s i 4 ds

Ý ÝH lb

q

t

Ý Ý H g Ž u Ž s . , i . nbi, j Ž s . I b

lb

q

mi

t

Ý Ý H g Ž u « Ž s . , zi j . Ž I b

is1 js1 0

«

Ž s.s zi j 4

y nbi, j Ž s . I b « Ž s.s i 4 . ds.

Recall that nbi, j Ž s . denotes the j-th component of the quasi-stationary distribution nbi Ž s ., i.e., nbi Ž s . s Ž nbi, 1Ž s ., . . . , nbi, miŽ s ... An integration by parts reveals that t

H0 g Ž u

«

Ž s . , zi j . Ž I b « Ž s.s zi j4 y nbi, j Ž s . I b « Ž s.s i 4 . ds

s g Ž u « Ž t . , zi j . y

t

H0

t

H0 Ž I

 b « Ž s.s zi j 4

dg Ž u « Ž s . , zi j . ds

y nbi, j Ž s . I b « Ž s.sj4 . ds

s

H0 Ž I

 b « Ž r .s zi j 4

y nbi, j Ž r . I b « Ž r .sj4 . dr ds.

By virtue of Lemma 2, the estimate above, and the boundedness of g Ž u « , zi j . and the boundedness of its derivative, for some constant K ) 0, lb

lim sup E

ǻ0 0FtFT

mi

2

t

Ý Ý H g Ž u « Ž s . , zi j .

is1 js1 0

F K lim sup E ǻ0 0FtFT

qK lim sup

t

H0

t

I b « Ž s.s zi j 4 y nbi, j Ž s . I b « Ž s.s i 4 ds 2

I b « Ž s.s zi j 4 y nbi, j Ž s . I b « Ž s.s i 4 ds

H E H0 ǻ0 0FtFT 0

s

2

I b « Ž r .s zi j 4 y nbi, j Ž r . I b « Ž r .s i 4 dr s 0.

360

YIN AND ZHANG

The weak convergence of Ž x « Ž?., u « Ž?.. and the Skorohod representation further imply that «

sup g Ž u « Ž s . , zi j . y g Ž u Ž s . , zi j . ª 0 w.p.1

sup

sg w0, T x ig Mb

and hence lb

mi

t

Ý ÝH

is1 js1 0

«

g Ž u « Ž s . , zi j . y g Ž u Ž s . , zi j . nbi, j Ž s . I b « Ž s.s i 4 ds ª 0

in probability uniformly in t g w0, T x. Therefore, upon using the weak convergence of u 0« to u , lb

u « Ž t. s u q

mi

t

Ý Ý H g Ž u Ž s . , zi j . nbi, j Ž s . I b

is1 js1 0

«

Ž s.s i 4

ds q o Ž 1 . , Ž 15 .

«

where oŽ1. ª 0 in probability uniformly in t. Owing to the weak convergence of b « Ž?. to b Ž?., in particular, by Ž8., for any bounded and continuous function ˜ hŽ?.,

˜ Ž I b « Ž s.si4 . s ˜h Ž 1 . P Ž b « Ž s . s i . q ˜h Ž 0 . Eh

Ý P Ž b « Ž s. s j . j/i

ª˜ h Ž 1. P Ž b Ž s . s i . q ˜ h Ž 0.

Ý P Ž b Ž s. s j . j/i

˜ Ž I b Ž s.si4 . . s Eh Thus I b « Ž s.s i 4 converges to I b Ž s.s i 4 weakly. This and Ž15. then infer that u « Ž?. converges to u Ž?. such that

u Ž t. s u q

t H0 g Ž u Ž s . , z . n ij

i b, j

Ž s . I b Ž s.s i 4 ds.

Next consider x « Ž?.. Similar argument leads to x« Ž t. s x q

la

mi

Ý Ý

is1 js1

t H0 H f Ž x Ž s . , u Ž s . , à , z . m˜ Ž dà . ij

=nai , j Ž s . I a « Ž s.s i 4 ds

s

Ž 16 .

361

CONTROL OF DYNAMIC SYSTEMS t H0 H f Ž x

q

«

Ž s . , u Ž s . , Ã , zi j . y f Ž x Ž s . , u Ž s . , Ã , zi j .

=m ˜ s Ž dà . nai , j Ž s . I a « Ž s.s i 4 ds q

t

H0 H f Ž x

«

Ž s . , u Ž s . , à , zi j . I a « Ž s.s zi j4 y nai , j Ž s . I a « Ž s.s i 4 m ˜ s Ž dà . ds t H0 H f Ž x

q

«

Ž s . , u « Ž s . , Ã , zi j . y f Ž x « Ž s . , u Ž s . , Ã , zi j .

=m ˜ s Ž dà . I a « Ž s.s zi j4 ds t

q

H0 H f Ž x

«

, u « , à , m zi j . m ˜ «s Ž dà . y m ˜ s Ž dà . I a « Ž s.s zi j4 ds .

In view of Lemma 2, the a priori bound of x « Ž?. and the boundedness of Ž f ?., as in the case for the u « Ž?. process, an integration by parts leads to lim sup E

ǻ0 0FtFT

t

H0 H f Ž x

«

Ž s . , u Ž s . , Ã , zi j . I a « Ž s.s zi j4 y nai , j Ž s . I a « Ž s.s i 4 2

=m ˜ s Ž dà . ds F K sup

t

H 0FtFT 0

2

I a « Ž s.s zi j 4 y nai , j Ž s . I a « Ž s.s i 4 ds s 0.

The continuity of f Ž?., the weak convergence of Ž x « Ž?., u « Ž?.. to Ž x Ž?., u Ž?.. and the Skorohod representation infer that lim sup E

ǻ0 0FtFT

t H0 H f Ž x

«

Ž s . , u Ž s . , à , zi j . y f Ž x Ž s . , u Ž s . , à , zi j . =m ˜ s Ž dà . nai , j Ž s . I a Ž s.s i 4 ds s 0.

An analogues equation holds for the fifth term on the right side of the equality sign, i.e., lim sup E

ǻ0 0FtFT

t H0 H f Ž x

«

Ž s . , u « Ž s . , à , zi j . y f Ž x « Ž s . , u Ž s . , à , zi j . =m ˜ s Ž dà . I a « Ž s.s zi j4 ds s 0.

362

YIN AND ZHANG

The convergence of m ˜ « Ž?. to m ˜ Ž?. and the boundedness of f Ž?. imply lim sup E

ǻ0 0FtFT

t

H0 H f Ž x

«

Ž s . , u « Ž s . , Ã , zi j .

= m ˜ «s Ž dà . y m ˜ s Ž dà . I a « Ž s.s zi j4 ds s 0. Consequently, la

x« Ž t. s x q

mi

t

Ý Ý H H f Ž x Ž s . , u Ž s . , à , zi j . m˜ s Ž dà . nai , j Ž s . I a

is1 js1 0

q o Ž 1. ,

«

Ž s.s i 4

ds

Ž 17 .

«

where oŽ1. ª 0 in probability uniformly in t. Similar as in the treatment of u « Ž?., I a « Ž s.s i 4 converges to I a Ž s.s i 4 weakly, and as a result, the limit x Ž?. satisfies xŽ t. s x q

la

mi

t

Ý Ý H H f Ž x Ž s . , u Ž s . , à , zi j . m˜ s Ž dà . nai , j Ž s . I a Ž s.s i 4 ds.

is1 js1 0

Owing to the nature of the D-space, there can be at most countable number of points t at which P  Ž x Ž t . , u Ž t . . / Ž x Ž ty . , u Ž ty . . 4 ) 0 Žsee w12, p. 32x.. Let Tp denote the complement of this set, let tk - t - t q s 1 with t, tk 1, t q s g Tp , i.e., in the set of continuity points of Ž x Ž?., u Ž?.., and hŽ?. be any bounded and continuous function, and F Ž?. be any twice continuously differentiable function with compact support. Let pk 2Ž?. be arbitrary bounded and continuous functions. Notice that def

² pk , m ˜ « :t s 2 «

t

H0 H p

ª

k2

Ž s, à . m ˜ «s Ž dà . ds

t

H0 H p

k2

Ž s, à . m ˜ s Ž dà . ds s ² pk 2 , m ˜ :t .

Let i1 and j1 be arbitrary positive integers. Then by virtue of the weak convergence and the Skorohod representation, Eh x « Ž tk 1 . , u « Ž tk 1 . , ² pk 2 , m ˜ « :tk 1 , k 1 F i1 , k 2 F j1

ž

«

«

«

/

«

=Ž F Ž x Ž t q s . , u Ž t q s . . y F Ž x Ž t . , u Ž t . . . ª Eh x Ž tk 1 . , u Ž tk 1 . , ² pk 2 , m ˜ :tk 1 , k 1 F i1 , k 2 F j1

ž

/

=Ž F Ž x Ž t q s . , u Ž t q s . . y F Ž x Ž t . , u Ž t . . . .

363

CONTROL OF DYNAMIC SYSTEMS

On the other hand, we have lim Eh x « Ž tk 1 . , u « Ž tk 1 . , ² pk 2 , m ˜ « :tk 1 , k 1 F i1 , k 2 F j1

ž

«

/

= F Ž x « Ž t q s. , u « Ž t q s. . y F Ž x « Ž t . , u « Ž t . . tqs

y

Ht

A « F Ž x « Ž t . , u « Ž t . . dt s 0,

where

­ F Ž x, u .

A « F Ž x, u . s

­u q

ž

lb

mi

Ý Ý g Ž u , zi j . I b

«

is1 js1

­ F Ž x, u . ­x

ž

la

Ž t .s zi j 4

/

mi

Ý Ý H f Ž x, u , à , zi j . m˜ «t Ž dà . I a

«

is1 js1

Ž t .s zi j 4

/

.

Consequently, using Ž16. ] Ž17., Eh x Ž tk 1 . , u Ž tk 1 . , ² pk 2 , m ˜ :tk 1 , k 1 F i1 , k 2 F j1

ž

/

= F Ž x Ž t q s. , u Ž t q s. . y F Ž x Ž t . , u Ž t . .

ž

y

tqs

Ht

AF Ž x Ž t . , u Ž t . . dt s 0,

/

Ž 18 .

where AF Ž x, u . s

­ F Ž x, u . ­u q

ž

­ F Ž x, u . ­x

lb

mi

Ý Ý g Ž u , zi j . nbi, j I b Ž t .s i 4

is1 js1

ž

la

/

mi

Ý Ý H f Ž x, u , à , zi j . m˜ t Ž dà . n ji Ž t . I b Ž t .s i 4

is1 js1

/

.

The arbitrariness of i1 , j1 , F Ž?., hŽ?., pk 2Ž?., tk 1, t, s together with Ž18. implies that Ž x, u . solves the martingale problem with operator A, that is, F Ž x Ž t . , u Ž t . . y F Ž x Ž 0. , u Ž 0. . y

t

H0 AF Ž x Ž s . , u Ž s . . ds

is a martingale for each bounded real-valued function F Ž?. being twice continuously differentiable with compact support. Equivalently, Ž x Ž?., u Ž?..

364

YIN AND ZHANG

satisfies the limit problem, and has continuous paths with probability 1. Furthermore, m ˜ Ž?. is an admissible relaxed control for the limit problem P0 . Stage 3: The limit of J « Ž?.. The weak convergence of

Ž x « Ž ?. , u « Ž ?. , m ˜ « Ž ? . . to Ž x Ž ? . , u Ž ? . , m ˜ Ž ?. . , and the continuity of p Ž?. then imply J « Ž m ˜ « . ª JŽ m ˜ . as « ª 0. 5.2. Nearly Optimal Control This subsection focuses on Problem P« . We aim at deriving a limit result for the approximation of P« via P0 . Recall that ¨ « and ¨ 0 denote the value functions of P« and P0 , respectively. The asymptotically near optimality is in the theorem below. THEOREM 2.

Assume ŽA1. ] ŽA4.. Then lim ¨ « s ¨ 0 .

ǻ0

Moreo¨ er, for each d ) 0, there exists a Lipschitz continuous feedback control ud s ud Ž t . s ud Ž x « Ž t . , u « Ž t . , a « Ž t . , b « Ž t . , t . , which is d-optimal for P0 such that for the cost function J « Ž?. in P« , lim sup J « Ž u d . y ¨ « F d . «ª0

Remark. This theorem indicates a nearly optimal control for the original problem P« can be obtained by simply solving the problem P0 . Since d ) 0 is arbitrary, u d can be chosen to approximate the optimal solution with desired accuracy. Proof of Theorem 2. By virtue of Lemma 4, a smooth d-optimal control u d for Problem P0 exists. The weak convergence results of Theorem 1 then yield x « Ž u d , ?. « x Ž u d , ?.

and

J « Ž ud . ª J Ž ud . .

Ž 19 .

d

Since u is a d-optimal control for P0 , J Ž ud . F ¨ 0 q d . In view of Ž19., J « Ž u d . s J Ž u d . q D 1Ž « . F ¨ 0 q d q D 1Ž « . , «

where D 1Ž « . ª 0.

Ž 20 .

CONTROL OF DYNAMIC SYSTEMS

365

Now, by virtue of Theorem 1, choose m ˜ « g R « , u such that ¨ « , u G «. J m ˜ y « . Since R « , u is relatively compact, there exists a subsequence of  m ˜ « Ž?.4 Žstill denoted by  m ˜ « Ž?.4 for notational simplicity. such that «Ž . m ˜ ? «m ˜ Ž?.. Then it follows from Theorem 1 again that «Ž

¨0 F JŽ m ˜ . s ¨ «, u q D2Ž « . ,

Ž 21 .

«

with D 2 Ž « . ª 0. Combining Ž20. and Ž21. above, and noticing that ¨ « , u F ¨ « F J « Ž ud . ,

we arrive at ¨ « , u F ¨ « F J « Ž u d . F ¨ 0 q d q D 1Ž « .

F ¨ « , u q d q D 1Ž « . q D 2 Ž « . .

Ž 22 .

Taking lim sup as « ª 0, lim sup < ¨ « y ¨ 0 < F d . «ª0

Since d is arbitrary, lim « ª 0 ¨ « s ¨ 0 . By virtue of Ž22., 0 F J « Ž ud . y ¨ « F d q D Ž « . , «

where DŽ « . ª 0. This yields lim sup J « Ž u d . y ¨ « F d . «ª0

The proof of the theorem is now completed.

6. CONCLUDING REMARKS It is until very recently, the use of asymptotic expansion for singularly perturbed Markov chains and chains with weak and strong interactions begins to come into prominence w9, 10x. The behavior of the underlying Markov chains is captured by the quasi-stationary distribution and the related limit of the aggregated chains w19x. This work demonstrates how the asymptotic properties obtained can be employed to study nonlinear dynamic systems; it uses a purely probabilistic approach to treat the corresponding control problems of dynamic systems. The results should be useful for a wide range of applications in control, optimization and Markov decision problems; see for example, w20x and the references therein.

366

YIN AND ZHANG

The case that u « Ž?. is known is interesting on its own right. In this case, it can be integrated as a state variable, and our work presents a framework for controlled dynamic systems involving singularly perturbed Markov chains. In view of the proofs, for the singularly perturbed chains, all we need is the weak convergence of a « Ž?. to a Ž?. and b « Ž?. to b Ž?.. For this to hold, the smoothness of the generators is not crucial. One may work with measurable generators as in w19, Chapter 7.5x. Thus, Condition ŽQ. can be weakened by requiring only the weak irreducibility of each submatrix. As far as estimation problems are concerned, multi-scaled stochastic approximation remains to be a challenging problem w15x. For controlled systems, one may consider the problem that in addition to the setup of the current formulation, there is a fast changing noise process representing additional source of random disturbances. Another question worth of studying is a controlled dynamic system involving both Markov pure jump processes with weak and strong interactions and diffusion processes. Adding another fold of complexity, one may wish to study nearly optimal controls of systems consisting of both singularly perturbed Markov chains and singularly perturbed diffusions w8x. Moreover, a variation from a practical concern calls for replacing the diffusion processes by wide band noise disturbances. Other extensions may include investigating controlled dynamic systems with multiple small parameters in which the relative rates of change become crucial, and studying discrete time systems in which new formulations are needed since one can no longer use generators of Markov chains. Our hope is that the current paper will serve as a seed to open up new paths for future investigation.

REFERENCES 1. A. Bensoussan, ‘‘Perturbation Methods in Optimal Control,’’ Wiley, Chichester, 1988. 2. K. L. Chung, ‘‘Markov Chains with Stationary Transition Probabilities,’’ 2nd ed., Springer-Verlag, New York, 1967. 3. M. H. A. Davis, ‘‘Markov Models and Optimization,’’ Chapman & Hall, New York, 1993. 4. G. B. Di Masi and Yu. M. Kabanov, A first order approximation for the convergence of distributions of the cox processes with fast Markov switchings, Stochastics Stochastics Rep. 54 Ž1995., 211]219. 5. S. N. Ethier and T. G. Kurtz, ‘‘Markov Processes: Characterization and Convergence,’’ Wiley, New York, 1986. 6. W. H. Fleming, Generalized solution in optimal stochastic control, in ‘‘Proc. URI Conf. on Control,’’ 1982, pp. 147]165. 7. V. G. Gaitsgori and A. A. Pervozvanskii, Aggregation of states in a Markov chain with weak interactions, Kybernetika Ž1975., 91]98. 8. R. Z. Khasminskii and G. Yin, Asymptotic series for singularly perturbed Kolmogorov]Fokker]Planck equations, SIAM J. Appl. Math. 56 Ž1996., 1766]1793.

CONTROL OF DYNAMIC SYSTEMS

367

9. R. Z. Khasminskii, G. Yin, and Q. Zhang, Asymptotic expansions of singularly perturbed systems involving rapidly fluctuating Markov chains, SIAM J. Appl. Math. 56 Ž1996., 277]293. 10. R. Z. Khasminskii, G. Yin, and Q. Zhang, Constructing asymptotic series for probability distribution of Markov chains with weak and strong interactions, Quart. Appl. Math. LV Ž1997., 177]200. 11. P. V. Kokotovic, H. K. Khalil, and J. O’Reilly, ‘‘Singular Perturbation Methods in Control,’’ Academic Press, London, 1986. 12. H. J. Kushner, ‘‘Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory,’’ MIT Press, Cambridge, MA, 1984. 13. H. J. Kushner, ‘‘Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems,’’ Birkhauser, Boston, 1990. ¨ 14. H. J. Kushner and W. Runggaldier, Nearly optimal state feedback controls for stochastic systems with wideband noise disturbances, SIAM J. Control Optim. 25 Ž1987., 289]315. 15. H. J. Kushner and G. Yin, ‘‘Stochastic Approximation Algorithms and Applications,’’ Springer-Verlag, New York, 1997. 16. R. G. Phillips and P. V. Kokotovic, A singular perturbation approach to modelling and control of Markov chains, IEEE Trans. Automat. Control 26 Ž1981., 1087]1094. 17. S. P. Sethi and Q. Zhang, ‘‘Hierarchical Decision Making in Stochastic Manufacturing Systems,’’ Birkhauser, Boston, 1994. ¨ 18. G. Yin and Q. Zhang, Near optimality of stochastic control in systems with unknown parameter processes, Appl. Math. Optim. 29 Ž1994., 263]284. 19. G. Yin and Q. Zhang, ‘‘Continuous-Time Markov Chains and Applications: A Singular Perturbation Approach,’’ Springer-Verlag, New York, 1997. 20. G. Yin and Q. Zhang ŽEds.., ‘‘Mathematics of Stochastic Manufacturing Systems’’ ŽProc. 1996 AMS-SIAM Summer Seminar in Applied Mathematics., Lectures in Applied Mathematics, Amer. Math. Soc., Providence, RI, 1997. 21. J. Warga, Relaxed variational problems, J. Math. Anal. Appl. 4 Ž1962., 111]128. 22. Q. Zhang and G. Yin, A central limit theorem for singularly perturbed nonstationary finite state Markov chains, Ann. Appl. Probab. 6 Ž1996., 650]670.