Available online at www.sciencedirect.com
Automatica 39 (2003) 1339 – 1351 www.elsevier.com/locate/automatica
Stability of Markov modulated discrete-time dynamic systems G. Yina;∗ , Q. Zhangb a Department
of Mathematics, Wayne State University, Detroit, MI 48202, USA of Mathematics, University of Georgia, Athens, GA 30602, USA
b Department
Received 17 March 2002; received in revised form 21 November 2002; accepted 7 March 2003
Abstract Motivated by various control applications, this paper develops stability analysis of discrete-time systems with regime switching, in which the dynamics are modulated by Markov chains with two time scales. Using the high contrast of the di0erent transition rates among the Markovian states, singularly perturbed Markov chains are used in the formulations. Taking into consideration of the regime changes and time-scale separation makes the stability analysis a di2cult task. In this work, asymptotic stability analysis is carried out using perturbed Liapunov function techniques. It is demonstrated that if the limit system is stable, then the original system is also stable. In addition, we examine path excursion, derive bounds on mean recurrence time, and obtain the associated probability bounds. ? 2003 Elsevier Ltd. All rights reserved. Keywords: Di0erence equation; Markov chain; Singular perturbation; Stability; Recurrence; Path excursion
1. Introduction This work is devoted to stability of Markov modulated dynamic systems. The models under consideration are discrete-time systems subject to regime switching across which the behavior of the systems could be markedly di0erent. Our interest lies in
0005-1098/03/$ - see front matter ? 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0005-1098(03)00133-X
Altman and Borovkov (1997) for a queueing system with retrials. For additional recent progress in stability of hybrid systems, we refer to Badowski and Yin (2002), Ji and Chizeck (1990), Mao (1994, 1999) and Tsai (1998) among others. In this paper, we consider the case that the underlying systems are modeled by di0erence equations and/or di0erence equations subject to an additional exogenous random noise input. The dynamics of the discrete-time system are subject to regime changes that are modulated by a Markov chain. Recent study of hybrid systems has indicated that such a formulation is more general and appropriate for a wide variety of applications; see, for example, feedback linear systems (Blair & Sworder, 1986), robust linear control (Mariton & Bertrand, 1985), Markov decision problems (Liu, Zhang, & Yin, 2001), portfolio selection problems and nearly optimal controls (Zhang & Yin, 2001). Due to various considerations, the Markov chain often has a large state space. Finding the optimal control of the underlying systems is frequently deemed unfeasible. To overcome the di2culties, by noting the high contrasts of the transition rates and introducing a small parameter ¿ 0, one can incorporate singularly perturbed Markov chains into the problem under consideration. (For some of the recent developments on singularly perturbed Markovian systems, we refer to Abbad, Filar, and Bielecki (1992) and Pervozvanskii and Gaitsgori 1988), among others.) Note that the introduction of the small parameter ¿ 0 is only a convenient way for the
1340
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
purpose of time-scale separation. In our previous work, we have concentrated on the hierarchical approach (see Simon & Ando, 1961). The task of reduction of complexity is accomplished by showing that the underlying system converges to a limit system in which the coe2cients of the dynamics are averaged out with respect to the invariant measures of the Markov chain. Then using the optimal controls of the limit systems as a reference, we construct controls for the original systems and demonstrate their asymptotic optimality. In Liu et al. (2001) and Liu, Zhang, and Yin (2002), for discrete-time systems with running time k, we have focused on the study of asymptotic properties for → 0 and k → ∞, but k remains to be bounded. In this paper, we examine the reduction of complexity from a di0erent angle. The main e0ort is devoted to the stability, i.e., the behavior of the systems as → 0; k → ∞, and k → ∞. We show that if the limit system (or reduced system) is stable, then the original system is also stable for su2ciently small ¿ 0. Dealing with a discrete-time system (e.g., xk+1 = (xk ) for an appropriate function (·)) directly, even without random disturbances, one needs to calculate V (xk+1 )−V (xk ) (see LaSalle, 1979, p. 5) in a more complex way than the di0erentiation along the solution (d=dt)V (x) for a continuous-time system. The approach presented in this paper uses the Liapunov function of the limit system to carry out the needed analysis so it is simpler than treating the original discrete-time system directly. To achieve the reduction of complexity, the original dynamic systems are compared with the limit systems and then perturbed Liapunov function methods are used to obtain the desired bounds. Our results indicate that one can concentrate on the properties of much simpler limit systems to make inference about the stability of the original more complex systems. The rest of the paper is arranged as follows. Section 2 gives the precise formulation of the problem, in which the systems to be studied are set forth. It also presents several auxiliary results that are needed in the subsequent study. The
2. Formulation and preliminary results As alluded to in the introduction, to highlight the contrasts of di0erent transition rates, we introduce a small parameter ¿ 0. Consider a discrete-time Markov chain k with
(1)
where P is a transition probability matrix of a time-homogeneous Markov chain and Q = (qij ) is a generator of another continuous-time Markov chain, i.e., for each i = j, qij ¿ 0 and for each i; q = 0 or Q5m = 0. Here and j ij –×1 hereafter, 5– ∈ R denotes a column vector with all components being 1. The speci
(2)
As → 0, the dynamic system (2) is close to an averaged system or a limit system in an appropriate sense. Given the stability of the limit system, our interest is to
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
the stability of the dynamic system governed by (3), which will be carried out using the stability of its limit system. Example 1. With x0 = x; 0 = ; k = 0; 1; 2; : : : ; consider the following discrete-time control system: xk+1 = xk + (A( k )xk + B( k )uk ) +
√
wk ;
(4)
where xk represents the state, uk is the control variable, wk denotes the system disturbance, and A(i) and B(i) are matrices of appropriate dimensions. If we take a linear feedback control uk = H ( )xk for some matrix H ( ), then the resulting system has the form of (3) with f(x; ) = (A( ) + B( )H ( ))x and (x; )=1. Such discrete-time models arise in many applications directly because observations and state data are often collected in discrete time; see also the related LQ control problem in Liu et al. (2002). The system in (4) may also arise from a continuous-time system via discretization. For example, consider a continuous-time hybrid control problem d x (t) = (A( (t))x (t) + B( (t))u(t)) dt + dw(t);
(5)
where w(·) is a standard Brownian motion and (t) is a ˜ + Q, ˆ where continuous-time Markov chain generated by Q= ˜ ˆ ˜ both Q and Q are generators and Q has block-diagonal form (see Yin & Zhang (1998, Chapter 9) for more details). In (5), using = t= and discretizing the resulting system using a step size lead to a control problem of the form given by (4). Example 2. Emerging applications have also been found in the area of wireless communications. For example, in a code-division multiple-access (CDMA) system, to e2ciently serve heterogeneous tra2c (e.g., data, video) in a dynamic environment, it is often necessary that the design involves adaptive optimization at the receiver and transmitter. There is a growing literature (see the references cited in Krishnamurthy, Wang, & Yin, 2002) on adaptive multiuser detection for CDMA systems. In Krishnamurthy et al. (2002), adaptive spreading code optimization at the transmitter was considered using discrete stochastic optimization algorithms. Let {n } be a discrete-time Markov chain with
1341
√ scaled sequence vn$ = (#ˆn − E#(n ))= $, which in turn leads to a switching di0usion limit. The result to be presented in this paper provides the stability analysis from a long-time behavior point of view for such systems. To visualize the behavior of the systems with regime switching, we present a couple of examples. As can be seen from the graphs, although regime switching are involved in the systems, the systems preserve stability. Since it is for illustration purpose, only simple examples are used. In the sections to follow, we show that the stability of such systems can be analyzed via their corresponding limit or reduced system. Example 3. Consider a linear √ system with switching regime xk+1 = xk + A( k )xk + ( k )wk , where xk ∈ R2 , k is a Markov chain with state space M = {s11 ; s12 ; s21 ; s22 ), −1 0 −4 3 A(s11 ) = ; A(s12 ) = ; A(s21 )= 0 −2 1 −5 −3 −1 −4 1 , A(s22 ) = , and {wk } is a se0 −2 −1 −3 quence of R2 -valued normally distributed random variables with mean 0 and covariance being the identity matrix I . The transition probability matrix of k is given by P = P + Q, where
0:5
0:55 P=
0:5
; 0:4 0:5
0:45 0:6 0:5
−0:6
0 Q= 0:2 0:1
0
0:3
−0:3
0:1
0:3
−0:5
0:3
0
0:3
0:2 : 0 −0:4
We plot the trajectories of the stochastic di0erence equation with = 0:01. The results are displayed in Fig. 1. Example 4. This example is concerned with a twodimensional nonlinear system. Again, P = P + Q with P=
0:9
0:1
0:15
0:85
;
Q=
−0:3
0:3
0:5
−0:5
:
The system is given by (3), with f(x; 1) = −(2xk; 1 ; 3xk; 2 ) , f(x; 2) = (−(xk; 1 + xk; 2 )2 ; −(xk; 2 + xk; 1 xk; 2 )) , (x; 1) = I; (x; 2) = 0:6I , where I is a 2 × 2 identity matrix. With = 0:01, we plot the trajectories of the two components in Fig. 2.
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351 6
6
5
5
4
4
3
3 e x k,1
e x k,1
1342
2 1
1
0
0
-1 0
(a)
-1
50 100 150 200 250 300 350 400 450 500 time: k
4
4
3.5
3.5
3
3
2.5
2.5 x ek,2
k,2
xe
1.5
1 0.5
0
0 50 100 150 200 250 300 350 400 450 500 time: k
-0.5
50 100 150 200 250 300 350 400 450 500 time: k
(b)
Fig. 1. Trajectories of xk for two-dimensional problem: four-state Markov chain; = 0:01; (s11 ) = 0:2I , (s12 ) = 0:15I; (s21 ) = 0:15I , and (s22 ) = 0:15I : (a) xk; 1 and (b) xk; s .
Fig. 2. Trajectories of xk for two-dimensional problem: two-state Markov chain; = 0:01, (x; 1) = I , and (x; 2) = 0:6I : (a) xk; 1 and (b) xk; 2 .
2.3. Preliminary results
2.2. Structure of P matrix Owing to the presence of the small parameter ¿ 0, the matrix P has crucial inNuence. It is well known (e.g., Iosifescu, 1980, p. 94) that a
0
2
1
0
50 100 150 200 250 300 350 400 450 500 time: k
1.5
0.5
-0.5
0
(a)
2
(b)
2
(6)
where for each i 6 l, Pi is a transition matrix within the ith recurrent class. Here and hereafter, diag(A1 ; : : : ; Aj ) denotes a block diagonal matrix with matrix entries A1 ; : : : ; Aj having appropriate dimensions. A Markov chain with transition matrix given by (6) consists of l recurrent classes. Corresponding to the transition matrix given in (6), the state space M admits the following decomposition M = M1 ∪ M2 ∪ · · · ∪ Ml , with Mi = {si1 ; : : : ; simi } satisfying |Mi | = mi and l i=1 mi = m. In what follows, we will
Suppose that k has transition probabilities (1) with P having a block diagonal form (6). For some T ¿ 0 and for each k = 0; 1; : : : ; T=, the probability vector pk = (P( k =1); : : : ; P( k =m)) ∈ R1×m satis
p0 = p0 ; (7) m p0; i ¿ 0 and p0 5m = i=1 p0; i =1, where p0; i denotes the ith component of p0 . Note that p0 is independent of and is the initial probability distribution. In the above, it is understood that T= is the integer part of T=. For simplicity, we have suppressed the Noor function notation T=. Denition 5. In what follows, for an appropriate function h(·) (either h(·) : Rn × M → Rn or h(·) : Rn × M → Rn×n ), we say h(·) satis
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
(A3) (·) satis
k−1
r=0
(I{ r =sij } − .ij I{ Rr =i} );
(8)
and Ok =(ok; ij ; i 6 l; j 6 mi ) ∈ R1×m . The following proposition summarizes the results of asymptotic expansions, mean squares bounds on occupation measures, and limit behavior on aggregation of states. A sketch of its proof can be found in Yin, Zhang, and Badowski (2003). Proposition 7. Assume condition (A1). Then the following assertions hold: (a) For the probability distribution vector pk , we have |pk − (t) diag(.1 ; : : : ; .l )| 6 K((k + 1) + 1k )
(9)
for some 1 with 0 ¡ 1 ¡ 1, and (t) = (1 (t); : : : ; l (t)) ∈ R1×l satis;es d(t) R = (t)Q; dt
i (0) = p0i 5mi ;
where t = k; p0 = (p01 ; : : : ; p0l ) with p0i ∈ Rmi ×1 , and ˜ QR = diag(.1 ; : : : ; .l )Q5;
1343
5˜ = diag(5m1 ; : : : ; 5ml ):
(10)
(b) The k-step transition probability matrix (P )k satis;es |(P )k − 3(k)| 6 K((k + 1) + 1k );
(11)
˜ 3(t) = 54(t) diag(.1 ; : : : ; .l ); d4(t) R = 4(t)Q; dt
4(0) = I:
(12)
(c) Then sup06k6T= E|ok; ij |2 = O() for i = 1; : : : ; l; j = 1; : : : ; mi . (d) De;ne R (t) = Rk for t ∈ [k; k + ). Then R (·) converges weakly to (·), R a continuous-time Markov chain generated by QR de;ned in (10). To further our study, we present a lemma that is a mean-square error estimate in an in
∞
k=0
e−k [I{ k =sij } − .ij I{ k ∈Mi } ]:
(13)
Then E(#ij )2 = O(); i = 1; : : : ; l and j = 1; : : : ; mi . Remark 9. Note that the occupation measures de
1344
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
Lemma 10. The following assertions hold: (a) Under (A1) and (A2), the sequence x (·) given in (2) converges weakly to x(·) that satis;es x(0) = x0 ;
(0) R = R0 , and d R x(t) = f(x(t);
(t)); R dt
(14)
R i) = mi .ij f(x; sij ). where for i = 1; 2; : : : ; l, f(x; j=1 (b) Assume conditions (A1) – (A4). Then the sequence (x (·); R (·)) converges weakly to (x(·); (·)) R such that x(0) = x0 ; (0) R = 0 and R d x(t) = f(x(t);
(t)) R dt + (x(t); R
(t)) R dw;
(15)
R = {1; : : : ; l} where for each i ∈ M def
R i) = (x; R i)R (x; i) = 5(x;
mi
.ij (x; sij ) (x; sij ):
(16)
j=1
polynomial growth of order n0 . It follows from this condition that |@– V (x; i)|[|f(x; i)|– + |(x; i)|– ] 6 K(V (x; i) + 1). Concerning the notion @– V (x; i). If – = 1; @V (x; i) is the gradient; if – = 2; @2 V (x; i) is the Hessian; if – ¿ 2, @– V (x; i) is the usual multi-index notation of mixed partial derivatives. References on matrix calculus and Kronecker products can be found, for instance in Graham (1981), among others. Theorem 12. Assume that (A1), (A2), and (A5) hold, that R = {1; : : : ; l}, for i ∈ M LV (x; i) 6 − 8V (x; i)
for some 8 ¿ 0;
(17)
where the operator is de;ned by R i) + QV R (x; ·)(i); LV (x; i) = Vx (x; i)f(x; R (x; ·)(i) = QV
l
qRij V (x; j);
(18)
i=1
3. Stability Use Ek to denote the conditional expectation with respect to Fk , the -algebra generated by {x0 ; j : j ¡ k} for (2) and the -algebra generated by {x0 ; j ; wj : j ¡ k} for (3), respectively. For future use, de
and that EV (x0 ; R0 ) ¡ ∞. Then EV (xk+1 ; Rk+1 ) 6 exp(−8k)EV (x0 ; R0 ) + O():
(19)
R (x; ·)(i) is the coupling term associated with Note that QV the switching Markov chain. It is the ith component of the R (x; 1); : : : ; V (x; l)) . For more detail on column vector Q(V generators associated with Markov processes, see Ethier and Kurtz (1986). Example 13. As an illustration, let k be a two-state Markov chain with transition matrix given by (1) such that P is irreducible and aperiodic. For – ∈ M = {1; 2}, consider the linear system: xk+1 = xk + A( k )xk , where A(–) = diag(a(–); b(–)); xk ∈ R2 . The limit of this system R where according to (a) in Lemma 10 is given by x˙ = Ax, AR = .1 A(1) + .2 A(2). Suppose that AR ¡ 0. De
R i ∈ M;
(20)
and that EV (x0 ; R0 ) ¡ ∞. Then ; Rk+1 ) 6 exp(−8k)EV (x0 ; R0 ) + O(): EV (xk+1
(21)
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
Example 15. Suppose that x ∈ Rn ; k is a discrete Markov chain with M={1; : : : ; m}, whose transition probability matrix is given by (1) such that P is irreducible and aperi odic. The system√ modulated by the Markov chain is xk+1 = xk + A( k )xk + B( k )xk wk , where for each – ∈ M, A(–) and B(–) are n × n matrices such that A(–) is Hurwitz and x B(–)B (–)x + x A(–)x ¡ 0 for x = 0, and {wk } satis
4. Recurrence and path excursion We have established bounds on EV (xk ; Rk ) for large k and small . In this section, we examine the recurrence and path excursion properties of the dynamic systems associated with the singularly perturbed discrete-time Markov chains. We will use the following assumptions. (A5 ) For each i = 1; : : : ; l, there is a twice continuously di0erentiable Liapunov function V (x; i) such that |Vx (x; i)x| 6 K(V (x; i)+1), for each i, that minx V (x; i)=0, that Vx (x; i)f(x; i) 6 − c0 for some c0 ¿ 0, and that V (x; i) → ∞ as |x| → ∞. For some 10 ¿ 0 and 11 ¿ 10 , de
E0 V (x0 ; R0 )(1
+ O()) + O()
c1
;
(22)
where E0 denotes the conditional expectation with respect to the -algebra F0 . Remark 18. The above theorem indicates that for suf
1345
if x0 ∈ B1 − B0 , then the conditional mean recurrence time of 1 − 0 has an upper bound [11 (1 + O()) + O()]=c1 . In addition to the conditional moment bound, we may also obtain the probability bound of the form, for = ¿ 0, P
sup
0 6k¡1
6
V (xk ; Rk ) ¿ =|F0
E0 V (x0 ; R0 +1 )(1 + O()) + O() : =
Compared with Theorems 12 and 14, the growth conditions are much relaxed. The main reason is: We do not need the moment estimates since we can work with truncated Liapunov function.
5. Further remarks 5.1. Extensions The results obtained can be extended to the case that the transition matrix P in (1) consists of not only recurrent states, but also transient states. The state space is now decomposed to M=M1 ∪M2 ∪· · · Ml ∪M∗ , where Mi are as de
(23)
where A∗ = (a1 ; : : : ; al ) ∈ Rm∗ ×l , with ai = −(P∗ − I )−1 P∗; i 5mi for i = 1; : : : ; l. Now the aggregated process Rk is modi
Rk
=
i
if k ∈ Mi ;
Uj
if k = s∗j ;
with Uj = I{06U 6a1; j } + 2I{a1; j ¡U 6a1; j +a2; j } + · · · + lI{a1; j +···+al−1; j ¡U 61} ;
1346
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
and U is a random variable uniformly distributed on [0; 1], independent of k . De
R (·) of Rk as R (t) = Rk for t ∈ [k; k + ). Then we can R that is a Markov chain show R (·) converges weakly to (·) generated by QR ∗ . The results on asymptotic expansions with inclusion of transient states and nonhomogeneous Markov chains can be found in Yin and Zhang (2000), whereas further asymptotic properties are in Yin et al. (2003). With the preparation above, we can carry out the stability analysis. The proofs and techniques are similar. The main idea behind this is that the transient states do not contribute anything to the limit systems. By aggregating only states in each recurrent class, the limit systems are still averages with respect to the stationary measures of the recurrent states. 5.2. Further investigation We have studied stability problems arising from discrete-time dynamic systems. The main results indicate that under appropriate conditions, stability of the limit continuous-time dynamic systems implies that of the original problems. A number of interesting problems remain to be open. For example, one may wish to consider the invariance principles associated with the dynamic systems, which will be modi
Acknowledgements This research is supported in part by the NSF Grant DMS-9877090, the USAF Grant F30602-99-2-0548 and the ONR Grant N00014-96-1-0263.
Appendix A. Proofs of results Proof of Lemma 8. The proof to follow relies mainly on the Markovian transition properties. The estimates on in
factor; see Solo and Kong (1995) and Kushner and Yin (1997) on the usage for recursive algorithms. Expanding ∞ k the summations in (13) leads to E(#ij )2 6 22 k=0 ‘=0 e−k e−‘ E1ijk 1ij‘ , where 1ijk = [I{ k =sij } − .ij I{ k ∈Mi } ]. Similar to our previous work (Yin et al., 2003), it can be shown that for k ¿ ‘, E1ijk 1ij‘ = O( + k2 + k 2 3 + 1‘ + 1k−‘ ). It follows ∞ 2 2 −k that E(#ij ) 6 K (1 + k + k 2 2 + k 3 3 ) = O(). k=0 e The desired result thus follows. Sketch of Lemma 10(b). Since our main e0ort in the paper is about stability analysis, most of the details of the weak convergence argument are omitted due to page limitation. We provide a sketch of proof for Lemma 10 part (b); the proof for Lemma 10(a) is even simpler. Working with the pair (x (·); (·)), as in the usual weak convergence analysis,
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
= Ek Vˆ x (xk ; k )f(xk ; k )
Step 1 (Estimate for V (x; i)): De
l
V (x; i)I{ ∈Mi } :
+ Ek QVˆ (xk ; ·)( k ) + O(2 )(Ek Vˆ (xk ; k ) + 1):
(A.1)
V1 (x; k) =
R R ) = Observe that for each k, f(x; k ) − f(x; k f(x; sij )[I{ k =sij } − .ij I{ k ∈Mi } ]. Thus,
=Ek [Vˆ (xk+1 ; k+1 ) − Vˆ (xk+1 ; k )]
(A.2)
where Ek denotes the conditional expectation with respect to Fk = {x0 ; j : j ¡ k}. Note that by use of a trun ; k ) − Vˆ (xk ; k )] = cated Taylor expansion, Ek [Vˆ (xk+1 Ek Vˆ x (xk ; k )f(xk ; k ) + Dk+1 , where n −1 0
– ˆ – |@ V (xk ; k )xk+1 − xk | Dk+1 = O Ek
V1 (x; k) =
1
i=1 j=1 k1 =k
0
E|V1 (x; k)| 6
mi l
Ek
[Vˆ (xk+1 ; sij )
i=1 j=1
= sij | k = si1 j1 ) ×P( k+1 −Vˆ (xk+1 ; si1 j1 )]I{ k =si1 j1 }
ˆ k ) + O(2 )(Ek Vˆ (xk ; k ) + 1) = 6k Ek QF(x = Ek QVˆ (xk ; ·)( k ) + O(2 )(Ek Vˆ (xk ; k ) + 1): Thus, we have ; k+1 ) Ek [Vˆ (xk+1
−
Vˆ (xk ; k )]
j=1
e(k−k1 ) Ek Vˆ x (x; k )f(x; sij )1ijk1 ;
E 1=2 [Ek Vˆ x (x; sij )f(x; sij )]2
1=2
2 ∞
(k−k1 ) ij Ek e 1k 1 ¡ ∞
uniformly in k ¿ 0. As a result, sup06k E|V1 (x; k)| ¡ ∞. In view of (A2), we have for each i = 1; : : : ; l and j = 1; : : : ; mi , |Vˆ x (x; k )f(x; sij )| 6 K(Vˆ (x; k ) + 1), for some K ¿ 0, and hence it can be shown that Ek 1ijk1
= Ek
mi0 l
[P( k 1 = sij | k = si0 j0 )
i0 =1 j0 =1
(A.3)
mi0
−.ij
ˆ k+1 ), using an estimate Since P − I is orthogonal to F(x similar to the above one, detailed calculation reveals that
mi l
mi
k1 =k
In view of the linear growth of f(x; ) implied by (A2), the bounds on @– Vˆ (x; ), for 2 6 – 6 n0 , we have
; k+1 ) − Vˆ (xk+1 ; k )] Ek [Vˆ (xk+1
i=1
i=1 j=1
×E
+ s(xk+1 − xk ))| ds :
Dk+1 = O(2 )(Ek Vˆ (xk ; k ) + 1):
l
recall that 1ijk1 = [I{ k =sij } − .ij I{ k ∈Mi } ]. Consequently, by 1 1 virtue of Lemma 8, an application of the Cauchy–Schwartz inequality yields that
|@n0 Vˆ (xk
i1 =1 j1 =1
mi
l
∞
–=2
+ O Ek |xk+1 − xk |n0
=
Ek e(k−k1 ) Vˆ x (x; k )
R Rk )]: ×[f(x; k 1 ) − f(x; 1
Ek Vˆ (xk+1 ; k+1 ) − Vˆ (xk ; k )
mi l
∞
k1 =k
Then we have
+ Ek [Vˆ (xk+1 ; k ) − Vˆ (xk ; k )];
(A.4)
Step 2 (Perturbed Liapunov function): De
i=1
1347
P( k 1 = sik2 | k = si0 j0 )]
k2 =1
= O( + 1k1 −k );
for some 0 ¡ 1 ¡ 1:
It can then be seen |V1 (x; k)| 6 K(Ek Vˆ (x; k ) + 1): Furthermore, we obtain that Ek [V1 (xk+1 ; k + 1) − V1 (xk+1 ; k)]
∞
= Ek e(k+1−k1 ) [Vˆ x (xk+1 ; k+1 ) k1 =k+1 R k+1 ; k )][f(xk+1 ; k 1 ) − f(x ; Rk1 )] −Vˆ x (xk+1
+
∞
k1 =k+1
e(k+1−k1 ) Vˆ x (xk+1 ; k )
(A.5)
1348
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351 R k+1 ×[f(xk+1 ; k 1 ) − f(x ; Rk1 )]
−
∞
Note that k−1
e8k1 [e8 − 1]V (xk1 ; k1 ) E
e(k−k1 ) Vˆ x (xk+1 ; k )
k1 =0
k1 =k
×
; k 1 ) [f(xk+1
−
R k+1 f(x ; Rk1 )]
6 K8
:
+ 1) −
k1 =0
=E
k−1
+
= O( )(Ek Vˆ (xk ; k ) + 1): 2
+ 1) −
k1 =0
R k ; Rk ) = − Ek Vˆ x (xk ; k )f(xk ; k ) + Ek Vˆ x (xk ; k )f(x +O(2 )(Ek Vˆ (xk ; k ) + 1):
(xk ; k)
Then using the estimates for Ek [V − ˆ ˆ V (xk ; k )] = O()(Ek V (xk ; k ) + 1), and upon cancellation, R k ; Rk ) + Ek QVˆ (xk ; ·)( Rk ) =Ek Vx (xk ; Rk )f(x +O(2 )(Ek Vˆ (xk ; k ) + 1):
+e8(k−1) E[Ek−1 V (xk ; k) − V (xk−1 ; k − 1)]: (A.8)
Iterating on the above recursion yields that Ee8k V (xk ; k) = E [V (x0 ; 0)
k1 =0
R k ; Rk )] ×Ek1 QVˆ x (xk1 ; ·)( k 1 )[f(xk1 ; k 2 ) − f(x 1 2 yields another term of the order O()(V (xk1 ; k 1 ) + 1), which is added to the next to the last term of (A.11). R (x ; ·)( ) 6 − Using (17), Vx (xk1 ; Rk1 )f(xk1 ; k 1 ) + QV k1 k1 8V (xk1 ; k1 ), and in addition, for su2ciently small ¿ 0, −8V (xk1 ; k 1 ) + O()V (xk1 ; Rk1 ) 6 − 81 V (xk1 ; k 1 ) for some k−1 0 ¡ 81 ¡ 8. As a result, E k1 =0 e8k1 Ek1 [Vx (xk1 ; Rk1 ) R (x ; ·)( R ) + O()V (x ; )] 6 0. Using f(xk1 ; Rk1 ) + QV k1 k1 k1 k1 (A.10), dividing both sides by e8k , we obtain k
E e8(k1 −k) [e8 − 1]V (xk1 ; k1 )
6 K8
k
e8(k1 −k) EV (xk1 ; k1 )
k1 =0
e8k1 [Ek1 V (xk1 +1 ; k1 + 1)
− V (xk1 ; k1 )] :
e(k1 −k2 )
k1 =0
e8k1 [e8 − 1]Ek1 V (xk1 ; k1 )
∞
k2 =k1
(A.7)
=e8(k−1) [e8 − 1]EV (xk ; k)
k1 =0
(A.11)
In the last term of (A.11), replacing QV by
= QVˆ (xk1 ; ·)( k 1 ) +
E[e8(k) V (xk ; k) − e8(k−1) V (xk−1 ; k − 1)]
+
i=1 j=1
def
Step 3 (Final estimates and iteration): With the 8 ¿ 0 given in the theorem, we have
k−1
k1 =0
QV (xk1 ; k1 )
Ek [V (xk+1 ; k + 1) − V (xk ; k)]
k−1
mi l
e8k1 Ek1 QV (xk1 ; ·)
× (sij )1ijk1 :
(A.6) V1 (x; k),
e8k1 O()(Ek1 V (xk1 ; k 1 ) + 1)
k−1
+
To proceed, de
+
k−1
+
V1 (xk ; k)]
V (x; k) = Vˆ (x; k ) + V1 (x; k):
R (xk ; ·)( Rk ) e8k1 Ek1 QV 1 1
k1 =0
Then it follows that Ek [V1 (xk+1 ;k
R k ; Rk ) e8k1 Vx (xk1 ; Rk1 )f(x 1 1
k1 =0
+ O( )(Ek Vˆ (xk ; k ) + 1); −
k−1
Ek1
2
V1 (xk ; k)
(A.10)
and k−1
E e8k1 Ek1 [V (xk1 +1 ; k1 + 1) − V (xk1 ; k1 )]
V1 (xk+1 ; k)]
R k ; k )] = − Ek Vˆ x (xk ; k )[f(xk ; k ) − f(x Ek V1 (xk+1 ; k)
e8k1 EV (xk1 ; k1 );
k1 =0
Noting that all but one terms are cancelled in the last two sums above and carrying out a similar estimate as in step 1, we arrive at Ek [V1 (xk+1 ;k
k−1
(A.9)
+
k
k1 =0
e8(k1 −k) EQV (xk1 ; k1 ) + O():
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
1349
= tr[Ek Vˆ xx (xk ; k )5(xk ; k )]
As a result,
+O(2 )(Ek Vˆ (xk ; k ) + 1):
; k + 1) 6 e−8k EV (x0 ; 0) EV (xk+1 k
An application of the Gronwall’s inequality yields that
As for the next to the last term in (A.13), similar to the estimates of (A.3) with the use of independence of {wk } and { k }, we obtain D˜k+1 = O(2 )(Ek Vˆ (xk ; k ) + 1). Therefore, we have
EV (xk+1 ; k + 1) 6 Ke−8k V (x0 ; 0) + O():
Ek [Vˆ (xk+1 ; k+1 ) − Vˆ (xk ; k )]
+ K8
e8(k1 −k) EV (xk1 ; k1 ) + O():
k1 =0
(A.12)
Using (A.5) with V (xk ; k) replaced by V (xk ; Rk ) in (A.12), we obtain
=Ek Vˆ x (xk ; k )f(xk ; k ) + Ek QVˆ (xk ; ·)( k )
; k+1 ) 6 Ke−8k EV (x0 ; R0 ) + O(): EV (xk+1
+ tr[Ek Vˆ xx (xk ; k )5(xk ; k )] 2
The proof of the theorem is concluded.
+ O(2 )(Ek Vˆ (xk ; k ) + 1):
Proof of Theorem 14. The proof is in spirit similar to that of Theorem 12. However, special care must be exercised in treating the external disturbances. De
=Ek Vˆ x (xk ; k )f(xk ; k ) + Ek QVˆ (xk ; ·)( k )
(A.13)
where D˜k+1
=O Ek
n
0 −1
|@– Vˆ (xk ; k )xk+1 − xk |–
–=3
−xk |n0
0
1
De
k−1
Ek e(k−k1 ) 2 k1 =0
R k )]]: ×tr[Vˆ xx (x; k )[5(x; k 1 ) − 5(x; 1
+ O Ek |xk+1
|@n0 Vˆ (xk + s(xk+1 − xk ))| ds :
Since the arguments are similar to that of Theorem 12, we only outline the main steps and point out the di0erences here. As can be done similar to the proof of Theorem 12, sup06k E|V‘ (xk ; k)| ¡ ∞, for ‘ = 1; 2, |Vˆ ‘ (xk ; k)| = O()(Ek Vˆ (xk ; k )+1), for ‘=1; 2. Note that Ek [V2 (xk+1 ; k+ 2 1) − V2 (xk ; k + 1)] = O( )(Ek Vˆ (xk ; k ) + 1), and that tr{Ek Vˆ xx (xk ; k )5(xk ; k )} 2 R k ; k )} + tr{Ek Vˆ xx (xk ; k )5(x 2
=−
+ O(2 )(Ek Vˆ (xk ; k ) + 1): Next de
R k ; Rk ) + Ek QV R (xk ; ·)( Rk ) =Ek V x (xk ; Rk )f(x ˆ k ; k )} tr{Ek Vxx (xk ; Rk )5(x 2
By virtue of the independence of wk with F k and the measurability of Vˆ xx (xk ; k ), f(xk ; k ), and (xk ; k ) with respect to F k , by virtue of (A4),
R (xk ; ·)( Rk )] + Ek [QVˆ (xk ; ·)( k ) − QV
Ek tr[Vˆ xx (xk ; k )(xk ; k )wk wk (xk ; k )]
+O(2 )(Ek Vˆ (xk ; Rk ) + 1):
=Ek tr[Vˆ xx (xk ; k )(xk ; k )Ek [wk wk ] (xk ; k )] =Ek tr[Vˆ xx (xk ; k )(xk ; k ) (xk ; k )]; and similarly Ek tr[Vˆ xx (xk ; k )(xk ; k )wk f (xk ; k )]=0 and 2 Ek tr[Vˆ xx (xk ; k )f(xk ; k )f (xk ; k )] = O(2 )(Ek Vˆ (xk ; k ) + 1). Thus, − xk )(xk+1 − xk ) ] Ek tr[Vˆ xx (xk ; k )(xk+1
(A.15)
Ek [V2 (xk ; k + 1) − V2 (xk ; k)]
+ 12 Ek tr[Vˆ xx (xk ; k )(xk+1 − xk )(xk+1 − xk ) ]
+D˜k+1 + O(2 )(Ek Vˆ (xk ; k ) + 1);
(A.14)
+
Proceeding similarly as in the proof of Theorem 12, the desired stability follows. Proof of Theorem 17. In the proof of the result, we adopt the approach of Kushner (1984) for the Markov chain setup. Di0erent from the proofs of the previous two theorems, we can use truncated Liapunov function here. As a result, the conditions needed are much relaxed.
1350
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351
The proofs for systems (2) and (3) are similar, so we shall concern ourselves only with (2). De
(A.16)
For any M ¿ 0, de
=E0
Ek [VM (k + 1) − VM (k)]
k=0
6 − c1 E0 [1 ∧ (0 + k) ∧ ˜N − 0 ];
(A.17)
where a ∧ b = min(a; b). Replacing V1 (x; k) in (A.16) by V (k) − V (xk ; Rk ), we obtain V (k) ¿ (1 − O())Ek V (x; Rk ) − O(): We claim that limN ˜N ¿ 1 . For suppose not, then E0 V (1 ∧ (0 + k) ∧ ˜N ) ¿ E0 V (xk ; Rk ) − O() →∞
as N → ∞;
which is a contradiction. Using (A.17), c1 E0 [1 ∧ (0 + k) ∧ ˜N − 0 ] 6 E0 V (0 ). Using (A.16) again, E0 V (0 ) 6 E0 V (x0 ; Rc0 )(1+O())+ O(). Combining these two inequalities and letting N → ∞ and k → ∞, (22) is obtained.
References Abbad, M., Filar, J. A., & Bielecki, T. R. (1992). Algorithms for singularly perturbed limiting average Markov control problems. IEEE Transactions on Automatic Control, AC-37, 1421–1425. Altman, E., & Borovkov, A. A. (1997). On the stability of retrial queues. QUESTA, 26, 343–363.
Altman, E., & Hordijk, A. (1997). Applications of Borovkov’s renovation theory to nonstationary stochastic recursive sequences and their control. Advances in Applied Probability, 29, 388–413. Badowski, G., & Yin, G. (2002). Stability of hybrid dynamic systems containing singularly perturbed random processes. IEEE Transactions on Automatic Control, 47, 2021–2032. Blair, W. P., & Sworder, D. D. (1986). Feedback control of a class of linear discrete systems with jump parameters and quadratic cost criteria. International Journal of Control, 21, 833–841. Branicky, M. S. (1998). Multiple Liapunov functions and other analysis tools for switched and hybrid systems. IEEE Transactions on Automatic Control, 43, 475–482. Ethier, S. N., & Kurtz, T. G. (1986). Markov processes: Characterization and convergence. New York: Wiley. Graham, A. (1981). Kronecker products and matrix calculus with applications. Chinchester: Ellis Horwood Ltd. Iosifescu, M. (1980). Finite Markov processes and their applications. Chichester: Wiley. Ji, Y., & Chizeck, H. J. (1990). Controllability, stabilizability, and continuous-time Markovian jump linear quadratic control. IEEE Transactions on Automatic Control, 35, 777–788. Krishnamurthy, V., Wang, X., & Yin, G. (2002). Spreading code optimization and adaptation in CDMA via discrete stochastic approximation. Preprint. Kushner, H. J. (1984). Approximation and weak convergence methods for random processes, with applications to stochastic systems theory. Cambridge, MA: MIT Press. Kushner, H. J., & Yin, G. (1997). Stochastic approximation algorithms and applications. New York: Springer. LaSalle, J. P. (1979). The stability of dynamical systems. Philadelphia, PA: SIAM. Liu, R. H., Zhang, Q., & Yin, G. (2001). Nearly optimal control of singularly perturbed Markov decision processes in discrete time. Applied Mathematics and Optimization, 44, 105–129. Liu, R. H., Zhang, Q., & Yin, G. (2002). Asymptotically optimal controls of hybrid linear quadratic regulators in discrete time. Automatica, 38, 409–419. Mao, X. (1994). Exponential stability of stochastic di5erential equations. New York: Marcel Dekker. Mao, X. (1999). Stability of stochastic di0erential equations with Markovian switching. Stochastic Process Applied, 79, 45–67. Mariton, M., & Bertrand, P. (1985). Robust jump linear quadratic control: A mode stabilizing solution. IEEE Transactions on Automatic Control, AC-30, 1145–1147. Michel, A. N., & Hu, B. (1999). Towards a stability theory of general hybrid dynamical systems. Automatica, 35, 371–384. Pervozvanskii, A. A., & Gaitsgori, V. G. (1988). Theory of suboptimal decisions: Decomposition and aggregation. Dordrecht: Kluwer. Simon, H. A., & Ando, A. (1961). Aggregation of variables in dynamic systems. Econometrica, 29, 111–138. Solo, V., & Kong, X. (1995). Adaptive signal processing algorithms. Englewood Cli0s, NJ: Prentice-Hall. Tsai, C.-C. (1998). Composite stabilization of singularly perturbed stochastic hybrid systems. International Journal of Control, 71, 1005–1020. Ye, H., Michel, A. N., & Hou, L. (1998). Stability theory for hybrid dynamical systems. IEEE Transactions on Automatic Control, AC-43, 461–474. Yin, G., & Zhang, Q. (1998). Continuous-time Markov chains and applications: A singular perturbation approach. New York: Springer. Yin, G., & Zhang, Q. (2000). Singularly perturbed discrete-time Markov chains. SIAM Journal on Applied Mathematics, 61, 834–854. Yin, G., Zhang, Q., & Badowski, G. (2000). Asymptotic properties of a singularly perturbed Markov chain with inclusion of transient states. Annals of Applied Probability, 10, 549–572.
G. Yin, Q. Zhang / Automatica 39 (2003) 1339 – 1351 Yin, G., Zhang, Q., & Badowski, G. (2003). Discrete-time singularly perturbed Markov chains: Aggregation, occupation measures, and switching di0usion limit. Advances in Applied Probability, 35. Zhang, Q., & Yin, G. (2001). Nearly optimal asset allocation in hybrid stock-investment models. Preprint. G. George Yin received his B.S. degree in mathematics, from the University of Delaware in 1983, M.S. in Electrical Engineering and Ph.D. in Applied Mathematics, from Brown University in 1987. Subsequently, he joined the Department of Mathematics, Wayne State University, where he became a professor in 1996. He received the Career Development Chair Award in 1993, Board of Governors Faculty Recognition Award in 1999, and Charles H. Gershenson Distinguished Faculty Fellowship in 2001 from WSU. He severed on the editorial board of Stochastic Optimization and Design, the Mathematical Review Date Base Committee, and various conference program committees; he was the editor of SIAM Activity Group on Control and Systems Theory Newsletters, an Associate Editor of the IEEE Transactions on Automatic control, the SIAM Representative to the 34th CDC, and Co-Chair of 1996 AMS-SIAM Summer Seminar in Applied Mathematics. He is also Co-Chair of the 2003 AMS-IMA-SIAM Summer Research Conference “Mathematics of Finance.” He is a Fellow of IEEE.
1351
Qing Zhang is a Professor of Mathematics at University of Georgia. He received his B.Sc. in Control Theory from Nankai University in 1983 and M.Sc. and Ph.D. from Brown University in 1985 and 1988, respectively. His research interests include stochastic system and control, nonlinear