Automatica 41 (2005) 1063 – 1070 www.elsevier.com/locate/automatica
Brief paper
Near-optimal controls of random-switching LQ problems with indefinite control weight costs夡 Yuanjin Liua , G. Yina,∗ , Xun Yu Zhoub a Department of Mathematics, Wayne State University, Detroit, MI 48202, USA b Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong
Received 1 April 2004; received in revised form 18 November 2004; accepted 18 January 2005 Available online 17 March 2005
Abstract In this paper, we consider hybrid controls for a class of linear quadratic problems with white noise perturbation and Markov regime switching, where the regime switching is modeled by a continuous-time Markov chain with a large state space and the control weights are indefinite. The use of large state space enables us to take various factors of uncertain environment into consideration, yet it creates computational overhead and adds difficulties. Aiming at reduction of complexity, we demonstrate how to construct near-optimal controls. First, in the model, we introduce a small parameter to highlight the contrast of the weak and strong interactions and fast and slow motions. This results in a two-time-scale formulation. In view of the recent developments on LQ problems with indefinite control weights and twotime-scale Markov chains, we then establish the convergence of the system of Riccati equations associated with the hybrid LQ problem. Based on the optimal feedback control of the limit system obtained using the system of Riccati equations, we construct controls for the original problem and show that such controls are near-optimal. A numerical demonstration of a simple system is presented here. 䉷 2005 Elsevier Ltd. All rights reserved. Keywords: Hybrid system; LQ problem; Indefinite weight; Near optimality
1. Introduction How can we deal with optimal controls for hybrid linear quadratic systems with white noise perturbation, in which the control weights are indefinite and the total number of elements in the switching set is large? More specifically, how can we reduce the complexity for handling such systems when the random process governing the regime switching is a Markov chain with large state space? The objective of this paper is to answer these questions. The models considered
夡 This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor P. Colaneri under the direction of Editor R. Tempo. ∗ Corresponding author. E-mail addresses:
[email protected] (Y.J. Liu),
[email protected] (G. Yin),
[email protected] (X.Y. Zhou).
0005-1098/$ - see front matter 䉷 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2005.01.002
and the methods we use feature in several distinct aspects, including regime switching, indefinite controls, multi-scale modeling, and comparison controls. 1.1. Indefinite LQ problems LQ problems have a long and illustrated history. Nowadays, LQ control methodology is studied in almost every standard textbook in control theory, and is a “must” in undergraduate and graduate curriculum. Nevertheless, until very recently, the main focus has been on the case when the control weight (the matrix associated with the control action) is positive definite. In fact, for deterministic systems, if the control weight is not positive definite, the problem is not well posed. It has been shown recently that for stochastic systems, LQ with indefinite control weights could make sense if certain balance is reached (Chen, Li, & Zhou, 1998; Li & Zhou, 2002; Yong & Zhou, 1999). The control weights can be indefinite or even negative definite as long as they are
1064
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070
not “too negative”. The stochastic influence, to some extent, compensates the negative control weights to make the problem well posed. The reference (Chen et al., 1998, p. 1686) contains a couple of simple although illuminating examples to explain this point. Owing to many applications, especially optimal controls in financial engineering, LQ with indefinite control weights have drawn increasing attention lately. 1.2. Hybrid systems Along another line, realizing that many optimal control problems are hybrid in nature, not only do they include the usual analog dynamics, but they also involve discrete events. In response to such demand, much effort has been devoted to the study of such systems. Since uncertainty is ubiquitous in daily life, it is necessary to take random environment into consideration. One of possible ways of handling the random environment is to use a regime switching model. It has been found that such models can be used in many applications. To mention just a few, they include tracking time-varying parameters, filtering (Yin & Dey, 2003), and mean-variance portfolio selection (Zhou & Yin, 2003). Mean-variance model was originally proposed in Markowitz (1952) for portfolio selection in a single period. It enables an investor to seek highest return after specifying the risk level (given by the variance of the return) that is acceptable to him/her. Observing that the environment does influence the market significantly resulting in the variations of key system parameters, such as appreciation and volatility rates, a viable alternative is to consider a regime switching model. Then the corresponding control system in the mean-variance portfolio selection problem becomes a LQ problem modulated by a continuous-time Markov chain. The inclusion of random discrete events opens up the possibilities for more realistic modeling and enables the consideration of various random environment factors. But the underlying system may be qualitatively different from one without the switching process. There is a vast literature on LQ controls involving Markov chains. For jump linear systems, coupled Riccati systems, and singular perturbation approaches in such problems, we refer the reader to Ji and Chizeck (1992), Mariton and Bertrand (1985), Pan and Ba¸sar (1995), Pervozvanskii and Gaitsgori (1988), Phillips and Kokotovic (1981), Sethi and Zhang (1994), Yin and Dey (2003) and Yin and Zhang (1998) and references therein. 1.3. Large-scale systems Focusing on the Markov modulated regime switching systems, from a modeling point of view, when we take many factors into consideration, the state space of the Markov chain becomes inevitably large. The resulting control
problem becomes a large-scale one. Although, one may proceed with the dynamic programming approach, and obtain the associated Hamilton–Jacobi–Bellman (HJB) equations, the total number of equations (equal to the number of states of the Markov chain) is large. Computational complexity is of practical concern. To resolve this issue, we realize that in many large-scale systems, not all components, parts, and subsystems evolve at the same rate. Some of them change at a fast pace, and others vary slowly. To take advantage of the intrinsic time-scale separation, one can introduce a small parameter > 0 to bring out the hierarchical structures of the underlying systems (see Abbad, Filar, & Bielecki, 1992; Hoppenstead, Salehi, & Skorohod, 1996; Sethi & Zhang, 1994; Yin & Zhang, 1998; Zhang & Yin, 1999 among others). We remark that in practical systems, is just a fixed parameter and it separates different scales in the sense of order of magnitude in the generators. It does not need to go to 0. For further explanation and a specific example, see Yin and Zhang (1998), Section 3.6: Interpretation of time-scale separation. In this paper, we use a two-time-scale formulation for the Markovian hybrid system having indefinite control weights. We obtain a system of Riccati equations, prove the convergence of the system of Riccati equations, construct near-optimal controls based on the limit system, and establish near optimality. 1.4. Organization of the paper The rest of the paper is arranged as follows. Section 2 begins with the formulation of the problem. Section 3 discusses the optimal control, which is of theoretical, rather than computational, interest. Section 4 studies near-optimal controls. As an illustration, Section 5 presents a numerical example for demonstration. Section 6 concludes the paper with further remarks.
2. Problem formulation Let (t) be a homogeneous continuous-time Markov chain with state space M = {1, . . . , m} and w(t) be a standard one-dimensional Brownian motion on a complete probability space (, F, P ). Suppose that the generator of the Markov chain isgiven by Q = (qij ) ∈ Rm×m . That is, qij 0 for i = j and j qij = 0 for each i ∈ M. Let Ft = {w(s), (s) : s t}. We will work with a finite horizon [0, T ] for some T > 0. Our objective is to find the optimal control u(·) to minimize the expected quadratic cost function J (s, x, , u(·)) = Es
T s
[x (t)M((t))x(t)
+ u (t)N ((t))u(t)] dt
+ x (T )D((T ))x(T ) ,
(1)
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070
subject to the system constraint x(s) = x, (s) = , and dx(t) = [A((t))x(t) + B((t))u(t)] dt + C((t))u(t) dw(t), for s t T ,
For i ∈ M, (2)
where x(t) ∈ Rn1 is the state, u(t) ∈ Rn2 is the control, and Es the expectation given (s) = and x(s) = x. A control is admissible if it is an Ft -adapted measurable process. The collection of all admissible controls is denoted by Uad . In (2), A(i), M(i), D(i) ∈ Rn1 ×n1 , B(i), C(i) ∈ Rn1 ×n2 , and N(i) ∈ Rn2 ×n2 for each i ∈ M. i = 1, . . . , m. Define the value function v(s, x, )=inf u(·)∈Uad J (s, x, , u(·)). We use the following conditions throughout this paper. (A1) For each i ∈ M, M(i) is a symmetric positive semidefinite, N(i) is symmetric, and D(i) is symmetric positive semidefinite. (A2) The process (·) and w(·) are independent. Remark 1. We do not require the control weights N (i) to be positive definite. This enables us to treat applications that do not verify the usual positive definiteness conditions. The regime switching allows us to take many factors with discrete shifts into consideration. The condition on the independence of w(·) and (·) is natural. Notation. For G ∈ Rn1 ×n2 , G denotes its transpose and |G| denotes its norm when G is a square matrix or a vector. We use S n to denote the space of all n × n symmetn the subspace of all positive semidefinite ric matrices, S+ n the subspace of all positive defin matrices of S , and S+ n nite matrices of S . For a matrix G ∈ S n , G 0 means n , and G > 0 indicates that G ∈ n . The noS+ that G ∈ S+ tation C([0, T ]; X) is for the Banach space of X-valued continuous functions on [0, T ] endowed with the maximum norm for a given Hilbert space X. Given a probability space (, F, P ) with a filtration Ft : a t b (−∞ a b ∞), a Hilbert space X with the norm | · |X , p and p (1 p ∞), define the Banach space LF ([a, b]; X) to be a collection of Ft -adapted, X-valued measurable prob p cess (·) on [a, b] such that E a |(t)|X dt < ∞, with the b p norm |(·)|F,p = (E a |(t)|X dt)1/p . Suppose that Q is the generator of a continuous-time Markov chain. For a suitable h(·), Qh(·)(i) =
m
qij h(j ) =
j =1
qij (h(j ) − h(i)).
1065
(3)
j =i
3. Optimal controls In this section, we highlight the optimal control for the LQ problem (1)–(2). Since most of the results in this section are extensions of those without Markovian switching in (Chen et al., 1998), we only provide detailed argument of proofs that are distinct from previous work. Introduce a system of Riccati equations as follows:
˙ i) = − K(t, i)A(i) − A (i)K(t, i) − M(i) K(t, m − qij K(t, j ) + K(t, i)B(i)(N (i) j =1
+ C (i)K(t, i)C(i))−1 B (i)K(t, i), s t T , K(T , i) = D(i), N (i) + C (i)K(t, i)C(i) > 0. (4) The following result is proved in [Li and Zhou (2002), Corollary 3.1]. Theorem 2. If {K(·, i) : i ∈ M} is a solution of the system of Riccati equations (4), then the optimal feedback control (with v(s, x, i) = x K(s, i)x) for (1)–(2) is u(t) = −
m
(N (i) + C (i)K(t, i)C(i))−1
i=1
× B (i)K(t, i)I{(t)=i} x(t).
(5)
Next, we derive a sufficient and necessary condition for the unique solvability of the system of Riccati equations (4). First, we have the following lemma. Lemma 3. Assume N (i) > 0. Then (4) has a solution {K(·, i) : i ∈ M} on [0, T ] with K(t, i) 0, ∀t ∈ [0, T ] and i ∈ M. Proof. Since N (i) + C (i)D(i)C(i) > 0, it follows from the classical theory of ordinary differential equations (see, e.g., Walter (1998), p. 66) that the Riccati equations (4) have a local solution. Let (t¯, T ] ⊂ [0, T ] be the maximal interval on which a local solution {K(·, i) : i ∈ M} exists with N (i) + C (i)K(t, i)C(i) > 0. In order to prove that the existence is in fact global on [0, T ], it suffices to show that there is no escape time, or each K(·, i) is uniformly bounded on (t¯, T ]. To this end, we will show that there exists a positive scalar > 0 independent of t¯ such that 0 K(t, i) I , for all t ∈ (t¯, T ] and i ∈ M. First, let x0 ∈ Rn1 be an arbitrary initial state of system (2) starting at a time t ∈ (t¯, T ], and (t) = i. Then Theorem 2 implies that x0 K(t, i)x0 =inf u(·)∈Uad J (t, x0 , i, u(·)) 0 owing to the positive (semi)definiteness assumption of the cost weighting matrices. Next, let x(·) be a solution to (2) corresponding to the initial (x(t), (t)) = (x0 , i) and the admissible control u(·) ≡ 0. Then Theorem 2 implies x0 K(t, i)x0 E
T
x (s)M((s))x(s) ds t +x(T ) D((T ))x(T ) , for all i ∈ M.
From the above inequality and the fact that x(·) satisfies a homogeneous linear equation, it follows that there exists a scalar > 0 such that x0 K(t, i)x0 x0 x0 . The proof is completed.
1066
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070
n2 m Denote W = {(W (1), . . . , W (m)) ∈ [L∞ (0, T ; Sˆ+ )] n −1 2 ∞ ˆ | W (i) ∈ L (0, T ; S+ ), ∀i ∈ M}. It is easy to check n2 m )] ⊂ W. By virtue of Lemma 3, for any that [C(0, T ; Sˆ+ given W = (W (1), . . . , W (m)) ∈ W the following equation
+ K(t, i)B(i)W (i) B (i)K(t, i), K(T , i) = D(i), i = 1, 2, . . . , m,
s t T , (6)
admits a unique solution W = (W (·, 1), . . . , W (·, m)) ∈ n2 m [C(0, T ; S+ )] . Thus we can define a mapping : W → n2 m [C(0, T ; S+ )] as K ≡ (K(·, 1), . . . , K(·, m)) = (W ) ≡ (1 (W ), . . . , m (W )). Theorem 4 can be proved in exactly the same way as Chen et al. (1998, Theorem 4.2). Theorem 4. The system of Riccati equations (4) admits a unique solution if and only if there exists W = n2 m (W (1), . . . , W (m)) ∈ [C(0, T ; Sˆ+ )] such that N (i) + C (i)i (W )C(i) W (i), for each i ∈ M. 4. Near-optimal controls By using the results obtained in the last section, we are able to derive optimal LQ controls. At a first glance, it seems that the problem is largely solved. Nevertheless, it barely begins. Unlike the usual LQ problem, in lieu of one Riccati equation, we have to solve a system of Riccati equations. The number of Riccati equations in the system is exactly the same as that of |M|, the cardinality of the set M or the total number of states of the Markov chain. Note that in the LQ formulation with regime switching, the factor process (t) often has a large state space. The large dimensionality of the Markovian state presents us with opportunities of modeling various aspects of uncertainty, but it also creates computational burden. If |M| is very large, the computational task could be infeasible. Therefore, not only is it desirable, but also it is necessary to look into better alternatives. Our main concern here is to treat such large-scale systems and obtain the desired “good” controls with reduced computational effort. To reduce the complexity, we introduce a small parameter > 0 into the system under consideration. The purpose is to highlight the contrast of the fast and slow transitions among different Markovian states and to bring out the hierarchical structure of the underlying system. This task is accomplished by letting the generator of the Markov chain depend on a scale parameter . That is, Q = Q with 1 Q = Q + Q,
(8)
subject to the system constraint
j =1
s
+ u (t)N ( (t))u(t)] dt +x (T )D( (T ))x(T ) ,
˙ i) = − K(t, i)A(i) − A (i)K(t, i) − M(i) K(t, m − qij K(t, j ) −1
(t) = (t). In the rest of the paper, our objective is to find the optimal control u(·) to minimize T [x (t)M( (t))x(t) J (s, x, , u(·)) = Es
(7)
and Q are generators of continuous-time where both Q Markov chains. Consisting of a rapidly changing part and a slowly varying part, the Markov chain can be written as
dx(t) = [A( (t))x(t) + B( (t))u(t)] dt + C( (t))u(t) dw(t) for s t T ,
(9)
(s)
with x(s) = x and = . Note that in the above, x(t) should really have been written as x (t). We suppressed dependence for notational simplicity. In the rest of the paper, whenever there is a need to emphasize the dependence of , we will retain it. It will become clear from the context. It follows that the value function is v (s, x, ) = inf u(·)∈Uad J (s, x, , u(·)). We retain the assumptions (A1) and (A2). Then by the generalized Ito lemma for Markovmodulated processes in Björk (1980), the optimal feedback control provided in Theorem 2, the solvability of the Riccati system Lemma 3, and Theorem 4 all hold. In fact, we have the following system of Riccati equations: K˙ (t, i) = − K (t, i)A(i) − A (i)K (t, i) − M(i) m − qij K (t, j ) + K (t, i)B(i) j =1
× (N (i) + C (i)K (t, i)C(i))−1 B (i)K (t, i), (10) K (T , i) = D(i), N (i) + C (i)K (t, i)C(i) > 0, for i ∈ M. The corresponding optimal control is u(t) = −
m i=1
(N (i) + C (i)K (t, i)C(i))−1
× B (i)K (t, i)I{ (t)=i} x (t),
i ∈ M.
(11)
The essence of our approach is that in lieu of solving the optimal control problem directly, we construct approximations, which are much simpler to solve than that of the original problem, leading to near optimality. Using the hierarchical structure of the two-time scales, we decompose the state space naturally so that within each subspace, the transition frequency is of the same order of magnitude, and transitions among different subspaces evolve in a much slower rate. For a finite-state Markov chain, there is at least one state that is recurrent. Either all states are recurrent, or in addition to recurrent states, there are some transient states. Taking advantage of the time-scale separation, we lump all the states in each ergodic class (each subspace of the recurrent states) into one state and obtain an aggregated process. Then the entire problem is recast into one, in which the total number of aggregated states is l. If l >m, a substantial reduction of complexity will be achieved. We show that as the small parameter goes to zero, the value functions converge to that of the limit problem. Using the optimal control of the limit
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070
problem, we then build controls leading to approximate optimality. Suppose that the underlying Markov chain is divisible to a number of recurrent groups (or ergodic classes) such that it fluctuates rapidly among different states within a group consisting of recurrent states, but jump less frequently from one group to another. Let the generator of the has the form Markov chain be given by (7). Assume thatQ 1
= diag(Q1 , . . . , Ql ), where diag(Q ,...,Q l ) denotes a Q l , 1 , . . . , Q block diagonal matrix having matrix entries Q m ×m k ∈ R k k are irreducible, for k = 1, . . . , l, and and Q l k=1 mk = m. Let Mk = {k1 , . . . , kmk } for k = 1, . . . , l k . Then the state denotes the state space corresponding to Q space of the Markov chain admits the decomposition M = M1 ∪ · · · ∪ Ml = {11 , . . . , 1m1 } ∪ · · · ∪ {l1 , . . . , lml }. Note that in the above, we have relabelled the states by use of kj with k = 1, . . . , l and j = 1, . . . , mk . Because k = ( = ( Q qijk )mk ×mk and Q qij )m×m are generators, for mk k k=1, . . . , l, j =1 qij =0 fori =1, . . . , mk , and m qij =0, j =1 represents the rapidly for i = 1, . . . , m. In the above, Q describes the slowly varying compochanging part and Q nents. The slow and fast components are coupled through weak and strong interactions in the sense that the underlying Markov chain fluctuates rapidly within a single group Mk and varies less frequently among groups Mk and Mj for k = j . Aggregating the states in Mk as a single super-state, We obtain an all such states are coupled by the matrix Q. aggregated process (·) defined by (t) = k if (t) ∈ Mk . The process (·) is not necessarily Markovian; see Yin & Zhang (1998), Example 7.5. Nevertheless, by a probabilistic argument, it is shown in Yin & Zhang (1998, Theorem 7.4, pp. 172–174) that (·) converges weakly to a continuoustime Markov chain (·) generated by diag(1m1 , . . . , 1m ), Q = diag( 1 , . . . , l )Q l
k
(12) k , Q
where is the stationary distribution of and 1 = (1, . . . , 1) ∈ R . Moreover, for any bounded deterministic (·) (see Yin & Zhang (1998), Theorem 7.2, pp. 170–171), we have the following second moment T bounds: E( s [I{ (t)=kj } − kj I{ (t)=k} ](t) dt)2 = O(). For preparation of subsequent study, we establish two lemmas concerning the a priori estimate of K and its Lipschitz continuity. Lemma 5. There exist constants 1 and A such that for all s ∈ [0, T ] and i ∈ M, |K (s, i)| 1 (T + 1)eA T . Proof. We claim that for each i ∈ M, 0 v (s, i, x) 1 eA T |x|2 (T + 1). Clearly v (s, i, x) 0 because J (s, i, x, u(·)) 0 for all admissible u(·). To derive the upper bound, t set u0 (t) = 0. Then, under such a control, x(t) = x + s A( (r))x(r) dr. Using the Gronwall inequality, we have |x(t)| |x|eA T , where A = maxi |A(i)|. This inequality holds for all t ∈ [0, T ]. For 0 s t, v (s, i, x) J (s, i, x, u0 (·)) 1 eA T |x|2 (T + 1). Owing to Theorem 4, K (s, i) is symmetric and posi-
1067
tive semi-definite. By the definition of the matrix norm |K (s, i)| = max{eigenvalues of K (s, i)}, it suffices to verify that for any unit vector , K (s, i) 1 eA T (T + 1). By taking x = a with a being a scalar a 2 K (s, i) = v (s, i, a ) 1 eA T a 2 (T + 1). Dividing both sides above by a 2 , the proof is completed. Lemma 6. For i ∈ M, K (·, i) are uniformly (in ) Lipschitz continuous on [0, T ]. Proof. We first prove that v (s, i, x) is uniformly Lipschitz. Then, we prove the Lipschitz property of K . Step 1: There exists a constant 2 that may depend on T such that for any > 0 and (s, i, x) ∈ [0, T ] × M × Rn1 , |v (s + , i, x) − v (s, i, x)| 2 (|x|2 + 1) . For given (s, i, x), write the value function v (s, i, x) as T v (s, i, x) = E [(x o, (t)M( (t))x o (t) s
+ uo, (t)N ( (t))uo (t))] dt +x o, (T )D( (T ))x o (T ) ,
(13)
where uo (t) = uo (t, (t), x o (t)) is the optimal control defined in (5), x o (t) is the corresponding trajectory, and E is the conditional expectation given ( (s), x o (s)) = (i, x). By changing variable t → t + in (13), using similar argument as in Zhang & Yin (1999), we obtain v (s +
, i, x) − v (s, i, x) 2 (|x|2 + 1) . Moreover, we can also establish the reverse inequality v (s + , i, x) − v (s, i, x) − 2 (|x|2 + 1) . Step 2: As in the proof of Lemma 5, taking x = a with being a unit vector and sending a → ∞, we obtain |K (s +
, i) − K (s, i)| 2 . 4.1. Limit Riccati equations To proceed, we obtain a result concerning the limit Riccati equations. k k For any function F on M, we define F (k) = m j =1 j F (kj ) for k = 1, . . . , l, for F1 and F2 , k k define F1 F2 (k) = m j =1 j F1 (kj )F2 (kj ). Theorem 7. For k = 1, . . . , l and j = 1, . . . , mk , K (s, kj ) → K(s, k) uniformly on [0, T ] as → 0, where K(s, k) is the unique solution to ˙ K(s, k) = −K(s, k)A(k) − A (k)K(s, k) − M(k)
+ K(s, k)B(k)(N (k) + C (k)K(s, k)C(k))−1 B (k) (14) × K(s, k) − Q K(s, ·)(k), def mk k j =1 j D(kj ).
where K(T , k) = D(k) =
Proof. By virtue of Lemmas 5 and 6, {K (s, kj )} is equicontinuous and uniformly bounded. By Arzela–Ascoli theorem, for each sequence of ( → 0), extract a further subsequence (still indexed by for simplicity)
1068
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070
such that K (s, kj ) converges uniformly on [0, T ] to a continuous function K(s, kj ). The Riccati equation in its integral form with time running backward and terminal condition K (T , kj ) = D(kj ) is K (s, kj ) = T D(kj ) + s [ K (r, kj )A(kj ) + A (kj )K (r, kj ) + M(kj ) − K (r, kj )B(kj )P −1 (kj ) B (kj )K (r, kj ) + Q K (r, ·)(kj )] dr, where P −1 (kj ) = (N (kj ) + C (kj ) K(r, kj )C(kj ))−1 and Q is given by (7). Note P −1 (kj ) should have been written as P −1 (r, kj ). Here and henceforth, we have suppressed the time dependence for notational simplity. By the uniform boundedness of K , multiplying both sides of the above equation by k K(r, ·)(kj ) dr = and sending → 0 yield sT Q T k K (r, ·)(kj ) dr = 0, for s ∈ [0, T ]. Owing to lim→0 s Q k K(s, ·)(kj ) = 0, the continuity of K(s, kj ), we obtain Q for s ∈ [0, T ]. The irreducibility of Qk implies that the k is one-dimensional spanned by 1m . Thus, null space of Q k the vector (K(s, k1 ), K(s, k2 ), . . . , K(s, kmk )) is in the k and as a result K(s, k1 ) = K(s, k2 ) = null space of Q · · · = K(s, kmk ) := K(s, k) (see also [Yin & Zhang, 1998, Lemma A. 39, pp. 327–328], for further details). We proceed to show K(s, k) = K(s, k). For each k = 1, . . . , l, multiplying K (s, kj ) by kj , summing over the index j and sending → 0, we obtain K(s, k) = D(k)+
T s
[K(r, k)A(k)+A (k)K(r, k)+M(k)
− K(r, k)BP −1 B (k)K(r, k) + QK(r, ·)(k)] dr. [Recall notation (3).] Thus, the uniqueness of the Riccati equation implies K(s, k) = K(s, k). As a result, K (s, kj ) → K(s, k). 4.2. Near-optimal controls The convergence of K (s, kj ) leads to that of v (s, kj , x) = x K (s, kj )x. It follows that as → 0, v (s, kj , x) → v(s, k, x) for j = 1, . . . , mk , where v(s, k, x) = x K(s, k)x corresponds to the value function of a limit problem. Let U denote the control set of the limit problem: U = {U = (U 1 , . . . , U l ) : U k = (uk1 , . . . , ukmk , ukj ∈ Rn2 )}. Define f (s, k, x, U ) = A(k)x +
mk j =1
U) = N(k,
mk j =1
kj B(kj )ukj ,
kj (ukj N (kj )ukj ),
U )C (k, U ) = C(k,
mk j =1
kj C(kj )ukj ukj , C (kj ).
(15)
The control problem for the limit system is T min J (s, k, x, U (·)) = E[ s [x (t)M((t))x(t) ((t), U (t))] dt +N +x (T )D(k)x(T )], U ) dw(t), s.t. dx(t) = f (t, (t), x(t), U (t)) dt + C(k, x(s) = x, where (·) ∈ {1, . . . , l} is a Markov chain generated by Q. The optimal control for the limit problem is U o (s, k, x) = (U o,1 (s, x), . . . , U o,l (s, x)) with U o,k (s, x)=(uo,k1 (s, x), . . ., uo,kmk (s, x)), and uo,kj (s, x)= −P −1 (kj )B (kj )K(s, k)x. Using such controls (similar to Sethi & Zhang, 1994; Yin & Zhang, 1998), we k o,kj (s, x) for construct u (s, , x) = lk=1 m j =1 I{=kj } u the original problem. This control can be written as u (s, , x) = −P −1 ()B ()K(s, k)x, if ∈ Mk . We denote u (t) = u (t, (t), x(t)) for the original problem. It will be shown that this is a near-optimal control. If A(kj ) = A(k), B(kj ) = B(k),C(kj ) = C(k), M(kj ) = M(k), and N (kj ) = N (k) are independent of j , owing to the second moment bounds preceding Lemma 5, we may replace I{ (t)=kj } by I{ (t)=k} kj and l mk k o,kj (s, x), consider u (s, , x) = k=1 j =1 I{∈Mk } j u and write u (s, , x) = u (s, k, x). The control u only needs the information (t) ∈ Mk . As a result, we use u (t) = u (t, (t), x(t)). Theorem 8. The following assertions hold. (i) The u (t) given above is near optimal in that |J (s, , x, u ) − v (s, , x)| → 0 as → 0. (ii) Assume A(kj ) = A(k), B(kj ) = B(k), C(kj ) = C(k), M(kj ) = M(k) and N (kj ) = N (k) independent of j . Then u (t) defined above is near-optimal in that |J (s, , x, u ) −v (s, , x)| → 0 as → 0. Proof. Recall (t) = k if (t) ∈ Mk . The constructed control u can be written as u (t) = −P −1 ( (t))B ( (t)) K(t, (t))x (t),where x (t) is the corresponding trajectory governed by the differential equation dx (t) = (A( (t)) − B( (t))P −1 ( (t))B ( (t)) × K(t, (t)))x (t) dt − C( (t))P −1 ( (t)) × B ( (t))K(t, (t))x (t) dw(t) with x (s) = x. Let x(t) denote the optimal trajectory of the limit problem. Then dx(t) = f (t, (t), x(t), U o (t)) dt + defined U o ) dw(t) with x(s) = x, and f (·) and C(·) C(k, by (15). Using the convergence of (·) to (·) and owing to the second moment bounds preceding Lemma 5, as in Yin & Zhang (1998), Chapter 9, by using the Skorohod representation, E|x (t) − x(t)| → 0, so |J (s, , x, u ) − v(s, k, x)| → 0. This together with v → v leads to lim→0 |J (s, , x, u )−v (s, , x)| → 0. To obtain the second part of the theorem, under the condition A(kj ) = A(k), B(kj )=B(k), C(kj )=C(k), M(kj )=M(k), and N (kj )= N (k), we haveu (t) = −P −1 ((t))B ((t))K(t, (t))x (t).
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070 Table 1 Error bounds
¯ |K − K|
|v − v|
1.7234 2.1477 2.2003 2.2065
0.3394 0.3553 0.3549 0.3618
0.1 0.01 0.001 0.0001
1069
tems, is a fixed parameter that may present different order of magnitude of the elements of the generators. The nearoptimal control developed gives an effective approximation using comparison controls. As a result, instead of solving complex systems with the Markov chain having large state spaces, we can deal with much simplified systems.
Acknowledgements The optimal trajectory x(t) for the limit problem is U 0 ) dw(t) with dx(t) = f (t, (t), x(t), U 0 (t)) dt + C(k, x(s) = x, and f (·) and C(·) being defined in (15). Then the convergence of x (t) − x(t) and J − v can be verified in a similar way as in the previous case.
The research was supported by Wayne State University Research Enhancement Program, the National Science Foundation, and the RGC Earmarked Grants. CUHK4234/01E and CUHK4175/03E.
5. A numerical example
References
Consider a numerical example with strictly negative definite control weights N (i) for all i ∈ M. Let (t) ∈ M = {1, 2, 3}, and use (7) with −0.7 0.3 0.4 = Q 0.4 −0.7 0.3 , 0.3 0.4 −0.7 −0.2 0.1 0.1 = Q 0.2 −0.4 0.2 . 0.1 0.2 −0.3
Abbad, M., Filar, J. A., & Bielecki, T. R. (1992). Algorithms for singularly perturbed limiting average Markov control problems. IEEE Transactions on Automatic Control, AC-37, 1421–1425. Björk, T. (1980). Finite dimensional optimal filters for a class of Ito processes with jumping parameters. Stochastics, 4, 167–183. Chen, S., Li, X., & Zhou, X. Y. (1998). Stochastic linear quadratic regulators with indefinite control weight costs. SIAM Journal on Control and Optimization, 36, 1685–1702. Hoppenstead, F., Salehi, H., & Skorohod, A. (1996). An averaging principle for dynamic systems in Hilber space with Markov random perturbations. Stochastic Process and their Applications, 61, 85–108. Ji, Y., & Chizeck, H. J. (1992). Jump linear quadratic Gaussian control in continuous-time. IEEE Transactions on Automatic Control, 37, 1884–1892. Li, X., & Zhou, X. Y. (2002). Indefinite stochastic LQ controls with Markovian jumps in a finite time horizon. Communications on Information and Systems, 2, 265–282. Mariton, M., & Bertrand, P. (1985). Robust jump linear quadratic control: A mode stabilizing solution. IEEE Transactions on Automatic Control, 30, 1145–1147. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 77–91. Pan, Z. G., Ba¸sar, T. (1995). H ∞ -control of Markovian jump linear systems and solutions to associated piecewise-deterministic differential games. In G. J. Olsder (Ed.), New trends in dynamic games and applications (pp. 61–94). Pervozvanskii, A. A., & Gaitsgori, V. G. (1988). Theory of suboptimal decisions: decomposition and aggregation. Dordrecht: Kluwer. Phillips, R. G., & Kokotovic, P. V. (1981). A singular perturbation approach to modelling and control of Markov chains. IEEE Transactions on Automatic Control, 26, 1087–1094. Sethi, S. P., & Zhang, Q. (1994). Hierarchical decision making in stochastic manufacturing systems. Boston, MA: Birkhäuser. Walter, W. (1998). Ordinary differential equations. New York: Springer. Yin, G., & Dey, S. (2003). Weak convergence of hybrid filtering problems involving nearly completely decomposable hidden Markov chains. SIAM Journal on Control and Optimization, 41, 1820–1842. Yin, G., & Zhang, Q. (1998). Continuous-time Markov chains and applications: a singular perturbation approach. New York: Springer. Yong, J., & Zhou, X. Y. (1999). Stochastic controls: Hamiltonian systems and HJB equations. New York: Springer. Zhang, Q., & Yin, G. (1999). On nearly optimal controls of hybrid LQG problems. IEEE Transactions on Automatic Control, 44, 2271–2282. Zhou, X. Y., & Yin, G. (2003). Markowitz’s mean-variance portfolio selection with regime switching: a continuous-time model. SIAM Journal on Control and Optimization, 42, 1466–1482.
Consider the control objective (8) subject to the constraint (9), in which x(0) = 3, (0) = 2, for 0 t T = 10, A(1) = 0.2, A(2)=0.3, A(3)=0.4, B(1)=18, B(2)=19, B(3)=22, C(1) = 10, C(2) = 11, C(3) = 12, M(1) = 1, M(2) = 2, M(3) = 3, N(1) = −1, N (2) = −2, N (3) = −3, D(i) = 4, = (1/3, 1/3, 1/3). The limit system of Riccati equations is given by (14) with Q = 0. By using an explicit Runge–Kutta solver with relative accuracy tolerance 10−5 and absolute error tolerance 10−8 , we numerically solve the Riccati equations to obtain K (·, i) and K(·). The integration step size is chosen to be h = 0.01. Define |K − K| = T / h (h/T ) 3k=1 j =1 |K (j h, k) − K(j h)|, where z denote the integer part of z ∈ R. The calculated error bounds are provided in Table 1.
6. Concluding remark Near-optimal controls of LQ with regime switching and indefinite control weights are treated in this paper. Due to page limitation only Markov chains with recurrent states are dealt with. Similar techniques can be used for Markov chains including transient states. Currently, the formulation is under the assumption that a scalar Brownian motion is used. The techniques used and the near-optimal control construction can be extended to multidimensional diffusion with regime switching. Although the asymptotic results are obtained with the small parameter → 0, in practical sys-
1070
Y.J. Liu et al. / Automatica 41 (2005) 1063 – 1070 Yuanjin Liu received his B.S. degree in Mathematics in 1996 and M.A. degree in Mathematical statistics in 1999, from Tongji University, Shanghai, China. He was awarded an outstanding graduate of Shanghai when he received his B.S. degree in 1996, and his M.A. degree in 1999, respectively. He is currently working toward the Ph.D. degree in Mathematics at Wayne State University, Detroit, Michigan. His research interests include stochastic approximation and optimization, singular perturbation, mathematics of finance.
G. George Yin received his B.S. degree in mathematics from the University of Delaware in 1983, M.S. in Electrical Engineering and Ph.D. in Applied Mathematics from Brown University in 1987. Then he joined Department of Mathematics, Wayne State University, where he became a professor in 1996. He severed on Mathematical Review Date Base Committee and various conference program committees, and was SIAM Representative to the 34th CDC, Co-Chair of 1996 AMS-SIAM Summer Seminar in Applied Mathematics, Co-Chair of 2003 AMS-IMS-SIAM Summer Research Conference, and Co-organizer of 2005 IMA Workshop on Wireless Communications.
He was on the editorial board of Stochastic Optimization & Design, editor of SIAM Activity Group on Control and Systems Theory Newsletters, and an Associate Editor of IEEE Transactions on Automatic Control. He is an associate editor of SIAM Journal on Control and Optimization and an associate editor of three other journals. He is a Fellow of IEEE. Xun Yu Zhou got his BSc in pure mathematics in 1984 and his Ph.D. in operations research and control theory in 1989, both from Fudan University. He did his postdoctoral researches at Kobe University (Science Faculty) and University of Toronto (Management School) from 1989 to 1993, and joined the Chinese University of Hong Kong (Engineering School) in 1993 where he is now a Professor. His research interests are in stochastic control, financial engineering and discrete-event manufacturing systems. He has published more than 70 journal papers, 1 research monograph, and 2 edited books. He is a Fellow of IEEE and a Croucher Senior Fellow. Selected honors include SIAM Outstanding Paper Prize, INFORMS Meritorious Service Award, and Alexander von Humboldt Research Fellowship. He is or was on the editorial board of Operations Research (1999-), Mathematical Finance (2001-), and IEEE Transactions on Automatic Control (1999-2003).