Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
Contents lists available at ScienceDirect
Nonlinear Analysis: Hybrid Systems journal homepage: www.elsevier.com/locate/nahs
Optimization of hybrid stochastic differential systems in communications networks Hana Baili Supélec, 3 rue Joliot-Curie, 91192 Gif-sur-Yvette, France
article
info
Article history: Received 31 October 2012 Accepted 20 January 2015 Keywords: Heavy traffic approximation Optimal control Hybrid stochastic differential systems
abstract This paper addresses issues from applied stochastic analysis for modeling and solving control problems in communications networks. We consider the problem of optimal scheduling in a wireless system with time varying traffic. The system is handled by a single base station transmitting over time varying channels. This may be the case in practice for a hybrid TDMA–CDMA (Time Division Multiple Access–Code Division Multiple Access) system. Heavy traffic approximation for the physical system yields a problem of optimal control in a constrained hybrid stochastic differential system. Here constrained means bounded or reflected in the K -dimensional positive orthant. In this work we establish a closed form solution for the nascent optimal control problem. The aim is to find a control which minimizes the expected total delay for the users. The control is constrained to satisfy some inequalities. Hence it seems natural that the cost function involve a penalty term for breaking these inequalities. We study this control problem by a dynamic programming approach and we are led to the resolution of a Hamilton–Jacobi–Bellman (HJB) equation in the finite dimensional space RK+ . While optimizing it turns out that the optimum falls inside the class of feedback controls. Hence our method consists in finding a smooth solution of Bellman’s equation and consequently getting a unique solution for the closed loop. Here a separation of variables enables us to construct an explicit C 2 solution of the HJB equation so that existence of a unique optimal control is proven. Further stochastic analysis of the hybrid stochastic differential system under the optimal control is provided: the Fokker–Planck equation for the distribution density of the state. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Wireless communications networks have experienced a spectacular expansion; they provide today the predominant means of accessing the Internet. The Internet offers a ‘‘best effort’’ type service, i.e., the resources in a network are shared among all users, and when the number of users increases, the quality of service per user decreases. The need for communications networks with significantly better performance yields the question of how transmission resources must be allocated? We study this important problem of resources allocation – or scheduling – in the light of the works by Buche and Kushner [1], and Wu et al. [2]. The central part of this introduction describes the physical system, develops its heavy traffic approximation and states the optimization problem. This part is inspired from Sections 1, 2 and 3 of [1], and Section 2 of [2]. Here we use the notation of Buche and Kushner [1] and we shall give the connection with the notation of Wu et al. [2]. Let a wireless system have a fixed number K of users competing to transmit data from a single base station transmitter to respective mobiles (see Fig. 1). The connecting channels to the base station have randomly time varying states and are
E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.nahs.2015.01.003 1751-570X/© 2015 Elsevier Ltd. All rights reserved.
26
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
Index of symbols n nΛai
Order of data arrival and transmission rate Mean arrival rate of data to queue i
n C U˜ i (j) u¯ i (j) √1
nν
Mean transmission rate of data from queue i given that the channel is in state j
Basic resource allocated to queue i given that the channel is in state j ui (j, t ) Excess resource allocated to queue i given that the channel is in state j
λi (j)
Rate of transmission per basic-resource increment allocated to queue i given that the channel is in state j Channel fading rate Channel state process Scaled number of packets arrived to queue i by time t Scaled number of packets sent from queue i by time t using the reassignments from queue k of its basic allocations xni (t ) Scaled number of packets in queue i at time t x(t ) Limit of the scaled vector queue state as n → ∞ (in the sense of weak convergence of stochastic processes) W (t ) Multi-dimensional Wiener process of intensity A (xt , yt ) or ξt HSDS xt Continuous component of the state yt Discrete component of the state E State space Tk Jump time of the state zt Rate or intensity of spontaneous jumps Q Transition or reset kernel for the post jump state β Lagrange multiplier c (xt , yt ) or c (ξt ) Cost rate G Generating operator of the Markov process ξt D(G) Domain of G D Bounded open domain from RK+ τ First exit time of the continuous component from D LFP Fokker–Planck operator of the continuous component under the optimal controls D(LFP ) Domain of the Fokker–Planck operator nν L(nν t ) Ani (t ) ynki (t )
possibly correlated (e.g., when one mobile follows another one closely). Let L(t ) be a continuous-time Markov chain with values in {1, . . . , J }. The process L(t ) is interpreted to model the K -dimensional channel; thus any value j ∈ {1, . . . , J } of the process L(t ) determines the states of all the connecting channels of the wireless system. We assume that L(t ) is ergodic with known stationary distribution {π1 , . . . , πJ }. Besides the channel process, there are data stochastic processes arriving at the base station from the sources (the users) for the destinations (the mobiles). Data is measured in packets or cells of small size; it is queued until transmitted. A base station queue is assigned to each source–destination pair. There are two time rates (or intensities) in our system; first, let us introduce the data arrival rate. Denote n the order of mean arrival rate of data. That is, ¯ ai per time unit. The mean number of packets in for each i ∈ {1, . . . , K } batches of data arrive at queue i with mean rate nλ ¯ ai v¯ i per time unit. We use an arriving batch for queue i is denoted v¯ i . Hence the mean rate of arrival of packets to queue i is nλ a a ¯ i v¯ i = Λi . In practice the data arrival process is very fast. The second time rate in the physical system is given the definition λ by the rate of jumps or transitions between different states of the Markov chain L(t ). This is called the fading rate of the channel; it is expressed in jumps or transitions per time unit. Here we assume that the channel state variations are slower than the data arrivals but still fast; namely the order of the fading rate is nν for some ν ∈]0, 1[. This assumption of slow fading channel makes sense in practice. In an indoor wireless environment or a low mobility outdoor wireless environment, the channel coherence time, which represents the period of time during which the channel is invariant, is relatively high compared to the time slot duration. For example, in 3.5 G and 4 G wireless systems the time slot is equal to 0.66 ms and 0.5 ms respectively. For a low mobile speed of 10 Km/h the channel coherence time is approximately 50 ms. As regards the data service process, it is as fast as the data arrival process, i.e., the order of mean departure rate of data is n per time unit. Precisely, the conditional expectation for the rate of transmission from the ith queue under the conditioning event ‘‘the channel is in the jth state’’ is given by the capacity formula
Ri (j) , n C U˜ i (j) . U˜ i (j) denotes some allocated resource: power, time slot, bandwidth, etc., or all simultaneously. We assume that the capacity formula C is known; its determination is out of the scope of this paper and the following holds for any capacity expression. Actually, the rates of transmission (or equivalently, the capacities of the connecting channels) Ri (j) are random functions
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
27
Fig. 1. Physical system.
of time that ought to be written Ri (j, t ). The dependence on time is implicit in the formula above; it stems from the fact that the resources U˜ i (j) depend on time either explicitly, or through the lengths of the queues. The randomness of Ri (j) is obvious since it is a conditional expectation. The details of coding, modulation, detection and decoding are all embedded into the transmission rate processes Ri (j). So we do not need to specify how the system operates, whether TDMA, CDMA or otherwise. Furthermore, the available resource is divided in two parts 1 U˜ i (j, t ) = u¯ i (j) + √ ui (j, t ) nν
√
u¯ i (j) is called equilibrium or basic resource and ui (j, t )/ nν reserve or excess resource. The basic part is sufficient to handle the mean load at the queues in a sense that we shall precise later (see Eq. (1)). Since the total resource is only slightly greater, by the excess resource, than what is required for the mean demand, the system is operating in heavy traffic. The reserve part is used to control the fluctuations of the queues due to the fluctuations in the channel and arrival processes. A small amount of the total resource is subject to control, but it has a significant effect on the sizes of the queues. The excess resources are controls applied to the queues; they are random and evolving in time but the sum over the queues K
ui (j, t )
i =1
must remain bounded all the time, for every j ∈ {1, . . . , J }. Moreover, ui (j, t ) has possible values in a discrete set {0, umax }; the control is then called extreme or bang–bang. Here we put constraints: for every i ∈ {1, . . . , K } J
πj ui (j, t ) ≤ uave .
j=1
Obviously, we must have uave ≤ umax . The excess resources depend on time either explicitly, ui (j, t ), or through the vector queue size x(t ), i.e., ui (j, x(t )); they are allocated in such a way that the expected total delay for the users is minimized. As to the basic part of the resource, it has a known distribution over the queues. That is, u¯ i (j) for √ i ∈ {1, . . . , K }, j ∈ {1, . . . , J } are positive design parameters which are deterministic and constant in time. The level 1/ nν of the excess resource is a consequence of the high speed of the system. This re-scaling or re-sizing level for the excess resource is adequate as will be seen from the fact that the vector queue state process converges to a process that lends itself well to the optimization problem. Now writing the Taylor expansion of C U˜ i (j) around the point u¯ i (j), and retaining the items of up to the first
28
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
order yields n Ri (j, t ) ≈ n C (¯ui (j)) + √ C′ (¯ui (j)) ui (j, t ). nν Clearly the numbers C′ (¯ui (j)) are positive; for notational simplicity, C′ (¯ui (j)) are denoted λi (j) in the following. Right now let us assume λi (j) as known parameters; that is to say: given the channel state, there is a well-known rate of transmission per unit power (if the power is the resource to be shared). Now let us state the distribution over the queues of the basic part of the resource. There exist allocations u¯ i (j), j ∈ {1, . . . , J }, i ∈ {1, . . . , K }, such that for each i nΛai =
J
n C (¯ui (j)) πj .
(1)
j=1
In words: the mean arrival rate to queue i equals the average equilibrium departure rate (averaged over the channel state stationary distribution). Once the two time rates of our system n and nν are given, the channel state process, the queue state processes, and the data arrival and service processes that contribute to the contents of the queues are all parameterized by n (which appears as a superscript on each symbol). Again these rates induce re-scaling or re-sizing as follows. First, the channel process is given by Ln (t ) = L(nν t ). Secondly, define γ = 1 − ν2 and let xni (t ) denote 1/nγ times the number of packets in queue i at time t, and xn (t ) = xni (t )
i ∈ {1, . . . , K }.
γ
Let (t ) denote 1/n times the number of packets arrived to queue i by time t. There may be several empty queues at the same time, though this is a rare event in practice. When a queue empties, e.g., the queue k at time t, its basic allocations u¯ k (j), j ∈ {1, . . . , J }, are reassigned to all the other nonempty queues. Thus the number of packets sent from a queue is either due to its own allocations or to reassignments from some empty queues in the network. Let ynki (t ) denote 1/nγ times the number of packets sent from queue i by time t using the reassignments to queue i from queue k (as long as the latter is empty and the former is nonempty). Then Ani
1 ynki (t ) − γ n k̸=i
xni (t ) − xni (0) = Ani (t ) −
− +
t
1 nγ
0
j
t
1 nγ
t 0
ui (j)) ds 1{Ln (s)=j} n C (¯
j
n 1{Ln (s)=j} √ λi (j) ui (j, xn (s)) ds nν 1{Ln (s)=j} 1{xn (s)=0} n C (¯ ui (j)) ds. i
0
j
Notice the last item; it is the scaled number of packets that could have been sent, using the equilibrium departure rate, had the queue been nonempty. Since
Λai =
J
C (¯ ui (j)) πj
j =1
it follows that the scaled queue state process is given by
n n xni (t ) − xni (0) = Ani (t ) − γ Λai t + γ n n −
t 0
+
t 0
πj − 1{Ln (s)=j} C (¯ui (j)) ds
j
1{Ln (s)=j} λi (j) ui (j, xn (s)) ds
j
n nγ
t 0
1{Ln (s)=j} 1{xn (s)=0} C (¯ ui (j)) ds − i
k̸=i
j
Let us give some remarks about the process given by the last item: zin (t ) ,
n nγ
t 0
1{Ln (s)=j} 1{xn (s)=0} C (¯ ui (j)) ds −
i
j
Obviously,
• it starts at the origin;
k̸=i
ynki (t ).
ynki
(t ) .
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
29
• it is the difference between two continuous nondecreasing processes and thus continuous of finite variation; • it has positive increments only as and when the queue i empties, i.e., it has a positive increment dzin (t ) if xni (t ) = 0; otherwise its increments are nonpositive. This ‘‘pushing’’ at 0 assures the positivity of the scaled queue state process; it is called instantaneous reflection at the boundary of the state space R+ for the process xni (t ). The vector-valued process z n (t ) = zin (t )
i ∈ {1, . . . , K }
is called a reflection process. Remark 1. In [2] the order of the fading rate is n1−κ for some κ ∈]0, 1[; the relation
ν =1−κ makes the connection with the notation (mainly the scaling) of [2]. We are now ready to announce the convergence result of the heavy traffic limit, i.e., when the rates n and nν tend to infinity. Intuitively, we mean by heavy traffic limit that both the transmitter idle time and the spare resource tend to zero. First we assume for each i ∈ {1, . . . , K } that the sequence xni (0) of positive random variables converge in law as n → ∞ to some known distribution. The end of Section 3 of [1] is the proof that Ani (t ) − nnγ Λai t converges weakly to zero as n → ∞. In the main Refs. [1,2] the authors prove that the sequence of processes n nγ
t πj − 1{Ln (s)=j} C (¯ui (j)) ds 0
j
converges weakly to a Wiener process, denoted Wi (t ) in the following. In [1] there is a first step in computing A, the intensity of the limit non-standard K -dimensional Wiener process; see Eq. (9.5) of the Appendix. It is stated that Aik = E [Wi (1)Wk (1)] = 2E
0
∞
J
1{L(0)=j} − πj
1{L(s)=j′ } − πj′
ui (j)) C u¯ k (j ) ds . C (¯
′
j ,j ′ = 1
But this result is not usable; it needs much more development. In [3] we complete the computation of the intensity for the driving Wiener process W (t ) using issues from the theory of Markov processes. The large n limit for the scaled vector queue state process xni (t ), i ∈ {1, . . . , K }, solves the stochastic differential system (SDS) xi (t ) = xi (0) −
t J 0
πj λi (j)ui (j, x(s)) ds + Wi (t ) + zi (t )
(2)
j =1
x(t ) is a continuous process with state space E = RK+ . The initial condition x(0) for the SDS (2) is a random variable with known distribution. Everywhere x(t ) reaches the boundary ∂ E, it is reflected. Here the reflection from the boundary is not normal: when a queue empties, let us say the queue k at time t, its basic resource are reassigned to all the other nonempty queues, and thus the increment
dz (t ) 1 .. . dz (t ) = dzk (t ) . . . dzK (t ) has none of its components null (not only dzk (t ) which is of course positive). Under heavy traffic limit the optimization problem is to minimize, with respect to the controls, the expected total delay for the users. This is equivalent to minimize
∞
E 0
K
xi (t ) dt .
i=1
Heavy traffic limit or approximation was first introduced in the pioneering paper of [4]. Even when the physical system has a moderate traffic, the insights provided by heavy traffic limit is very interesting from operator visions. Outline This work has [1,2] as main related references, but there are major differences.
• In [1] the controls lie in a continuous set, and the optimal control problem has not been solved; here the controls are constrained from the start to be extreme.
30
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
• We consider a multi-user wireless network whereas [2] concerns only the one-dimensional case; besides, the proposed solution for the optimal control problem is not generalizable to the multi-dimensional case.
• Since the operating time interval is infinite, the proposed solution in [2] involves the ergodic distribution of the queue which is not known a priori, or worst of all, not existent. Ergodicity analysis is inevitable, especially after the optimal control is obtained; we think it is still lacking. • In [2] reasoning in terms of relaxed controls instead of ordinary controls is imposed, but not justified in view of the controlled SDS. Since the required existence and uniqueness of solutions to this SDS are fulfilled, this ‘‘relaxation’’ is actually not needed. We briefly describe the notion of relaxed controls in Remark 4 (see Section 3). This paper extends the questions addressed in its antecedent: in [3] the operating time interval is determinate and finite, and the control lies in a continuous set; here the control is extreme, and we shall show later that the operating time interval must be random and almost surely finite. In this paper likewise [3] the proposed solution involves the intensity of a non-standard multi-dimensional Wiener process induced by heavy traffic approximation. In [3] we compute this covariance matrix given the steady-state distribution of the channel process. For these arguments, this paper and its antecedent form complementary works on the subject. The organization of this paper is as follows. Since the controls are extreme, we suggest a hybrid stochastic differential system (HSDS) for modeling the optimal control problem; to the best of our knowledge, such a model is quiet novel as regards telecommunications. The major part of Section 2 is the description of the proposed model in general. Heavy traffic approximation of the physical system is a particular HSDS; its ingredients are identified at the end of Section 2. Sections 3.1 and 3.2 contain our main contributions; of interest are two topics: existence and uniqueness of optimal controls. Section 4 gives a demonstration of a bivariate test case. Section 5 concludes the paper; it is about stochastic analysis after the control problem has been completely solved. The Appendix is written as a short presentation for different models of hybrid stochastic differential systems. It may be read as a commentary on Section 2. 2. Further modeling: hybrid stochastic differential systems Formally, a hybrid stochastic differential system is described as follows. It is a continuous-time stochastic process with rcll sample paths. The state space E of a HSDS is a union of some manifolds with boundary:
My = (x, y) : y ∈ Y , x ∈ My .
y∈Y
Y is a finite discrete set; for each y ∈ Y , My is a locally compact set from Rd(y) , where d(y) is a function on Y into N \ {0}. A HSDS must therefore be denoted (xt , yt ) or ξt . As a stochastic process, ξt is based on some filtered probability space (Ω ; F; F = Ft , t ∈ R+ ; P). Customarily, a family Pξ ξ ∈E of probability measures on (Ω ; F) is defined by Pξ0 {•} = P {•|ξ (0) = ξ0 } .
By construction Pξ0 {ξ (0) = ξ0 } = 1. Under the original probability measure P, ξt starts at t = 0 with µ, a probability distribution on (E ; E), where E is the Borel σ -algebra of E. For the simple reason that
P {•} =
ξ ∈E
Pξ {•} µ(dξ )
one uses sometimes the notation Pµ for the original probability measure. We shall freely work in all these probability measures. yt being a Y -valued process, it is called the discrete component of the state. The instants of jumps
{Tk } k ∈ {1, 2, . . .} for the random process yt form an increasing sequence of positive random variables. Here we assume that for any starting point ξ ∈ E and each t ≥ 0 Eξ
1{0
< ∞.
k≥1
Hence almost surely finitely many jumps take place in any finite time interval. This assumption holds naturally in applications. xt has values in My as long as yt = y; it is called the continuous component of the state. Namely, xt solves a SDS with coefficients depending on yt ; so there is a switch in the dynamics of the continuous component of the state at each jump time of its discrete component. We denote these coefficients a(x, y) and B(x, y); the former is the drift coefficient of the SDS and the latter is its diffusion coefficient. It is sufficient that for each y
• a(x, y) and B(x, y) be locally Lipschitzian continuous functions in x • |a(x, y)| + ∥B(x, y)∥ <∞ sup (1 + |x|) x
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
31
to ensure the existence of a continuous solution of the SDS which is pathwise or strongly unique; see Chapter VIII of [5]. This solution has almost surely continuous sample paths on every [Tk−1 , Tk [ (for homogeneity of notation we set T0 = 0). We further assume that the continuous component of the state xt is everywhere continuous, i.e., for all Tk x(Tk ) = x(Tk− ).
(3)
Clearly then, the My ’s must have the same dimension and be at least overlapping. Under the assumption (3), jump times for the discrete component of the state yt are the entirety of jump times for the HSDS ξt . Besides, these jump times are Fstopping times of two kinds: totally inaccessible for spontaneous jumping, or predictable if the jumps are triggered by some events. A stopping time S with respect to the filtration F = (Ft , t ∈ R+ ) is a random time such that for each t ∈ R+ , the event {S ≤ t } belongs to the σ -algebra Ft [6,7]. Predictable jump times occur, for instance, when yt is a feedback, yt = φ(xt ). In the latter case they may be the passage times (up-crossings and down-crossings) of a threshold by some bounded continuous function of the continuous process xt ; see [8]. For spontaneous jumping there is one further ingredient: a hazard rate or intensity denoted zt . This is a continuous-time stochastic process with values in R+ . Totally inaccessible jump times are related to zt via the generalized exponential distribution P Tk > t |FTk−1
= exp −
t
zs ds
1{Tk−1 ≤t } + 1{Tk−1 >t }
t ∈ R+ .
Tk−1
Without loss of generality, we further assume that zt is a (state) feedback: At the current time t, it may depend only on t and ξt − . If you think about state extension (the extended state is ξt′ = (t , ξt )), then zt = h(t , ξt − ) is a particular case of zt = h(ξt − ), not the converse. In fact, a HSDS ξt with spontaneous jumping is Markovian if and only if zt = h(t , ξt − ) or zt = h(ξt − ); it is time-homogeneous Markovian only in the latter case. Predictable jump times, regarded as a point process, do not have a hazard rate process. We define the active or essential boundary of the state space as a set of boundary points
ξ¯ = (¯x, y¯ ) ∈
∂ My
y∈Y
almost surely reachable (within predictable hitting times since xt is continuous). Here we assume that the active boundary is reflective, i.e., everywhere ξt reaches the boundary ∂ E, it is instantaneously reflected. The last ingredient of the HSDS is a transition kernel: It states the conditional probability distribution on Y under P(x0 ,y) of the post jump state y(T1 ) given the conditioning event
x(T1 ) = x(T1− ) = x, y(T1− ) = y .
That is,
Q(x, y, y′ ) = P(x0 ,y) y(T1 ) = y′ |x(T1 ) = x(T1− ) = x, y(T1− ) = y for y, y′ ∈ Y with y′ ̸= y and x0 , x ∈ My . (Note the independence from x0 .) Obviously,
Q(x, y, y′ ) = 1.
y′ ∈Y y′ ̸=y
Remark 2. The HSDS described above is inspired from the model proposed in [9], but there are major differences. In [9],
• there are boundary points of the state space almost surely reachable where the state process jumps instantaneously; there is no reflection from this active boundary;
• the continuous component of the state has jumps of two kinds: totally inaccessible for spontaneous jumping, or predictable for the jumps from the boundary of the state space;
• the discrete component of the state has only forced jumps from the boundary of the state space; • the My ’s are disjoint by assumption, even when d(y) = n for all y ∈ Y ; so the continuous component of the state jumps necessarily as and when its discrete component does. In this paper,
• there are boundary points of the state space almost surely reachable where the state process is reflected instantaneously; there is no jumping from this active boundary;
• the discrete component of the state has jumps of two kinds: totally inaccessible for spontaneous jumping, or predictable for forced jumping;
• forced jumps of the discrete component of the state, if any, may be triggered by the continuous component of the state; • the continuous component of the state has almost surely continuous sample paths. It is clear that the proposed HSDS can be applied for modeling our optimal control problem; the remainder now is to identify all its ingredients. Here the discrete component of the state is a matrix-valued process yt = {ui (j, t )}
i ∈ {1, . . . , K } j ∈ {1, . . . , J }
32
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
yt has possible values in Y = {0, umax }J × K . The continuous component of the state is a vector-valued process xt = {xi (t )}
i ∈ {1, . . . , K }
xt solves a multi-dimensional SDS with drift coefficients depending on yt dxi (t ) = −
J
πj λi (j)ui (j, t ) dt + dWi (t ) + dzi (t )
(4)
j =1
with given initial conditions x(0) = {xi (0)}
i ∈ {1, . . . , K }
xt is a continuous process with values in the positive orthant RK+ without regard to the value of yt . Therefore, we can omit to write the dependence of My on y in this case. Furthermore, everywhere xt reaches the boundary ∂ M, it is instantaneously reflected. In fact, starting from any interior point of its space, R+ , xi (t ) reaches the boundary 0 by a predictable hitting time which is almost surely finite because of the negative sign of the drift J
−
πj λi (j)ui (j, t ) dt .
j =1
Finally, the optimal control problem is to minimize
∞
E 0
K
xi (t ) dt
i=1
over both the matrix-valued process yt and the scalar-valued process zt . 3. Solution by dynamic programming Remark 3. In case of extreme controls sharp equality constraints such as K
ui (j, t ) = const j ∈ {1, . . . , J }
i=1
are not convenient; otherwise scheduling becomes strictly equivalent to one and the same problem: time slot scheduling; but in greater generality the excess resource may be power, time slot, bandwidth, etc., or all simultaneously. For that reason the constraint on the allocated excess resource is relaxed, through averaging with respect to the channel stationary distribution, to the inequalities J
πj ui (j, t ) ≤ uave i ∈ {1, . . . , K }.
j =1
Remark 4. For a controlled ordinary differential system d dt
φ(t ) = g (φ(t ), u(t )) φ(0) = x
(5)
the usual class of controls is that of ordinary (open loop) controls, i.e., measurable functions u : R+ → U0 for which (5) is uniquely solvable. Carathéodory’s theorem gives us sufficient conditions for existence and uniqueness of a solution. Relaxed controls are a generalization of ordinary controls. Intuitively, the idea is that a each time t, instead of choosing a fixed value u(t ) ∈ U0 we randomize over the set of possible values U0 , i.e., we choosea value according to some probability distribution vt (du). The mean value of the velocity vector under this randomization is U0 g (x, u)vt (du) and the corresponding trajectory φ(t ) satisfies d dt
φ(t ) =
g (φ(t ), u)vt (du)
φ(0) = x.
(6)
U0
The class of relaxed controls is the set of measurable functions v : R+ → P(U0 ). If g (x, u) is bounded and continuous, then the function g˜ (x, t ) =
g (x, u)vt (du) U0
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
33
satisfies Carathéodory’s conditions for any relaxed control v , and thus (6) has a unique solution. In [2] the authors generalize this notion of relaxed controls for the optimal control of their one-dimensional stochastic differential system. To do this, they use as admissible ordinary controls those which yield to an ergodic SDS, and then for relaxation they use the set of stationary distributions corresponding to the set of admissible controls. Now we introduce a Lagrange multiplier β ∈ R+ ; the constrained optimal control problem above is equivalent to a family, parameterized by β , of unconstrained optimal control problems where the objective is to minimize
∞
E 0
K
xi (t ) − β uave −
J
i=1
πj ui (j, t )
dt .
j =1
Written in compact notation, the criterion to minimize is given by ∞
c (ξt ) dt
J =E
∞
c (xt , yt ) dt
=E
0
0
where the cost rate is c (ξt ) = c (xt , yt ) =
K
xi (t ) − β uave −
i =1
J
πj ui (j, t )
.
j =1
The dynamic programming principle [10] yields to Bellman’s equation to be solved for a function f (ξ ) on E min
y∈{0,umax }J × K z ∈ R+
Gy,z f (x, y) + c (x, y) = 0
where Gy,z f (x, y) = −
J K i =1 j =1
K ∂ 2 f (x, y) ∂ f (x, y) 1 πj λi (j) yij + +z Q(x, y, y′ )f (x, y′ ) − f (x, y) . Aik y′ ∈Y ∂ xi 2 i,k=1 ∂ xi ∂ xk y′ ̸=y
A is the intensity of the d-dimensional Wiener process W (t ), i.e. the covariance matrix of W (1). The domain D(G) of G is the set of real-valued functions on E which are twice continuously differentiable with respect to x, measurable with respect to y, and satisfying for every i ∈ {1, . . . , K }
∂f (¯x, y¯ ) > 0 if x¯ i = 0. ∂ xi
(7)
This condition tells us that the boundary ∂ E is composed of reflecting faces. In fact (¯x, y¯ ) ∈ ∂ E means that x¯ 1 = 0 or x¯ 2 = 0 . . . or x¯ K = 0. Thus Bellman’s equation has Eq. (7) as boundary condition at the moment. Recall that c ( x, y ) =
K
xi − β uave −
i=1
J
πj yij
.
j =1
The expression to minimize with respect to yij ∈ {0, umax } is
∂ f (x, y) −πj λi (j) − β yij . ∂ xi Here appears a threshold form for the optimal controls:
yˆ ij =
umax
if
0
if
∂f −β >0 ∂ xi ∂f λi (j) − β ≤ 0. ∂ xi
λi (j)
(8)
Let Γ be characterized by the algebraic equations
λi (j)
∂ f (x, y) − β = 0 i ∈ {1, . . . , K } j ∈ {1, . . . , J }. ∂ xi
Being in multi-dimensional case, Γ is an infinite set of points from the state space E. Γ is composed of active junctions: everywhere ξt reaches Γ it jumps instantaneously. The jumping times Tk are necessarily predictable stopping times. These ∂f may also be seen as the passage times of zero by continuous functions of the continuous process xt : λi (j) ∂ x − β . There is i no spontaneous jumping or, equivalently, the intensity zt is the zero process; the last term of Gy,z f (x, y) with z in factor
34
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
vanishes. Eventually, it turns out that the optimal control must be a feedback: yˆ = φ(x), and hence Bellman’s equation is to be solved for a function f (x) on RK+ . It follows that the expression below vanishes automatically
Q(x, yˆ , y′ )f (x) − f (x) = 0 x ∈ RK+ .
y′ ∈Y y′ ̸=yˆ
This is also a second reason for the last term of Gyˆ ,z f (x) with z in factor to vanish. Using this notation:
λi (j)
∂f −β ∂ xi
+
∂f = max 0, λi (j) −β . ∂ xi
Bellman’s equation reduces to K 1
2 i,k=1
Aik
+ J K K ∂ f (x) ∂ 2 f (x) − πj λi (j) −β umax + (xi − β uave ) = 0 ∂ xi ∂ xk ∂ xi i=1 j=1 i =1
with the boundary condition: for every i ∈ {1, . . . , K }
∂f (¯x) > 0 if x¯ i = 0. ∂ xi
(9)
This is a piecewisely defined elliptic partial differential equation (PDE). To see this, order the positive numbers β/λi (j); without loss of generality, let us say that for every i ∈ {1, . . . , K } we have
β β β > > ··· > . λi (1) λi (2) λi ( J ) Then Bellman’s equation develops in a batch of (J + 1)K PDEs. In some unknown subregion D1 from RK+ characterized by
∂ f (x) β > i ∈ {1, . . . , K } λi ( J ) ∂ xi it is given by K 1
2 i,k=1
Aik
K ∂ 2 f (x) + (xi − β uave ) = 0. ∂ xi ∂ xk i=1
In some other unknown subregion D2 from RK+ characterized by
∂ f (x) β β > > λ1 (J − 1) ∂ x1 λ1 (J ) ∂ f (x) β > i ∈ {2, . . . , K } λi ( J ) ∂ xi
it is given by
K ∂ 2 f (x) ∂ f ( x) + − β umax = 0. Aik (xi − β uave ) − π1 λ1 (1) 2 i,k=1 ∂ xi ∂ xk ∂ x1 i=1 K 1
The reasoning is similar for the remaining subregions. 3.1. Main result: existence We now show that under ad hoc conditions Bellman’s equation is amenable to a smooth solution. Firstly let us try the form f (x) =
K
fi (xi ) x ∈ RK+ .
i=1
Then ∂ f /∂ xi only depends on xi , namely
∂f (¯x) = fi′ (¯xi ) x¯ ∈ RK+ . ∂ xi Right now we see from Eq. (9) that fi′ (0) > 0
(10)
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
35
for every i ∈ {1, . . . , K }. Now let us assume ∂ f /∂ xi strictly decreasing; this assumption is not done in all innocence in view of Remark 5 (see Section 3.2): the upper bound in Eq. (25) should be close to f (x). Decoupling the variables makes the summands fi (xi ), i ∈ {1, . . . , K }, play symmetric roles. Now Bellman’s equation yields a batch of J + 1 second order ordinary differential equations for each summand fi (x), x ∈ R+ . For k ̸= i the batch of ODEs relative to fk (x) is the same to the batch of ODEs relative to fi (x) except that we have to replace Aii , λi (j) by Akk , λk (j) respectively. In what follows we omit to write the dependence of fi (x), Aii and λi (j) on the index i ∈ {1, . . . , K }. Since the function f (x) must be two-times differentiable on R+ , f ′ (x) is strictly decreasing, and
β β β > > ··· > λ(1) λ(2) λ(J ) there exist some numbers 0 ≤ c1 < c2 < · · · < cJ < ∞ such that β j ∈ {1, . . . , J }. λ(j) We must have f ′ (x) > β/λ(1) for 0 ≤ x < c1 , β β > f ′ (x) > λ(j − 1) λ(j) in the interval ]cj−1 , cj [ for each j ∈ {2, . . . , J }, and β/λ(J ) > f ′ (x) for cJ < x < ∞. Then we are looking for a function f (x) f ′ (cj ) =
satisfying the equation 1 2
A f ′′ (x) + (x − β uave ) −
J
πj λ(j) f ′ (x) − β umax = 0
(11)
j =1
with
β < f ′ (x) λ(1) in the interval [0, c1 [; and satisfying the equation 1 2
A f ′′ (x) + (x − β uave ) −
J
πj λ(j) f ′ (x) − β umax = 0
(12)
j =j ′
with
β β < f ′ (x) < λ(j′ ) λ(j′ − 1) in the interval ]cj′ −1 , cj′ [ for each j′ ∈ {2, . . . , J }; and satisfying the equation 1 2
A f ′′ (x) + (x − β uave ) = 0
(13)
with
β λ(J ) in the interval [cJ , ∞[. In addition, f (x) must have smooth connection between piecewise parts: for each j ∈ {1, . . . , J } f ′ (x) <
f (cj− ) = f (cj+ )
β f ′ (cj− ) = f ′ (cj+ ) = λ(j)
f ′′ (cj− ) = f ′′ (cj+ ).
A general solution of an nth-order ODE is a solution containing n arbitrary independent constants of integration. A particular solution is derived from the general solution by setting the constants to particular values, often chosen to fulfill a set of initial conditions. In what follows, we shall solve the initial value problem relative to Eqs. (11)–(13) (here the solution shall involve the initial conditions f (0) and f ′ (0)). Eqs. (11)–(13) are respectively equivalent to Eqs. (14)–(16) below 1 2 1 2 1 2
A f ( x) − ′′
J
πj λ(j) umax f ′ (x) + x − β [uave − umax ] = 0
(14)
j =1
A f ′′ (x) −
J
πj λ(j) umax f ′ (x) + x − β uave −
j=j′
A f ′′ (x) + x − β uave = 0.
J
πj umax = 0
(15)
j=j′
(16)
36
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
If we denote 1 B= 1 A 2
b1 = B
J
π λ(j ) umax j′
′
bj = B
J
j ′ =1
π λ(j ) umax j ∈ {1, . . . , J } ′
j′
j ′ =j
b˜ 0 = β uave
b˜ 1 = β [uave − umax ]
b˜ j = β uave −
J
πj′ umax
j ∈ {1, . . . , J }
j ′ =j
then we have
f ′′ (x) = b1 f ′ (x) + B −x + b˜ 1
f ′′ (x) = bj f ′ (x) + B −x + b˜ j
(17)
(18)
f ′′ (x) = B −x + b˜ 0 .
(19)
From Eqs. (17)–(19) we obtain
• for x ∈ [0, c1 [ f ′ (x) = f ′ (0)eb1 (x−0) + B
x
eb1 (x−y) −y + b˜ 1 dy
0
i.e.,
f (x) =
f (0) − B
′
0
′
b1
+
1 b21
b˜ 1
−
b1 (x−0)
+B
e
b1
x b1
+
1 b21
−
b˜ 1
0 ≤ x < c1
b1
• for each j ∈ {2, . . . , J }, for x ∈]cj−1 , cj [ x f ′ (x) = f ′ (cj−1 )ebj (x−cj−1 ) + B ebj (x−y) −y + b˜ j dy cj−1
i.e.,
f (x) =
f (cj−1 ) − B
′
′
cj−1 bj
+
1 b2j
−
b˜ j bj
e
b1 (x−cj−1 )
+B
x bj
+
1 b2j
−
b˜ j
bj
cj−1 < x < cj
• for x ∈ [cJ , ∞[ f ′ (x) = f ′ (cJ ) + B
x −y + b˜ 0 dy cJ
i.e.,
1 2 f ′ (x) = f ′ (cJ ) + B − x − cJ2 + b˜ 0 x − cJ 2
cJ < x < ∞
f ′ (0) and f ′ (cj ) for j ∈ {1, . . . , J } are J + 1 arbitrary independent constants of integration. Eq. (10) gives us a condition on the first constant of integration, f ′ (0) > 0. The J conditions
β j ∈ {1, . . . , J } λ(j)
f ′ (cj+ ) =
yield the last J constants of integration and the solution f ′ (x) over the intervals ]c1 , c2 [, . . . , [cJ , ∞[. Then f ′ (x), x ∈ R+ , is given by
f (x) = ′
f (0) − B ′
0 b1
+
1 b21
−
b˜ 1 b1
e
b1 (x−0)
+B
x b1
+
1 b21
−
b˜ 1 b1
0 ≤ x < c1
β cj−1 1 b˜ j x 1 b˜ j b1 (x−cj−1 ) f (x) = −B + 2− e +B + 2− λ(j − 1) bj bj bj bj bj bj β 1 f ′ (x) = + B − x2 − cJ2 + b˜ 0 x − cJ cJ < x < ∞. λ(J ) 2 ′
cj−1 < x < cj j ∈ {2, . . . , J }
(20)
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
37
The J conditions f ′ (cj− ) =
β j ∈ {1, . . . , J } λ(j)
yield J nonlinear algebraic equations to be solved in the unknown thresholds c1 , . . . , cJ
f (0) − B ′
0 b1
1
+
b21
−
b˜ 1
e
b1
cj−1 β 1 b˜ j −B + 2− λ(j − 1) bj bj bj
b1 (c1 −0)
+B
c1 b1
+
1 b21
e
b1 (cj −cj−1 )
+B
cj bj
−
+
b˜ 1
b1 1 b2j
=
−
β λ(1)
β = j ∈ {2, . . . , J }. λ(j)
b˜ j bj
(21)
The system (21) is solved top-down: the first equation gives c1 and then c1 being known, the second equation gives c2 , etc. It appears clearly from Eqs. (17)–(19)
0 ≤ x < c1 f ′′ (x) = bj f ′ (x) + B −x + b˜ j cj−1 < x < cj j ∈ {2, . . . , J } f ′′ (x) = B −x + b˜ 0 cJ < x < ∞ f ′′ (x) = b1 f ′ (x) + B −x + b˜ 1
and from the expressions of f ′ (x) on each subinterval that the following conditions are automatically satisfied f ′′ (cj− ) = f ′′ (cj+ ) j ∈ {1, . . . , J }. From Eq. (20) we obtain
f ( x) =
f ′ (0) − B
0 b1
+
1 b21
−
b˜ 1
b1
1 b1
eb1 x − 1 +
B 2b1
x2 + B
1 b21
−
b˜ 1
x + f (0) 0 ≤ x < c1 .
b1
(22)
For j ∈ {2, . . . , J },
β cj−1 1 b˜ j 1 b1 (x−cj−1 ) B 2 1 b˜ j f ( x) = −B + 2− e + x +B x + Kj−1 cj−1 < x < cj − λ(j − 1) bj bj b1 2bj bj bj b2j BcJ2 B 3 Bb˜ 0 2 β f (x) = − x + x + + − Bb˜ 0 cJ x + KJ cJ < x < ∞. 6 2 λ(J ) 2
(23)
(24)
Finally, the last J conditions f (cj− ) = f (cj+ )
j ∈ {1, . . . , J }
give us the J constants of integration K1 , . . . , KJ , e.g.,
f (0) − B ′
K1 =
0 b1
+
1 b21
−
b˜ 1 b1
1 b1 c1 B 2 c1 + B e −1 + b1 2b1
1 b21
−
b˜ 1 b1
c1 + f (0)
β c1 1 b˜ 2 1 B 2 1 b˜ 2 − −B + 2− − c1 − B − c1 . λ(1) b2 b2 b1 2b2 b2 b2 b22 Here again the computation of K1 , . . . , KJ is done top-down. The first condition f (c1− ) = f (c1+ ) gives K1 and then K1 being known, the second condition f (c2− ) = f (c2+ ) gives K2 , etc.
A general solution to Bellman’s equation, f (x) = i=1 fi (xi ), is now on hand where the initial conditions fi (0) for i ∈ {1, . . . , K } are left as degrees of freedom. And f (x) is bounded over RK+ since Bii > 0 for all i ∈ {1, . . . , K } (the covariance matrix A is positive definite). Let us summarize the results obtained so far in the following proposition.
K
Proposition 1. Bellman’s equation has a bounded solution f (x) = optimal control yˆ t = uˆ i (j, t )
K
i=1 fi
(xi ), x ∈ RK+ . This solution determines entirely the
i ∈ {1, . . . , K } j ∈ {1, . . . , J }.
For every i ∈ {1, . . . , K } and any t ∈ R+ , xi (t ) must satisfy either xi (t ) < c1 or cj′ −1 < xi (t ) < cj′ for some j′ ∈ {2, . . . , J } or cJ < xi (t ). Then the state feedback uˆ i (j, t ) = φij (xi (t )) is described as follows.
38
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
• If xi (t ) < c1 , uˆ i (j, t ) = umax
j ∈ {1, . . . , J }.
• If cj′ −1 < xi (t ) < cj′ for some j′ ∈ {2, . . . , J }, uˆ i (j, t ) = 0 j ∈ {1, . . . , j′ − 1} uˆ i (j, t ) = umax
j ∈ {j′ , . . . , J }.
• If cJ < xi (t ), uˆ i (j, t ) = 0 j ∈ {1, . . . , J }. Note that the thresholds c1 , . . . , cJ depend on i, and so do the constants B, b1 , . . . , bJ , λ(j), . . . , λ(J ) and the initial condition f (0) figuring in Eqs. (22)–(24). 3.2. Main result: uniqueness Besides the existence, we need the uniqueness of solutions to Bellman’s equation. For elliptic PDEs we consider boundaryvalue problems. The Dirichlet problem for Bellman’s equation is formulated as follows. Given a bounded open region D from RK+ , e.g.,
D = ]0, xmax [K and a function ϕ(x) on its boundary ∂ D find a function g (x)
• • • •
continuous on D ∪ ∂ D; smooth in D, i.e., a function having continuous partial derivatives ∂ g /∂ xi , ∂ 2 g /∂ xi ∂ xk ; satisfies Bellman’s equation for x ∈ D; and satisfies the boundary condition g (x) = ϕ(x) for x ∈ ∂ D.
Here we take
ϕ(x) = f (x) x ∈ ∂ D where f (x) is the solution to Bellman’s equation presented in Section 3.1. In bounding the domain of the continuous component of the state we modify the criterion to minimize, namely J′ = E
τ
c (xt , yt ) dt + E [ϕ(xτ )]
0
where τ is the first exit time of the process xt from D, i.e.,
τ=
min{t : xt ̸∈ D}
if there are such t otherwise.
∞
In fact the boundary ∂ D is composed of reflecting faces and non-reflecting or open faces through which xt starting inside D may exit from D. Now that the time horizon is random, it must be almost surely finite. The behavior of the HSDS after τ does not matter; we may consider that the process ξt is stopped or killed at τ . Proposition 2. We claim that f (x) is the unique solution to the Dirichlet problem for Bellman’s equation provided E[τ ] < ∞ or at least τ < ∞ almost surely. Proof. f (x) is the minimum value of the criterion J ′ obtained with the initial condition x(0) being a non-random point x, and of course under the optimal control, i.e., with xt being the unique (at least in a weak sense) process solution of the closed loop. If there is another solution g (x), x ∈ D ∪ ∂ D, to the Dirichlet problem above, it will be given by the same expectation. This result is based on Doob’s optional sampling theorem. Bounding the domain of the continuous component of the state is not only essential to achieve the uniqueness, but it also says that the queues cannot be arbitrary large in size and this is quite natural in our application. Remark 5. With respect to the original criterion τ
J =E
c (xt , yt ) dt
0
the minimum expected cost under the conditioning event ‘‘x(0) = x’’ is given by f (x) − E [ϕ(xτ )] for each x ∈ D. And we have f (x) − E [ϕ(xτ )] ≤ f (x) − inf f (¯x). x¯ ∈ ∂ D
(25)
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
39
We content ourselves with this upper bound (which is a worst-case scenario) as satisfactory approximation of the optimal cost. Now we clearly see, since f (x) → 0 as xi → ∞ for all i ∈ {1, . . . , K }, that the domain D must be chosen large enough so that inf f (¯x) ≈ 0
x¯ ∈ ∂ D
and the upper bound be close to f (x). The following proposition completes our statements about the solution of the optimal control problem. Proposition 3. We have uniqueness of the optimal control of Proposition 1. Proof. While optimizing it turns out that the optimum falls inside the class of feedback controls, i.e., yˆ t = φ(xt ). Therefore our program is composed of two points: 1. finding a smooth solution of Bellman’s equation (this is the result of Section 3.1); 2. thanks to the smoothness of this solution, getting on the one hand the existence of optimal controls, and on the other hand the uniqueness of solutions to the closed loop (i.e., the SDS (4) under the optimal control). Uniqueness of optimal controls follows immediately since these are given by the feedback equation. 4. Demonstration: two users and two channel states The implementation details are as follows. K = 2, J = 2, λ1 (1) = λ2 (1) = 10, λ1 (2) = λ2 (2) = 20, π1 = 0.2, π2 = 0.8, umax = 1, uave = 0.8 umax , and β = 1.
A=
1 0
0 1
103 .
First we recall the notation 1 1 B2 = 1 B1 = 1 A A 2 11 2 22 b11 = B1 (π1 λ1 (1) + π2 λ1 (2)) umax
b12 = B1 π2 λ1 (2)umax
b21 = B2 (π1 λ2 (1) + π2 λ2 (2)) umax
b22 = B2 π2 λ2 (2)umax
b˜ 0 = β uave
b˜ 1 = β (uave − umax )
b˜ 2 = β (uave − π2 umax ) .
Note that the constants b˜ 0 , b˜ 1 , b˜ 2 do not depend on i. Second we compute for each i ∈ {1, 2} the thresholds ci1 , ci2 . For i = 1, c11 and c12 solve the following nonlinear algebraic equations
f1′ (0) − B1
0 b11
β − B1 λ1 (1)
+
c11 b12
1
−
b211 1
+
b212
b˜ 1
eb11 (c11 −0) + B1
b11 b˜ 2
−
c11 b11
e
b12
b11 (c12 −c11 )
+ B1
+
c12 b12
1 b211
+
− 1
b212
b˜ 1
−
β λ1 (1)
=
b11 b˜ 2
β . λ1 (2)
=
b12
For i = 2, c21 and c22 solve the following nonlinear algebraic equations
f2 (0) − B2 ′
0 b21
β − B2 λ2 (1)
+
c21 b22
1
−
b221
+
1 b222
b˜ 1
e
b21 b˜ 2
−
b21 (c21 −0)
+ B2
c21 b21
e
b22
b21 (c22 −c21 )
+ B2
+
c22 b22
1 b221
+
− 1
b222
b˜ 1
−
β λ2 (1)
=
b21 b˜ 2 b22
β . λ2 (2)
=
The solution to Bellman’s equation is given by f (x) = f1 (x1 ) + f2 (x2 )
x ∈ R2+ .
(26)
The first summand f1 (x1 ), x1 ∈ R+ , is given by
f 1 ( x1 ) =
f1 (0) − B1
f 1 ( x1 ) =
0
′
b11
β − B1 λ1 (1)
f 1 ( x1 ) = −
B1 6
x31 +
B1 b˜ 0 2
+
c11 b12
1 b211
+
x21 +
−
1 b212
b˜ 1
b11
−
b˜ 2 b12
1 b11
1 b11
b11 x1
e
−1 +
b11 (x1 −c11 )
e
+
B1 2b11 B1 2b12
x21
+ B1
b211
x21
+ B1
1
1 b212
−
−
2 β B1 c12 + − B1 b˜ 0 c12 x1 + K12 c12 < x1 < ∞ λ1 (2) 2
b˜ 1 b11 b˜ 2 b12
x1 + f1 (0) 0 ≤ x1 < c11
x1 + K11
c11 < x1 < c12
40
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
where
f1 (0) − B1
0
′
K11 =
1
+
b11
−
b211
b˜ 1
1 b11 c11 B1 2 −1 + c11 + B1 e b11 2b11
b11
1 b211
−
b˜ 1
c11 + f1 (0)
b11
K12
β 1 1 B1 2 1 c11 b˜ 2 b˜ 2 − + 2 − − c11 − B1 c11 − B1 − λ1 (1) b12 b12 b11 2b12 b12 b12 b212 β 1 1 b11 (c12 −c11 ) 1 c11 b˜ 2 b˜ 2 B1 2 = + 2 − e + c12 + B1 − c12 − B1 λ1 (1) b12 b12 b11 2b12 b12 b12 b212 2 B1 c12 B1 3 B1 b˜ 0 2 β + K11 c12 − c12 − + − B1 b˜ 0 c12 c12 . 6 2 λ1 (2) 2
The second summand f2 (x2 ), x2 ∈ R+ , is given by
f2 (x2 ) =
f2 (0) − B2
0
′
f2 (x2 ) = f2 (x2 ) = −
b21
β − B2 λ2 (1)
B2 6
x32 +
B2 b˜ 0 2
+
c21 b22
x22 +
1 b221
+
−
1 b222
b˜ 1
b21
−
b˜ 2 b22
1 b21
1 b21
b21 x2
e
−1 +
eb21 (x2 −c21 ) +
B2 2b21 B2 2b22
x22
+ B2
1 b221
x22 + B2
1 b222
−
−
b˜ 1
x2 + f2 (0) 0 ≤ x2 < c21
b21
b˜ 2 b22
x2 + K21
c21 < x2 < c22
2 B2 c22 β + − B2 b˜ 0 c22 x2 + K22 c22 < x2 < ∞ λ2 (2) 2
where
K21 =
f2 (0) − B2 ′
0 b21
+
1 b221
−
b˜ 1 b21
1 b21 c21 B2 2 c21 + B2 e −1 + b21 2b21
1 b221
−
b˜ 1 b21
c21 + f2 (0)
c21 1 b˜ 2 1 B2 2 1 b˜ 2 β − B2 + 2 − − c21 − B2 c21 − − λ2 (1) b22 b22 b21 2b22 b22 b22 b222 β c21 1 b˜ 2 B2 2 b˜ 2 1 b21 (c22 −c21 ) 1 = − B2 + 2 − e + c22 + B2 − c22 λ2 (1) b22 b22 b21 2b22 b22 b22 b222 2 B2 3 B2 c22 B2 b˜ 0 2 β + K21 c22 c22 − + − B2 b˜ 0 c22 c22 . − 6 2 λ2 (2) 2
K22
Finally we choose the initial conditions f1′ (0), f1 (0) and f2′ (0), f2 (0) so as to derive a particular solution from the general solution (26). With f1′ (0) = f2′ (0) = 1, we get c11 = c21 = 59.838 and c12 = c22 = 60.195. We choose f1 (0) = f2 (0) = 0. The minimum expected cost is plotted in Fig. 2. The summand f1 (x1 ) is shown in Fig. 3; this plot reveals the chosen domain:
D ∪ ∂ D = [0, 93.223]2 . Fig. 4 is a zoom in Fig. 3 to point up the thresholds of the optimal feedback extreme control. 5. Further stochastic analysis: the Fokker–Planck equation for the distribution density of the queue state The Fokker–Planck equation is an important analytical tool for characterizing the law of a Markov process. Here the state process under the optimal control is Markovian. Our previous work [9] is about the generalization of the Fokker–Planck equation to hybrid stochastic differential systems; it does not concern stochastic control or optimization. [11,12] are two applications of this work to sustainable energy systems and pharmacodynamics respectively. While the HSDS presented in [9] is quite general, it does not include the HSDS proposed in this paper (see Remark 2 for comparison between the two models). Therefore the derivation of the Fokker–Planck equation is handled on a case-by-case base. The generating operator G of the SDS (4) under the optimal controls uˆ i (j, t ) = φij (xi (t )) i ∈ {1, . . . , K } j ∈ {1, . . . , J } is given by G f ( x) = −
d n i=1 j=1
πj λi (j) φij (xi )
d ∂ 2 f ( x) ∂ f (x) 1 + Aik x ∈ D ∪ ∂ D. ∂ xi 2 i,k=1 ∂ xi ∂ xk
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
41
Fig. 2. Cost Surface f (x1 , x2 ) = f1 (x1 ) + f2 (x2 ).
Fig. 3. First Summand f1 (x1 ) and the Domain for x1 .
Fig. 4. Zoom in Fig. 3.
The domain D(G) of G is the set of twice continuously differentiable real-valued functions satisfying for every i ∈ {1, . . . , K } ∂f (¯x) > 0 if x¯ i = 0. ∂ xi
42
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
The generating operator leads us to its adjoint: the Fokker–Planck operator LFP together with its domain D(LFP ). Assume for each t, 0 ≤ t ≤ τ , the existence of a density p(x, t ) for the distribution of xt under P. That is, for Λ a Borel-measurable set from D ∪ ∂ D P{xt ∈ Λ} =
Λ
p(x, t ) dx.
We claim that LFP p(x, t ) =
d n
πj λi (j) φij (x)
i =1 j =1
d ∂ 2 p(x, t ) ∂ p(x, t ) 1 + . Aik ∂ xi 2 i,k=1 ∂ xi ∂ xk
D(LFP ) is the set of distribution densities on D ∪ ∂ D satisfying for x ∈ ∂ Dreflect , 0 ≤ t ≤ τ J(x, t ) · next (x) = 0
(27)
where ∂ D denotes the reflecting faces of ∂ D, and next (x) is the exterior normal to ∂ D probability current [13,14]; it is given by reflect
reflect
at x. J(x, t ) is the so-called
n d 1 ∂ p(x, t ) π λ ( j ) φ ( x ) p ( x , t ) − − A j 1 1j 1k 2 k=1 ∂ xk j=1 n d 1 ∂ p(x, t ) − πj λ1 (j) φ2j (x) p(x, t ) − A2k k 2 k=1 ∂x J(x, t ) = j=1 .. . n d 1 ∂ p(x, t ) − πj λd (j) φdj (x) p(x, t ) − Adk . 2 k=1 ∂ xk j=1
(27) is a static relation for each x ∈ ∂ Dreflect , i.e. it holds for any t. Then p(x, t ) solves the Fokker–Planck equation
∂p (x, t ) = LFP p(x, t ) x ∈ D ∪ ∂ D; 0 ≤ t ≤ τ ∂t along with 1. the initial condition p(x, 0) which is the distribution density of x(0) 2. the boundary condition J(x, t ) · next (x) = 0
x ∈ ∂ Dreflect ; 0 ≤ t ≤ τ
3. the normalization condition
p(x, t ) dx = 1
0 ≤ t ≤ τ.
D∪∂ D
Appendix. Bibliographical notes on HSDS Engineering systems are described in terms of inputs, state variables and outputs. There is inherently uncertainty in each of these. Here we restrict attention to dynamical systems represented by differential equations (as opposed to those described by discrete-time iteration equations, integral equations, or partial differential equations). Modeling uncertainty in such dynamical systems leads to the more general class of stochastic differential systems (SDS). Roughly speaking, a SDS is called hybrid when uncertainty concerns also the structure of the system. HSDS result usually from a combination or coordination of self-triggered time-driven mechanisms with event-triggered ones. They arise in insurance pricing [15], automated manufacturing [16], and air traffic management [17]. There are different models of HSDS; the main difference between them lies in the way the stochasticity enters these models. The list below summarizes the modeling features in some key references of the literature.
• Switching diffusion (between the transitions of the discrete component, the continuous component evolves as a diffusion • • • • • • • •
process): [18]. Mode dependent dimension (the dimension of the continuous component depends on the discrete component): [19]. Spontaneous hybrid jumps (defined by a state-dependent stochastic intensity or rate): [18]. Forced hybrid jumps (triggered by a guard set): [20,8]. Transition rate control: [18]. Forced transition control: [20]. Continuous control: [20,18]. Probabilistic reset (for the post jump state): [19]. Continuous reset control: [20].
H. Baili / Nonlinear Analysis: Hybrid Systems 17 (2015) 25–43
43
References [1] R. Buche, H.J. Kushner, Control of mobile communications with time varying channels in heavy traffic, IEEE Trans. Automat. Control 47 (6) (2002) 992–1003. 06/. [2] W. Wu, A. Arapostathis, S. Shakkottai, Optimal power allocation for a time-varying wireless channel under heavy-traffic approximation, IEEE Trans. Automat. Control 51 (4) (2006) 580–594. [3] H. Baili, M. Assaad, Optimal scheduling and power control in wireless networks with heavy traffic, Math. Comput. Model. Dyn. Syst. (2013) submitted for publication. [4] J.M. Harrison, M.I. Reiman, Reflected Brownian motion on an orthant, Ann. Probab. 9 (1981) 302–308. [5] I.I. Gikhman, A.V. Skorokhod, Introduction to the Theory of Random Processes, W. B. Saunders Company, 1969. [6] B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, sixth ed., Springer, 2003. [7] P.E. Protter, Stochastic Integration and Differential Equations, second ed., Springer, 2005. [8] R. Malhamé, C.Y. Chong, Electric load model synthesis by diffusion approximation of a high-order hybrid-state stochastic system, IEEE Trans. Automat. Control 30 (9) (1985) 854–860. [9] J. Bect, H. Baili, G. Fleury, Generalized Fokker–Planck equation for piecewise-diffusion processes with boundary hitting resets, in: 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, July 24–28, 2006. [10] A.V. Skorokhod, Basic Principles and Applications of Probability Theory, Springer-Verlag, 2005. [11] J. Bect, Y. Phulpin, H. Baili, G. Fleury, On the Fokker–Planck equation for stochastic hybrid systems—application to a wind turbine model, in: 9th International Conference on Probabilistic Methods Applied to Power Systems, Stockholm, Sweden, June 11–15, 2006. [12] H. Baili, Stochastic analysis of noncompliance with drug therapy, J. Uncertain Syst. 5 (1) (2011) 38–44. [13] C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, second ed., Springer-Verlag, 1985. [14] H. Risken, The Fokker–Planck Equation: Methods of Solution and Applications, second ed., Springer-Verlag, 1989. [15] M.H.A. Davis, M.H. Vellekoop, Permanent health insurance: a case study in piecewise-deterministic Markov modelling, Mitt. Schweiz. Ver. Versicherungsmathematiker 2 (1995) 177–212. [16] M.K. Ghosh, A. Arapostathis, S.I. Marcus, Optimal control of switching diffusions with application to flexible manufacturing systems, SIAM J. Control Optim. 30 (6) (1992) 1–23. [17] G. Pola, M.L. Bujorianu, J. Lygeros, M. De Benedetto, Stochastic hybrid models: an overview with applications to air traffic management, in: IFAC Conference on Analysis and Design of Hybrid Systems, Saint Malo, France, June 16–18, 2003. [18] M.K. Ghosh, A. Arapostathis, S.I. Marcus, Ergodic control of switching diffusions, SIAM J. Control Optim. 35 (6) (1992) 1952–1988. [19] M.H.A. Davis, Piecewise deterministic Markov processes: a general class of nondiffusion stochastic models, J. R. Stat. Soc. Ser. B Stat. Methodol. 46 (1984) 353–388. [20] A. Bensoussan, J.L. Menaldi, Stochastic hybrid control, J. Math. Anal. Appl. 249 (1) (2000) 261–288.