Optimal dynamic routing in Markov queueing networks

369KB Sizes 0 Downloads 136 Views

Report

PDF Reader
Full Text

0005 1098/86 $3.00+ 0,00 PergamonJournals Ltd. © 1986InternationalFederationof AutomaticControl

Automatica, Vol. 22, No. 3, pp. 367-370, 1986 Printed in Great Britain.

Brief Paper

Optimal Dynamic Routing in Markov Queueing Networks KEITH

W. R O S S f

Key W o r d s - - M a r k o v processes; dynamic programming; queueing control; traffic control; computer communication.

Abstraet--Markov decision theory is applied to general Markov queueing networks with finite buffer capacity. Existence of optimal dynamic routing policies is proved for the long-run average and infinite-horizon discounted cases. With the aid of a process that is equivalent to the state process, the subordinated process, fast algorithms are derived for locating an optimal routing policy. A numerical example is given and the application of the theory to computer communication networks is discussed.

C M Q N s to computer communication networks. Indeed, the rate #~(x) may be interpreted as the capacity of the communication link from station i to j. Allowing the rate to also depend on the network state permits us to model multiple processors at a single node. There are also L exogenous arrival streams, and for each stream i there is an associated routing set S i c {0,1 .... N}. Arriving from a stream i • {1. . . . . L}, a customer is routed to a stationj • S~ according to the routing policy. However, he cannot be routed to station k • S~ if the buffer at station k is full. If all buffers in Si are full, the arriving customer is lost and routed to node 0. Ifa customer originates from stream i, and if he is destined to station j, then he arrives at a rate 2o(x). Note that C M Q N incorporates a flow control mechanism. If 0 • Si then a customer arriving from stream i will or will not be admitted to the network according to the routing (flow control) policy. In fact, the model allows for the inclusion of a cost due to the refusal of admittance of a customer either because of the routing policy or because all buffers are full. These ideas are exemplified in the example of Section 4. The C M Q N is quite flexible and can model many practical situations. The problem remains tractable if bqlk arrivals and services, priorities and customer destinations (virtual circuits) are permitted. However, for the sake of brevity, a generalized version of the C M Q N is not introduced. In Section 2, the C M Q N is adapted to the Markov decision context, and the optimality criteria for the average and discounted cost cases are formulated. With the aid of the subordinated process (Cinlar, Section 8.4, 1975), value-iteration algorithms are given for the C M Q N in Section 3. It is well-known that the subordinated process is useful for determining the structure of the optimal policy for simple networks (Stidham, 1985; Rosberg et al., 1982; Hajek, 1984). However, for the case of CMQNs, it is shown that the subordinated process can also lead to substantially faster algorithms for locating the optimal policy numerically. For a class of CMQNs, namely, feed-forward networks, it is shown that the optimal average cost is state invariant, and a corresponding value-iteration algorithm is given. A numerical example is provided in Section 4.

1. Introduction THE APPLICATION of Markov Decision Processes (MDPs) to dynamic control of queueing networks has become important in recent years as a result of potential application in computer systems and communication networks. Several researchers have applied M D P s to "small" networks. Ephremides et al. (1980) studied the problem of routing a customer to one of the two stations. Rosberg et al. (1982) examined a network of two stations in series while controlling the rate of service for the tail queue; structural properties of the optimal policy were determined. Hajek (1984) generalized the work of Rosberg et al. by allowing for feedback between the two stations. See also Stidham for many interesting results concerning optimal flow control for networks of two queues. The application of M D P s to routing in queueing networks with an arbitrary number of stations has barely begun. Existence of an optimal pure policy with the average cost criterion for this class of problems was proved in the elegant paper by Borkar. However, for general networks, neither algorithms for locating the optimal policy nor the structure of the optimal policy seem to have been investigated. Therefore existence, structure and algorithms for optimal routing policies for the Controlled Markov Queueing Network (CMQN), which is defined as follows, are studied. There are N single-server stations, where station i is assumed to have buffer capacity 1 _< M~ < oc (which includes the customer in service). Let X~(t) be the line length for station i at time t. Then the state of C M Q N at time t is X ( t ) = (Xt(1) . . . . . . Xt(N)). The C M Q N is defined so that under any pure routing policy the state is rendered a Markov process. When a customer departs from the network, he is routed to a fictitious station 0. Associated with each station i there is a routing set Ri c {0, 1. . . . N}. A customer departing from station i is routed to stationj e Rg according to a routing policy. However, he cannot be routed to station k in R~ if the buffer at station k is full, and he is routed back to station i if all buffers in R~ are full. If a customer is at the head of the queue at station i, and if he is destined for station j, then he receives service at a rate po(x) where x is the present state of the network. Allowing #~j(x) to depend on j the station to which a customer is routed, permits us to apply

2. A M D P framework for C M Q N s In order to apply the classical results of the theory of M D P s (Ross, 1970; Bertsekas, 1976) the state space, the action space and the dynamics for the C M Q N must be defined. The state space S and the state-dependent action spaces A(x) are defined as follows: s = { o .....

M,}x...x',O..... M~}

F(x) = {j: x s < Mj, 1 _
* Received 23 March 1985; revised 18 November 1985. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor A. Sage under the direction of Editor G. Guardabassi. t University of Pennsylvania, Department of Systems Engineering, The Moore School of Electrical Engineering, Philadelphia, Pennsylvania.

~Si c~ F(x) ({0}

~Ri c~ F ( x ) R,(x) = ({i}

if Si c~ F(x) ~ dp if S, c~ F(x) =dp if Ri c~ F(x) 4: if R~ c~ F(x) = ¢p

A(x)= I1 S,
367

l<_i<_N

for x e S

36~

Brief Paper

If the present state is v. then SAx) is the set of stations to which a customer may be routed when he arrives to the network from the ith arrival stream. There is an analogous interpretation for Rdx). An element a belonging to the action space A(x) will be expressed by a - {h~,a2 .... aL, a~ .., a~.) where aie S(x) and ai ~ R d.': )A routing policy g establishes "'switches" from the arrival streams to the stations, and from the stations to other stations. These switches change, according to the routing policy, as the state process X, changes. In order to present a more formal definition of a policy, let 0 be the element orS that corresponds to an empty system, and let ,4 - A(0). Then a routing policy g is a mapping from S to 4 such that g(xl~A(xl for all ~ S . The notation g(x) = (g~(x! . . . . . ~LlX), gi{x) . . . . . g~(x)t will frequently be. employed. where

Let r be thc maximum rate ol the ( M(ON. gixell b> r=max

l'xmax'j-ii(.vt:¢~.'~,(\ll + z...,"max',yi,lxt:l-

R,l.vK

i

Standard arguments from the theory' of MDPs give the tollowing geometric bound on the rate of convergence of I ~;,~ t"

11~2,

vii ~

r+~J

llih

I'll

IN.l)

where I1"11 is the supremum norm h)r 4*. Although the convergence rate of {F;,] is geometric, it turns out that the determination of T~, requires an enormous amount of computation for networks of reasonable size. This difficulty is not only due to exponential growth of state space with the number of stations, but also to a minimization over A that is required to determine TI/,(x) for each x e S. Indeed, the cardinality of A, denoted by ]A], is on the order of

gi(x~ e Si(x ) and gi(x) • Ri(x ). Note that the discussion is limited to the class of pure policies. More complex policies are not considered because the discussion shall be restricted to cost functions and to C M Q N s for which there exists an optimal pure policy [8] [11. In order to define precisely the state process, recall that flu and /,q are arbitrary non-negative functions of the state space. We make the natural assumption that ,uu(x)= 0 if xi = 0. For notational convenience, write e~for the element in S with a one for the ith component, and zeros elsewhere. Fix a routing policy g and order S in an arbitary manner. Define the components of an IS]× ISI matrix A~ as follows: (a) (x,x + eat with x i < M i enter S )qflx)l(~dx) =.j) i-I

(b) (x,x - eft with xo > 0 enter l~jol(gj(x) = 0) (c) for (x,x - e~ + ej) with xj > O,x¢ < M j enter t~ij(x)l(gi(x) - J ) (d) except along the diagonal, all other entries are zero. The entries along the diagonal are such that the sum across any row is zero. Definition. The state process (X,, t -> 0) is a Markov process with infinitessimal generator Ag when policy g is in force. It remains to define the optimization criteria. Let C be a nonnegative function defined on Sx A, and let :~ > 0. For these purposes, the discounted and long-run average costs will take the following forms:

i

(i IS,I l 'I IRJ. i I

i

It turns out that V is related to another operator U, which gives rise to a new sequence of functions (V,,n>-O) converging geometrically with parameter r/r + ~. Moreover, the cardinality of the set over which the operator (; must search is on the order of L

x~

IS,I + v [Ri[. i

I

i= I

Hence, the computational effort required to locate the optimal policy can be substantially reduced if the value-iteration algorithm corresponding to the operator U is employed. In order to specify precisely the operator U, and to establish the above claims, the subordinated process (Cinlar, Section 8.4, 1975) and its relevance to MDPs is briefly reviewed. Denote /

1~ = 1 + .4~,

where Ae is the infinitesimal generator lbr the fixed policy g. Let { Y,} be a Markov chain with transition matrix P~ and let {N,~ be an independent Poisson process with constant rate r. If the subordinated process is defined by

then Y, is stochastically equivalent to X,. A straightforward calculation then leads to (see Serfozo, 1979) ~=

v,(x) = E a r '

C(X,,g(X,,e

13.2)

r

"'dtlX0 = ,- 1

I

~+r

~'~

(3.3)

where W,(x) = minEg

Vdx) = inf Vdx)

K

4)g(x)= l i m E , [

l(

C(X~,g(X,))dstXo= x

l

q~(x) = infqSg(xD. A policy g is said to be optimal for the discounted criterion [resp. the average criterion] if Vg(x)= V(x) for all .-(cS [resp. if qSg(x) = ~b(x) for all x E S ] .

~ (r/r + ~)"c(L,~lL))l¢o = x

A continuous-time stochastic control problem has therefore been converted to a discrete-time problem. Applying the discrete-time value-iteration algorithm (Ross, Chapter 6, 1970) to 13.4) the following result is obtained. Theorem 1. Suppose the cost function C(x, a ) can be expressed as C(x,a) = ~ .%,(x, + ~ t~Jx) i

3. Value-iteration algorithms jor C M Q N s The optimization problems defined in the previous section are semi-Markov decision problems with finite state and action spaces. Therefore, for the discounted case, the value-iteration algorithm given in Ross, Theorem 7.3 (1970) can be immediately applied in order to determine the optimal policy. Indeed, it follows from this theorem that V= lim ~;, where n [/,+a = TV, and 1,~ is arbitary. Here, T is defined on p. 158 of the same reference, and is an operator from ~ to ~,, where q, is the set of all real functions on S.

(3.4)

L n = 1)

I

i

13.5,

I

for all x 6 S and all aeA. Then l = l / a + rlim ~;,, where V,+ ~ = UV,, I/~)arbitrary, and where U is the operator on t) given by (Uv)(x) - r ~ 7~~(x) + ~ min{sij(x) + ~i;l~)- [v( x + e i - e3 i=1

r +

,~

-- v(x) ]: j e Ri(x)l + ~ 2u(x) min{tii(x) + -[v(x q- ej) -- t ( X ) ] : j e Ai{x)l. i-I

r + ~

(3.6)

Brief Paper

369

Moreover, any policy g that chooses the minimizing action in (3.6) with r replaced by (~t + r)V is optimal.

Moreover, with h = lim hk, any policy that chooses the minimizing actions in (3.10) with h replacing h, is optimal.

Remark. The requirement that the cost function satisfies (3.5) is not very restrictive; one special case is C(x, a) = C(x). It should

4. Example

also be observed that the operator U enjoys a decomposition property, namely, the minimization is separately performed over each node and input stream rather than over the entire action space. The long-run average-cost criterion is now examined. As in the discounted case, a value-iteration algorithm is given which has a decomposition property. But first a rich class of C M Q N s , which model m a n y realistic network topologies, is introduced.

Definition. A C M Q N is said to be feed-forward if it has the following properties: (i) Stations can be ordered so that Ri ~ {0,1 .... i -

1} for

i=l...N. (ii) there is an ~: > 0 such that #ii(x) > ~ whenever j~R~(x) and x~ > 0.

Theorem 2. For a feed-forward network there is an optimal stationary policy which attains the average cost ~b(x) for all initial states x. Moreover, 05(x) = w for some constant w.

Proof By Serfozo (1979) and Corollary 6 of Ross (1970) it suffices

This paper is concluded with an example that demonstrates the computational advantages of the operator U. Let N = 3 and L = 11. The routing sets are: R1 = {0}, Re = {0, 11, R3 = {0,1,2); S1 = {0, 1,2,31, $2 = {0, 1,2},

S3 =

{0, 1,Y,, S~ = {0,2,3},

$5 = {1,2,3}, S0 = {0,1}, $7 = {0,21, $8 = t0,3,, f ~ $10 = { 1 , 3 1 , $11 = f~-,-~. 9 :v $9 = .tl,2~,

This feed-forward network consists of three queues in series with queues 3, 2 and 1 at the head, middle and tail of the network, respectively. However, all customers do not receive service from each queue; they m a y skip queues depending on the routing policy. The intensities are given by

#ij(x) = [10j + 30(4 - i)] when xi > 0, 2ij(x) = (12 - i) + (3 -.j). and the cost function is given by C(x,a) = 100 X xi + ~ ( i - a i -

to show that for each initial state x and stationary policy g, there is an integer L such that

i=l

+ ~ ( 3 - ?tl)2i°~(x). Pg(~k = 0lYo = x) > 0

N

L = ~ nx n n=l

and z, = (0,0 . . . . . x,+x . . . . . . xN), z~. = 0, z0 = x. From the C h a p m a n - K o l m o g o r o v equation, N

e.(f,=01¢0=x)_> l q e . ( C , , = z . l ? 0 = z . _ , )

(3.8)

n=l

and since the C M Q N is feed-forward,

e~(C,, = a ~ o = ~._,)->

-~.

(3.9)

where M = max(M1, M 2.... MN). Combining (3.8) and (3.9) P~(]?I. = 01¢0 = x) - >

> 0,

is obtained and the proof is therefore complete. As a result of Theorem 2, the following algorithm is available to determine the optimal policy for the average-cost criterion (see Bertsekas, p. 345, 1976).

Corollary 1. For a feed-forward C M Q N with cost function satisfying (3.5), 05 = lim H,(O) where H.(x) is obtained by the following recursive equations:

+ ~ min{slj(x) + #ij(x) [h,(x + ej) - hn(x)]:j~Ri(x)} r

+ ~ min{ti~(x) + ).ij(x)[h,(x + eft - h.(x)]:j~Si(x)}, i-I

The first term in (4.1) is a holding cost proportional to the total number of customers in the network; the second and third terms are costs due to customers skipping queues because of internal and external routings, respectively. For this example, the longrun average cost for a policyf O:(x), consists of two components: the average number of customers in the system and the average number of queues skipped per unit time. Consider the discounted criterion. The cardinality of the action space is 124,416. Therefore, the operator T o f Ross, p. 158, (1970) would require a search in a space with 124,416 elements in order to perform each step of the value iteration algorithm. However, the operator U has a decomposition property which translates to a search in a space of cardinality 20 for each step of the value iteration algorithm. The reader can check that for this example the operator T does not have the decomposition property. Using the value iteration algorithm of Theorem 1 we obtained the optimal routing policy with ~ = 20 and M~ = 2 for i = 1,2, 3. The small buffer sizes were chosen to limit the cardinality of the state space, thereby permitting a compact presentation of the numerical results given in Fig. 1. For each state (xl, x2, x3) the value in the column labeled R~ is the station to which a customer is routed if he is served at station i. The value in the column S~ is the station to which a customer is routed if he arrives from stream i. In other words, after a transition in X~ to a state (xl, x2, x3) the "routing switches" are immediately set according to Fig. 1. These switches remain fixed until the next transition in Xt, i.e. an arrival or departure, and then the switches are reset according to the new state. Certain entries are empty in the columns labelled R~ because setting a switch for the corresponding state would be meaningless. For example, ifx = (0, 0, 0) the system is empty and the next transition in Xt would not result from a customer service. Thus, the entries corresponding to row x = (0, 0, 0) and columns labelled R1, R2, R3 are empty.

5. Conclusion

H,(x) = hn(x) i-I

(4.1)

i=1

(3.7)

where { Y,,] is the subordinated chain. For fixed x ~ S and policy g, let

l)#/~,(x)

i=1

r

(3.10) h,+ l(x) = H,+ l(x) - H,+ i(0), ho(x) is arbitrary.

It has been shown for the C M Q N that the use of the operator U is far superior to that of T i n the value iteration algorithm. The bounds on the rate of convergence of V. and of V. are the same; however, the calculation (UV.) (x) is generally much simpler than the calculation (TV.)(x). This was shown in the example of Section 4. For examples with action spaces of higher cardinality, the numerical advantages are even more evident. For networks with a small number of stations the calculation of an optimal routing policy is quite feasible. However, the

37t)

Brief Papcr

X

~

Xq

""

,xo,×.

R,

R2

R3

S

$2

$3

$4

S5

$6

$7

$8

"q

~i

0

0

0

'.:.164

3

2

3

3

3

1

2

3

2

3

0

0

!

24.384

2

2

2

!

2

2

i

2

3

2

3

3

0

0

2

18.447

2

2

2

1

2

2

i

2

0

2

:

2

0

1

0

13.222

I

-

3

I

3

3

3

I

2

3

i

3

3

0

I

i

16.583

i

I

i

i

I

0

1

i

2

3

l

3

3

0

i

2

20.731

i

i

I

1

l

0

i

i

2

0

2

1

2

0

2

0

16.058

i

-

3

i

I

3

3

1

0

]

I

3

3

0

2

i

19.497

1

1

1

1

1

0

1

1

0

3

l

3

]

0

2

2

23.977

1

1

1

I

i

0

I

1

0

0

I

1

0

l

0

0

12. 136

3

2

3

3

3

0

2

3

2

3

3

0

]

i

0

i

15 . 405

0

-

2

2

2

3

2

2

0

2

3

2

3

3

I

0

2

19. 503

0

-

2

2

2

0

2

2

0

2

0

2

I

2

1

I

0

14.276

0

0

-

3

0

3

3

3

0

2

3

2

3

3

I

1

i

17 .702

0

0

0

0

0

0

0

3

0

2

3

2

3

3

1

i

2

21 .896

0

0

0

0

0

0

0

2

0

2

0

2

1

2

1

2

0

17. 159

0

0

-

3

0

3

3

3

0

0

3

i

3

3

i

2

i

20.650

0

0

0

0

0

0

0

3

0

0

3

l

3

3

l

2

2

24,863

0

0

0

0

0

0

0

1

0

0

0

i

'.

0

2

0

0

13.860

0

-

3

2

3

3

3

0

2

3

2

3

3

2

0

I

17.162

0

-

2

2

2

0

2

2

0

2

3

2

3

3

2

0

2

21 .212

0

-

2

2

2

0

2

2

0

2

0

2

0

2

2

i

0

16.034

0

0

-

3

0

3

3

3

0

2

3

2

3

3

2

1

1

19.488

0

0

0

0

0

0

0

3

0

2

3

2

3

3

2

i

2

23.623

0

0

0

0

0

0

0

2

0

2

0

2

0

2

2

2

0

18.862

0

0

3

0

3

3

3

0

0

3

0

3

3

2

2

I

22.367

0

0

0

0

0

0

3

3

0

0

3

0

3

3

2

2

2

26. 400

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Iq(;. 1. Compact representation of numerical results for routing example.

techniques presented here would not be appropriate for networks with a large number of stations because the cardinality of the state space becomes unmanageable. For example, if M~ = M for all stations then [S[ = N M+ ~ and thus ISt increases exponentially with the number of stations. For large networks the techniques presented here can be used as guidance for finding suboptimal policies. For example, stations can be grouped and/or the dynamics approximated with bulk arrivals and departures so that a new C M Q N is created with a manageable state space. For example, if the actual buffer size for each station is 50, an optimal policy could be calculated for a modified network with bulk size 10 and buffers capable of accepting up to 5 "bulked customers". In this case, the cardinality of state space of the original network, 51 u, would be reduced to 6'~ for the modified network. Thus, it would be of interest to develop a systematic theory for grouping stations and choosing bulk arrival sizes.

ReJerences Bertsekas, D. (1976). Dynamic Programmin~ and Stochastic Control. Academic Press, New York.

Borkar, V. S. (1983). Controlled Markov chains and stochastic networks. SIAM J. Control, 21. 652 666. Cinlar, E. (1975). Introduction to Stochasti~ Proces~se~. Prentice- Hall, Englewood Cliffs, New Jersey. Ephremides, A., P. Varaiya and J. Walrand (1980). A simple dynamic routing problem. IEEE-AC, AC-25, 690 698. Hajek, B. (1984). Optimal control of two interacting service stations. IEEE Trana. Aut. Control, AC-29, 491 499. Lippman, S. A. (1975). Applying a new device in the optimization of queueing systems. Ops. Res., 23, 687 710. Rosberg, Z., P. Varaiya and J. Walrand (1982). Optimal control of service in tandem queues. IEEE-AC, 27, 600 609. Ross, S. (1970). Applied Probability Models with Optimization Application. Holden Day, San Francisco. Serfozo, R. F. (1979). An equivalence between continuous and discrete time Markov decision processes. Op,~ Res. 27, 616 620. Stidham Jr., S. (1985). Optimal control of admissions to a queueing system. IEEE-AC, AC-25, 426-439.

Optimal dynamic routing in Markov queueing networks

Optimal dynamic routing in Markov queueing networks

Recommend Documents