Operations Research Letters 41 (2013) 351–356
Contents lists available at SciVerse ScienceDirect
Operations Research Letters journal homepage: www.elsevier.com/locate/orl
Optimal admission control for tandem loss systems with two stations Daniel F. Silva a,∗ , Bo Zhang b , Hayriye Ayhan a a
H. Milton Stewart School of Systems and Industrial Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, United States
b
IBM T. J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, United States
article
info
Article history: Received 10 December 2012 Received in revised form 5 February 2013 Accepted 1 April 2013 Available online 10 April 2013
abstract We study a system of two queues in tandem with finite buffers, Poisson arrivals to the first station, and exponentially distributed service times at both stations. Losses are incurred either when a customer is rejected at the time of arrival to the first station or when the second station is full at the time of service completion at the first station. The objective is to determine the optimal admission control policy that minimizes the long-run average cost. © 2013 Elsevier B.V. All rights reserved.
Keywords: Admission control Loss systems Tandem queues Markov decision processes Stochastic dynamic programming
1. Introduction We consider a system with two stations in tandem and one server at each station. Arrivals to the system follow a Poisson process with rate λ and service times at each station follow an exponential distribution with rates µi , i = 1, 2. This note will mainly focus on the case that the first station has a buffer capacity of one, while the second station has an arbitrary finite buffer capacity. However, we also have results for systems with arbitrary finite buffers at both stations. Upon each arrival a gatekeeper has to decide (based on full knowledge of the state of the system) whether to admit or reject the incoming arrival. If an arrival is not admitted, a cost c1 is incurred. If a customer completes service at the first station and at that time the second station is full, the customer is lost and a penalty cost of c2 is incurred. Note that if the first station is full at the time of an arrival, then the incoming customer has to be rejected and the cost of c1 is incurred. Our objective is to determine an admission policy at the first station that minimizes the long-run average cost. Communication systems such as the Internet can be modeled as queuing networks with loss; see for example Bertsekas and Gallagher [1] or Spicer and Ziedins [12] and the references therein for a description of these loss systems, and where they arise. To the best of our knowledge, the first paper to provide exact analytical results for a long-run average cost optimal policy for tandem queues with
∗
Corresponding author. Tel.: +1 4049914840. E-mail address:
[email protected] (D.F. Silva).
0167-6377/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.orl.2013.04.001
loss was Zhang and Ayhan [13] which considered a system with a finite buffer at the first station and a unitary buffer at the second station. They find that the optimal policy is of threshold type on the cost c2 and provide a closed form expression for this threshold (which does not depend on the buffer size). Several related control problems in tandem queues can be found in the literature. The most relevant to this research are Ku and Jordan [5,7,6], Sheu and Ziedins [11], Ghoneim and Stidham [3], Hordijk and Koole [4], Chang and Chen [2] and Leskelä and Resing [8]. In the interest of space, we refer the reader to Zhang and Ayhan [13] for details on these references. The remainder of this paper is organized as follows. In Section 2, we provide a Markov Decision Process formulation of the problem and provide some properties of the optimal policy for the general problem. We then focus on the more specific case of having a unitary buffer at the first station and completely characterize the optimal policy for this case. However, since the exact implementation of the optimal policy can be difficult for larger buffer sizes, in Section 3 we present numerical results that show that a heuristic policy has near optimal long-run average cost performance for larger systems. 2. Characterization of the optimal policy In this section, we begin by modeling the system as a Markov Decision Process and proceed to prove the intuitive result that if c1 ≥ c2 , then the optimal policy always accepts incoming customers unless the first buffer is full at the time of arrival. Afterwards, we focus on the specific problem of unitary buffer at the
352
D.F. Silva et al. / Operations Research Letters 41 (2013) 351–356
first station and show that there are only two policies that could be optimal depending on a threshold of c2 . We provide closed form expressions for the threshold when the buffer size at the second station is 10 or less. Finally, we investigate the effects of choosing the wrong policy. 2.1. Preliminaries We use uniformization (see Lippman [9]) similarly to Zhang and Ayhan [13] to formulate the admission control problem as a discrete-time Markov decision process (MDP). Let Bi , i = 1, 2 be the buffer sizes at each station. Without loss of generality, we assume µ1 + µ2 + λ = 1. We then have a discrete-time Markov chain with state space S = {(q1 , q2 ) ∈ Z2 : 0 ≤ q1 ≤ B1 , 0 ≤ q2 ≤ B2 }, where qi is the number of customers at station i = 1, 2. Let 0 denote rejecting the next potential arrival and 1 denote accepting the next potential arrival. Then the sets of allowable actions are A(q1 ,q2 ) = {0} if q1 = B1 , and A(q1 ,q2 ) = {0, 1}, for any other (q1 , q2 ) ∈ S. Let p(·|(q1 , q2 ), d) denote the transition probability when action d is taken in state (q1 , q2 ). Let 1A = 1 if A is true, and 0 otherwise. We have
µ1 , s = ((q1 − 1)+ , q2 + 1{q1 >0,q2
0 and q2 =B2 } . This model is unichain since regardless of the policy and the initial state, state (0, 0) can be reached with probability 1, due to a long enough interarrival time. Therefore, (0, 0) is recurrent, and all the states accessible from (0, 0), together with (0, 0), form a recurrent class; other states form a set of transient states. Hence, the existence of a deterministic stationary optimal policy is guaranteed by the finiteness of the state space and the action space (see Theorem 9.1.8 in Puterman [10]). Therefore, throughout the remainder of this paper, a policy π is identified with a binary-valued function π : S → {0, 1}. Let Sd ≡ {(q1 , q2 ) ∈ S : 0 ≤ q1 ≤ B1 − 1} be the set of states at which a choice of acceptance or rejection needs to be made. Each possible combination of actions to be taken at each state in Sd comprises a stationary deterministic policy π . First, we analyze the corresponding finite-horizon model under the expected total reward criterion. Let vn,π (s) be the expected total reward over n periods under policy π if the initial state is s and vn (s) be the optimal n-period reward with the initial state s, or vn (s) = infπ vn,π (s). Let (a)+ = max {a, 0}. Then the optimality equation ∀(q1 , q2 ) ∈ S is as follows:
vn (q1 , q2 ) = µ1 [−c2 1{q1 >0 and q2 =B2 } + vn−1 ((q1 − 1)+ , q2 + 1{q1 >0 and q2
+ λ max{−c1 + vn−1 (q1 , q2 ), vn−1 (q1 + 1, q2 )}. The optimal action at (q1 , q2 ) ∈ Sd with n periods remaining is dn (q1 , q2 ) =
0, 1,
if vn−1 (q1 + 1, q2 ) < −c1 + vn−1 (q1 , q2 ) if vn−1 (q1 + 1, q2 ) ≥ −c1 + vn−1 (q1 , q2 ).
(1)
The following proposition is intuitive and will be useful in our analysis. Proposition 1. vn (q1 + 1, q2 ) ≥ vn (q1 , q2 ) − max {c1 , c2 } , ∀q1 < B1 , q2 ≤ B2 , ∀n ∈ Z+ . Proof. Consider two systems with the same number of periods remaining in the decision horizon: system I under any policy π
with the initial state (q1 , q2 ), and system II with the initial state (q1 + 1, q2 ) for any q1 < B1 . Let system II operate under the same policy π as if the initial state were (q1 , q2 ). Mark the last customer at station 1 in system II and always hold the marked customer as late as possible; specifically, serve the marked customer only if there is no one else at his station, and if there is an arrival to his station during his service, his service is preempted (in this case, his remaining service time, when he resumes service, equals in distribution to his original service time by the exponential assumption). Also, if a loss occurs at the marked customer’s station before he leaves the system, he is immediately unmarked and treated the same as all other customers afterwards. With the two systems operating the aforementioned way, there are only two possibilities: (1) the marked customer leaves system II before any difference arises in the occurrence of losses in these two systems; after that, both systems evolve identically and thus the same total cost is incurred in both systems by the end of the decision horizon. (2) One more loss occurs in system II than in system I at one of the two stations (where the marked customer is located at that moment), incurring an extra cost c1 or c2 , and after that both systems evolve identically. In other words, by initially holding one more customer at station 1, system II either has the same total cost, or c1 or c2 more than system I. Because this is true for any policy π , the proposition then follows. 2.2. Optimal policy We will use a similar convention as proposed in Zhang and Ayhan [13] for naming a particular policy. Specifically, consider a policy where the gatekeeper admits an arrival whenever possible, we call this policy πG or greedy policy. Now consider the case where c1 ≥ c2 . We can use Proposition 1 to obtain the following. Theorem 1. If c1 ≥ c2 , the greedy policy πG is optimal for the n-period problem under the expected total cost criterion, ∀n ∈ Z+ . Proof. It follows from c1 ≥ c2 and Proposition 1 that
vn (q1 + 1, q2 ) ≥ vn (q1 , q2 ) − c1 . This, together with Eq. (1), then implies the optimality of the greedy policy. Corollary 1. If c1 ≥ c2 , the greedy policy πG is long-run average reward optimal. For the rest of the paper we focus on the case where B1 = 1 and B2 = B < ∞. Consider a policy where the gatekeeper admits an arrival whenever the current state (q1 , q2 ) satisfies q1 = 0 and q2 < B. We call this policy πP or prudent policy throughout. Since the capacity of the first station is now 1, it is intuitive that a deterministic policy that rejects at (0, q2 ) for some q2 < B cannot be optimal. This is expected as any arrival admitted into the system at such states cannot be lost, so by rejecting at these states the system would incur a cost of c1 when there is zero probability of incurring c2 . In the next proposition we formalize this idea. Proposition 2. For a given policy π it is always optimal to accept at (0, q2 ) for any q2 < B. Or conversely a deterministic policy that rejects at (0, q2 ) for some q2 < B cannot be optimal. Proof. We prove by induction. The result holds for q2 = 0 since rejecting at (0, 0) yields the long-run average cost λc1 per time unit; accepting at (0, 0) while rejecting at all other states yields λc1 δ , where δ , the long-run fraction of time that the resulting system is not in state (0, 0), must be strictly less than 1. Now suppose the desired result holds for q2 = 0, 1, . . . , j − 1, where j < B. Then we show it is also true for q2 = j.
D.F. Silva et al. / Operations Research Letters 41 (2013) 351–356
Consider two systems: system I under any policy π , under which the rejection action is taken at state (0, j), and system II also under π except that the acceptance action is taken at state (0, j). Both systems start with the same initial state. Because, for system I, all states (q1 , q2 ) with q1 + q2 ≥ j + 1 are transient, we can assume that the policy π prescribes the rejection action for all these states. Also, by the induction hypothesis, it suffices for us to assume that the policy π prescribes the acceptance action for (0, q2 ), where q2 = 0, 1, . . . , j − 1. Note that both assumptions are effective for system II as well, because it also operates under π (except at state (0, j)). Consider any sample path. If state (0, j) is never seen by an arrival on this sample path, both systems evolve identically. We now show that, if (0, j) is ever seen by an arrival at some point in time, system II will have no more loss cost than system I from that point, say T1 , to the next time point when both systems reach the same state, say T2 . First, we note that no loss ever occurs at station 2 in either system because j < B and there are never more than j + 1 customers in either system. Second, at any point in time between T1 and T2 the states of the two systems at a customer arrival epoch must have one of the following forms.
• System I at (0, j) and system II at (1, j). The arrival is rejected in both systems in this case. • System I at (0, i) and system II at (0, i + 1), where i < j. In this case, the arrival is accepted in both systems by the induction hypothesis and also, for i = j − 1, using the assumption that the acceptance action is taken at state (0, j) in system II. • System I at (0, j) and system II at (0, j + 1). In this case, the arrival is rejected in both systems. • System I at (0, i) and system II at (1, i), for some i < j. The arrival is accepted in system I by induction hypothesis but rejected in system II. Between T1 and T2 this can only occur once, after which both systems immediately reach the same state. Therefore, in the interval (T1 , T2 ], at most c1 more cost is incurred in system II than in system I. Also, because, at time T1 , system I has a rejection/loss cost c1 while system II does not have any cost, we conclude that system II has no more loss cost than system I from T1 to T2 . Because π is arbitrary, it is optimal to accept at (0, j). So far, we have shown that for a given value of B and a known set of parameters either the prudent policy must be optimal or the greedy policy must be optimal. Also, that if c1 ≥ c2 , then πG is always optimal. This leaves the question of determining which of the two policies is optimal when c1 < c2 . In order to answer the question we compute the long-run average cost under both policies and compare the cost values. Define CG (B) and CP (B) as the long-run average cost of operating a system with buffer B under policies πG and πP , respectively. Let p(q1 ,q2 ) , (q1 , q2 ) ∈ S be the stationary distribution of the Continuous Time Markov Chain (CTMC) model of this system operating under the prudent policy, that is S = {(q1 , q2 ) ∈ Z2 : 0 ≤ q1 ≤ 1, 0 ≤ q2 ≤ B} and pˆ (q1 ,q2 ) , (q1 , q2 ) ∈ S be the stationary distribution of the resulting CTMC when operating under the greedy policy πG . Define
B −1 p(1,i) − pˆ (1,i) + p(0,B) λ i=0 − 1 . c∗ (B) = c1 µ1 pˆ (1,B)
Proposition 3. If for a fixed buffer size B, c2 > c∗ (B), then CP (B) < CG (B) and πP is long-run average cost optimal for this system; otherwise πG is optimal.
353
Proof. Note that p(1,B) = 0. Then, for fixed B:
CP (B) = c1 λ
B−1
p(1,i) + p(0,B)
.
i =0
Similarly, for the greedy policy: CG (B) = c1 λ
B
pˆ (1,i) + c2 µ1 pˆ (1,B) .
i =0
The result then immediately follows from Proposition 2 and the comparison of CG (B) with CP (B). Next we provide closed form expressions for c∗ (B) when B ≤ 10 (since the computation of c∗ (B) for B > 10 is difficult and tedious). Following the procedure described above, we obtain expressions of the form c∗ (B) = c1
1+
α(B) . β(B)
We have a closed form expression for α(B), when B ≤ 10, which is ⌈ B/2⌉
α(B) = µ
B+1 2
B −i λ + 1{(B−i)>(i−1)} λi−1 (µ1 + µ2 )B−2i+1
i =1 i−1
×
µ
i−j−1 1
µ
j 2
j =0
i−1 j
j 1
j! k=1
(B − i + k)
for β(B) we get an expression for each B
β(1) = λµ1 + λµ2 + µ1 µ2 β(2) = µ1 µ22 (µ1 + µ2 ) + λµ2 (µ1 + µ2 )2 + λ2 µ21 + µ1 µ2 + µ22 β(3) = µ1 µ32 (µ1 + µ2 )2 + λ3 µ31 + µ21 µ2 + µ1 µ22 + µ32 + λµ22 µ31 + 3µ21 µ2 + 4µ1 µ22 + µ32 + λ2 µ2 µ31 + 2µ21 µ2 + 3µ1 µ22 + 2µ32 β(4) = µ1 µ42 (µ1 + µ2 )3 + λ4 µ41 + µ31 µ2 + µ21 µ22 + µ1 µ32 + µ42 + λµ32 µ41 + 4µ31 µ2 + 8µ21 µ22 + 6µ1 µ32 + µ42 + λ3 µ2 µ41 + 2µ31 µ2 + 3µ21 µ22 + 4µ1 µ32 + 3µ42 + λ2 µ22 µ41 + 3µ31 µ2 + 6µ21 µ22 + 8µ1 µ32 + 3µ42 β(5) = µ1 µ52 (µ1 + µ2 )4 + λµ42 (µ1 + µ2 )2 × µ31 + 3µ21 µ2 + 6µ1 µ22 + µ32 + λ2 µ32 (µ51 + 4µ41 µ2 + 10µ31 µ22 + 18µ21 µ32 + 16µ1 µ42 + 4µ52 ) + λ3 µ22 × (µ51 + 3µ41 µ2 + 6µ31 µ22 + 10µ21 µ32 + 13µ1 µ42 + 6µ52 ) + λ4 µ2 µ51 + 2µ41 µ2 + 3µ31 µ22 + 4µ21 µ32 + 5µ1 µ42 + 4µ52 + λ5 µ51 + µ41 µ2 + µ31 µ22 + µ21 µ32 + µ1 µ42 + µ52 β(6) = µ1 µ62 (µ1 + µ2 )5 + λµ52 (µ1 + µ2 )3 × µ31 + 3µ21 µ2 + 7µ1 µ22 + µ32 + λ2 µ42 (µ61 + 5µ51 µ2 + 15µ41 µ22 + 33µ31 µ32 + 44µ21 µ42 + 27µ1 µ52 + 5µ62 ) + λ3 µ32 (µ61 + 4µ51 µ2 + 10µ41 µ22 + 20µ31 µ32 + 33µ21 µ42 + 32µ1 µ52 + 10µ62 ) + λ4 µ22 (µ61 + 3µ51 µ2 + 6µ41 µ22 + 10µ31 µ32 + 15µ21 µ42 + 19µ1 µ52 + 10µ62 ) + λ5 µ2 (µ61 + 2µ51 µ2 + 3µ41 µ22 + 4µ31 µ32 + 5µ21 µ42 + 6µ1 µ52 + 5µ62 )
354
D.F. Silva et al. / Operations Research Letters 41 (2013) 351–356
+ λ6 µ61 + µ51 µ2 + µ41 µ22 + µ31 µ32 + µ21 µ42 + µ1 µ52 + µ62 β(7) = µ1 µ72 (µ1 + µ2 )6 + λµ62 (µ1 + µ2 )4 × µ31 + 3µ21 µ2 + 8µ1 µ22 + µ32 + λ2 µ52 (µ1 + µ2 )2 × (µ51 + 4µ41 µ2 + 12µ31 µ22 + 26µ21 µ32 + 29µ1 µ42 + 6µ52 ) + λ3 µ42 (µ71 + 5µ61 µ2 + 15µ51 µ22 + 35µ41 µ32 + 68µ31 µ42 + 93µ21 µ52 + 65µ1 µ62 + 15µ72 ) + λ4 µ32 (µ71 + 4µ61 µ2 + 10µ51 µ22 + 20µ41 µ32 + 35µ31 µ42 + 54µ21 µ52 + 55µ1 µ62 + 20µ72 ) + λ5 µ22 (µ71 + 3µ61 µ2 + 6µ51 µ22 + 10µ41 µ32 + 15µ31 µ42 + 21µ µ + 26µ1 µ + 15µ ) + λ µ2 (µ + 2µ µ2 2 1
5 2
6 2
7 2
6
7 1
6 1
+ 3µ51 µ22 + 4µ41 µ32 + 5µ31 µ42 + 6µ21 µ52 + 7µ1 µ62 + 6µ72 ) + λ7 (µ71 + µ61 µ2 + µ51 µ22 + µ41 µ32 + µ31 µ42 + µ21 µ52 + µ1 µ62 + µ72 ) β(8) = µ1 µ (µ1 + µ2 ) + λµ (µ1 + µ2 ) × µ31 + 3µ21 µ2 + 9µ1 µ22 + µ32 + λ2 µ62 (µ1 + µ2 )3 × µ51 + 4µ41 µ2 + 13µ31 µ22 + 30µ21 µ32 + 37µ1 µ42 + 7µ52 + λ3 µ52 µ81 + 6µ71 µ2 + 21µ61 µ22 + 56µ51 µ32 + 124µ41 µ42 + 210µ31 µ52 + 221µ21 µ62 + 116µ1 µ72 + 21µ82 + λ4 µ42 × µ81 + 5µ71 µ2 + 15µ61 µ22 + 35µ51 µ32 + 70µ41 µ42 + 124µ31 µ52 + 170µ21 µ62 + 130µ1 µ72 + 35µ82 + λ5 µ32 × µ81 + 4µ71 µ2 + 10µ61 µ22 + 20µ51 µ32 + 35µ41 µ42 + 56µ31 µ52 + 82µ21 µ62 + 86µ1 µ72 + 35µ82 + λ6 µ22 µ81 + 3µ71 µ2 8 2
7
7 2
5
+ 6µ61 µ22 + 10µ51 µ32 + 15µ41 µ42 + 21µ31 µ52 + 28µ21 µ62 + 34µ1 µ72 + 21µ82 + λ7 µ2 µ81 + 2µ71 µ2 + 3µ61 µ22 + 4µ51 µ32 + 5µ41 µ42 + 6µ31 µ52 + 7µ21 µ62 + 8µ1 µ72 + 7µ82 + λ8 × µ81 + µ71 µ2 + µ61 µ22 + µ51 µ32 + µ41 µ42 + µ31 µ52 + µ21 µ62 + µ1 µ72 + µ82 β(9) = µ1 µ92 (µ1 + µ2 )8 + λµ82 (µ1 + µ2 )6 µ31 + 3µ21 µ2 + 10µ1 µ22 + µ32 + λ2 µ72 (µ1 + µ2 )4 µ51 + 4µ41 µ2 + 14µ31 µ22 + 34µ21 µ32 + 46µ1 µ42 + 8µ52 + λ3 µ62 (µ1 + µ2 )2 × µ71 + 5µ61 µ2 + 17µ51 µ22 + 45µ41 µ32 + 101µ31 µ42 + 164µ21 µ52 + 133µ1 µ62 + 28µ72 + λ9 µ91 + µ81 µ2 + µ71 µ22 + µ61 µ32 + µ51 µ42 + µ41 µ52 + µ31 µ62 + µ21 µ72 + µ1 µ82 + µ92 + λ8 µ2 µ91 + 2µ81 µ2 + 3µ71 µ22 + 4µ61 µ32 + 5µ51 µ42 + 6µ41 µ52 + 7µ31 µ62 + 8µ21 µ72 + 9µ1 µ82 + 8µ92 + λ7 µ22 µ91 + 3µ81 µ2 + 6µ71 µ22 + 10µ61 µ32 + 15µ51 µ42 + 21µ41 µ52 + 28µ31 µ62 + 36µ21 µ72 + 43µ1 µ82 + 28µ92 + λ6 µ32 µ91 + 4µ81 µ2
+ 15µ71 µ22 + 35µ61 µ32 + 70µ51 µ42 + 126µ41 µ52 + 208µ31 µ62 + 283µ21 µ72 + 231µ1 µ82 + 70µ92 3 9 9 7 β(10) = µ1 µ10 µ1 + 3µ21 µ2 2 (µ1 + µ2 ) + λµ2 (µ1 + µ2 ) + 11µ1 µ22 + µ32 + λ2 µ82 (µ1 + µ2 )5 µ51 + 4µ41 µ2 + 15µ31 µ22 + 38µ21 µ32 + 56µ1 µ42 + 9µ52 + λ3 µ72 (µ1 + µ2 )3 × µ71 + 5µ61 µ2 + 18µ51 µ22 + 50µ41 µ32 + 119µ31 µ42 9 + 207µ21 µ52 + 180µ1 µ62 + 36µ72 + λ10 µ10 1 + µ1 µ2 +µ81 µ22 + µ71 µ32 + µ61 µ42 + µ51 µ52 + µ41 µ62 + µ31 µ72 9 8 2 + µ21 µ82 + µ1 µ92 + µ10 + λ9 µ2 µ10 1 + 2µ1 µ2 + 3µ1 µ2 2 +4µ71 µ32 + 5µ61 µ42 + 6µ51 µ52 + 7µ41 µ62 + 8µ31 µ72 + 9µ21 µ82 9 + 10µ1 µ92 + 9µ10 + λ8 µ22 µ10 2 1 + 3µ1 µ2 + 6µ81 µ22 + 10µ71 µ32 + 15µ61 µ42 + 21µ51 µ52 + 28µ41 µ62 + 36µ31 µ72 + 45µ21 µ82 + 53µ1 µ92 + 36µ10 2 9 8 2 7 3 + λ7 µ32 µ10 1 + 4µ1 µ2 + 10µ1 µ2 + 20µ1 µ2 + 35µ61 µ42 + 56µ51 µ52 + 84µ41 µ62 + 120µ31 µ72 9 + 163µ21 µ82 + 176µ1 µ92 + 84µ10 + λ4 µ62 µ10 2 1 + 7µ1 µ2 + 28µ81 µ22 + 84µ71 µ32 + 210µ61 µ42 + 460µ51 µ52 + 862µ41 µ62 +1208µ31 µ72 + 1064µ21 µ82 + 490µ1 µ92 + 84µ10 2 9 8 2 7 3 6 4 + λ6 µ42 µ10 1 + 5µ1 µ2 + 15µ1 µ2 + 35µ1 µ2 + 70µ1 µ2 + 126µ51 µ52 + 210µ41 µ62 + 328µ31 µ72 + 441µ21 µ82 9 8 2 + 378µ1 µ92 + 126µ10 + λ5 µ52 µ10 2 1 + 6µ1 µ2 + 21µ1 µ2 + 56µ71 µ32 + 126µ61 µ42 + 252µ51 µ52 + 460µ41 µ62 + 732µ31 µ72 + 840µ21 µ82 + 532µ1 µ92 + 126µ10 . 2 However, obtaining an expression for c∗ (B) for B > 10 appears to be difficult and the results of Section 3 illustrate that one can develop a near optimal heuristic without computing c∗ (B) for larger B. 2.3. Trends in probabilities In order to understand the impact of choosing the wrong policy on the optimal cost, we investigate the properties of the probabilities of being in states where the system is so full that the gatekeeper would reject arrivals under the prudent policy but accept them under the greedy policy. These propositions follow immediately by computing the stationary distributions of the corresponding continuous time Markov chain. Proposition 4. Under πP the stationary probability of being in state (0, B), p(0,B) is strictly decreasing in B for 0 ≤ B ≤ 10. Now consider the greedy policy. Proposition 5. Under πG the stationary probability of being in state (0, B), pˆ (0,B) is strictly decreasing in B for 0 ≤ B ≤ 10.
+ 10µ71 µ22 + 20µ61 µ32 + 35µ51 µ42 + 56µ41 µ52 + 84µ31 µ62 + 118µ21 µ72 + 126µ1 µ82 + 56µ92 + λ4 µ52 µ91 + 6µ81 µ2
Finally, consider state (1, B) under the greedy policy, where cost c2 may be incurred.
+ 21µ71 µ22 + 56µ61 µ32 + 126µ51 µ42 + 250µ41 µ52 + 411µ31 µ62 + 456µ21 µ72 + 266µ1 µ82 + 56µ92 + λ5 µ42 µ91 + 5µ81 µ2
Proposition 6. Under πG the stationary probability of being in state (1, B), pˆ (1,B) is strictly decreasing in B for 0 ≤ B ≤ 10.
D.F. Silva et al. / Operations Research Letters 41 (2013) 351–356
355
Fig. 1. Average p(0, B) for increasing values of B for cases A–E.
Fig. 2. Average pˆ (0, B) for increasing values of B for cases A–E.
The above propositions lead us to propose the following conjecture. Conjecture 1. Under πP as B → ∞, p(0,B) converges monotonically to some value p∗(0,B) ≥ 0. Under πG as B → ∞ the probabilities pˆ (0,B) and pˆ (1,B) converge monotonically to values pˆ ∗(0,B) ≥ 0 and pˆ ∗(1,B) ≥ 0, respectively. As there is no closed form expression for the stationary probabilities for general values of B, we provide numerical evidence to support our conjecture. We conduct numerical experiments considering a system where B attains all integer values from 1 to 50, and also multiples of 50 up to 500. We generate 1000 sets of parameters independently and solve the problem numerically for each set with each value of the buffer size. In order to generate λ, µ1 , and µ2 , we sample from a continuous uniform distribution between 1 and 100, while c1 , c2 are sampled from a continuous uniform distribution between 1 and 1000. If c1 ≥ c2 , we discard those cost values and generate a new set until c1 < c2 , since we know (from Theorem 1) that if c1 ≥ c2 then πG is optimal. In order to analyze the results we divided the datasets into six cases as explained in Table 1.
Table 1 Cases studied. Case
Description
A B C D E F
λ < µ1 < µ2 λ < µ2 < µ1 µ1 < λ < µ2 µ1 < µ2 < λ µ2 < λ < µ1 µ2 < µ1 < λ
Fig. 1 illustrates how p(0, B) changes with respect to B for each of the cases above. Fig. 2 shows pˆ (0,B) for different values of B. The results for pˆ (1,B) follow a similar pattern to those shown in Fig. 2 and were omitted in the interest of space. All probability values in these figures were calculated as averages for a fixed buffer size over the experiments described above. Our observations show that the probability of being in the states of interest decreases dramatically as a function of the buffer size until it practically vanishes by B = 10 for cases A and C and by B = 50 for cases B and D but never vanishes for cases E and F (i.e., when µ2 is smaller than µ1 and λ). For cases E and F, as B increases, all probabilities of interest monotonically decrease to a strictly positive value.
356
D.F. Silva et al. / Operations Research Letters 41 (2013) 351–356
These results suggest that choosing a non-optimal policy (amongst the prudent and greedy policies) may not have a significant impact on the cost when parameters satisfy cases A–D, but might have an impact when they satisfy cases E and F. In particular, we can conjecture that c∗ (10) can be a good approximation for c∗ (B) for B > 10 in cases A–D; however, the same may not hold in cases E and F. Thus, it is important to develop heuristic policies that yield good cost performance for larger values of B under these scenarios as the risk remains that we will incur unnecessary costs by choosing the wrong policy. 3. Heuristic and numerical experiments As we have shown, the optimal policy is characterized by a threshold c∗ (B). We provided closed form expressions for c∗ (B) for B ≤ 10. However, it is difficult to obtain a closed form expression for c∗ (B) for B > 10. In the previous section we saw that for large values of B, the probabilities of being in a full state vanish quickly as B increases for all systems where µ2 > λ and/or µ2 > µ1 but they are not negligible otherwise (namely cases E and F). Our numerical results below show that for a wide range of parameters c∗ (10) can be used whenever B > 10 and still obtain near optimal results. For the datasets described in Section 2.3, we calculated both the optimal long-run average cost and the long-run average cost under a heuristic policy, which uses the optimal threshold if B ≤ 10 and uses c∗ (10) otherwise. The average percentage of additional cost incurred over the optimal cost for the heuristic policy was less than 0.01% for all six cases A–F. The worst case performance of the heuristic policy was also close to the optimal policy, with worst case additional cost incurred over the optimal cost of 0.08% for case
E and 0.02% for case F, and less than 0.001% for cases A–D. Thus, the heuristic policy has near optimal performance in all cases. Acknowledgment This work was supported in part by the National Science Foundation under grants CMMI-0856600 and CMMI-0969747. References [1] D. Bertsekas, R. Gallager, Data Networks, second ed., Prentice-Hall, Inc., NJ, 1992. [2] K.-H. Chang, W.-F. Chen, Admission control policies for two-stage tandem queues with no waiting spaces, Comput. Oper. Res. 30 (4) (2003) 589–601. [3] H.A. Ghoneim, S. Stidham, Control of arrivals to two queues in series, European J. Oper. Res. 21 (3) (1985) 399–409. [4] A. Hordijk, G. Koole, On the shortest queue policy for the tandem parallel queue, Probab. Engrg. Inform. Sci. 6 (1992) 63–79. [5] C.-Y. Ku, S. Jordan, Access control to two multiserver loss queues in series, IEEE Trans. Automat. Control 42 (7) (1997) 1017–1023. [6] C.-Y. Ku, S. Jordan, Access control of parallel multiserver loss queues, Perform. Eval. 50 (4) (2002) 219–231. [7] C.-Y. Ku, S. Jordan, Near optimal admission control for multiserver loss queues in series, European J. Oper. Res. 144 (1) (2003) 166–178. [8] L. Leskelä, J. Resing, A tandem queueing network with feedback admission control, in: Proceedings of the 1st EuroFGI international conference on Network control and optimization, NET-COOP’07, 2007. [9] S.A. Lippman, Applying a new device in the optimization of exponential queuing systems, Oper. Res. 23 (4) (1975) 687–710. [10] M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley-Interscience, 1994. [11] R.-S. Sheu, I. Ziedins, Asymptotically optimal control of parallel tandem queues with loss, Queueing Syst. 65 (3) (2010) 211–227. [12] S. Spicer, I. Ziedins, User-optimal state-dependent routeing in parallel tandem queues with loss, J. Appl. Probab. 43 (2006) 274–281. [13] B. Zhang, H. Ayhan, Optimal admission control for tandem queues with loss, IEEE Trans. Automat. Control 58 (1) (2013) 163–167.