Computers & Operations Research 39 (2012) 2323–2336
Contents lists available at SciVerse ScienceDirect
Computers & Operations Research journal homepage: www.elsevier.com/locate/caor
Lagrangian relaxation and constraint generation for allocation and advanced scheduling Yasin Gocgun a, Archis Ghate b,n a b
Sauder School of Business, University of British Columbia, Vancouver, Canada Industrial and Systems Engineering, University of Washington, Seattle, USA
a r t i c l e i n f o
abstract
Available online 25 November 2011
Diverse applications in manufacturing, logistics, health care, telecommunications, and computing require that renewable resources be dynamically scheduled to handle distinct classes of job service requests arriving randomly over slotted time. These dynamic stochastic resource scheduling problems are analytically and computationally intractable even when the number of job classes is relatively small. In this paper, we formally introduce two types of problems called allocation and advanced scheduling, and formulate their Markov decision process (MDP) models. We establish that these MDPs are ‘‘weakly coupled’’ and exploit this structural property to develop an approximate dynamic programming method that uses Lagrangian relaxation and constraint generation to efficiently make good scheduling decisions. In fact, our method is presented for a general class of large-scale weakly coupled MDPs that we precisely define. Extensive computational experiments on hundreds of randomly generated test problems reveal that Lagrangian decisions outperform myopic decisions with a statistically significant margin. The relative benefit of Lagrangian decisions is much higher for advanced scheduling than for allocation scheduling. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Approximate dynamic programming Resource allocation Scheduling
1. Introduction In many operations management applications, multiple renewable resources need to be dynamically allocated to different types of job service requests arriving randomly over time. Time is often slotted, arriving requests are reviewed periodically, and jobs need to be serviced so as to optimize some system-performance metric. Depending on the application, job classes may be characterized by their arrival rates, resource requirements, revenue generated on completion, waiting and penalty costs, and more loosely, plannerassigned priorities. In some applications, arriving requests may either be served immediately or rejected; in other situations, arriving requests may be scheduled to future service slots in a booking horizon. The former is called allocation scheduling and the latter is termed advanced scheduling. Both allocation and advanced scheduling are common in health care operations management. For example, in a diagnostic setting such as a CT or an MRI section, jobs correspond to patient requests for scans of different types (head, abdomen), of varying urgency (stat, routine), and from distinct patient-types (inpatient, outpatient); image scanners are the resources and time is typically slotted into
n
Corresponding author. E-mail addresses:
[email protected] (Y. Gocgun),
[email protected] (A. Ghate). 0305-0548/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2011.11.017
15- or 30-min slots [11,15,16,21]. In hospital capacity management, jobs relate to daily elective patient requests for hospital admissions, and resources correspond to equipment, hospital beds, and personnel such as residents, doctors, and nurses [9,19,26]. Patrick et al. [27] recently studied problems where patients of different types are scheduled to future days in a diagnostic facility. Similarly, in their extensive review of surgery scheduling practices, Erdogan and Denton [7] state that patient surgeries may be scheduled several days in advance. An advanced scheduling model of elective surgery scheduling is presented in [10]. Similar situations arise in applications beyond health care. In maintenance and repair facilities, jobs correspond to daily requests for inspecting or repairing different types of machinery whereas resources are personnel, and testing or repair equipment [28]. In a manufacturing system, jobs correspond to orders arriving daily for different products and resources include manufacturing equipment [23,32]. In military applications such as reconnaissance and combat planning, jobs relate to requests for specific mission tasks with airplanes, ships, and crew as the resources [1]. In communication networks, jobs correspond to requests for data, voice, or video transmission and the resource is bandwidth [4,30,36]. In high performance computing facilities such as a supercomputer or a multi-processor machine with an operating system that allows dynamic resource reconfiguration, jobs correspond to computing tasks and resources to CPUs, memory, bandwidth, and storage space [35].
2324
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
In this paper, we precisely define certain classes of allocation and advanced scheduling problems and build their Markov decision process (MDP) [5,29,31] models. These MDPs are intractable to exact solution methods. We therefore employ an approximation framework rooted in a key high-level structural observation that we make about these MDPs—the resource constraints are the only link that interconnects decisions made regarding different job classes. So if somehow these constraints were appropriately relaxed as a part of a computational approach, the resulting problem decomposition should help to overcome the curse of dimensionality while searching for approximate solutions. This places our models within the class of weakly coupled MDPs and a recently developed Lagrangian relaxation technique that we review in Appendix A is in principle applicable [2,18]. However, a linear program that needs to be solved in the key step of the Lagrangian relaxation method is intractable in our MDPs. We therefore develop a value function approximation and constraint generation technique to solve this linear program approximately. In fact, we present this approach in the more general context of large-scale weakly coupled MDPs that we precisely define in Section 2, and apply it to our allocation and advanced scheduling MDPs in Sections 3 and 4, respectively.
2. Large-scale weakly coupled MDPs We define a class of weakly coupled MDPs where, for each iA I , the sets Xi and U i ðxi Þ grow exponentially in some positive integers Ki and Li, respectively. We use Ki to denote the set f1, 2, . . . ,K i g. For each iA I , let Xik be finite non-empty sets for k A Ki , and suppose X i DX i1 X i2 X iK i . That is, state xi is xi ¼ ðxi1 ; xi2 ; . . . ; xiK i Þ A X i , where xik A X ik . We use Li to denote the sets f1, 2, . . . ,Li g for i A I . For i A I and xi A X i , let U il ðxi Þ be finite non-empty sets for l A Li , and suppose that U i ðxi Þ DU i1 ðxi Þ U i2 ðxi Þ U iLi ðxi Þ. The decision ui is ui ¼ ðui1 ; ui2 ; . . . ; uiLi Þ A U i ðxi Þ, where uil A U il ðxi Þ. The MDPs in Sections 3 and 4 have this structure. For this class, the numbers of variables and constraints in (LLP), although linear in I, are exponential in Ki and Li for each i. Thus, solving (LLP) is computationally intractable when either the Ki’s or the Li’s are large. Hence we use the value P function approximation ni ðxi Þ gi þ Kk i¼ 1 cik nik ðxik Þ, where nik : i X k -R are predetermined functions, and cik Z 0 and gi are unknown coefficients. This approximates (LLP) to 0 1 PJ Ki I I X X X X lb j¼1 j j i i @ ðALLPÞ H9min þ gi þ bi ðx Þnik ðxk ÞAcik 1a i i i¼1 i¼1k¼1 x AX
ð1aÞgi þ
Ki X
ðnik ðxik ÞaEðxi ,ui Þ nik ðX0ik ÞÞcik
k¼1
þ
J X
lj g ij ðxi ,ui Þ Z ri ðxi ,ui Þ, ðxi ,ui Þ A Y i , iA I ,
j¼1
l Z0, cik Z0, iA I , kA Ki : P The number of variables in (ALLP) is only J þ i A I ðK i þ 1Þ, but the number of constraints is still exponential in Ki and Li for each i A I . We therefore use constraint generation1 to tackle (ALLP). Constraint generation begins with a linear program that includes only a subset C of the functional constraints and all non-negativity restrictions on the variables in (ALLP). This linear program is solved to find an optimal solution. The goal then is to find a violated functional constraint from (ALLP) and add it to C. The new
1 This is essentially equivalent to using column generation to solve the dual of (ALLP).
linear program is then solved and the procedure continues until either no violated constraints can be found or a stopping criterion is met. We present one way to choose a violated constraint from (ALLP). Let ðALLPÞC denote a relaxation of (ALLP) that includes functional constraints from C. Also, g and c denote the vectors formed by variables gi for iA I , and cik for i A I and k A Ki , respectively. Suppose that after solving ðALLPÞC , we obtain gC ,cC , lC as its optimal solution. The violation in the (ALLP) functional constraint corresponding to iA I and ðxi ,ui Þ A Y i is given by Z i ðxi ,ui Þ9r i ðxi ,ui Þ
J X
lCj g ij ðxi ,ui Þð1aÞgCi
j¼1
Ki X
ðnik ðxik ÞaEðxi ,ui Þ nik ðX0ik ÞÞcCik :
k¼1
We find a constraint from (ALLP) with the largest violation and add it to C. This ‘‘most violated constraint’’ problem is given by2 Z n 9 maxi A I ðmaxðxi ,ui Þ A Y i Z i ðxi ,ui ÞÞ. Proposition 2.1 below provides a bound on H (see Theorem 6 in [2] for a similar result and proof). Proposition 2.1. Suppose that ðALLPÞC has an optimal solution with optimal value HC for some constraint set C. Then HC rH r HC þ ðI=1aÞZ n . Proof. Since ðALLPÞC is a relaxation of (ALLP), we have HC r H. Let gC ,cC , lC be an optimal solution to ðALLPÞC . Then 0 1 PJ Ki I I X X X X lC b j¼1 j j C i i A C @ HC ¼ þ gi þ bi ðx Þnik ðxk Þ cik : 1a i i i¼1 i¼1k¼1 x AX
The dual of (ALLP) is given by X X r i ðxi ,ui Þzi ðxi ,ui Þ ðDALLPÞ max i A I ðxi ,ui Þ A Y i j
X X
g ij ðxi ,ui Þzi ðxi ,ui Þ r
i A I ðxi ,ui Þ A Y i
X
ð1aÞ
zi ðxi ,ui Þ ¼ 1,
b , 1a
jAJ ,
iA I ,
ðxi ,ui Þ A Y i
X
ðnik ðxik ÞaEðxi ,ui Þ nik ðX0ik ÞÞzi ðxi ,ui Þ
ðxi ,ui Þ A Y
i
X
r
bi ðxi Þnik ðxik Þ, i A I , k A Ki ,
xi A X i
zi ðxi ,ui Þ Z 0, ðxi ,ui Þ A Y i ,
iAI:
Since (ALLP) has an optimal solution, so does (DALLP) by strong duality. Let zni ðxi ,ui Þ be an optimal solution for (DALLP) and note P P that H ¼ i A I ðxi ,ui Þ A Y i r i ðxi ,ui Þzni ðxi ,ui Þ. Then using the constraints in (DALLP) in the above expression for HC we get 0 1 X X X C i i n i i @ HC Z g ij ðx ,u Þz ðx ,u ÞAl j
i
jAJ
i A I ðxi ,ui Þ A Y i
0 1 X X n i i A C @ þð1aÞ zi ðx ,u Þ gi iAI
þ
I X
Ki X
i¼1k¼1
0 @
ðxi ,ui Þ A Y i
X
1 i ik ðxk Þ
ðn
Eðxi ,ui Þ ik ðX0ik ÞÞzni ðxi ,ui ÞAcCik
a
n
ðxi ,ui Þ A Y i
2 As long as there exists a violated constraint, an optimal solution to this problem comes from a constraint that is not in C. Therefore, for simplicity of notation, the maximization is formulated over all functional constraints in (ALLP) instead of only those that are not in C, without loss of optimality.
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
¼
X X
ðr i ðxi ,ui ÞZ i ðxi ,ui ÞÞzni ðxi ,ui Þ, by definition of Z i ðxi ,ui Þ
i A I ðxi ,ui Þ A Y i
X X Z i ðxi ,ui Þzni ðxi ,ui Þ, ¼ H
by definition of H
i A I ðxi ,ui Þ A Y i
Z HZ n
X X i A I ðxi ,ui Þ A Y
zni ðxi ,ui Þ,
because zni ðxi ,ui Þ Z 0
i
ZnI from the equality constraints in ðDALLPÞ: ¼ H 1a
2325
(DSKPs), dynamic fleet management problems [13,14,34], discretetime queueing problems [3,8,20,22,24,25,36,37], and the classic multiclass queuing models with Poisson arrivals [17]. Major differences between DSALSPs and this existing literature can be found in [10,12]. We do not repeat those here for brevity. 3.1. A weakly coupled MDP model of DSALSPs
&
This result leads to a natural stopping criterion for constraint generation—terminate when the absolute percentage gap between HC and HC þ ðI=1aÞZ n , that is, 1009ðI=1aÞZ n =HC 9 drops below a tolerance. We apply this approach to allocation and advanced scheduling problems in the next two sections.
3. Application to allocation scheduling Consider the following class of problems, which we call Dynamic Stochastic Allocation Scheduling Problems (DSALSPs). 1. (Heterogeneous job-types) The index set of job-types is denoted as I ¼ f1, 2, . . . ,Ig. 2. (Random arrivals) A random number Di of type i jobs arrive in one time-period; Di takes values from the set f0, 1, . . . ,N i g, where Ni are positive integers. The probability that ni type i jobs arrive in one period is pi ðni Þ. Arrivals across types are independent. 3. (Stochastic service-durations) The service-duration of each type i job is a positive integer-valued random variable Ci independent of all else; Ci takes values in the set f1, 2, . . . ,T i g for some positive integer Ti. The probability that Ci equals ti from this set is yi ðti Þ. We use Yi to denote the cumulative distribution function of Ci . 4. (Multiple resource constraints) J ¼ f1, . . . ,Jg denotes the set of resources and bj (positive integer) is the total quantity of resource j available for consumption in each time-period. In order to perform each job of type i, a non-negative integer amount aij of resource j is required. We assume that for each i, there is at least one j such that aij 4 0. 5. (Immediate decision) Each arriving job must either immediately start service or be rejected/outsourced/served through overtime. Henceforth we use the word ‘‘rejected’’ to refer to a generic decision of this latter type. 6. (Rewards and costs) The following rewards/costs are received/ incurred: (Completion reward) Reward Ri is received at the end of a time-period for completing a type i job. (Penalty cost of rejection) A penalty cost of Gi is incurred at the beginning of a period for each job of type i that is rejected. 7. (Discounted profit objective) The goal is to decide which of the arriving jobs to select for service in each period so as to maximize the total discounted expected profit over an infinite horizon with discount factor 0 o a o 1. Preemption is not allowed; that is, once a job is started, it cannot be discontinued until completion. DSALPs are a variation of problems we introduced in [12], where, among other differences, it was assumed that service-durations were geometrically distributed and hence the state- and actionspaces were much simpler than those below because the geometric distribution is memoryless. Examples of work that is somewhat related to DSALSPs include Network Revenue Management Problems (NRMPs) [33], Dynamic Stochastic Knapsack Problems
We now build a weakly coupled MDP model for DSALSPs. The state of this MDP is defined as x ¼ ðx1 ,x2 , . . . ,xI Þ, where xi ¼ ðxi0 ; xi1 ; . . . ; xiT i 1 Þ. In this vector, xik, for k ¼ 0; 1, . . . ,T i 1, is the number of type i jobs that have received service for k time-periods in the past. In particular, xi0 is the number of type i jobs that arrived in the previous period and need to be either selected for service or rejected. Since the maximum number of type i jobs that can arrive in one period is Ni, we have xi0 r Ni . We therefore define the set X i0 ¼ f0, 1, . . . ,N i g. Also let M i ¼ minj A J bbj =aij c be the largest number of type i jobs that can be in service in any given period. We define sets X i1 ¼ X i2 ¼ ¼ X iT i 1 ¼ f0, 1, . . . ,M i g and let ( ) TX i 1 xik r M i : X i ¼ x A X i0 X i1 X iT i 1 : k¼1
Moreover, let X ¼ X 1 X 2 X I . Then the set of all feasible states is characterized by the resource availability as ( ) ! TX I i 1 X X ¼ xAX : aij xik rbj , j A J : ð1Þ i¼1
k¼1
The decision vector is u ¼ ðu1 , u2 , . . . ,uI Þ, where ui is the number of new type i jobs we select for service from the xi0 jobs that arrived in the previous period. Therefore, ui r xi0 and xi0 ui jobs are rejected. Let U i1 ðxi Þ ¼ f0, 1, . . . ,xi0 g and ( ) TX i 1 U i ðxi Þ ¼ ui A U i1 ðxi Þ : ui þ xik r Mi : k¼1 1
Also let UðxÞ ¼ U ðx Þ U 2 ðx2 Þ U I ðxI Þ. The set U ðxÞ D UðxÞ of all decision vectors feasible in state x is defined by the resource constraints ) ( ! TX I i 1 X 1 2 i i i U ðxÞ ¼ ðu , u , . . . ,u Þ A UðxÞ : aij u þ xk r bj , j A J : 1
i¼1
k¼1
ð2Þ Let qi(t) denote the conditional probability that a type i job that has received service for 0 rt r T i 1 time-periods in the past will be completed at the end of the current time-period (in the case of t¼0, this is relevant only for jobs that we choose to serve in the current time-period). We have, P½Ci ¼ t þ1 \ Ci 4 t y ðt þ 1Þ ¼ i : P½Ci 4 t 1Yi ðtÞ
qi ðtÞ ¼ P½Ci ¼ t þ 19Ci 4t ¼
ð3Þ
Let Zi0 be the random number of type i jobs that are completed out of the ui new jobs started in the current period. Thus Zi0 is Binomial(ui ,qi ð0Þ). Similarly, for k ¼ 1, . . . ,T i 1, let Zik be the number of type i jobs completed in the current period out of the xik ongoing jobs. For k ¼ 1, . . . ,T i 1, Zik is Binomial(xik ,qi ðkÞ). In fact, ZiT i 1 ¼ xiT i 1 with probability one because all jobs that have received service for T i 1 periods will be certainly completed in the next period. We define the vector Zi ¼ ðZi0 ; Zi1 ; . . . ; ZiT i 1 Þ. For 0 r ni r Ni , let " i u Zi0 ui Zi0 Pi ðxi ,ui ; Zi ,ni Þ ¼ pi ðni Þ i ðqi ð0ÞÞ ð1qi ð0ÞÞ
Z0
TY i 2 k¼1
xik i k
Z
!
i
i
i
#
ðqi ðkÞÞZk ð1qi ðkÞÞxk Zk :
2326
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
This is the joint probability that, ni new type i jobs will arrive, and given state-vector xi and decision-vector ui for type i, the vector of completed type i jobs will be equal to Zi . Then with probability QI i i i 0 01 02 0I i ¼ 1 P i ðx ,u ; Z ,ni Þ, the next state equals x ¼ ðx ,x , . . . ,x Þ, where for i A I , x0i ¼ ðni ; ui Zi0 ; xi1 Zi1 ; . . . ; xiT i 2 ZiT i 2 Þ. In this vector, ui Zi0 is the number of new type i jobs that were started in this period but could not be completed. Therefore, they have now been processed for one time-period. Similarly, xik Zik , for k ¼ 1, 2, . . . T i 2, is the number of type i jobs that have now been processed for kþ 1 time-periods but are not complete. The discounted expected profit P accrued in this period equals f ðx,uÞ ¼ Ii ¼ 1 f i ðxi ,ui Þ, where P 1 i f i ðxi ,ui Þ ¼ aRi ðui qi ð0Þ þ Tk i¼ x q ðkÞÞGi ðxi0 ui Þ.Thus Bellman’s 1 k i equations for all x A X are now given by (
I X
VðxÞ ¼ max u A U ðxÞ
) f i ðxi ,ui Þ þ aEðx,uÞ VðX0 Þ ,
ð4Þ
i¼1
where the shorthand X0 denotes the I-dimensional random vector representing the next state and whose probability mass function is QI i i i i ¼ 1 P i ðx ,u ; Z ,ni Þ. The subscript (x,u) on the expectation operator E emphasizes that this probability mass function is characterized by x and u. This MDP is almost weakly coupled—it satisfies all requirements of a weakly coupled MDP except that the set of feasible states X in (1) does not have a Cartesian product structure as state feasibility is defined through linking resource constraints. To endow the state-space with a Cartesian structure, we extend it to X by including (artificial) states in X\X . In each xi A X i , we allow P 1 i the (artificial) action mi ðxi Þ ¼ Tk i¼ x . This particular choice for 1 k an artificial action may seem ad hoc at first, but it allows us to conveniently extend the set U ðxÞ for x A X to the set AðxÞ for x A X in (5) below. We first define the set Ai ðxi Þ ¼ U i ðxi Þ [ fmi ðxi Þg, and for states x A X, we let AðxÞ ¼ A1 ðx1 Þ A2 ðx2 Þ AI ðxI Þ. We then define the set of feasible decisions in state x A X as ( AðxÞ ¼
ðu1 , u2 , . . . ,uI Þ A AðxÞ :
I X
aij ui þ
i¼1
TX i 1
! xik rbj , j A J
) :
k¼1
ð5Þ PT i 1
Now note that mi ðxi Þ þ k ¼ 1 xik ¼ 0. This implies that (5) does not put any unnecessary restrictions on components of u that are not artificial. After choosing a decision u A AðxÞ whose ith component is the artificial decision mi ðxi Þ in a state with ith component xi A X i , we (artificially) set the ith component of the next state to x0i ¼ ðni ; 0; 0; . . . ; 0Þ with probability pi ðni Þ. This particular choice of state is based purely on its simplicity. Finally, we set f i ðxi , mi ðxi ÞÞ, for all xi A X i and i A I , to L, where L is a sufficiently large positive number. Let FðxÞ be the maximum infinite-horizon discounted expected reward obtained if the current state in the transformed MDP is x A X. Then Bellman’s equations for the transformed MDP are (
FðxÞ ¼ max u A AðxÞ
I X
By applying the Lagrangian relaxation method to the weakly coupled MDP above, we obtain the relaxed Bellman’s equations 8 !! J TX
ð6Þ
i¼1
where the ith component of the random vector X0 depends on whether ui is artificial or not. The expectation is with respect to the probability mass function of this random vector, and it is characterized by (x,u). It is easy to show that the addition of artificial states and actions does not alter the optimal values of, and optimal decisions in, the original states (we provided a proof for a different MDP model in [10]). In particular, VðxÞ ¼ FðxÞ for every x A X . Since this new MDP is weakly coupled, we apply Lagrangian relaxation.
x A X:
The Lagrangian value function is given by PJ I lj bj X l F ðxÞ ¼ j ¼ 1 þ Fli ðxi Þ, 1a i¼1 where, for xi A X i , 8 9 ! J TX < = i 1 X l i l 0i i i i i Fi ðx Þ ¼ max f i ðx ,u Þ lj aij u þ xk þ aEðxi ,ui Þ Fi ðX Þ : ; ui A Ai ðxi Þ: j¼1
k¼1
In the above equation, E denotes the expectation with respect to the probability mass function of the ith component X0i of the random vector X0 . The subscript ðxi ,ui Þ emphasizes that this probability mass function is characterized by the state xi and decision ui. Using the notation Y i ¼ fðxi ,ui Þ : xi A X i ,ui A Ai ðxi Þg for iA I , the Lagrangian linear program is now given by PJ I X X lb n j¼1 j j þ bi ðxi Þni ðxi Þ ðLLP1Þ Hl ðbÞ ¼ min 1a i ¼ 1 xi A X i ! J TX i 1 X ni ðxi ÞaEðxi ,ui Þ ni ðX0i Þ þ lj aij ui þ xik j¼1 i
i
i
i
Z f i ðx ,u Þ, ðx ,u Þ A Y i ,
k¼1
iAI,
l Z0: Notice that even though the numbers of variables and constraints in (LLP1) are both linear in I, they are exponential in Ti for each i. Thus (LLP1) is intractable when Ti’s are large. We therefore apply the value function approximation and constraint generation method developed in Section 2. But first, we show that dropping the functional constraints that correspond to artificial actions from (LLP1) does not alter its optimal solution. For each i A I , let Y i Y i be the subset given by fðxi ,ui Þ A Y i : ui a mi ðxi Þg. That is, Y i does not include artificial actions and hence is equivalently defined as Y i ¼ fðxi ,ui Þ : xi A X i , ui A U i ðxi Þg. Now consider a relaxation of (LLP1), which only includes functional constraints over sets Y i for i A I . It is given by PJ I X X j ¼ 1 lj bj þ bi ðxi Þni ðxi Þ ðLLP1RÞ min 1a i ¼ 1 xi A X i ! J TX i 1 X ni ðxi ÞaEðxi ,ui Þ ni ðX0i Þ þ lj aij ui þ xik j¼1 i
) f i ðxi ,ui Þ þ aEðx,uÞ FðX0 Þ , x A X,
3.2. Lagrangian relaxation of the allocation scheduling MDP
i
i
i
Z f i ðx ,u Þ, ðx ,u Þ A Y i , l Z0:
k¼1
iAI,
Let nni ðxi Þ for xi A X i and iA I be an optimal solution of (LLP1R). Then this solution is also optimal to (LLP1). To establish this, we only need to show that this solution satisfies the functional constraints in S (LLP1) that correspond to i A I ðY i \Y i Þ. The left hand side of such a functional constraint corresponding to xi A X i and artificial action P mi ðxi Þ simply equals nni ðxi ÞaEðxi , mi ðxi ÞÞ nni ðX0i Þ because mi ðxi Þ þ Tk i¼11 xik ¼ 0 by definition of mi ðxi Þ. Let nn ¼ mini A I fminxi A X i ðnni ðxi Þ aEðxi , mi ðxi ÞÞ nni ðX0i ÞÞg be the smallest among all such left hand side values. Then our claim holds because f i ðxi , mi ðxi ÞÞ is sufficiently small, and in particular we assume it to be smaller than nn , for all i A I and xi A X i . Thus it suffices to solve (LLP1R).
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
3.3. Affine approximation and constraint generation We employ affine functions in our componentwise value function approximations. That is, for each i A I and xi A X i , we P 1 set ni ðxi Þ gi þ Tk i¼ c xi , where gi and cik Z 0 are unknown 0 ik k coefficients. Here, cik can be interpreted as the marginal value of having an incomplete type i job that has received service for k time-periods in the past, in the system. This simplifies (LLP1R) to 0 1 PJ I I TX i 1 X X X lb j¼1 j j @ ðALLP1Þ H ¼ min þ gi þ bi ðxi Þxik Acik 1a i i i¼1 i¼1k¼0 x AX
ð1aÞgi þ
TX i 1
ðxik aEðxi ,ui Þ X0ik Þcik þ
lj aij ui þ
j¼1
k¼0
Z f i ðxi ,ui Þ,
J X
ðxi ,ui Þ A Y i ,
TX i 1
!
k¼1
iA I ,
In our numerical results below, the constraint generation procedure for (ALLP1) starts with an initial set of constraints C0 that includes constraints corresponding to xik ¼ 0 for k ¼ 0, 1, . . . ,T i 1 and ui ¼0 for all i A I . It is easy to see that gi ¼ 0 for all i A I , cik ¼ 0 for k ¼ 0, 1, . . . ,T i 1 and all iA I , and lj ¼ 0 for all j A J is an optimal solution to this initial problem. Now suppose that, during the constraint generation procedure, after solving ðALLP1ÞC with C some set of constraints C, we obtain gC ,cC , l as its optimal solution. The violation in the functional constraint corresponding to any ðxi ,ui Þ A Y i for any iA I is given by Z i ðxi ,ui Þ9f i ðxi ,ui Þð1aÞgCi
TX i 1
lCj aij ui þ
j¼1
TX i 1
ðxik a
To simplify this, recall that X0i0 ¼ ni with probability pi ðni Þ for 0 r ni rN i . Moreover, ¼ u i0 , where i0 is Binomial(ui ,qi ð0Þ). Similarly, for k ¼ 1, 2, . . . ,T i 2, X0ik þ 1 ¼ xik ik , where ik is Binomial P i1 C P 2 cik Eðxi ,ui Þ X0ik ¼cCi0 EDi þcCi1 ui ð1qi ð0ÞÞþ Tk i¼ ðxik ,qi ðkÞÞ. Therefore, Tk¼0 1 P 1 i cCikþ1 xik ð1qi ðkÞÞ. Also recall that f i ðxi ,ui Þ ¼ Ri ðui qi ð0Þþ Tk i¼ x q ðkÞÞ 1 k i Gi ðxi0 ui Þ. Using this, after algebraic simplification, Z i ðxi ,ui Þ can be P 1 C i expressed as Z i ðxi ,ui Þ ¼ Tk i¼ d x þ Ci ui þ Ci . Thus the maximum 0 ik k
Z
Z Z
Z
a
E
e
constraint violation problem is given by Z n 9maxi A I Z ni , where Z ni is the optimal value of the linear integer program Z ni 9max
TX i 1
dCik xik þ ECi ui þ eCi
k¼0
xi0 r Ni , ui rxi0 , ui þ
TX i 1
xik r M i
k¼1
ui Z0 and integer, xik Z 0 and integer,
j¼1
lnj bj
1a
þ
lnj bj
1a
I X
gni þ
i¼1
TX i 1
þ
I X
n
Fli ðx0i Þ
i¼1
!
cnik x0i ,
k¼0
in the right hand side of Bellman’s Eq. (4). The decision retrieval problem then simplifies to ( ) P I I I TX i 1 X X X a Jj ¼ 1 lnj bj 0i n i i n þa gi þ max f i ðx ,u Þ þ a cik Eðxi ,ui Þ Xk : 1a u A U ðxÞ i ¼ 1 i¼1 i¼1k¼0
i¼1
þa
k¼1
"
I X
n
n
i
ci0 EDi þ ci1 u ð1qi ð0ÞÞ þ
i¼1
TX i 2
#) cnik þ 1 xik ð1qi ðkÞÞ
:
k¼1
Finally, ignoring the terms in the objective function that do not depend on u and using the definition of U ðxÞ in (2), we further rewrite this problem as max
I X
i¼1 ui r xi0 , I X
½aRi qi ð0Þ þGi þ acni1 ð1qi ð0ÞÞui iAI,
aij ui r bj
I X i¼1
aij
TX i 1
xik ,
jAJ ,
k¼1
iA I :
This is a linear integer program that needs to be solved in order to recover decisions on the fly.
xik :
i
j¼1
ui Z 0 and integer,
!
k¼1
X0i1
PJ
i¼1
Eðxi ,ui Þ X0ik ÞcCik
k¼0 J X
n
Vðx0 Þ ¼ Fðx0 Þ Fl ðx0 Þ ¼
u A U ðxÞ
k ¼ 0, 1, . . . ,T i 1:
iA I ,
respect to the approximation PJ
We ignore the terms outside the maximization in the discussion below. Then substituting appropriate expressions for f i ðxi ,ui Þ and Eðxi ,ui Þ , we get ( ! ! TX I i 1 X aRi ui qi ð0Þ þ xik qi ðkÞ Gi ðxi0 ui Þ max
xik
l Z 0, cik Z0,
2327
for k ¼ 0, 1, . . . ,T i 1:
ð7Þ
3.4. Retrieving decisions from the approximate value function We denote the values of variables obtained after terminating n the constraint generation procedure as lj for j A J , gni for i A I , and cnik for k ¼ 0, 1, . . . ,T i 1 and i A I . Now suppose for some state x A X , we wish to compute a decision vector that is myopic with
3.5. Numerical experiments All numerical experiments in this paper were performed on a Dell Optiplex 960 running Windows XP on a 3.16 GHz Intel Core Duo CPU with 3.21 GB RAM. Computer code was written in MATLAB, which called IBM CPLEX for solving linear and integer programs. We used a test bed of randomly generated problems to compare the performance of our Lagrangian policy with a myopic policy. Let DU½A1 , A2 denote a random integer chosen uniformly from the set of integers fA1 , A1 þ 1, . . . ,A2 g. We set I¼5, J A f1, 2g, and the maximum number of type i arrivals Ni ¼ DU½1, 5 for each i. Once Ni was sampled, we generated Ni þ 1 uniform ð0, 1Þ random variables denoted r 0 , r 1 , . . . ,r Ni . Then the arrival probP i ability pi(k) was set to r k = N r . We set aij ¼ DU½1, 100 for all k¼0 k PI i, j, and bj ¼ I r i ¼ 1 aij for all j, where r is called tightness ratio [6]. We used r A f0:25, 0:5g. The Tis were set to 1 þDU½1, 10 for each i. Once a Ti was sampled, the service-duration probabilities yi ðti Þ, for ti A f1, 2, . . . ,T i g, were obtained by normalizing Ti uniform ð0, 1Þ random variables as explained above for pi ðÞ. We also used Ri ¼ DU½1, 1000 and Gi ¼ DU½1, 100. Following [2], we used a ¼ 0:85. We solved 50 problem instances for each combination of J, r. Recall from Section 3.3 that our initial set of constraints includes constraints corresponding to xik ¼ 0 for k ¼ 0, 1, . . . ,T i 1 and ui ¼0 for all i A I . We terminated constraint generation when the absolute percentage gap ðI=1aÞ9Z n =HC 9 dropped below 0.5%. For each problem instance, the infinite-horizon discounted expected profit accrued by any policy was estimated by averaging the total discounted profit accumulated over 200 independent simulations of 50 periods. Each simulation was started in the
2328
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
Table 1 Results for four allocation scheduling problem sets. For each set, the estimates are averages over 50 instances. Complete results for all instances are provided in Table 3 in Appendix B. # J
r
CPU (s) Lagrangian Myopic % Improvement Significant Count
1 2 3 4
1 1 2 2
0.25 0.5 0.25 0.5
3 4 3 4
4271 8341 3474 7324
4187 8201 3453 7230
2.00 1.70 0.61 1.30
Yes Yes No Yes
29 35 26 35
–
–
3.5
5852
5768
1.46
Yes
125
‘‘empty’’ state xik ¼ 0 for k ¼ 0, 1, . . . ,T i . As suggested in [2], running a simulation for 50 periods sufficiently well-approximates the infinite-horizon profits because 0:8550 104 and hence the tail profits beyond the 50th period are relatively insignificant. These profit estimates, averaged over the 50 independent problem instances, are reported in Table 1 for each problem set. The table also lists the average CPU seconds needed for constraint generation and percentage improvement in profit obtained by the Lagrangian policy over the myopic policy. The table shows that the Lagrangian policy generated higher profit than myopic in all four problem sets. Paired t-tests indicated that this improvement was statistically significant at the 0.05 level in three out of the four problem sets as stated in the column ‘‘significant?’’. For each problem set, the number of instances, out of the 50 total, for which the Lagrangian policy outperformed myopic is listed in the column titled ‘‘count’’. The last row summarizes results over all 200 instances. The average CPU time for constraint generation was 3.5 s and the average improvement equaled 100 ð58525768Þ=5768 ¼ 1:46%. A paired t-test for the difference between profit estimates over the 200 instances was significant at the 0.05 level with p-value 105 and returned ½46:97, 122:08 as the 95% confidence interval for the true mean of the difference. The Lagrangian policy outperformed myopic in 125 of 200 instances.
4. Application to advanced scheduling We now consider the following class of problems that we term Dynamic Stochastic Advanced Scheduling Problems (DSADSPs). 1. (Heterogeneous job types) The index set of job types is denoted I ¼ f1, 2, . . . ,Ig. 2. (Random arrivals) A random number Di of type i jobs arrive in one time-period; Di takes values from the set f0, 1, . . . ,N i g, where Ni are positive integers. The probability that ni type i jobs arrive in one period is pi ðni Þ. Arrivals across types are independent. 3. (Multiple resource constraints) J ¼ f1, . . . ,Jg denotes the set of resources and bj (positive integer) is the total quantity of resource j A J available for consumption in each time-period. In order to perform each job of type i, a non-negative integer amount aij of resource j is required. We assume that for each job of type i, there is at least one resource j such that aij 40. 4. (Immediate decision) Each arriving job must be scheduled to any one of the next T periods—the booking horizon. The planner can also reject/outsource/serve jobs using overtime. Henceforth we use the word ‘‘reject’’ to refer to a generic decision of this latter type. 5. (Costs) The following costs are incurred. (Delay cost) Type i jobs are associated with a deadline 0 rDi rT. The cost of scheduling a type i job on day 1r t rT in the booking horizon is given by the non-negative
function C i ðt; Di Þ. We assume that C i ðt; Di Þ is non-decreasing in t. (Penalty cost of rejection) The planner incurs a penalty cost of Gi per type i job that is rejected. 6. (Total discounted cost objective) The goal is to schedule and/or reject arriving jobs so as to minimize the total discounted expected cost over an infinite horizon with discount factor 0 o a o 1. This problem is similar to [27], who studied patient scheduling problems in a diagnostic setting. The key differences are that they only had one capacity constraint and all patients ‘‘consumed’’ equal, and in fact, unit, capacity. In this sense, our problem statement is a generalization of [27]. Consequently, the state and action spaces in their MDP model are simpler in structure and smaller in size than ours below. We note that similar to [27], the notion of service-duration is not explicit in our problem statement. It can, however, be incorporated implicitly through resource constraints. For example, in advanced scheduling of elective surgeries, hospitals reserve blocks of fixed durations, say 1 h or 2 h, depending on the type of surgery in an operating room. So in an 8-h day shift, eight 1-h surgeries or four 2-h surgeries or four 1h surgeries and two 2-h surgeries, etc., can be performed. This kind of service-durations can be incorporated into our advanced scheduling framework through a resource constraint where time itself is the resource (see [10] for an application of advanced scheduling to elective surgeries). 4.1. A weakly coupled MDP model of DSADSPs The state of our MDP model is defined as x ¼ ðx1 ,x2 , . . . ,xI Þ, where each xi is a Tþ1 dimensional vector written as xi ¼ ðxi0 ; xi1 ; . . . ; xiT Þ. In this vector, xi0 is the number of type i jobs that arrived in the previous period, and xik, for k¼1,y,T, is the number of type i jobs that are currently scheduled on the kth day in the booking horizon. As in the allocation scheduling model, since the maximum number of type i jobs that can arrive in one period is Ni, we have xi0 r Ni . We therefore define the set X i0 ¼ f0; 1, . . . ,N i g. Also let M i ¼ minj A J bbj =aij c be the largest number of type i jobs that could be scheduled in any given period. We define sets X i1 ¼ X i2 ¼ ¼ X iT ¼ f0; 1, . . . ,M i g and let X i ¼ X i0 X i1 X iT . Finally, let X ¼ X 1 X 2 X I . The set X of all feasible states is then given by ( ) I X X ¼ xAX : aij xit r bj , j A J , t ¼ 1, 2, . . . ,T : ð8Þ i¼1
That is, state x is feasible if and only if the number of jobs of different types that are currently scheduled for period t in the booking horizon must respect the resource constraint for each resource j. The decision vector is u ¼ ðu1 , u2 , . . . ,uI Þ, where each ui is a T-dimensional vector written as ui ¼ ðui1 ; . . . ; uiT Þ. Here, uit, for t¼1,y,T, is the number of type i jobs (among the ones that arrived in the previous period) that we choose to schedule on the tth day in the booking horizon. For iA I , xi A X i and t ¼ 1, 2, . . . ,T, let U it ðxi Þ ¼ f0, 1, . . . ,xi0 g and ( T X i i uit rxi0 , U ðx Þ ¼ ui A U i1 ðxi Þ U iT ðxi Þ : t¼1
)
and
xit þuit rM i ,
for t ¼ 1, 2, . . . ,T :
P Thus xi0 Tt ¼ 1 uit is the number of type i jobs that are rejected. Also, for x A X, let UðxÞ ¼ U 1 ðx1 Þ U 2 ðx2 Þ U I ðxI Þ. The set U ðxÞ D UðxÞ of all decision vectors feasible in state x A X is ;defined as U ðxÞ ¼ P fðu1 , u2 , . . . , uI Þ A UðxÞ : Ii ¼ 1 aij ðuit þ xit Þ rbj , j A J , t ¼ 1, 2, . . . ,Tg.
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
On selecting decision u A U ðxÞ in state x, the ith component of the next state equals x0i ¼ ðni ; xi2 þ ui2 ; . . . ; xiT þ uiT ; 0Þ with probability pi ðni Þ, for 0 r ni rN i . Note here that xit, for t ¼ 2, 3, . . . ,T, is the number of type i jobs previously scheduled on the t1 st day in the (new) booking horizon. Adding uit, for t ¼ 2, 3, . . . ,T, to this yields the total number of type i jobs scheduled on these days. The last component in x0i is zero because this is the Tþ1st day in the (old) booking horizon and hence no jobs can be scheduled there. The next state then is Q x0 ¼ ðx01 , x02 , . . . ,x0I Þ with probability Ii ¼ 1 pi ðni Þ. The immediate cost P incurred in this period is given by f ðx,uÞ ¼ Ii ¼ 1 f i ðxi ,ui Þ, where ! T T X X f i ðxi ,ui Þ ¼ Gi xi0 uit þ uit C i ðt; Di Þ, 8xi A X i , ui A U i ðxi Þ: t¼1
4.2. Lagrangian relaxation of the advanced scheduling MDP Let lt ¼ ðlt1 , lt2 , . . . , ltJ Þ Z 0 be the Lagrange multipliers associated with the J resources on days 1 r t rT. Also let l represent the vector ðl1 ; l2 ; . . . ; lT Þ. For each x A X, the Lagrangian relaxation of Bellman’s equations (12) is given by 8 ! J
Let V(x) be the minimum infinite-horizon discounted expected cost incurred starting in state x A X . Then, Bellman’s equations for our MDP model are given by VðxÞ ¼ min u A U ðxÞ
f ðx,uÞ þ a
N1 N2 X X
n1 ¼ 0 n2 ¼ 0
NI X nI ¼ 0
I Y
!
) 01
0I
pi ðni Þ Vðx , . . . ,x Þ ,
xA X :
i¼1
ð10Þ Similar to the allocation scheduling MDP in Section 3.1, the above MDP is almost weakly coupled—it satisfies all requirements of a weakly coupled MDP except that the set of feasible states X in (8) does not have a Cartesian product structure as state feasibility is defined through linking resource constraints. We first convert it into an equivalent weakly coupled MDP as follows. We extend the feasible state-space X to X by including (artificial) states X\X . In each xi A X i , we allow the (artificial) action mi ðxi Þ whose components mi ðxi Þt , for t¼1,y,T, are given by ðmi ðxi Þ1 ; mi ðxi Þ2 ; . . . ; mi ðxi ÞT Þ ¼ ðxi1 ; xi2 ; . . . ; xiT Þ. This particular choice of artificial actions allows us to conveniently extend the set U ðxÞ for x A X to the set AðxÞ for x A X in (11) below. We first define the set Ai ðxi Þ ¼ U i ðxi Þ [ mi ðxi Þ, and for states x A X, we let AðxÞ ¼ A1 ðx1 Þ A2 ðx2 Þ AI ðxI Þ. We define the set of decisions feasible in state x A X as ( AðxÞ ¼
ðu1 , u2 , . . . ,uI Þ A AðxÞ :
I X
) aij ðuit þ xit Þ rbj , j A J , t ¼ 1, 2, . . . ,T :
i¼1
ð11Þ Now note that owing to the way artificial actions are defined, mit ðxi Þ þ xit ¼ 0 for t ¼ 1, 2, . . . ,T. This implies that (11) does not put any unnecessary restrictions on components of u that are not artificial. After choosing a decision u A AðxÞ whose ith component is the artificial decision mi ðxi Þ in a state with ith component xi A X i , we (artificially) set the ith component of the next state to QI x0i ¼ ðni ; 0; . . . ; 0Þ with probability i ¼ 1 pi ðni Þ. This particular choice of state was based on its simplicity. Finally, we set the ith component of the immediate cost incurred on choosing the artificial action component mi ðxi Þ in state component xi to a sufficiently large positive number B. Let FðxÞ be the minimum infinite-horizon discounted expected cost incurred when starting the transformed MDP in state x A X. Bellman’s equations for the transformed MDP are given by ( ! N1 N2 NI I X X X Y FðxÞ ¼ min f ðx,uÞ þ a pi ðni Þ u A AðxÞ
)
Fðx01 , . . . ,x0I Þ ,
n1 ¼ 0 n2 ¼ 0
x A X:
ð13Þ
The Lagrangian value function Fl ðxÞ can be expressed as
t¼1
ð9Þ
(
2329
nI ¼ 0
i¼1
Fl ðxÞ ¼
J T X I X 1 X ltj bj þ Fli ðxi Þ, 1a t ¼ 1 j ¼ 1 i¼1
ð14Þ
where 8 J T X < X Fi ðx Þ ¼ min f i ðxi ,ui Þ þ ltj aij ðuit þ xit Þ ui A Ai ðxi Þ: t ¼1j¼1 ) l
i
þ aEðxi ,ui Þ Fli ðX0i Þ ,
8xi A X i :
ð15Þ
Let Y i ¼ fðxi ,ui Þ : xi A X i ,ui A Ai ðxi Þg for i A I . The Lagrangian linear program is given by P P I X Tt ¼ 1 Jj ¼ 1 ltj bj X n ðLLP2Þ Hl ðbÞ ¼ max þ bi ðxi Þni ðxi Þ 1a i i¼1 i x AX
i
0i
ni ðx ÞaEðxi ,ui Þ ni ðX Þ
J T X X
ltj aij ðuit þ xit Þ
t ¼1j¼1 i
i
i
i
r f i ðx ,u Þ, ðx ,u Þ A Y i ,
iA I ,
l Z0: P This linear program has TJ þ Ii ¼ 1 ðNi þ1ÞðMi þ1ÞT variables and at PI most TJ þ i ¼ 1 ð1 þ ððNi þ1ÞðM i þ1ÞT Ni Tþ T ÞÞ constraints. Thus the numbers of variables and constraints are linear in I but unfortunately still exponential in T. Thus (LLP2) is intractable for realistic values of booking horizons T. In Section 4.3, we therefore apply the value function approximation and constraint generation method developed in Section 2. 4.3. Affine approximation and constraint generation We simplify (LLP2) using the affine approximation ni ðxi Þ gi þ i t ¼ 0 cit xt , where gi and cit are unknown coefficients. The coefficients cit can be interpreted as the marginal cost of having a type i job scheduled in period t in the booking horizon. This simplifies (LLP2) to the approximate Lagrangian linear program P P I Tt ¼ 1 Jj ¼ 1 ltj bj X ðALLP2Þ H ¼ max þ gi 1a 0 1 i¼1 I X T X X @ þ bi ðxi Þxit Acit PT
i¼1t ¼0
ð1aÞgi þ
T X
xi A X i
ðxit aEðxi ,ui Þ X0it Þcit
t¼0
ð12Þ
8x A X,
r f i ðxi ,ui Þ, ðxi ,ui Þ A Y i ,
J T X X
ltj aij ðuit þ xit Þ
t ¼1j¼1
iA I ,
l Z0, cit Z0, iA I , t ¼ 0, 1, . . . ,T: It is easy to show that the addition of artificial states and actions does not alter the optimal values of, and optimal decisions in, the original states (see [10] for a proof). In particular, VðxÞ ¼ FðxÞ for every x A X . Since this new MDP is weakly coupled, we apply Lagrangian relaxation.
Unlike (ALLP1), it is not easy to identify a set of initial constraints C0 in (ALLP2) that guarantees existence of an optimal solution to ðALLP2ÞC0 . Empirically, we found after running some initial experiments that starting with certain peculiar constraints that correspond to state-action pairs which include artificial
2330
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
Table 2 Results for 10 advanced scheduling problem sets. For each problem set, the cost estimates reported are averages over 50 randomly generated problem instances. Complete results for these 500 instances are provided in Tables 4 and 5 in Appendix B. #
I
J
r
CPU (s)
Lagrangian
Myopic
% Improvement
Significant
1 2 3 4 5 6 7 8 9 10
5 5 5 5 10 10 10 10 15 15
1 1 2 2 1 1 2 2 1 2
0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.5 0.25 0.25
10 9 9 9 30 32 29 31 60 66
15748 9895 16120 12945 9737 1008 14978 1976 4758 4625
18400 11767 17126 13920 12812 903 18853 2118 6661 8558
14.41 15.91 5.87 7.00 24.00 11.63 20.55 6.70 28.57 45.96
Yes Yes Yes No Yes No Yes No Yes Yes
35 32 30 25 34 11 41 13 26 29
–
–
–
28.5
9179
11112
17.39
Yes
276
actions, helps in this regard (see our numerical experiments in Section 4.5). Therefore, unlike in (ALLP1), we did not drop artificial actions from (ALLP2). Suppose that after solving ðALLP2ÞC for some set of constraints C C, we obtain gC ,cC , l as its optimal solution. Similar to (ALLP1), we define Y i 9fðxi ,ui Þ : xi A X i , ui A U i ðxi Þg as the subset of Y i that excludes artificial actions. The violation in the functional constraint corresponding to any ðxi ,ui Þ A Y i for any i A I is given by T X
Z 1i ðxi ,ui Þ9ð1aÞgCi þ
ðxit aEðxi ,ui Þ X0it ÞcCit
t¼0
J T X X
lCtj aij ðuit þ xit Þf i ðxi ,ui Þ:
t ¼1j¼1
To simplify this, recall that X0i0 ¼ ni with probability pi ðni Þ for 0 r ni rN i , and X0it ¼ xit þ 1 þuit þ 1 for t ¼ 1, 2, . . . ,T1 with probability one. Finally, X0iT ¼ 0 with probability one. Consequently, T X
Eðxi ,ui Þ X0it cCit ¼ cCi0 EDi þ
t¼0
T 1 X
4.4. Retrieving decisions from the approximate value function
cCit ðxit þ 1 þ uit þ 1 Þ:
n
PT
t¼1
uit Þ þ
PT
t¼1
Z 1i ðxi ,ui Þ ¼ ð1 Þ Ci þ ðxi0 EDi ÞcCi0 T 1 X þ ðxit ðxit þ 1 þ uit þ 1 ÞÞcCit þ xiT cCiT t¼1
ag
uit C i ðt,Di Þ, we get
a
a
J T X X
lCtj aij ðuit þ xit ÞGi xi0
t ¼1j¼1
T X
T X
! uit
t¼1
uit C i ðt,Di Þ:
t¼1
After algebraic simplification, the above is seen to be of the form P P C Z 1i ðxi ,ui Þ ¼ Tt ¼ 0 dit xit þ Tt ¼ 1 ECit uit þ eCi . 1n Let Z i be the maximum violation among all constraints in the set Y i . This is given by the optimal value of the linear integer program Z 1i n ¼ max
T X
dCit xit þ
t¼0
xi0 r Ni ,
P Z 2i ðxi , mi ðxi ÞÞ9ð1aÞgCi aEDi cCi0 þ Tt ¼ 1 cCit xit B, which follows by algebra identical to that for Z 1i ðxi ,ui Þ after noting that mi ðxi Þt þ xit ¼ 0 for t ¼ 1, 2, . . . ,T and that f i ðxi , mi ðxi ÞÞ ¼ B. Let Z 2i n be the maximum violation among all constraints in the set Y i \Y i . It is easy to observe from the definition of Z 2i ðxi , mi ðxi ÞÞ that this maximum violation occurs when xi0 ¼ N i and xit ¼ M i for t ¼ 1, P 2, . . . ,T. Thus Z 2i n ¼ ð1aÞgCi aEDi cCi0 þ N i cCi0 þ M i Tt ¼ 1 cCit B. Now we set Z ni ¼ maxðZ 1i n ,Z 2i n Þ and the maximum constraint violation problem reduces to Z n 9maxi A I Z ni . Also note that once the state xi0 ¼ Ni , xit ¼ M i for t ¼ 1, 2, . . . ,T for each iA I and the corresponding artificial actions have been included in the set of constraints C and an optimal solution to (ALLP2)C has been found, all artificial actions can be removed from consideration in subsequent iterations of constraint generation. This is because all constraints corresponding to state-action pairs that include artificial actions will be satisfied in these iterations. Consequently, the constraint violation problem in these iterations simplifies to maxi A I Z 1i n .
t¼1
Recalling that f i ðxi ,ui Þ ¼ Gi ðxi0
T X
uit r xi0 ,
T X
Count
ECit uit þ eCi
Let ltj for t ¼ 1, 2, . . . ,T and j A J , gni for iA I , and cnit for t ¼ 0, 1, . . . ,T and i A I , denote the values of variables in (ALLP2) after terminating constraint generation. Now suppose that for some state x A X , we wish to compute a decision vector that is myopic with respect to the approximation PT PJ ln bj XI n n t¼1 j ¼ 1 tj Vðx0 Þ ¼ Fðx0 Þ Fl ðx0 Þ ¼ þ Fli ðx0i Þ i¼1 1a ! PT PJ n j I T X X t¼1 j ¼ 1 ltj b i n i þ gn þ cit x0t : 1a t¼0 i¼1 After substituting this in the right hand side of Bellman’s equation (12) and then following steps similar to Section 3.4, the decision retrieval problem simplifies to the linear integer program " # I T X X ðC i ð1; Di ÞGi Þui1 þ ðC i ðt; Di ÞGi þ acnit1 Þuit min uit rxi0 ,
iAI,
t¼1
t¼1
uit þxit rM i ,
t¼2
i¼1 T X
t ¼ 1, 2, . . . ,T,
for t ¼ 1, 2, . . . ,T,
xit Z 0, and integer,
for t ¼ 0, 1, . . . ,T:
j aij uit rb
i¼1 I X
aij xit ,
jAJ ,
t ¼ 1, 2, . . . ,T,
i¼1
t¼1
uit Z 0, and integer,
I X
uit Z 0 and integer, ð16Þ
Now consider any state-action pair ðxi , mi ðxi ÞÞ A Y i \Y i for any i A I . The violation in the corresponding constraint is given by
iAI,
t ¼ 1; 2, . . . ,T:
4.5. Numerical experiments We again used a test bed of randomly generated problems to compare policies. We used I A f5, 10, 15g and J A f1, 2g. Ni was set
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
to DU½1, 20 for each i. Arrival probabilities pi ðÞ, resource consumptions aij, and resource availabilities were constructed exactly as in allocation scheduling. We used tightness ratio r A f0:25, 0:5g for I ¼5, 10 and r ¼ 0:25 for I¼15.3 The booking horizon was set to T¼ 10 and Di was set to DU½1,T. Delay costs were calculated using the expression C i ðt; Di Þ ¼ F i maxðtDi ,0Þ, for t ¼ 1, . . . ,T, from [27], with F i ¼ DU½1, 100. We set Gi ¼ r i C i ðT; Di Þ, where ri was a uniform ð0, 2Þ random variable. As in allocation scheduling, we used a ¼ 0:85. We solved 50 randomly generated problem instances for each combination of I, J, r. The initial set of constraints in our constraint generation algorithm was constructed using state vectors xi ¼ ðNi ; s; s; . . . ; sÞ with s A f0, 1, . . . ,M i g for all i A I , and the real and artificial action vectors ui ¼ ð0; 0; . . . ; 0Þ and ui ¼ ðs; s; . . . ; sÞ for all iA I , respectively. We terminated constraint generation when ðI=1aÞ9Z n =HC 9 dropped below 0.5%. The infinite-horizon discounted expected cost incurred by any policy was estimated by averaging the total discounted cost incurred over 200 independent simulations of 50 periods. Each simulation was started in the ‘‘empty’’ state xit ¼ 0 for t ¼ 0, 1, . . . ,T. These cost estimates, averaged over the 50 independent problem instances for each problem set are reported in Table 2. The table also lists the average CPU seconds needed for constraint generation and the percentage improvement in cost obtained by the Lagrangian policy. The table shows that the Lagrangian policy incurs lower cost than the myopic policy in nine of the ten problem sets. In fact, the one problem set in which the Lagrangian policy was worse than the myopic policy, the difference was not statistically significant at the 0.05 level according to a paired t-test. On the other hand, the improvement obtained by the Lagrangian policy over the myopic policy was significant in seven problem sets as per paired t-tests at the 0.05 level. The three problem sets in which the performance difference was not significant are such that, owing to our problem generation method, the resource availabilities are relatively high and hence a simple decision rule may be adequate. That is, from a computational optimization perspective, these three problem sets perhaps do not have sufficient tradeoffs. Importantly, the Lagrangian policy performed statistically at least as well as the myopic policy in every problem set that we considered. For each problem set, the table also lists the number of instances, out of 50, for which the Lagrangian policy outperformed the myopic. The last row of the table summarizes results over all 500 problem instances. The average CPU time for constraint generation was 28.5 s and the improvement achieved by the Lagrangian policy equaled 100 ð111129179Þ=11112 ¼ 17:39%. A paired t-test for the difference between myopic and Lagrangian cost estimates over the 500 instances was statistically significant at the 0.05 level with p-value of the order 1019 and returned ½1518:4, 2347:1 as the 95% confidence interval for the true mean of the difference. The Lagrangian policy outperformed the myopic policy in 276 of the 500 instances.
5. Conclusions and future work We proposed a hybrid Lagrangian relaxation and constraint generation approach for approximate solution of large-scale weakly coupled MDPs. This method was applied to two classes of dynamic stochastic scheduling problems. The performance of resulting Lagrangian decisions was compared with that of myopic decisions on a test bed of 14 problem sets of 50 instances each using simulations. Lagrangian decisions performed statistically at least as well as myopic decisions in each of the 14 problem 3 We empirically observed that larger values of r than those considered here generated excessive amounts of resources in relation to our job arrival streams. This essentially removed bottlenecks from many of the problem instances leading to the existence of zero-cost policies and rendering the problem uninteresting.
2331
sets. The relative benefit of Lagrangian decisions was higher in advanced scheduling problems than in allocation scheduling problems. As discussed in [10], several additional features could be incorporated into our problem statements for allocation and advanced scheduling. It would be interesting to investigate whether the resulting MDP models continue to be weakly coupled so that our hybrid Lagrangian relaxation and constraint generation approach could be applied. More generally, our hope is that researchers would find new applications of large-scale weakly coupled MDPs and our method would help to find their approximate solutions.
Acknowledgments This research was funded in part by the Department of Radiology at the University of Washington Medical Center. The authors also appreciate financial support from the National Science Foundation through Grant CCF-0830380. The authors thank the IBM Academic Initiative for making their CPLEX Optimization Studio available at no charge.
Appendix A. Background on weakly coupled MDPs A weakly coupled MDP includes several independent subMDPs joined only through linking constraints [2,18] as described next. Consider an MDP with state-space X ¼ X 1 X 2 X I , where I is a positive integer and X 1 ,X 2 , . . . ,X I are finite non-empty sets. We use I to denote the set f1, 2, . . . ,Ig. For each i A I , let U i ðxi Þ be finite non-empty sets indexed by xi A X i . Also, for each state x ¼ ðx1 , x2 , . . . ,xI Þ A X, let U(x) equal to the Cartesian product U 1 ðx1 Þ U 2 ðx2 Þ U I ðxI Þ. Let J be a positive integer and we use J to denote the set f1, 2, . . . ,Jg in the sequel. Suppose b A RJ . Let g i : ðX i U i Þ-RJ be functions defined for each i A I . The set of feasible actions in state x A X is given by ( ) I X 1 1 2 2 I I 1 2 I i i U ðxÞ ¼ ðu , u , . . . ,u Þ AU ðx ÞU ðx Þ U ðx Þ : g i ðx ,u Þ r b : i¼1
ð17Þ For all x A X and u A UðxÞ, the immediate expected reward rðx,uÞ is P given by rðx,uÞ ¼ Ii ¼ 1 r i ðxi ,ui Þ, where r i : ðX i U i Þ-R is called the immediate expected reward function for component i A I . That is, the immediate expected reward is additively separable. For any x,x0 A X and u A UðxÞ, the transition probability pðx0 ; x,uÞ is Q multiplicatively separable. That is, pðx0 ; x,uÞ ¼ Ii ¼ 1 pi ðx0i ; xi ,ui Þ, where pi ðx0i ; xi ,ui Þ is the probability that the ith component MDP makes a transition from state xi A X i to state x0i A X i on choosing action ui A U i ðxi Þ in state xi. For any x A X, let V(x) be the maximum infinite-horizon discounted expected reward accumulated starting in state x. Thus V : X-R is the optimal value function of this weakly coupled MDP. From the general theory of MDPs [29,31], it is known that this optimal value function is the unique solution of Bellman’s equations ! I I X X Y VðxÞ ¼ max r i ðxi ,ui Þ þ a pi ðx0i ; xi ,ui Þ u A U ðxÞ
i¼1
!
Vðx01 , x02 , . . . ,x0I Þ ,
x0 A X
i¼1
8x A X:
ð18Þ
Feasible decisions that achieve the maximum in (18) define an optimal policy for the MDP. We use Eðx,uÞ to denote an expectation with respect to the probability mass function of the random
2332
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
vector X0 that represents the next state given that decision u A U ðxÞ was chosen in state x A X. Then the above Bellman’s equations can be rewritten more compactly as ! I X VðxÞ ¼ max r i ðxi ,ui Þ þ aEðx,uÞ VðX0 Þ , 8x A X: ð19Þ u A U ðxÞ
i¼1
In most problems, exact solution of these equations is intractable because X and U ðxÞ for x A X are both exponential in I. Two approximate solution techniques have been proposed to alleviate this curse of dimensionality: the linear programming approach and the Lagrangian approach. The reader is referred to [2,18] and references therein for more details on these. Both these methods yield approximations to the optimal value function V. More specifically, these approximations are statewise upper bounds on optimal values V(x) for x A X. The linear programming method provides a tighter bound but is computationally far more demanding than the Lagrangian method. In fact, when I is large, only the Lagrangian approach is computationally viable (see [2] for theoretical and empirical comparisons of the two methods). Thus our work here builds upon the Lagrangian method and we first briefly review it here.
The relaxation gives a statewise bound on the optimal value function of the weakly coupled MDP; that is, for all x A X, V l ðxÞ ZVðxÞ for any l Z 0. Let bðxÞ 4 0 be a sequence of positive P real numbers indexed by states x A X such that x A X bðxÞ ¼ 1. The weighted values of the original MDP and its relaxation are defined P P as HðbÞ ¼ x A X bðxÞVðxÞ and Hl ðbÞ ¼ x A X bðxÞV l ðxÞ, respectively, and Hl ðbÞ Z HðbÞ for any l Z0. Moreover, for each fixed iA I , let bi ðxi Þ 40 be real numbers indexed by state components xi A X i P such that bi ðxi Þ ¼ fx0 :x0i ¼ xi g bðx0 Þ. Note that X X X X bi ðxi Þ ¼ bðx0 Þ ¼ bðx0 Þ ¼ 1: xi A X i
xi A X i fx0 :x0i ¼ xi g
Also let Y i ¼ fðxi ,ui Þ : xi A X i ,ui A U i ðxi Þg. Then for each fixed i A I , the values V li ðxi Þ for all xi A X i equal to the optimal values of variables ni ðxi Þ in the linear program X bi ðxi Þni ðxi Þ min xi A X i
ni ðxi ÞaEðxi ,ui Þ ni ðX0i Þ þ lT g i ðxi ,ui Þ Z ri ðxi ,ui Þ, for ðxi ,ui Þ A Y i : After collating the above linear programs over all i, we obtain PJ I X X j ¼ 1 lj bj Hl ðbÞ ¼ þ min bi ðxi Þni ðxi Þ 1a i i¼1 i x AX
A.1. The Lagrangian relaxation method The Lagrangian approach essentially breaks the weakly coupled MDP into I sub-MDPs. This is achieved by relaxing the coupling constraints (17) using a vector l A RJþ of Lagrange multipliers and then adding the corresponding Lagrangian penalty function to the immediate expected reward. This is similar for instance to the Lagrangian relaxation method of integer programming [38], and results in the relaxed Bellman’s equations 0 " # ! J I I X X X l V ðxÞ ¼ max @ ri ðxi ,ui Þ þ lj bj g ij ðxi ,ui Þ þ aEðx,uÞ V l ðX0 Þ , u A UðxÞ
j¼1
i¼1
ð20Þ
for approximate value functions V l . In addition, these approximate value functions are given by
lb j¼1 j j
V l ðxÞ ¼
1a
þ
I X
V li ðxi Þ,
ð21Þ
i¼1
where the approximate componentwise value functions V li ðxi Þ, for each fixed i A I , are unique solutions of Bellman’s equations 0 1 J X X l i l i i i i 0i i i 0i V i ðx Þ ¼ max @r i ðx ,u Þ lj g ij ðx ,u Þ þ a pi ðx ; x ,u ÞV i ðx ÞA, ui A U i ðxi Þ
j¼1
xi A X i :
x0i A X i
ð22Þ
We use Eðxi ,ui Þ to denote the expectation with respect to the probability mass function of the random variable X0i representing the ith component of the next state given ðxi ,ui Þ, to rewrite the componentwise Bellman’s equations compactly as 0 1 J X l i l i i i i 0i V i ðx Þ ¼ max @r i ðx ,u Þ lj g ij ðx ,u Þ þ aEðxi ,ui Þ V i ðX ÞA, ui A U i ðxi Þ
xi A X i :
ni ðxi ÞaEðxi ,ui Þ ni ðX0i Þ þ
J X
lj g ij ðxi ,ui Þ Z ri ðxi ,ui Þ,
j¼1
ðxi ,ui Þ A Y i ,
iA I :
In order to identify a l Z 0 that yields the tightest upper bound on HðbÞ, an outer minimization over l is performed to get the Lagrangian linear program n
ðLLPÞHl ðbÞ9min
I X lT b X þ b ðxi Þni ðxi Þ 1a i ¼ 1 i i i x AX
i¼1
x A X,
PJ
x0 A X
ni ðxi ÞaEðxi ,ui Þ ni ðX0i Þ þ
J X
lj g ij ðxi ,ui Þ Z ri ðxi ,ui Þ, ðxi ,ui Þ A Y i ,
j¼1
iAI,
l Z 0:
The numbers of variables and constraints in (LLP) are P P P J þ Ii ¼ 1 9X i 9 and Ii ¼ 1 xi A X i 9U i ðxi Þ9, respectively. In particular, they are both only linear in I. Thus (LLP) can be solved efficiently in some cases as for example in bandit-like problems [2] and the scheduling problems in [12]. n n Solving (LLP) yields optimal values l and V li ðxi Þ of the ln variables. These are plugged into (21) to obtain V ðxÞ. A Lagrann gian policy is a policy that is myopic with respect to V l . That is, it ln is constructed, in theory, by plugging V into the right-hand side of (19) and solving the resulting static optimization problem, for every x A X. Performing this calculation for every x A X is, however, intractable owing to the size of X. Fortunately, in practice, a whole policy is not needed a priori—it suffices to select a decision vector in a particular state only if and when the system reaches that state during an actual system run. This is typically accomplished by designing an efficient, problem-specific algorithm.
Appendix B. Raw data for Tables 1 and 2
j¼1
ð23Þ
Here we include complete numerical results that were used to obtained the summary statistics presented in Tables 1 and 2.
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
2333
Table 3 Estimates of profits obtained by Lagrangian (L) and myopic (M) policies for 200 allocation scheduling test problems summarized in Table 1. The numbers are rounded to save space. Instance
Problem set 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Problem set 2
Problem set 3
Problem set 4
L
M
L
M
L
M
L
M
5567 1830 1983 12,023 107 7397 4987 213 5488 4524 5042 623 400 839 3930 2756 4603 4670 8477 9537 4634 300 950 10,248 3147 4720 2361 5783 5403 4983 3807 3906 1091 2733 7703 8745 1339 6349 7006 2658 1900 5964 5175 4262 5663 229 1064 9982 3067 4205
5540 1754 1802 12,041 98 7442 4988 203 5423 4213 5065 462 386 701 4073 2669 4535 4596 8513 9594 4666 287 806 10,184 2930 4531 2029 5824 5450 5019 3891 3868 1137 2634 7675 7117 1471 5994 6970 2710 1503 5812 5244 4313 5759 149 1046 9979 3090 3970
19,926 7304 5521 5845 5790 8188 21,951 7161 10,157 6302 5373 9389 12,187 3673 3668 3830 8141 9299 8176 6339 9092 9190 9466 10,836 8931 6467 6939 6682 7707 7213 5832 7207 18,603 11,643 6311 6755 8754 9129 4551 8570 6877 9090 11,320 10,102 8437 5053 5901 5618 3237 13,349
19,653 7388 5341 5810 5758 8224 21,937 7264 9905 6199 5390 9324 11,738 3657 3649 3893 8126 9333 8156 6210 8988 9172 9435 10,843 9016 6178 6868 5688 7594 7322 5624 7190 18,539 9531 6181 6782 8747 8950 4399 7014 6953 8999 11,398 10,077 8546 5032 5747 5651 3161 13,468
1866 3211 1452 4988 1274 5803 2963 2676 996 6115 2594 139 3490 1082 3603 8292 1646 5611 3218 417 946 3920 2713 7099 3201 8985 478 3744 3083 3943 5066 1939 1701 1760 3694 7143 2307 1250 5648 3136 3675 7407 3088 3931 1643 6073 6050 1607 3547 3486
1981 3283 1494 4911 1236 5890 2961 2606 839 6004 2569 62 3511 1217 3576 8358 1662 5653 3242 425 486 3888 2747 7095 3328 9087 492 3766 2913 3943 4988 2086 1681 1707 3692 7202 2326 1218 5358 3241 3670 7399 3092 3620 1646 6141 6047 1457 3566 3317
6325 13,657 16,487 8858 1649 9016 5627 3292 1988 3499 6230 8106 9040 6064 12,433 9173 7891 3675 5647 5710 13957 3089 10,309 10,174 12,702 5823 8639 4554 5308 9648 6401 4185 7067 8182 3336 4042 7765 8461 7220 4385 6231 16370 2098 10,820 6442 6428 9409 7234 3062 8488
6258 13,695 16,485 8682 1539 8951 5668 2989 2091 3648 5858 8136 9041 5995 12,350 9104 7810 3601 4397 5751 13977 3092 10,172 9994 12,559 5761 8578 4555 5432 9538 6354 4004 6994 8074 3314 4030 7695 8421 7252 4343 6172 16,422 2107 10,734 6311 6487 8452 7193 3046 8403
Table 4 Estimates of costs incurred by Lagrangian (L) and myopic (M) policies for the first 250 of the 500 advanced scheduling test problems summarized in Table 2. The numbers are rounded to save space. Instance
1 2 3 4 5 6 7 8 9
Problem set 1
Problem set 2
Problem set 3
Problem set 4
Problem set 5
L
M
L
M
L
M
L
M
L
M
9496 862 19,176 750 31,061 3165 8143 7558 14,606
19,234 603 24,838 1230 27,077 6142 14,820 3374 20,760
10,919 12,064 2390 3650 8505 4660 175 43,585 562
6119 15,084 2462 647 8773 11,615 804 42,849 1060
10,242 36,328 40,499 12,417 2128 5715 7444 6350 22,027
10,075 36071 38,920 16,777 4433 7179 8422 6321 29,949
5831 20,047 3702 15,684 21,907 7863 5854 49,354 25,033
1770 22,608 2592 14,521 23,360 10,260 5820 44,923 30,129
2593 23,334 2235 1046 8241 11,357 4053 15,200 17,235
5902 24,917 294 14,824 19,282 10,668 6937 19,087 21,780
2334
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
Table 4 (continued ) Instance
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Problem set 1
Problem set 2
Problem set 3
Problem set 4
Problem set 5
L
M
L
M
L
M
L
M
L
M
45,067 5248 9201 30,896 1154 11,072 28,192 90 4634 19,737 16,513 15,285 36,938 4826 7144 3335 3074 35,161 392 9214 26,768 3434 819 10,134 40,860 6853 35,861 565 6791 8819 13,461 25,242 4117 19,376 30,527 9263 49,160 32,582 14,092 60,787 5919
39,926 6955 9849 29,969 2500 10,478 48,255 626 4722 19,390 16,373 12,623 41,182 7293 10,121 7851 6129 31,228 1761 11,904 35,226 3901 930 8762 51,701 8007 41,964 1946 6174 12,595 21,705 36,793 7029 20,069 24,186 14,460 58,165 39,962 24,977 59,724 4542
15,567 10,122 16,867 2175 6570 213 4690 27,818 23,236 2238 6004 26,862 9 14,778 1487 15,556 12,263 346 32,469 6602 3577 2759 2653 4499 1762 3951 3907 15 21,510 24,829 5717 4464 454 556 24,022 430 10,979 968 7467 38,405 19,482
21,321 13,561 20,345 6915 13,802 4023 599 37,301 31,280 4022 5536 34,885 760 32,890 919 15,726 12,204 60 22,240 6434 4655 3187 900 5114 12,856 4370 3649 0 30,949 26,300 5186 4496 60 100 24,342 6902 11,175 1891 5281 35,046 27,671
33,124 445 43,458 35,485 14,821 18,732 18,829 7885 7945 12,337 16,021 3528 17,287 6264 5942 38,891 1475 36,586 30,652 8817 35,322 10,503 2814 14,301 14,561 31,173 13,862 34,296 867 22,370 28,465 2411 18,100 15,138 227 15,800 14,731 14,866 2326 7802 4365
31,807 346 40,815 33,702 18,047 19,608 20,495 8799 8517 15,190 16,900 3624 11,877 6334 6345 35,828 4722 33,324 28,136 11,178 34,511 13,277 3525 16,118 15,300 31,872 13,289 31,975 465 27,119 29,969 2823 25,030 21,890 0 30,250 13,259 14,137 1775 10,454 5520
38,578 18,612 24,693 12,585 8341 857 7022 3822 15,604 16,372 2002 1001 27,588 5989 2373 11,234 3071 18,221 6902 3733 7894 3370 20,587 303 11,554 12,465 18,965 7960 17,496 7851 26,223 12,732 17,985 5214 42,981 12,513 6027 1330 9569 12,102 8282
44,978 21,754 49,103 12,174 8212 1664 5292 2329 17,167 16,277 729 528 34,605 5317 1575 17,446 8051 20,866 7067 5110 5350 4778 19,145 11 20,880 11,461 13,743 8758 15,213 5882 28,182 14,278 8738 9796 39,042 7170 2985 2817 10,674 15,484 15,404
4305 1364 1546 12,801 972 3656 4503 5239 3328 4722 5190 6862 13,453 19,977 6881 4640 27,124 6626 3913 10,368 5688 1944 389 9136 9224 23,162 4983 6964 15,408 41,997 4192 7097 5220 2500 24,611 7277 14,712 4607 24,531 25,386 15,090
7897 32 579 15,300 624 6507 5403 6848 6264 9473 4790 7176 24,724 37,783 6018 2498 24,234 5886 12,867 17,404 9448 1580 18 14,944 14,067 25,897 9715 7049 26,329 39,414 8734 12,181 5639 3858 23,249 14,223 14,662 14,469 27,899 30,344 10,920
Table 5 Estimates of costs incurred by Lagrangian (L) and myopic (M) policies for the last 250 of the 500 advanced scheduling test problems summarized in Table 2. The numbers are rounded to save space. Instance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Problem set 6
Problem set 7
Problem set 8
Problem set 9
Problem set 10
L
M
L
M
L
M
L
M
L
M
44 175 3625 1807 11 1760 5053 1495 5 274 7.9 1 1075 1153 255 186 974 4.4 143 1092 235
0 1.5 7186 2 0 47 1996 2406 0 52 0.2 0 1.3 442 0 0.2 182 0 422 106 0
17,005 26,923 7180 885 12,227 66,421 4947 31,584 2887 3090 40,883 14,720 19,272 21,093 21,426 20,703 4560 13,600 18,279 10,797 19,928
13,303 21,844 7899 1521 25,871 55,642 8006 36,453 3904 12,952 48,132 17,359 35,315 15,245 18,084 21,684 5207 22,835 27,074 22,383 33,149
5976 1868 829 3122 3184 176 466 397 599 2425 8717 5125 1370 11,315 1626 509 2026 14 319 1149 1820
1305 1902 355 1843 4865 16 61 2281 55 1553 9960 8263 1030 7801 12,029 103 48 1.9 19 405 666
13,545 978 3634 19,963 863 5952 1504 10,630 2499 759 9528 502 8278 2836 2457 666 713 9089 1211 1227 521
12,273 4390 973 16,650 83 13,381 553 13,528 5305 32 17,655 7 19,500 2792 3767 3 2762 14,421 2746 109 14
494 408 135 2560 4486 5899 188 7647 3375 837 2812 4032 1880 2102 4268 991 5451 31,650 9220 104 651
170 166 5.2 461 31,538 7889 2.7 8620 4709 368 6168 1906 893 3901 10,149 3465 5851 39,496 5736 0 293
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
2335
Table 5 (continued ) Instance
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Problem set 6
Problem set 7
Problem set 8
L
M
L
M
155 66 3161 294 26 5044 210 78 922 0 528 145 0 143 11 2114 1107 1053 30 187 1075 0 8357 2615 0 3529 179 0 9
0.2 33 4880 0 0 6316 0 0.3 919 0 197 0 0 14 0 1168 1418 48 0 0.6 30 0 9551 856 0 6883 1.8 0 0
11,064 1551 36,091 5748 29,167 3239 1771 6096 3487 9191 25,084 8675 25,987 54,401 9747 5991 827 2348 1399 872 5284 14,887 20,008 39,468 3420 30,814 2159 7276 4484
13,956 2109 38,399 3643 32,342 6131 5255 9789 12576 11,908 39,826 18,248 30,618 53,996 15,838 12,395 706 9290 1653 1219 5566 15,853 27,547 39,102 10,548 35,540 11,585 12,572 10,580
L 124 3922 160 5005 104 147 113 1095 267 3675 23 291 2679 20 163 1491 716 64 5845 689 149 1157 8831 131 1981 3161 3684 1.9 120
References [1] Aberdeen D, Thiebaux S, Zhang L. Decision-theoretic military operations planning. In: Proceedings of the 14th international conference on automated planning and scheduling, Whistler, British Columbia, Canada; 2004. [2] Adelman D, Mersereau AJ. Relaxations of weakly coupled stochastic dynamic programs. Operations Research 2008;56(3):712–27. [3] Ali MM, Song X. A performance analysis of a discrete-time priority queueing system with correlated arrivals. Performance Evaluation 2004;57(3):307–39. [4] Altman E. Applications of Markov decision processes in communication networks. In: Feinberg E, Shwartz A, editors. Handbook of Markov decision processes: methods and applications. Springer; 2002. p. 489–536 [chapter 16]. [5] Bertsekas DP. Dynamic programming and optimal control, vols. 1 and 2. Nashua, NH, USA: Athena Scientific; 2007. [6] Chu PC, Beasley JE. A genetic algorithm for the multidimensional knapsack problem. Journal of Heuristics 1998;4(3):63–86. [7] Erdogan SA, Denton BT. Surgery planning and scheduling: a literature review. Working Paper; 2009. [8] Gao P, Wittevrongel S, Bruneel H. Discrete-time multiserver queues with geometric service times. Computers and Operations Research 2004;31(1): 81–99. [9] Garg L, McClean S, Meenan B, Millard P. A non-homogeneous discrete time Markov model for admission scheduling and resource planning in a cost or capacity constrained healthcare system. Health Care Management Science 2010;13(2):155–69. [10] Gocgun Y. Approximate dynamic programming for dynamic stochastic resource allocation with applications to healthcare. PhD thesis, University of Washington, Seattle, WA, USA; 2010. [11] Gocgun Y, Bresnahan B, Ghate A, Gunn M. A Markov decision process approach to multi-category patient scheduling in a diagnostic facility. Artificial Intelligence in Medicine 2011;53(2):73–81. [12] Gocgun Y, Ghate A. A Lagrangian approach to dynamic resource allocation. In: Johansson B, Jain S, Montoya-Torres J, Hugan J, Yucesan E, editors. Proceedings of the winter simulation conference, Baltimore; 2010. p. 3330–8. [13] Godfrey GA, Powell WB. An adaptive dynamic programming algorithm for dynamic fleet management, I: single period travel times. Transportation Science 2002;36(1):21–39. [14] Godfrey GA, Powell WB. An adaptive dynamic programming algorithm for dynamic fleet management, II: multiperiod travel times. Transportation Science 2002;36(1):40–54. [15] Green LV, Savin S, Wang B. Managing patient service in a diagnostic medical facility. Operations Research 2006;54(1):11–25.
Problem set 9
Problem set 10
M
L
M
L
M
6 2837 35 3112 29 0 0 899 179 5393 24 48 6774 0 0 523 625 0 2967 933 35 1040 14,425 1781 941 1944 6786 0 18
1038 6494 2889 1963 15,521 620 11,661 1964 5337 4167 4351 2079 12,287 3114 2222 836 2222 13417 5534 1229 834 1758 1261 1075 11,648 9118 13,253 0 2657
270 6109 2458 452 10,312 346 17,666 3596 9564 8371 7926 3459 12,533 3624 2198 284 705 18,891 13,031 956 510 5331 305 5243 16,047 16,674 29,749 0 5539
4699 4823 4386 1267 919 2984 1380 296 1312 21,059 7569 15,115 2106 3933 714 7591 2706 3911 16,568 4522 2680 1866 955 403 10,036 11 10,053 4081 4156
1748 9772 12,557 1712 151 3744 1234 1308 10,967 38,597 13,168 31,745 3238 21,844 230 6339 2393 5948 21,680 6977 15,215 3464 38 243 40,699 1.7 20,165 16,853 4128
[16] Gupta D, Wang L. Revenue management for a primary-care clinic in the presence of patient choice. Operations Research 2008;56(3):576–92. [17] Harrison MJ. Dynamic scheduling of a multiclass queue: discount optimality. Operations Research 1975;23(2):270–82. [18] Hawkins J. A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA; 2003. [19] Jahn B, Theurl E, Siebert U, Pfeiffer KP. Tutorial in medical decision modeling incorporating waiting lines and queues using discrete event simulation. Value in Health 2010;13(4):501–6. [20] Khamisy A, Sidi M. Discrete-time priority queues with two-state Markov modulated arrivals. Stochastic Models 1992;8(2):337–57. [21] Kolisch R, Sickinger S. Providing radiology health care services to stochastic demand of different customer classes. OR Spectrum 2008;30:375–95. [22] Laevens K, Bruneel H. Discrete-time multiserver queues with priorities. Performance Evaluation 1988;33(4):249–75. [23] Li J, Blumenfeld DE, Huang N, Alden JM. Throughput analysis of production systems: recent advances and future topics. International Journal of Production Research 2009;47(14):3823–51. [24] Luh PB, Miao X, Chang S, Castanon DA. Stochastic task selection and renewable resource allocation. IEE Transactions on Automatic Control 1989;34(3): 335–8. [25] Ndreca S, Scoppola B. Discrete time GI/Geom/1 queueing system with priority. European Journal of Operational Research 2008;189(3):1403–8. [26] Nadal Nunes LG, de Carvalho SV, de Ca´ssia Meneses Rodrigues R. Markov decision process applied to the control of hospital elective admissions. Artificial Intelligence in Medicine 2009;47(2):159–71. [27] Patrick J, Puterman ML, Queyranne M. Dynamic multi-priority patient scheduling for a diagnostic resource. Operations Research 2008;56(6):1507–25. [28] Pavitsos A, Kyriakidis EG. Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers and Operations Research 2009;36(6):1993–2006. [29] Puterman M. Markov decision processes. New Jersey: John Wiley and Sons; 1994. [30] Ross KW, Tsang D. The stochastic knapsack problem. IEEE Transactions on Communications 1989;37(7):740–7. [31] Ross SM. Introduction to stochastic dynamic programming. New York: Academic Press; 1983. [32] Tolio T. Design of flexible production systems—methodologies and tools. Berlin, Germany: Springer; 2009. [33] Topaloglu H. Using Lagrangian relaxation to compute capacity-dependent bid prices in network revenue management. Operations Research 2009;57(3): 637–49.
2336
Y. Gocgun, A. Ghate / Computers & Operations Research 39 (2012) 2323–2336
[34] Topaloglu H, Powell WB. A distributed decision-making structure for dynamic resource allocation using nonlinear functional approximations. Operations Research 2005;53(2):281–97. [35] Vengerov D. A reinforcement learning approach to dynamic resource allocation. Engineering Applications of Artificial Intelligence 2007;20(3): 383–90.
[36] Walraevens J, Steyaert B, Bruneel H. Performance analysis of a single-server ATM queue with a priority scheduling. Computers and Operations Research 2003;30(12):1807–29. [37] Walraevens J, Steyaert B, Bruneel H. Analysis of a discrete-time preemptive resume priority buffer. European Journal of Operational Research 2008;186(1):182–201. [38] Wolsey L. Integer programming. New York, NY, USA: Wiley-Interscience; 1998.