Operations Research Letters 38 (2010) 462–467
Contents lists available at ScienceDirect
Operations Research Letters journal homepage: www.elsevier.com/locate/orl
A new method of proving structural properties for certain class of stochastic dynamic control problems Weifen Zhuang a,∗ , Michael Z.F. Li b a
Desautels Faculty of Management, McGill University, Montreal, Quebec H3A 1G5, Canada
b
Nanyang Business School, Nanyang Technological University, Singapore 639798, Singapore
article
info
Article history: Received 19 May 2009 Accepted 4 May 2010 Available online 24 May 2010 Keywords: Stochastic dynamic control Multimodularity Structural properties
abstract This paper introduces the multimodularity concept to study structural properties for certain class of stochastic dynamic control problems through a new efficient approach. We demonstrate that this approach can substantially simplify the proofs of the main results of one recent article and provide an alternative method for two other models in the literature. Crown Copyright © 2010 Published by Elsevier B.V. All rights reserved.
1. Introduction Stochastic dynamic control problems have a wide range of applications in service capacity management such as airlines, hotels, car rentals, healthcare, telecommunications, call centers, etc.; as well as in production capacity management in manufacturing. In particular, they are commonly associated with queueing models, inventory control and production planning. It is well known that characterizing optimal policies for Markovian systems is quite challenging. Without any special structure, identifying the optimal control for Markov decision processes (MDP) is computationally infeasible in general (see [2]). A standard approach is to identify structural properties that produce structural forms of optimal policies and show that structural properties propagate through dynamic programming recursions. Structural properties of the value function such as monotonicity, convexity, sub/supermodularity are generally studied and used to characterize optimal policies and to derive comparative statics results for MDP models (see [8]). Structural forms of optimal policies such as monotone switching curve policies are common for two-dimensional systems in literature. The multimodularity concept has been mostly applied in queueing controls (see [6,7,9]). Many recent studies in service capacity allocation, manufacturing and production and inventory control (see [3,4,11]) have applied some equivalent properties of multimodularity to characterize structural properties of the
∗
Corresponding author. E-mail addresses:
[email protected] (W. Zhuang),
[email protected] (M.Z.F. Li).
optimal policies without mentioning the multimodularity concept. To our knowledge, multimodularity has not been systematically utilized to study structural properties of stochastic dynamic control problems. This paper addresses two key questions. First, what structural properties that we need to prove? And second, can we prove them efficiently? The main methodological contribution of this paper is to introduce the multimodularity concept to study structural properties systematically through a new efficient approach. In particular, we demonstrate that this treatment can substantially simplify the proofs of the main results of [3] and can be applied to analyze structural properties for problems addressed by [4,11]. The rest of the paper is organized as follows. In Section 2, we introduce the multimodularity concept and establish a key lemma that plays an instrumental role toward a new approach that shows the propagation of structural properties. In Section 3, we apply our method to re-prove the main results of [3]. Section 4 demonstrates that our method can be easily used to handle the problems addressed in [4,11]. 2. Multimodularity Convex functions are tractable for continuous optimization problems because a local optimality guarantees the global optimality. However, many stochastic dynamic control problems are defined over a discrete state space. Introduced by [5], the discrete analog to a convex function is a multimodular function. In this paper we focus on two-dimensional stochastic dynamic control problems and study their structural properties. Instead of providing a general definition of multimodularity as originally introduced by [5], we limit our attention to two-dimensional
0167-6377/$ – see front matter Crown Copyright © 2010 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.orl.2010.05.008
W. Zhuang, M.Z.F. Li / Operations Research Letters 38 (2010) 462–467
multimodular functions. The following definition, derived from [1, Lemma 2, p. 13], focuses on the operational properties of a twodimensional multimodular function. Definition 1. A real-valued function f : Z2 → R is multimodular if and only if for all x ∈ Z2 , and for i, j = 1, 2, i 6= j,
∆i ∆j f (x) ≥ 0,
(1)
∆i ∆i f (x) ≥ ∆i ∆j f (x),
(2)
463
Multimodularity guarantees unique global optimality and ensures that the value differences over two states horizontally, vertically and diagonally are monotone over state variables, which are sufficient for the optimality of the monotone switching curve policy for a certain class of stochastic dynamic control problems. In the next two sections, we will apply the multimodularity concept and Lemma 2 to prove structural properties for a dynamic resource allocation problem [3] and a production scheduling and inventory control problem [5,11].
where ∆i f (x) ≡ f (x) − f (x − ei ) with e1 ≡ (1, 0) and e2 ≡ (0, 1). In the literature, there are various concepts that are closely related to multimodularity as summarized in the following definition. Definition 2. Let f : Z2 → R be any real-valued function. For all x ∈ Z2 , and for i, j = 1, 2, i 6= j, (1) (2) (3) (4)
f is supermodular if ∆i ∆j f (x) ≥ 0; f is superconvex [6] if ∆i f (x + ei ) ≥ ∆i f (x + ej ); f is componentwise convex if ∆i ∆i f (x) ≥ 0; f is integer convex [1] if f (x + d) − f (x) ≥ f (x) − f (x − d) for d ∈ Z2 .
The following lemma is an immediate consequence of the multimodularity definition. Lemma 1. A real-valued function f: Z2 → R is multimodular if and only if f is supermodular and superconvex. Furthermore, any multimodular function is componentwise convex and integer convex. Note that there is a parallel definition of a multimodular function defined on Z2 in the concavity context which is equivalent to submodular and subconcave, and implies componentwise concave and integer concave by reversing the inequalities in Definition 2, respectively. For ease of presentation, throughout this paper we let V be the class of multimodular functions in the normal convexity sense (supermodular and superconvex) and U be the class of multimodular functions in the concavity context (submodular and subconcave). There are two well-established approaches in the literature that have been used to show the propagation of structural properties of the value function. One approach, originated from [10,2], is known as the theory of ordered optimal solutions, recently applied in [4,11]. Another approach is the case-by-case method, recently applied in [3]. The following lemma provides an efficient alternative to prove structural properties and derive comparative statics results. Lemma 2. Let {fi , gi , f˜i , g˜i : i = 1, . . . , m} be four arbitrary real sequences. Assume that for any i and j, there exist k1 and k2 ∈ {1, . . . , m} such that fi − gk1 ≤ f˜k2 − g˜j .
(3)
Then the following inequality holds, max{f1 , . . . , fm } − max{g1 , . . . , gm }
≤ max{f˜1 , . . . , f˜m } − max{˜g1 , . . . , g˜m }. Proof. The proof is routine thus omitted.
(4)
It is important to highlight that examining structural properties of certain operator by using Lemma 2 requires checking m2 inequalities only while the normal case-by-case method requires checking m4 inequalities. In addition, checking (3) is much easier than the ordered optimal solutions method. Note that there is a parallel result of Lemma 2 in the minimization context by reversing the inequalities (3) and (4), and replacing the max operator by min.
3. An application to dynamic resource allocation Let us briefly rephrase the dynamic resource allocation problem addressed in [3] and present an equivalent but more succinct MDP formulation of the underlying problem. Consider one medical diagnostic facility serving three types of customers, namely emergency patients, inpatients and outpatients. Emergency patients have the priority over the other two. Any particular operating day is divided into N service slots of equal length. Both emergency patients and inpatients arrive randomly while outpatients must make an appointment. Let Z1 and Z2 be the number of arrivals for emergency patients and inpatients in each service slot respectively and let Z3 be the number of show-ups of outpatients in each given service slot. Z1 , Z2 and Z3 are assumed to be 0–1 Bernoulli-type random variables. Denote pe = P (Z1 = 1) and pn = P (Z2 = 1) as the arrival probabilities of emergency and inpatient service requests, respectively. The appointment schedule is represented by a vector a = (a1 , . . . , aN ) : ai = 1 if the ith service slot has been booked and ai = 0 otherwise (i = 1, . . . , N ). Given that the probability of show-up for any outpatient with an appointment is given by ps , the random variable Z3 has the following distribution: P (Z3 = 1) = ps ai and P (Z3 = 0) = 1 − ps ai , where ai is the appointment status for the service slot i. The system is fully described by the numbers of inpatients and outpatients, (n, s), waiting to be served at the beginning of the ith service period after decision. So the state space for the ith service slot is given by Si = {(n, s)|0 ≤ n, s ≤ i − 1}, i = 1, . . . , N. The action space is given by {(δ1 , δ2 )|δi = 0 or 1, i = 1, 2 and δ1 + δ2 ≤ 1} where δ1 and δ2 are the admission control on inpatient and outpatient respectively. Clearly, there are only three possible actions: admit one inpatient (δ1 = 1, δ2 = 0), admit one outpatient (δ1 = 0, δ2 = 1), or admit no one (δ1 = 0, δ2 = 0). Given the current state and an action, the system transits from period i to period i + 1 with eight possible states and the transition probabilities are determined by the joint distribution of (Z1 , Z2 , Z3 ). The reward structure consists of two parts: the revenues of rn and rs collected from serving an inpatient and an outpatient respectively with rn < rs and the waiting costs per period of wn and ws for inpatients and outpatients respectively with wn < ws . The end-of-day penalty costs from an untreatment of an inpatient and an outpatient are πn and πs respectively with πn > πs . The objective is to make dynamic capacity allocation given any appointment schedule for outpatients in order to maximize the total expected daily profit from inpatients and outpatient. As long as the emergency arrival in each service period is a 0–1 random variable and the random variables Z1 , Z2 and Z3 are independent, the underlying stochastic dynamics is clearly Markovian. Therefore, the optimal resource allocation problem for one diagnostic facility becomes a finitestate, finite-horizon MDP with a two-dimensional state space. Given any appointment schedule a, let Via (n, s) be the optimal expected profit-to-go function from ith slot onward after decision at state (n, s). The optimality equation for the underlying MDP is given by
464
W. Zhuang, M.Z.F. Li / Operations Research Letters 38 (2010) 462–467 1−Z1 ,a
Via (n, s) = −nwn − sws + E [Hi+1
(n, s) ∈ Si ,
Via+1 (n, s − 1) − Via+1 (n − 1, s − 1)
(n + Z2 , s + Z3 )],
i = 1, . . . , N ,
(5)
where
max{Via+1 (n − 1, s) + rn , Via+1 (n, s − 1) + rs }, if n ≥ 1, s ≥ 1, 1 ,a Hi+1 (n, s) = a + + Vi+1 ((n − 1) , (s − 1) ) + min(n, 1)rn + min(s, 1)rs , otherwise,
(6)
0 ,a
Note that the optimal rationing decision at the beginning of (i+1)st 1−Z1 ,a
slot is determined by the control operator Hi+1 (n + Z2 , s + Z3 ) which means that for any appointment schedule a the capacity rationing decision is made on allocating 1 − Z1 facility between serving an inpatient and serving an outpatient given Z1 emergency patient and Z2 inpatient arrive, and Z3 outpatient shows up during the ith slot. The boundary conditions are given by 1 ,a
0 ,a
HN +1 (n, s) = HN +1 (n, s) = VNa +1 (n, s) = f (n, s),
(7)
where f (n, s) is the penalty function defined on SN +1 . For an easy comparison, our Lemma 3 and Propositions 1–3 correspond to Lemmas A1 and A2 and Propositions 1–3 of [3] respectively, whose proofs take a space of more than 40 pages in a technical appendix. We will apply multimodularity and Lemma 2 to show structural properties of the value function that are sufficient for the monotone optimal switching curve policies and to derive key comparative statics results. Since this is a maximization problem, we will use the multimodularity concept in the concavity context, namely, the class U . Lemma 3. For the model characterized by (5)–(7), if f (n, s) ∈ U , then Via (n, s) ∈ U . Proof. We prove the result by induction on i. First, the boundary condition (7) assures VNa +1 (n, s) ∈ U . Now assume Via+1 (n, s) ∈ U for any i ≤ N, we need to show Via (n, s) ∈ U , which is equivalent to 1 ,a Hi+1
(n, s) ∈ U since taking expectation preserves those properties. From (6), it suffices to show T˜ Via+1 (n, s) ∈ U , where T˜ Via+1 (n, s) ≡ max{Via+1 (n − 1, s) + rn , Via+1 (n, s − 1) + rs }. Note that as revenues collected do not affect properties of the operator, we can drop them for ease of presentation. Redefine an operator TVia+1 (n, s) = max{Via+1 (n − 1, s), Via+1 (n, s − 1)}, and the remaining task is to show TVia+1 (n, s) ∈ U given Via+1 (n, s) ∈ U . By Lemma 1, it suffices to prove submodularity and subconcavity. Let us first prove the submodularity, that is, TVia+1 (n, s) − TVia+1 (n − 1, s) ≤ TVia+1 (n, s − 1) − TVia+1 (n − 1, s − 1),
≤ max{Via+1 (n − 1, s − 1), Via+1 (n, s − 2)}
≤ Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1), Via+1 (n − 1, s) − Via+1 (n − 1, s − 1)
≤ Via+1 (n − 1, s − 1) − Via+1 (n − 1, s − 2),
≤ TVia+1 (n − 1, s) − TVia+1 (n − 2, s), or equivalently, max{Via+1 (n − 1, s − 1), Via+1 (n, s − 2)}
− max{Via+1 (n − 2, s − 1), Via+1 (n − 1, s − 2)} ≤ max{Via+1 (n − 2, s), Via+1 (n − 1, s − 1)} − max{Via+1 (n − 3, s), Via+1 (n − 2, s − 1)}.
(9)
Note that Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1)
≤ Via+1 (n − 2, s) − Via+1 (n − 3, s), Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1)
≤ (=)Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1), Via+1 (n, s − 2) − Via+1 (n − 1, s − 2)
≤ Via+1 (n − 2, s) − Via+1 (n − 3, s), Via+1 (n, s − 2) − Via+1 (n − 1, s − 2)
≤ Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1), by subconcavity of Via+1 (n, s), hence (9) follows by Lemma 2. Therefore Via (n, s) ∈ U . The essence of Lemma 3 is to show the propagation of multimodularity of the value function. Comparing with the caseby-case method in [3], Lemma 3 is more efficient and using multimodularity will automatically eliminate the checking of many redundant properties. With the multimodularity property of the value function we can define the switching curve as
Proposition 1. For any f ∈ U and an arbitrary appointment schedule a, the optimal capacity allocation policy is characterized by a series of switching-curves {nai (s), i = 1, . . . , N } with the following properties:
− max{Via+1 (n − 2, s), Via+1 (n − 1, s − 1)}
Via+1 (n − 1, s) − Via+1 (n − 2, s)
TVia+1 (n, s − 1) − TVia+1 (n − 1, s − 1)
and characterize the optimal policy at ease.
max{Via+1 (n − 1, s), Via+1 (n, s − 1)}
Note that
where the first and forth inequalities follow by submodularity, and the second and third inequalities follow by componentwise concavity which is implied by multimodularity of Via+1 (n, s). The four inequalities above enumerate the combinations of optimal actions for TVia+1 (n, s) and TVia+1 (n − 1, s − 1), and find feasible actions for TVia+1 (n − 1, s) and TVia+1 (n, s − 1) such that inequalities (3) holds. Hence the inequality (8) follows by Lemma 2. We now turn to subconcavity. As subconcavity is pairwise symmetric, it suffices to prove the result for i = 1, that is,
nai (s) = min{n|Via+1 (n − 1, s) − Via+1 (n, s − 1) ≥ rs − rn }
or equivalently,
− max{Via+1 (n − 2, s − 1), Via+1 (n − 1, s − 2)}.
(n, s − 1) − Via+1 (n − 1, s − 1)
≤ Via+1 (n, s − 2) − Via+1 (n − 1, s − 2),
Hi+1 (n, s) = Via+1 (n, s).
showing
≤ Via+1 (n − 1, s − 1) − Via+1 (n − 2, s − 1), Via+1
(8)
(a) Serve an inpatient if and only if n ≥ nai (s) and serve an outpatient if and only if n < nai (s); (b) nai (s) is nondecreasing in s. Proof. The proof is routine thus omitted.
Lemma 2 can also be applied to substantially simplify the proofs of comparative statics results as demonstrated in Propositions 2 and 3.
W. Zhuang, M.Z.F. Li / Operations Research Letters 38 (2010) 462–467
Proposition 2. For any f (n, s) = −f1 (n)πn − f2 (s)πs ∈ U where f1 (n) and f2 (s) are nondecreasing in n and s respectively, the optimal switching curve {nai (s), i = 1, . . . , N } has the following properties: (a) (b)
nai nai
(s) :↓ πn , ↓ rn , ↓ wn , ↓ pn ; (s) :↑ πs , ↑ rs , ↑ ws , ↑ ps .
When i = N + 1, given pn ≤ p˜ n , we have VNa +1 (n − 1, s, pn ) − VNa +1 (n, s − 1, pn )
≤ (=)VNa +1 (n − 1, s, p˜ n ) − VNa +1 (n, s − 1, p˜ n ), which implies VNa +1 (n, s, pn ) ∈ U1 . Assume Via+1 (n, s, pn ) ∈ U1 . By applying a similar proof of (11) we have TVia+1 (n, s, pn ) ∈ U1 or
Proof. (a) Let U1 be the subclass of U satisfying the following additional property:
1−Z ,a
Hi+1 1 (n, s, pn ) ∈ U1 . We need to show Via (n, s, pn ) ∈ U1 , that is, Via (n − 1, s, pn ) − Via (n, s − 1, pn ) − Via (n − 1, s, p˜ n )
V (n − 1, s, θn ) − V (n, s − 1, θn ) ≤ V (n − 1, s, θ˜n )
− V (n, s − 1, θ˜n ) for θn ≤ θ˜n .
+ Via (n, s − 1, p˜ n ) ≤ 0. (10)
It is evident that (s, θn ) ≥ (s, θ˜n ) if (n, s, θn ) ∈ U1 and θn can be any one of the parameters πn , wn , pn and rn . Consider θn = πn . We now show Via (n, s, πn ) ∈ U1 by induction on i. When i = N + 1, given that πn ≤ π˜ n and f1 (n) is nondecreasing in n, we have nai
nai
Via
Note that Via (n − 1, s, pn ) − Via (n, s − 1, pn ) − Via (n − 1, s, p˜ n )
+ Via (n, s − 1, p˜ n ) h 1 − Z ,a 1 − Z ,a = pn E Hi+1 1 (n, s + Z3 , pn ) − Hi+1 1 (n + 1, s + Z3 − 1, pn ) i 1 − Z ,a 1 − Z ,a − Hi+1 1 (n, s + Z3 , p˜ n ) + Hi+1 1 (n + 1, s + Z3 − 1, p˜ n ) h 1 − Z ,a + (1 − pn )E Hi+1 1 (n − 1, s + Z3 , pn )
VNa +1 (n − 1, s, πn ) − VNa +1 (n, s − 1, πn )
= (f1 (n) − f1 (n − 1))πn − (f2 (s) − f2 (s − 1))πs
1−Z ,a
= VNa +1 (n − 1, s, π˜ n ) − VNa +1 (n, s − 1, π˜ n ), which implies VNa +1 (n, s, πn ) ∈ U1 . Given Via+1 (n, s, πn ) ∈ U1 , we will show Via (n, s, πn ) ∈ U1 , or TVia+1 (n, s, πn ) ∈ U1 , which is equivalent to requiring
≤
(n − 1, s, πn ) −
TVia+1
TVia+1
1−Z ,a
(11)
i.e., max{Via+1 (n − 2, s, πn ), Via+1 (n − 1, s − 1, πn )}
− max{Via+1 (n − 1, s − 1, πn ), Via+1 (n, s − 2, πn )} ≤ max{Via+1 (n − 2, s, π˜ n ), Via+1 (n − 1, s − 1, π˜ n )} − max{Via+1 (n − 1, s − 1, π˜ n ), Via+1 (n, s − 2, π˜ n )}.
≤ 0, where the non-positivity of both pn term and (1 − pn ) term follow 1−Z ,a from Hi+1 1 (n, s, pn ) ∈ U1 and the non-positivity of (˜pn − pn ) 1 − Z ,a
term follows from subconcavity of Hi+1 1 (n, s, pn ). Therefore the monotonicity result on pn is proved. Finally consider θn = rn . Note that nai (s, rn )
Note that Via+1 (n − 2, s, πn ) − Via+1 (n − 1, s − 1, πn )
≤ Via+1 (n − 2, s, π˜ n ) − Via+1 (n − 1, s − 1, π˜ n ), Via+1 (n − 2, s, πn ) − Via+1 (n − 1, s − 1, πn )
≤ Via+1 (n − 1, s − 1, πn ) − Via+1 (n, s − 2, πn ) ≤ Via+1 (n − 1, s − 1, π˜ n ) − Via+1 (n, s − 2, π˜ n ), Via+1 (n − 1, s − 1, πn ) − Via+1 (n − 1, s − 1, πn )
≤ (=)Via+1 (n − 1, s − 1, π˜ n ) − Via+1 (n − 1, s − 1, π˜ n ), Via+1
≤
(n − 1, s − 1, πn ) − Via+1
Via+1
(n, s − 2, πn )
(n − 1, s − 1, π˜ n ) −
Via+1
(n, s − 2, π˜ n ),
where the first, third, fifth inequalities follow from (10) that Via+1 (n, s, πn ) ∈ U1 , and the second inequality follows from integer concavity of Via+1 (n, s, πn ). Hence (11) follows by Lemma 2. The proof of the monotonicity in wn is the same as that of πn . Now consider θn = pn . We will show Via (n, s, pn ) ∈ U1 by induction on i. Rewrite (5) as follows Via (n, s, pn ) = −nwn − sws + pn EZ1 ,Z3 1−Z ,a
× [Hi+1 1 (n + 1, s + Z3 , pn )] + (1 − pn )EZ1 ,Z3 1−Z1 ,a
× [Hi+1 (n, s + Z3 , pn )].
1−Z ,a
− Hi+1 1 (n, s + Z3 , p˜ n ) − Hi+1 1 (n, s + Z3 − 1, p˜ n ) i 1 − Z ,a + Hi+1 1 (n − 1, s + Z3 , p˜ n )
(n, s − 1, πn )
(n − 1, s, π˜ n ) − TVia+1 (n, s − 1, π˜ n )
1−Z ,a
− Hi+1 1 (n, s + Z3 − 1, pn ) − Hi+1 1 (n − 1, s + Z3 , p˜ n ) i 1 − Z ,a + Hi+1 1 (n, s + Z3 − 1, p˜ n ) h 1 − Z ,a + (˜pn − pn )E Hi+1 1 (n + 1, s + Z3 − 1, p˜ n )
≤ (f1 (n) − f1 (n − 1))π˜ n − (f2 (s) − f2 (s − 1))πs
TVia+1
465
nai
= min{n | Via+1 (n − 1, s, rn ) − Via+1 (n, s − 1, rn ) ≥ rs − rn }, (s, r˜n ) = min{n | Via+1 (n − 1, s, r˜n ) − Via+1 (n, s − 1, r˜n ) ≥ rs − r˜n }.
Let U2 be the subclass of U satisfying the following additional property: V (n − 1, s, rn ) − V (n, s − 1, rn ) + rn ≤ V (n − 1, s, r˜n )
− V (n, s − 1, r˜n ) + r˜n for rn ≤ r˜n .
(12)
When i = N + 1, given rn ≤ r˜n , we have VNa +1 (n − 1, s, rn ) − VNa +1 (n, s − 1, rn )
≤ (=)VNa +1 (n − 1, s, r˜n ) − VNa +1 (n, s − 1, r˜n ), which implies VNa +1 (n, s, rn ) ∈ U2 . Assume Via+1 (n, s, rn ) ∈ U2 , we need to show that Via (n, s, rn ) ∈ U2 , or equivalently T˜ Via+1 (n, s, rn ) ∈ U2 . That is to show
T˜ Via+1 (n − 1, s, rn ) − T˜ Via+1 (n, s − 1, rn ) + rn
≤ T˜ Via+1 (n − 1, s, r˜n ) − T˜ Via+1 (n, s − 1, r˜n ) + r˜n i.e., max{Via+1 (n − 2, s, rn ) + rn , Via+1 (n − 1, s − 1, rn ) + rs }
− max{Via+1 (n − 1, s − 1, rn ) + rn , Via+1 (n, s − 2, rn ) + rs } + rn
(13)
466
W. Zhuang, M.Z.F. Li / Operations Research Letters 38 (2010) 462–467
≤ max{Via+1 (n − 2, s, r˜n ) + r˜n , Via+1 (n − 1, s − 1, r˜n ) + rs } − max{Via+1 (n − 1, s − 1, r˜n ) + r˜n , Via+1 (n, s − 2, r˜n ) + rs } + r˜n .
i.e., max{Via (n − 2, s), Via (n − 1, s − 1)}
− max{Via (n − 1, s − 1), Via (n, s − 2)} ≤ max{Via+1 (n − 2, s), Via+1 (n − 1, s − 1)} − max{Via+1 (n − 1, s − 1), Via+1 (n, s − 2)}.
Note that Via+1 (n − 2, s, rn ) + rn − Via+1 (n − 1, s − 1, rn ) + rn + rn
≤ Via+1 (n − 2, s, r˜n ) + r˜n − Via+1 (n − 1, s − 1, r˜n ) + r˜n + r˜n , a Vi+1 (n − 2, s, rn ) + rn − Via+1 (n − 1, s − 1, rn ) + rn + rn ≤ Via+1 (n − 1, s − 1, rn ) + rn − Via+1 (n, s − 2, rn ) + rn + r˜n ≤ Via+1 (n − 1, s − 1, r˜n ) + rs − Via+1 (n, s − 2, r˜n ) + rs + r˜n , a Vi+1 (n − 1, s − 1, rn ) + rs − Via+1 (n − 1, s − 1, rn ) + rn + rn ≤ (=) Via+1 (n − 1, s − 1, r˜n ) + rs − Via+1 (n − 1, s − 1, r˜n ) + r˜n + r˜n , a a Vi+1 (n − 1, s − 1, rn ) + rs − Vi+1 (n, s − 2, rn ) + rs + rn ≤ Via+1 (n − 1, s − 1, r˜n ) + rs − Via+1 (n, s − 2, r˜n ) + rs + r˜n ,
Note that
where the first, third, fifth inequalities follow from (12) that Via+1 (n, s, rn ) ∈ U2 , and the second inequality follows from integer concavity of Via+1 (n, s, rn ). Hence (13) follows by Lemma 2 and nai (s, rn ) ≥ nai (s, r˜n ) for rn ≤ r˜n . (b) Applying similar technique as in (a) we can show the monotonicity results as required.
where the first, second and fifth inequalities follow from (14) that Via (n, s) ∈ U3 and the third one follows from integer concavity of Via (n, s). Hence (15) follows by Lemma 2 and
Note that in Proposition 2 of [3], they limit the penalty function to be linear when showing the monotonicity of nai (s) in πn and πs . As shown above, the monotonicity of nai (s) remains true as long as the penalty cost function is separable in πn and πs , nondecreasing in n and s respectively. However, with a linear penalty function, a sharper characterization of the switching curve can be obtained as follows.
Since Via+1 (n − 1, s)− Via+1 (n, s − 1) ≥ Via (n − 1, s)− Via (n, s − 1) ≥ rs − rn for n = nai (s) thus nai (s) ≥ nai+1 (s). (b) When wn ≥ ws , we can similarly show that
Proposition 3. If f (n, s) statements are true:
= −nπn − sπs , then the following
(a) If wn < ws , then Via (n − 1, s) − Via (n, s − 1) ≤ Via+1 (n − 1, s) − Via+1 (n, s − 1) and nai (s) ≥ nai+1 (s); (b) If πn + rn + wn ≥ πs + rs + ws and wn ≥ ws then nai (s) = 0, ∀i = 1, . . . , N. Proof. (a) Denote U3 as the subclass of U with the following additional property: Vi (n − 1, s) − Vi (n, s − 1) ≤ Vi+1 (n − 1, s) − Vi+1 (n, s − 1). (14) We now prove the result by induction on i. Given that wn < ws , when i = N, we have VNa (n, s) = −nwn − sws + E [VNa +1 (n + Z2 , s + Z3 )] and VNa (n − 1, s) − VNa (n, s − 1)
= wn − ws + E [VN +1 (n + Z2 − 1, s + Z3 ) − VN +1 (n + Z2 , s + Z3 − 1)] = πn + wn − πs − ws ≤ VNa +1 (n − 1, s) − VNa +1 (n, s − 1) = πn − πs . Thus VNa (n, s) ∈ U3 . Assume Via (n, s) ∈ U3 for any i ≤ N, we need to show Via−1 (n, s) ∈ U3 or TVia (n, s) ∈ U3 . That is to show TVia (n − 1, s) − TVia (n, s − 1)
≤ TVia+1 (n − 1, s) − TVia+1 (n, s − 1)
(15)
Via (n − 2, s) − Via (n − 1, s − 1)
≤ Via+1 (n − 2, s) − Via+1 (n − 1, s − 1), Via (n − 2, s) − Via (n − 1, s − 1) ≤ Via+1 (n − 2, s) − Via+1 (n − 1, s − 1) ≤ Via+1 (n − 1, s − 1) − Via+1 (n, s − 2), Via (n − 1, s − 1) − Via (n − 1, s − 1) ≤ (=)Via+1 (n − 1, s − 1) − Via+1 (n − 1, s − 1), a Vi (n − 1, s − 1) − Via (n, s − 2) ≤ Via+1 (n − 1, s − 1) − Via+1 (n, s − 2),
Via (n − 1, s) − Via (n, s − 1) ≤ Via+1 (n − 1, s)
− Via+1 (n, s − 1) if wn < ws .
Via+1 (n − 1, s) − Via+1 (n, s − 1) ≤ Via (n − 1, s) − Via (n, s − 1) and nai (s) ≤ nai+1 (s). If πn + rn + wn ≥ πs + rs + ws , then VNa (n − 1, s) − VNa (n, s − 1) = πn + wn − πs − ws ≥ rs − rn , which implies that naN (s) = 0 and nai (s) = 0, ∀i = 1, . . . , N.
In this section, we have reinvestigated the main results of [3] by fully exploring the structural properties through multimodularity. In the next section, we will demonstrate that this treatment can be applied to address certain class of production scheduling problems and inventory control problems. 4. An application to production scheduling and inventory control Ha [4] studies the production scheduling for an M/M/1 maketo-stock production system with two products and characterizes the optimal production policy as the monotone hedging-point policy under the same service rate. Zhao et al. [11] study optimal production and inventory transshipment policies for a two-location make-to-stock system which involves a production decision as well as a demand fulfilling decision at each location. The optimal production decision is characterized as a monotone hedging-point policy and the optimal demand fulfilling decision is characterized as a monotone switching curve policy. Both [4,11] have used certain properties of multimodularity to characterize the monotone optimal polices, without realizing the connection with multimodularity. They have applied the ordered optimal solutions method to show the structural properties. In this section, we demonstrate that Lemma 2 can be used to show the propagation of multimodularity in a simple and efficient way. Due to the space limit, we only focus on proving structural properties for the key operators without boundaries. It is reminded that the analysis can be modified to allow for boundaries on the state space and readers can refer to [2] for more discussions on boundaries.
W. Zhuang, M.Z.F. Li / Operations Research Letters 38 (2010) 462–467
The common operator of [4,11] can be defined as follows with cost parameters omitted, T1 f (x1 , x2 ) = min{f (x1 , x2 ), f (x1 + 1, x2 ), f (x1 , x2 + 1)},
(16)
where the state space (x1 , x2 ) represents respectively the inventory level of production/location −i, i = 1, 2. The control operator T1 f (x1 , x2 ) in [4] determines the optimal production decisions: do not produce, produce product 1 and produce product 2; while T1 f (x1 , x2 ) in [11] determines the optimal production and transshipment decisions at location 1: do not produce, produce to increase its own inventory, produce and transship to location 2. There are additional operators in [11] which determine the optimal demand filling decisions for satisfying the demand of one location either by its own inventory or by transshipment from the other location. Here we only illustrate the proof of multimodularity for T1 f (x1 , x2 ) in the following lemma and similar approach can be adopted for other operators in [11] with some further analysis on boundaries.
467
i.e., min{f (x1 , x2 − 1), f (x1 + 1, x2 − 1), f (x1 , x2 )}
− min{f (x1 − 1, x2 − 1), f (x1 , x2 − 1), f (x1 − 1, x2 )} ≥ min{f (x1 − 1, x2 ), f (x1 , x2 ), f (x1 − 1, x2 + 1)} − min{f (x1 − 2, x2 ), f (x1 − 1, x2 ), f (x1 − 2, x2 + 1)}.
(18)
Since f (x1 , x2 − 1) − f (x1 − 1, x2 − 1) ≥ f (x1 − 1, x2 ) − f (x1 − 2, x2 ), f (x1 , x2 − 1) − f (x1 , x2 − 1) ≥ (=) f (x1 − 1, x2 ) − f (x1 − 1, x2 ), f (x1 , x2 − 1) − f (x1 − 1, x2 ) ≥ f (x1 − 1, x2 ) − f (x1 − 2, x2 + 1), f (x1 + 1, x2 − 1) − f (x1 , x2 − 1) ≥ f (x1 − 1, x2 ) − f (x1 − 2, x2 ), f (x1 + 1, x2 − 1) − f (x1 , x2 − 1) ≥ f (x1 , x2 ) − f (x1 − 1, x2 ), f (x1 + 1, x2 − 1) − f (x1 , x2 − 1)
≥ f (x1 − 1, x2 + 1) − f (x1 − 2, x2 + 1), f (x1 , x2 ) − f (x1 − 1, x2 ) ≥ f (x1 − 1, x2 ) − f (x1 − 2, x2 ),
Lemma 4. For the operator T1 f (x1 , x2 ) defined by (16), if f (x1 , x2 ) ∈ V then T1 f (x1 , x2 ) ∈ V .
f (x1 , x2 ) − f (x1 − 1, x2 ) ≥ (=) f (x1 , x2 ) − f (x1 − 1, x2 ),
Proof. It suffices to prove supermodularity and superconvexity. We first prove supermodularity, i.e.,
where the first, fifth, sixth and ninth inequality follow from superconvexity, the third inequality follows from integer convex; the seventh inequality follows from componentwise convex, and the forth inequality follows from superconvexity and componentwise convex through the following inequality chains
T1 f (x1 , x2 ) − T1 f (x1 , x2 − 1)
≥ T1 f (x1 − 1, x2 ) − T1 f (x1 − 1, x2 − 1),
f (x1 , x2 ) − f (x1 − 1, x2 ) ≥ f (x1 − 1, x2 + 1) − f (x1 − 2, x2 + 1),
or equivalently,
∆1 f (x1 + 1, x2 − 1) ≥ ∆1 f (x1 , x2 ) ≥ ∆1 f (x1 − 1, x2 ).
min{f (x1 , x2 ), f (x1 + 1, x2 ), f (x1 , x2 + 1)}
Hence (18) follows by Lemma 2 in the minimization context. Therefore T1 f (x1 , x2 ) ∈ V .
− min{f (x1 , x2 − 1), f (x1 + 1, x2 − 1), f (x1 , x2 )} ≥ min{f (x1 − 1, x2 ), f (x1 , x2 ), f (x1 − 1, x2 + 1)} − min{f (x1 − 1, x2 − 1), f (x1 , x2 − 1), f (x1 − 1, x2 )}.
(17)
Since f (x1 , x2 ) − f (x1 , x2 − 1) ≥ f (x1 − 1, x2 ) − f (x1 − 1, x2 − 1), f (x1 , x2 ) − f (x1 , x2 − 1) ≥ (=) f (x1 , x2 ) − f (x1 , x2 − 1), f (x1 , x2 ) − f (x1 , x2 ) ≥ (=) f (x1 − 1, x2 ) − f (x1 − 1, x2 ), f (x1 + 1, x2 ) − f (x1 + 1, x2 − 1)
≥ f (x1 − 1, x2 ) − f (x1 − 1, x2 − 1), f (x1 + 1, x2 ) − f (x1 + 1, x2 − 1) ≥ f (x1 , x2 ) − f (x1 , x2 − 1), f (x1 + 1, x2 ) − f (x1 , x2 ) ≥ f (x1 , x2 ) − f (x1 − 1, x2 ), f (x1 , x2 + 1) − f (x1 , x2 ) ≥ f (x1 − 1, x2 ) − f (x1 − 1, x2 − 1), f (x1 , x2 + 1) − f (x1 , x2 ) ≥ f (x1 , x2 ) − f (x1 , x2 − 1), f (x1 , x2 + 1) − f (x1 , x2 ) ≥ f (x1 − 1, x2 + 1) − f (x1 − 1, x2 ), where the first, fourth, fifth and the ninth inequality follow from supermodularity, the sixth and eighth inequality follows from componentwise convexity, and the seventh inequality follows from superconvexity and componentwise convexity through the following property
∆2 f (x1 , x2 + 1) ≥ ∆2 f (x1 + 1, x2 ) ≥ ∆2 f (x1 − 1, x2 ). Hence (17) follows by Lemma 2 in the minimization context. We now prove superconvexity and it suffices to prove the result for i = 1: T1 f (x1 , x2 − 1) − T1 f (x1 − 1, x2 − 1)
≥ T1 f (x1 − 1, x2 ) − T1 f (x1 − 2, x2 ),
With the multimodularity property of the value function, the remaining part is routine by using the structural properties to characterize the optimal control policies as monotone switching curve policies. These details are omitted. Acknowledgements We are deeply grateful for the constructive comments and feedback from Professor Sridhar Seshadri, the Area Editor, and the referee. All remaining errors are the authors’ responsibility. References [1] E. Altman, B. Gaujal, A. Hordijk, Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity, Springer, 2003. [2] P. Glasserman, D.D. Yao, Monotone optimal control of permutable GSMPs, Mathematics of Operations Research 19 (1994) 449–476. [3] L.V. Green, S. Savin, B. Wang, Managing patient service in a diagnostic medical facility, Operations Research 54 (2006) 11–25. [4] A.Y. Ha, Optimal dynamic scheduling policy for a make-to-stock production system, Operations Research 45 (1997) 42–53. [5] B. Hajek, Extremal splittings of point processes, Mathematics of Operations Research 10 (1985) 543–556. [6] G. Koole, Structural results for the control of queuing systems using eventbased dynamic programming, Queueing Systems 30 (1998) 323–339. [7] G. Koole, Monotonicity in Markov reward and decision chains: theory and applications, Foundations and Trends in Stochastic Systems 1 (2006) 1–82. [8] J.E. Smith, K.F. McCardle, Structural properties of stochastic dynamic programs, Operations Research 50 (2002) 796–809. [9] S. Stidham Jr., R. Weber, A survey of Markov decision models for control of networks of queues, Queueing Systems 13 (1993) 291–314. [10] D.M. Topkis, Minimizing a submodular function on a lattice, Operations Research 26 (1978) 305–321. [11] H. Zhao, J.K. Ryan, V. Deshpande, Optimal dynamic production and inventory transshipment policies for a two-location make-to-stock system, Operations Research 56 (2008) 400–410.