Multi-time scale procurement planning considering multiple suppliers and uncertainty in supply and demand

Multi-time scale procurement planning considering multiple suppliers and uncertainty in supply and demand

Accepted Manuscript Title: Multi-time Scale Procurement Planning Considering Multiple Suppliers and Uncertainty in Supply and Demand Author: Joohyun S...

1MB Sizes 1 Downloads 39 Views

Accepted Manuscript Title: Multi-time Scale Procurement Planning Considering Multiple Suppliers and Uncertainty in Supply and Demand Author: Joohyun Shin Jay H. Lee PII: DOI: Reference:

S0098-1354(16)30086-2 http://dx.doi.org/doi:10.1016/j.compchemeng.2016.03.024 CACE 5408

To appear in:

Computers and Chemical Engineering

Received date: Revised date: Accepted date:

13-8-2015 22-3-2016 22-3-2016

Please cite this article as: Shin, Joohyun., & Lee, Jay H., Multitime Scale Procurement Planning Considering Multiple Suppliers and Uncertainty in Supply and Demand.Computers and Chemical Engineering http://dx.doi.org/10.1016/j.compchemeng.2016.03.024 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Multi-time Scale Procurement Planning Considering Multiple Suppliers and Uncertainty in Supply and Demand Joohyun Shin1, and Jay H. Lee1* 1

Chemical and Biomolecular Engineering Department, Korea Advaced Institute of Science and Technology, Dae-

jeon, Korea. Highlights 

A procurement system is studied by considering multiple supplier, and demand and supply uncertainty.



The planning problem is formulated as a MDP, and integrated with a MILP scheduling model.



An approximate framework based on the value function approximation is developed.



A heuristic approximation of the scheduling is developed to reduce computational load.



The proposed methodology is examined for benchmark case studies.

Abstract Inventory management of procurement system is decomposed into sub-problems according to the timescale of decisions: The long-term planning for ordering raw materials and the short-term scheduling for unloading the orders. To ensure more sustainable and robust operation, different decision layers should be integrated (which is nature of multi-scale), and supply and demand uncertainty should be considered. In this study, the planning problem is formulated as a Markov decision process (MDP) to incorporate possible realizations of uncertainty into the decision-making process. The MDP planning model is integrated with a scheduling model expressed by a MILP (or closely approximated by a heuristic approach). Decision policies are obtained from solving the MDP problem through an exact value iteration, as well as an approximate approach intended to alleviate the computational challenges. We compare the results from applying them with those of a reference policy obtained without any rigorous integration with scheduling through benchmark problems. Key words: Procurement planning and scheduling, Multi-scale decision making, Supply and demand un-certainty, Markov decision process, Approximate dynamic programming

*Corresponding author

E-mail addresses: [email protected] (Joohyun Shin), [email protected] (Jay H. Lee)

1.

Introduction Supply chain (SC) of a manufacturing system includes the following functional activities: procurement of raw materials from suppliers, transformation of raw materials into valuable products, blending and storage of final products, and distribution of final products to warehouses or customers. Over the whole supply chain, production goals are set to maximize both profit and customer’s satisfaction subject to constraints of supply, capacity, and operation, which is called SC planning (SCP). Since chemical plants and refinery are complex interconnected systems and there are lots of sources of uncertainty, systematic SCP activities can give a significant impact on the overall profit and customer satisfaction (Maravelias & Sung, 2009). The decision making process in such a complex SC composed of interacting networks are generally decomposed according to the temporal scales as well as spatial scales (Guillén, Badell, Espuna, & Puigjaner, 2006; Gupta & Maranas, 2003; Maravelias & Sung, 2009), as shown in Fig. 1. This work addresses the upstream procurement system for managing supplier and resources (dashed line in Fig. 1). To ensure optimal inputs in the production system at all times, one needs to manage the inventories of raw materials in consideration of multiple suppliers and demands of intermediate products. For this, a two-level decision making problem is usually constructed with different decision epochs (Bengtsson & Nonås, 2010) (Fig. 2): In the upper planning level, external suppliers are selected and order amounts for the selected suppliers are decided typically on a monthly or weekly basis; in the lower scheduling level, a detailed operational schedule for unloading delivered orders from carriers to storage tanks should be decided on a faster time scale, e.g., daily. Here, one planning period is the overall decision horizon of scheduling level, and the detailed schedule in every single planning period is required for execution of the plan subject to various operational constraints, supplier availability decided by lead time, and continuously realized demand for the corresponding planning period. Traditionally, the material procurement planning is decomposed into sub-levels which interrelate in a hierarchical way, i.e., higher level decisions form given inputs (initial state, or constraints) for the decision making at 2

subordinate levels (Dudek, 2009). However, such a one way communication can sometimes lead to inconsistent or infeasible operation, as stated in Grossmann (2005): At the planning level, effects of daily inventories and uncertainties are neglected, and consequently it is prone to yield optimistic estimates that cannot be realized at the scheduling level; at the same time, scheduling models assume that key decisions (e.g., the amount of released order) have already been taken, which admits only limiting sustainable operation schedules. Therefore, the recent trend in planning has been to promote better integration with the operational level to achieve a globally optimal and sustainable solution by minimizing the gap between the plan and real execution. In the literature so far, several new formulations and solution algorithms have been proposed to incorporate such multi-scale nature of the problem, reviewed in Grossmann (2012). However, the existing approaches based on mathematical programming still allow a narrow range of application due to restrictive formulation and computational infeasibility, and thus a new approach needs to be developed. Another significant issue is that there exist various sources of uncertainty along the SC. Especially, supply and demand, which are major factors for inventory management, are exogenously changed over time. Because their variabilities give significant effects on the operation of the entire process, the operation derived from deterministic or heuristic models can be suboptimal or even infeasible. Thus a more robust and reliable decision policy should be constructed; a movement to capture and account for various types of uncertainties has recently led to a new research cluster within the process system engineering community. In the procurement system, long-term variability of demand is captured in the planning level, while uncertainty in carrier arrival is typically relegated to the scheduling level to be handled heuristically. According to some excellent reviews of planning under uncertainty (Mula, Poler, Garcia-Sabater, & Lario, 2006; Peidro, Mula, Poler, & Lario, 2009), many models exist for random demands while the case of supply uncertainty has been studied far less in the past, despite its significance. The purpose of this work is to propose a more sustainable and efficient procurement planning strategy by adopting both the supply and demand uncertainties, and by combining the procurement planning with the operational scheduling. For this, the procurement planning is formulated as a discrete Markov decision process (MDP) with two exogenous information variables: lead time (delayed time between order releases and arrivals), and demand. MDP modeling provides a basis for optimal information-based multi-period decision-making by capturing both the physical dynamics and the flow of information. A large-scale multi-period decision problem can be decomposed into sequential single-period decision problem, by a state-oriented formulation and the Markov property. 3

Here, one-period cost and inventory transition in the planning model are computed from an optimal schedule obtained from a scheduling model, expressed by a MILP (or closely approximated by a heuristic approach). That is, an optimal decision policy for ordering raw materials is constructed by considering operational inventory dynamics and daily uncertainty in supplier (carrier) availability. The MDP formulated problem is naturally solved by dynamic programming (DP), which is computationally challenging due to high dimensions of state and action space in most real scale problem. In this study, we alleviate the ‘curse of dimensionality’ associated with the exact DP approach by employing the approximate solution method (which is called approximate dynamic programming; ADP (Powell, 2007)), and thus increase the range of applications but performance loss due to the approximation should be evaluated. The rest of paper is organized as follows. Section 2 reviews the previous work related to two important issues for sustainable operation of SC: Incorporation of uncertainty, and multi-scale integration. Section 3 provides theoretical backgrounds on MDP and its solution algorithms. In Section 4, the studied procurement planning system is described and formulated as a MDP, and the integrated system model is constructed by incorporating a MILPbased scheduling model. Furthermore, an approximation framework and a heuristics-based scheduling model for reducing computation are introduced in the last part of Section 4. The usefulness of the integrated model and the approximate solution are then verified through simple benchmark problems in Section 5. Section 6 concludes the paper.

2. Literature review A new research paradigm for optimizing process operation strategies is needed to consider robustness and reliability under uncertainty, as well as optimality. Mula et al. (2006) classified the general types of uncertainty models in SCP into four groups: conceptual models, analytical models, simulation models, and artificial intelligence based models. Safety stock is one of the most basic inventory managing policies employed to guarantee a certain service level given uncertain demand. One strategy proposed along this line by You and Grossmann (2008) is to define the safety stock as a decision variable in the model. When the uncertainty is addressed with random variables of known probability distributions, the safety stock level is decided by the demand variance during the lead time and a specific service level (Rodriguez, Vecchietti, Harjunkoski, & Grossmann, 2014). 4

Meanwhile, analytical models allowing a search for optimal values of decision variables by evaluating a given objective function have been widely used up to date. In particular, models used in the stochastic programming (SP) framework often consider uncertainty using a scenario based approach (Birge & Louveaux, 2011). Each scenario arises from a particular realization of the uncertainty and it is associated with a certain probability of occurrence. The stochastic program is then reformulated as a large-scale deterministic equivalent problem, and a sequence of decisions over time prior to knowing the realizations of the random variables pertaining to the future is found. The approach is generally applied in the context of a two stage decision problem with recourses, where the decision variables are partitioned into two sets based on whether they use the knowledge of the uncertainty realization. Excellent reviews on the wide application of the stochastic programming approach including production planning, scheduling, and design and optimization of chemical processing systems can be found in Peidro et al. (2009) and Sahinidis (2004). However, to establish a MILP or MINLP model that accounts for uncertainty with multiple scenarios can often become extremely large and computationally challenging. Since the SP formulation is limited to the problem of only a modest number of scenarios, it is often infeasible to apply to multi-period decision problems of practical significance (Lee, 2014; Sahinidis, 2004). Besides SP, Markov decision process (MDP) represents another general modelling approach for addressing the problem of multi-period decision under uncertainty. MDPs are more naturally solved by dynamic programming (DP) rather than by math programming. A state-oriented formulation decomposes the problem into a sequence of single-period sub-problems, defining the value function as a shadow cost, summarizing the future consequences of taking actions and accounting for all the possible future state dynamics (Rust, 2008). Since the formulation of DP allows us to derive an optimal information-based decision policy (e.g., optimal feedback control policy), there is a good potential to reduce the performance loss by model mismatch resulting of uncertain reality (Lee & Lee, 2004). Despite the potential, DP has been thought to be largely impractical for real-world problems due to a computational challenge known as the “curse of dimensionality,” which is posed by the need to iterate over the state and action space of typically huge size. Recently, an algorithmic strategy called approximate dynamic programming (ADP) has been developed that to combines function approximation techniques with stochastic simulation. Cheng, Subrahmanian, and Westerberg (2003) formulated the design and planning under uncertainty as a multi-objective MDP, and the formulated problem is solved by neuro-dynamic programming, which adopts reinforcement learning technique to systematically approximate the cost-to-go function for obtaining a suboptimal 5

policy. ADP was used by Choi, Realff, and Lee (2006) to derive a novel inventory managing policy. In this work, stochastic simulation was used to define reduced state and action space as well as approximate transition probabilities to facilitate the solution of a dynamic program, which was done with the value iteration approach. More recently, Pratikakis (2008) suggested a real-time ADP approach based on nonparametric k-nearest neighbor scheme for the value function approximation; it was applied to a high dimensional light aromatic supply chain example. Other studies dealing with inventory problems through ADP are well listed in Katanyukul, Duff, and Chong (2011); however, the usefulness of ADP has not been fully explored in the process systems engineering community. For modelling a multiscale optimization, a fundamental problem arises from the integration of models across very different timescales, which poses the following question: how to coordinate the optimization over a given time horizon ranging from days to years (Grossmann, 2005). The solution strategies for dealing with integrated planning-scheduling models can be broadly classified into three categories: (a) hierarchical, (b) iterative, and (c) full-space methods (Maravelias & Sung, 2009). With inconsistent decisions arising from the one-way communicated hierarchical model being the main source of disruption of sustainable operation, an alternative methodology has been developed. In the iterative model, information from the scheduling sub-models is iteratively fed back to the upper level problem for including additional integer cuts in hope to reach the true optimal planning decision. Many studies have been published in this category, but it still has a structural problem for communication between the different levels. The full-space model approach considers an integrated model containing detailed scheduling sub-models for each planning period. However, the standard mathematical programming formulation for simultaneous planning and scheduling over a fine time grid yields a large-scale optimization problem that is oftentimes computationally infeasible. The most common approach to handle such a multi-scale problem is through decomposition techniques (Grossmann, 2012) such as Lagrangian decomposition, Benders decomposition, and bi-level decomposition.

3. Theoretical Backgrounds 3.1 Markov Decision Process (MDP) The formulation of problem as a MDP requires specification of the following elements with time index t 6

(Puterman, 2014): 

State variable st and finite state space S



Exogenous information variable ωt and finite exogenous space Ω



Decision variable, at and finite action space A



One-period cost function ct (st, at)



State transition function st+1 = S (st, at, ωt+1) A state variable is the function of history to compute all the future dynamics of the system. Exogenous information, which models uncertainty and becomes realized only at each corresponding time period, forces us to make decisions before all the information is known. It is expressed in the form of random variables governed by probability distributions or an exogenously provided file of scenarios (Powell et al., 2012). Decision is made by decision policy  : s → a, which is a map indicating which action to take for any given state and should be found by decision maker. The decision maker receives a cost corresponding to the given state and taken decision. The system state at the next period is determined by the given transition function. The problem of finding the best policy to minimize the overall cost over the entire period (an infinite horizon in this case) would be written as Eq. (1). Here, discount factor 0 < ≤ 1 is used to ensure the convergence of the infinite horizon cost.

  min E   t ct  st ,  ( st )  | s0   t 0 

(1)

For convenience, we can represent the optimal infinite horizon cost as V(s0), which is also referred to as the value function(or cost-to-go). Note that it is a function of the starting state. Invoking the Markov property of the system, meaning that future state depends on the current state and decision only, the entire multi-period decision problem (Eq. (1)) can be decomposed into sequential one-period decision problems. The optimality equation called Bellman equation (Eq. (2)) must be solved for the value function which includes both the current and future cost of being in the current state. The decision is taken from the set of solutions of the given argument for which the given cost function attains its minimum. Future cost-to-go should be calculated as an expectation because state transition contains stochastic effects through the exogenous information which is random and not yet realized at the current time. Thus the expectation can be derived from the probability of state transition from the current state 7

and given action, which is defined by the distribution of random vector t+1 (Eqs. (3) & (4)). Here, 1X is indicator function which is 1 if the statement X is true; and 0 otherwise. Vt ( st )  min ct ( st , at )   E Vt 1 ( st 1 ) | st 

(2)

E Vt 1 (st 1 ) | st    P  st 1  s | st , at Vt 1 (s)

(3)



(4)

at

s S

P  st 1  s  | st , at  

t 1 t 1

P (t 1 )1s  S M ( st , at ,t 1 )

In the case of an infinite horizon problem, such a problem is thought of being stationary where the cost function, the transition function, and the process governing the exogenous information process do not vary over time (Rust, 2008). One of the most widely used algorithm for solving this problem is value iteration (VI) in which the optimality equation is iteratively computed until it converges, described in Table 1 (Powell, 2007; Puterman, 2014). The fact that the value function is a function of the state complicates the calculation significantly. The exact value iteration algorithm essentially has to enumerate over the entire state space for each iteration, thus causing the ‘curse-of-dimensionality’. Instead, approximate dynamic programming (ADP) can be an alternative for alleviating the curse of dimensionality associated with solving the optimality equation exactly through the backward induction. 3.2 Approximate Dynamic Programming (ADP) A general structure of approximate value iteration (AVI) algorithm is shown in Table 2. A set of random samples representing uncertain information (called sample path) is generated for stepping forward the simulation. At each sampling point, the optimal decision and the value of being in a state are estimated using the value function computed in the previous iteration. The value of the visited state is then updated through some sort of approximation. After that, the next state is computed based on the current state, optimal decision, and the sample path. These steps are repeated until the number of iterations is large enough to approximate the value function close to the original one. By using this forward progress, the optimality equation does not need to be looped over all states, which causes the curse of dimensionality. For this, the exact value function needs to be replaced with some sort of approximation. One of the most basic and analytic approach is to represent a value function parameterized by several coefficients whose number is much smaller than the number of states (Powell, 2007). In the inventory 8

control problem especially, a linear separable function approximation (Eq. (5)) with appropriate basis function is known to be efficient due to only minimal interaction between the states (Kleywegt, Nori, & Savelsbergh, 2002). A basis function is a vector whose element is an indicator variable (e.g., inventory level, or demand). The coefficient vector in linear model is updated with an observed sample value vˆ n of being in state s n in a recursive manner. Refer to Powell (2007) for more details of the algorithm of approximate value iteration. Vt ( s )  V ( s ) 

 f  f ( s)  f F

(5)

4. Problem Formulation 4.1 System Description The studied system is shown in Fig. 3. Multiple suppliers i = 1,…, M (with varying characteristics) are considered together as candidates for providing raw materials to a single manufacturer. These decisions on materials requirement are generally determined to keep operating costs low without disrupting demand fulfillment. In general inventory management problem, different types of costs can be imposed: (1) replenishment cost of placing an order, (2) operational cost of maintaining an inventory (called inventory holding cost) and unloading orders, and (3) penalty cost for unsatisfied demand or stock below a safety level. The cost function can be formulated according to the problem setting. The studied procurement inventory problem can be defined by the following characteristics (Katanyukul, 2010): A single type of product is considered with a single stocking point (single echelon), and the inventory is periodically renewed (i.e., inventory is checked and reviewed at a particular time point) with lost-demand setting. The inventory level is a stochastic variable due to random lead time and demand. Both uncertainties are modelled by Markov chains to capture the time-correlation in randomly changing economic and market conditions (Bendre & Nielsen, 2013; Beyer, Cheng, Sethi, & Taksar, 2010; Song & Zipkin, 1996). Since the supply system is exogenous, lead times are independent of the amount of demand and lot size, and depend only on the conditions of the supply system. All suppliers are assumed to be independent of each other, and may have a minimum or maximum (fixed) order capacity. The replenishment cost for each supplier is known and fixed. A replenishment order for multiple suppliers is released on a slower time scale of t. An ordering decision is generally made with the assumption that events occur in the following sequence: Stocks ordered at the previous 9

time period, which are referred to as on-order inventory, arrive at the beginning of the period, and demands of intermediate products are realized all at once. Here, the storage level at time t is renewed by a simple material balance expressed by the current inventory, the given order that will arrive at the corresponding time period, and the realized demand. As a result, the inventory level is assumed to be held at a constant value for the time period of t and a decision about placing a new replenishment order for the next time period is made based on this nominal value. Fig. 4 shows this reference planning model. However, this rather idealistic scenario is quite different from the real execution. First, all orders do not arrive in the beginning of each planning period. Availability of carriers vary from supplier to supplier, and come with their own (and uncertain) lead times. At the same time, a detailed unloading schedule is subjected to various operational constraints and the capacity limitation of the unloading process and the storage tank. Furthermore, demands are continuously realized over an entire period. All these combined, the inventory level of the storage tank is changed over the faster time scale. Thus, the detailed inventory dynamics and operational uncertainty in the lead time should be captured in the planning level model in order to minimize the gap between planning and real operation. The above argues for a tighter integration between planning and scheduling, and Fig. 5 shows this connection between them. In the integrated model, A single time period in the planning level is the whole decision horizon in the scheduling level; given a planning period t, which is divided into H equal length time segments indexed as h = 1,…, H on a faster time scale, a detailed unloading schedule from the carriers to the storage tank is decided subject to the on-order inventory with time index (t, h). In the procurement planning, decision policy is to be obtained in the infinite horizon setting to ensure a sustainable operation beyond the decision horizon. Orders made one planning period ago are assumed to be arrived within a next planning period, i.e., lead times are smaller than H. In the scheduling model, information on the on-order inventory, the initial inventory state at (t, 0), and the exogenous information on uncertainty captured in the planning model is given to the scheduling model as constraints and specified parameters. Conversely in the planning level, one-period cost for time t and system state transition from time (t, 0) to (t+1, 0) (= (t, H)) are calculated based on the optimal (or suboptimal) unloading schedule determined by the scheduling model. Then, by predicting the next state based on these current information, a decision about placing a new replenishment order is made. 4.2 Integrated model

10

4.2.1

MILP model for scheduling A MILP model for the scheduling layer is established by referring to the basic crude oil unloading scheduling model suggested by Lee, Pinto, Grossmann, and Park (1996). This model is based on a uniform discretization of time in the given horizon, H. The objective function is defined in the following way: n H  Csch,t  min  Ch xt , h  Cun  i (Tli ,t  Tfi ,t )  Cld Ldt , h   Clv Lvi ,t  t i 1  h 1 





(6)

where Ch is the unit inventory holding cost; Cun is the unit unloading cost which is imposed by laytime which is independent of the unloading rate; Cld is the unit penalty cost for ‘lost-demand’ which is the unsatisfied demand; and Clv is the unit penalty cost for ‘lost-volume’ which is the difference between the delivered orders and the actually unloaded amounts. The original model contains equality constraints on demand satisfaction and overall unloading amounts but here they are relaxed as objective function penalty terms due to the potential feasibility problem that could arise in the integration with the planning layer. The optimal schedule  t at time t decides the following variables: binary variable to denote if the carrier i starts unloading at time (t, h) (Xfi,t,h), binary variable to denote if carrier i completes unloading at time (t, h) (Xli,t,h), unloading initiation and completion time of carrier i at time t, (Tfi,t and Tli,t, respectively), 0-1 continuous variable to denote if carrier i is unloading its product at time (t, h) (Zi,t,h), unloading flow rate from carrier i to the storage tank at time (t, h) (Fui,t,h), flow rate from the storage tank to the manufacturer for satisfying the demand at time (t, h) (Fdt,h), volume of the storage tank at scheduling time (t, h) (xt,h), the lost-demand at time (t, h) (Ldt,h), and the overall lost-volume of carrier i at time t (Lvi,t). The overall MILP model contains many inequality and equality constraints. Eqs. (7) – (13) are related to the unloading operation, expressed by Xfi,t,h, Xli,t,h, Tfi,t, Tli,t, Zi,t,h, and the overall delivered order from supplier i for time t, denoted by at,i. Unloading is impossible if there is no order released to the corresponding supplier, and each carrier starts and completes unloading only once throughout the overall horizon. Eq. (9) shows the relationships between the timing variables and binary variables for the unloading initiation and completion time. Unloading from carrier i should be started and completed only during the available time of the corresponding carrier, from Avf,i to Avl,i. Each carrier should complete unloading after it starts. Eq. (12) means that unloading is regarded as occurring full time between Tfi,t and Tli,t, and unloading cost is imposed by this laytime, independent of transfer rate. The unloading process has a capacity limitation; only one carrier can transfer its product to the storage tank 11

at any given time, as shown in Eq. (13). H

H

Xfi ,t ,h  at ,i , and  Xli ,t , h  at ,i  h 1 h 1 H

for i=1,…,M

(7)

H

Xfi ,t , h  1 , and  Xli ,t ,h  1  h 1 h 1 H

for i=1,…,M

(8)

H

tXfi ,t , h  Tfi ,t , and  tXli ,t , h  Tli ,t  h 1 h 1

for i=1,…,M

(9)

Av f ,i  Tfi,t , Tli ,t  Avl ,i for i=1,…,M

(10)

Tf t ,i  Tlt ,i for i=1,…,M

(11)

h

H

k 1

k h

Zi ,t ,h   Xfi ,t , k , and Zi ,t , h   Xli ,t , k for i=1,…,M and h=1,…,H

(12)

n

Z i ,t , h  1  i 1

for h=1,…,H

(13)

The raw material transfer rate from the carrier to the storage tank is restricted by unloading variable Zi,t,h, and the overall volume of raw material transferred from carrier i to the storage tank should be less than the initial product volume in carrier i:

Fu ,min Zi ,t , h  Fui ,t , h  Fu ,max Zi ,t , h for i=1,…,M and h=1,…,H

(14)

H

Fui,t ,h  at ,i  h 1

for i=1,…,M

(15)

Eq. (16) is for the storage tank balance at time (t, h) where xt,0 is the initial storage level, and the volume capacity limitation of the storage tank should be satisfied for all times (Eq. (17)). n

h

h

xt , h  xt ,0   Fui ,t , k   Fdt , k for h=1,…,H

(16)

xmin  xt , h  xmax for h=1,…,H

(17)

i 1 k 1

k 1

Demand is assumed to be constant over the whole scheduling horizon, as dt,h = dt / H, that is, dt is the overall 12

demand for time period t. Transfer rate from the storage tank to the manufacturer for satisfying the demand should be less than the unit demand, and Eq. (19) shows the resulting lost-demand. The lost-volume from the carrier i is defined by Eq. (20) where at,i is the overall delivered order from supplier i for time t.

0  Fd t , h  d t , h for h=1,…,H

(18)

dt , h  Fdt , h  Ldt , h for h=1,…,H

(19)

H

at ,i   Fui ,t , h  Lvi ,t for i=1,…,M h 1

4.2.2

(20)

Integrated model The procurement planning is formulated as a discrete infinite horizon Markov decision process (MDP) with the following fundamental components. Here, all state and decision variables are expressed as discrete variables with own finite space. The formulated problem is solved by an exact value iteration (VI) algorithm, explained in Table 1 in Section 3.1.



State variable, st consists of three categories: inventory level xt, demand dt, and lead time lt,i for supplier i. The stat space is then S = X  D  L1  …  LM where xt  X = {xmin, …, xmax}, dt  D = {dmin, …, dmax}, and lt,i  Li = {li,min, …, li,max}.  xt  Inventory level at time t   st   dt  Demand at time t  lt ,i  Lead time of supplier i at time t 



Demand and lead time contain random exogenous information, assumed to be normally distributed, and evolved by  t 1   t  ˆ t 1 . All components of the exogenous information vector are independent of one another.

 dˆt  New information on demand at time t



t    lˆt ,i  New information on lead time of supplier i at time t  

Decision variable, at is a vector containing the order amount to each supplier placed at time t. The action space is 13

A = O1  …  OM where at,i  Oi = {0, oi,min, …, oi,max}.

at   at ,i  order of suplier i placed at time t  

The overall cost function (Eq. (21)) is evaluated by the scheduling cost that is an optimal objective value decided by the scheduling model, and order cost defined by Eq. (22). Here, since the scheduling model contains exogenous information, expected value over the whole random outcome space should be calculated.

c(st , at )  Corder (at )  E Csch,t (st , at , ˆt 1 )

(21)

C order ( at )   i  csetup ,i  ( at ,i )  cvar,i at ,i 

(22)

where Corder is ordering cost composed of setup cost csetup (a fixed cost incurred for any size order) and unit ordering cost cvar; and δ(·) is the step function which is 0 for zero input and 1 otherwise. 

The state transition probability p ( s  | st , at ) from the current state s t  ( x t , d t , lt ) and given decision at to the future states s   ( x , d , l ) is specified in Eq. (23). The first and second terms are for the probabilities of demand and lead time evolution by random information respectively, and the probability of inventory transition is one if x is the last storage level ( xt*,H ) determined by an optimal solution of the MILP model  t* , and zero otherwise.

n

p( s | st , at )   p(lt | lt ,i ) p(d  | dt ) p( x | st , at , d , l )

(23)

1 if x  xt*, H p( x | st , at , d , l )   0 otherwise

(24)

i 1

The MDP formulated planning model and the MILP model discussed in the previous section are integrated by exchanging information with each other. The MILP model receives the initial inventory state xt,0 (in Eq. (16)) and released order amount at,i (in Eqs. (7), (15), and (20)) from the planning level. The exogenous information captured in the planning level is also included in the scheduling model via Eqs. (25) and (26); unit demand per scheduling time in Eqs. (18) and (19) is decided by Eq. (25); and availability of each carrier in Eq. (10) is decided by random outcome of lead time. Conversely, one-period cost and state transition of the MDP model of the planning level are 14

calculated based on the optimal unloading schedule of the MILP model, as shown in Eq. (21) and (23), respectively.

dt,h   dt  dˆt 1  H for h=1,…,H

(25)

Av f ,i  lt ,i  lˆt 1,i for i=1,…,M

(26)

4.3 Approximation algorithm To alleviate the computational burden caused by computing the value function on the backward induction, a piecewise linear model is constructed for the value function approximation, explained in Section 4.3.1. In addition to this, there is another significant challenge in this study, resulting from the integration of the planning model with the scheduling model: The MILP problem should be solved for all given states and actions to figure out the state transition and one-period cost function of the MDP formulation in the planning model. We thus developed a heuristic approach to estimate the operational cost and inventory transition, based on some simple operational assumptions (Section 4.3.2). 4.3.1

Piecewise linear approximation for value function Invoking the value function at time t given state st (referred to Vt(st)), incorporating all future costs from time t, the linear separable approximation model (Eq. (5)) can be constructed in terms of the three state variables as Eq. (27). To estimate the marginal effect of each state variable on the value function, the exact value function obtained from the case study (that would be explained in the next section) is first analysed for the purpose of illustration in Figs. 6 and 7 (the overall value function is shown in Fig. 11).

  1 x d l 

(27)

Fig. 6 shows the value function plotted with respect to inventory, and it can be piecewise-linearly approximated. In the extremely lost-demand dominant part, ‘stock-out’ occurs due to the carriers’ inherent lead time. The first interval point xp,1 represents the amount of demand realized before any carrier arrives, which is product of the smallest lead time (among the carriers) and unit demand of the current state (Eq. (28)). The marginal effect of the inventory level on the value function is the most significant in this first part because lost-demand is inevitable when the inventory level is lower than xp,1. Second part, from xp,1 to xp,2, is normally lost-demand dominant. In these two parts, the value function decreases as the inventory level increases, because the penalty for unsatisfied 15

demand is the most important factor. In the remaining third part above xp,2, holding cost is more important than the lost-demand penalty as the inventory level is sufficient to meet the demand; thus the excess inventory gives a positive effect on the value. Based on these considerations, the basis function for the inventory side is constructed as Eq. (30), and the coefficients are approximated starting from the corresponding unit cost.



x p ,1  min  li   d i

H



(28)

x p ,2  x p ,1  d max

x   x p ,1  x  



 x p,2  x 



(29)

 x  x p ,2 

 T

 

(30)

Meanwhile, the value function is linearly increased by demand and its marginal effect on the value function can be different with respect to the current storage state. Fig. 7 shows that demand has little effect on the value function when the system has sufficient inventories (in the holding cost dominant part), whereas it becomes more important in the case of an inventory shortage (in the lost-demand dominant parts). To reflect this impact, the basis function for the demand side is set to that in Eq. (31) with the index j representing the current inventory state. Variation in the lead time of each supplier is also incorporated.

4.3.2

d   d  dmin  3  j  c  where x   x p, j 1 , x p, j  and c isconstant

(31)

l   l  lmin 

(32)

Cost and state transition estimation Even though the value function is approximated by a piecewise linear model, computational burden to solve the MILP model is very severe when the dimension of state and action space is high. In the studied inventory system which involves a single product and a single echelon, an unloading schedule and its cost can be heuristically estimated, instead of through mathematical programming. Given an unloading sequence, q = {q1,…,qn}, the unloading point of the th carrier is defined by Eq. (33), from the availability of carrier qi or the unloading point of the next sequenced carrier. Here, minimum unloading time, UTi is the amount of initially delivered order divided by the maximum unloading rate. (  corresponds to round-up to the next highest integer value.) 16

if i  n  AvL , qn  UTqn UPi   min UPi 1 , AvL , qi   UTqi otherwise

(33)

 ai  UTi     Fu ,max 

(34)

We can then construct a simple premise of unloading process: unloading is started at time UPi or when the storage level drops to zero as shown in Figs. 8 and 9 (left), respectively. Or even if the storage level is zero, the unloading cannot be started when the corresponding carrier is not available. In this case, lost-demand is incurred until the carrier arrives (Fig. 9, right). Furthermore, unloading can be restricted by tank capacity when the holding amount of product is already high enough (Fig. 8, right). Following this, the inventory values of the vertices and period parameters in Figs. 8 and 9 are decided through Eqs. (35) – (41). From the starting inventory xsi for ith unloading carrier, wait-period Wi and lost-demand period Li are determined by Eqs. (36) and (37) respectively. Then the inventory level decreases by xwi according to the demand realization during the wait-period. (d is the unit demand per scheduling time.) After that, the amount of unloaded product UAi from the carrier to the storage tank is computed by the order amount and tank capacity as Eq. (39), and unloading-period Ui is sequentially decided by the value of UAi. Finally, the inventory level after the unloading-period xui is calculated by Eq. (41). These parameters are estimated repeatedly for all carriers.

 xt ,0 if i  1 xsi    xui 1 otherwise

(35)

  xsi   Wi  min    ,UPqi  d   

(36)

Li  max  AvF , qi  Wi ,0

(37)

xwi  xsi  Wi d

(38)

UAi  min  aqi , xmax  xwi 

(39)

17

 UAi  Ui     Fu ,max 

(40)

xui  xwi  UAi  U i d

(41)

Hence the scheduling cost and inventory transition can be numerically estimated using these parameters, with Eqs. (42) and (43) respectively. The inventory holding cost is the area of the dotted trapezoid or triangle in Figs. 8 and 9, and the other three costs are simply calculated from the corresponding parameters, Ui, Li, and UAi, respectively. Inventory finally reaches the storage level after unloading from the last carrier, followed by subtracting the amount of demand realized during remaining times. In this equation, all parameters are determined by the optimal unloading sequence, q * . M   xsi  xwi xwi  xui   Cˆ sch  min  Ch  Wi  U i   CunU i  Cld Li d  Clv (aqi  UAi )  q 2 2   i 1  

(42)

n   xˆt 1  xun*  d  H   Wi*  L*i  U i*   i 1  

(43)

5. Numerical Results A reference policy (for comparison with the policy using the proposed integrated models) is obtained from a discrete MDP formulation to deal with Markovian demand; however it does not rigorously account for the scheduling operation and lead time uncertainty. Only the exogenous information on demand is captured, and the penalty for inventory falling below a safety stock level is included to minimize the occurrences of the stock-out caused by large lead time. The level of safety stock is systematically decided by the amount of demand realized before any carrier arrives with fixed lead time, in the same manner with xp,1 (Eq. (28)). The integrated model is solved through both the exact method, and the approximation approach explained in Section 4.3. Consequently, three different policies are obtained from different combinations of the models and solution algorithms, and they are listed in Table 3. Discount factor to ensure the convergence of the infinite horizon cost is set by 0.99 for all three policies. For the approximate value iteration used to acquire Policy 3, the function coefficient vector in the linear model is updated by recursive least squares (RLS) through Eqs. (44) – (47) (Powell, 2007). Here, ˆ n is the error 18

between the prediction V ( n 1 ) from the previous iterated coefficient and the current observation vˆ n representing an estimate of the value of being in a state, and Bn and n are recursively computed. (Refer to Wang (1986) for more detailed formulation and derivation.) More recent observations receive more weight by tuning the parameter λ, i.e., smaller values of λ puts more weight on more recent observations.

 n   n 1 

1

n

B n 1 n ˆ n

(44)

ˆ n  V ( n 1 )  vˆn

Bn 

(45)





1  n 1 1 n 1 n n T n 1   B  n B    B   



 n     n  B n 1 n T

(46)

(47)

In order to examine the resulting decision policies in terms of performance under uncertainty and computational complexity, two different procurement systems and decision problems are defined as Table 4. A small scale case having short planning and scheduling decision horizon and three different types of suppliers is first investigated. Each supplier provides different lead time condition, order limit, and order cost. Then a larger case is studied for more practical application, which involves five suppliers and longer decision horizon; daily scheduling decision is made over a horizon of one month, and monthly planning decision over a horizon of one year. Unit cost parameters for these case studies are also listed in Table 4, and Css is the unit penalty cost for inventory level below the safety stock, which is used only for obtaining the reference policy. Informed and successive uncertainty model is more realistic rather than using independent and identically distributed uncertainty (Bendre & Nielsen, 2013; Bensoussan, 2012). The simplest and popular way to model the correlated uncertainty is to assume that the uncertainty is realized from a Markov process. . Thus in this work, we consider the situation when the uncertainty is modelled by a Markov chain (MC) with a finite number of states. In practice, such an MC model can be constructed by discretizing the space and estimating the transition probability from real historic data. Then, performances of the resulting policies can be evaluated by Markov chain Monte Carlo (MCMC) simulation. For each simulation run, random sample paths of the exogenous information on demand and lead time are generated for the overall decision horizon (t = 1, …, T) with the initial values of d0 = 7 and l0 = 4.

19

5.1 Case Study 1 In Case 1, three different types of suppliers are introduced; Supplier 1 has uncertain lead time varying from 2 to 5 but offers a cheaper setup cost for ordering, while the orders released to Supplier 2 and 3 arrive after fixed lead time but the order cost is higher. The maximum unloading flowrate for one scheduling period (Fu,max) is set by 3, and detailed parameters related to the suppliers are given in Table 5. Here, to reduce the excessive computation, decision space is aggregated with the value of Fu,max. That’s because to order products with Fu,max as a unit is benefit in terms of unloading cost. Inventory and demand space is defined with xmin = 0, xmax = 20, dmin = 5, and

dmax = 10. In this problem, the size of the action space is 48, and that of the state space is 504. The exogenous information variables for the two uncertain parameters (demand and lead time of Supplier 1) have independent normal distributions with the following statistic parameters: μd = 0, σd = 2, μl = 0, and σl = 1. The number of random samples used for AVI algorithm (N in Table 2) and discount factor for nonstationary RLS (λ in Eqs. (46) & (47)) are set by 150 and 0.98 respectively. These values are empirically decided. Policy 1 and 2 are obtained off-line by using the exact VI while the policy table based on the AVI algorithm is updated through simulations. At each planning epoch t, an order is released according to the policy table. The optimal schedule for the corresponding time period (t, h) where h = 1, …, H is then decided by the MILP model with given order and realized random outcomes. The inventory level is reviewed by implementing the schedule, and the updated state is used for the next decision. The costs of replenishment and operation schedule are summed over the whole time horizon. The overall policy evaluation results averaged by 100 simulation runs are shown in Table 6. The average cost of the reference policy is much higher than those of the other policies obtained from the integrated models with better details on the operational inventory dynamics and lead time uncertainty. The reference policy is vulnerable to environmental variabilities due to using an optimistic nominated inventory state and fixed lead time. While the average cost of Policy 2 from the integrated model and the exact solution method is the lowest, its computational burden is significant even for this simple problem. This approach does not scale well with problem size. In this sense, the proposed method using the heuristics based scheduling model and approximate value iteration represents an attractive trade-off between optimality (average cost) and computational time. More detail theoretical analysis on the computational scale of the exact approach and the approximate approach with the problem size is performed. It is obvious that the computational complexity of the vale iteration (VI) 20

algorithm for each iteration is O(S2A) where S and A are the sizes of state and action space, respectively. The number of iterations required to reach the optimal value function is polynomial in the number of states, and the value of 1 / (1-) where  is discount factor for infinite horizon cost (Littman, Dean, & Kaelbling, 1995). Thus the convergence rate is fairly slow as the discount factor approaches 1. In this case study, the VI algorithm had to be iterated 1277 times for the value function to converge with 10-3 error tolerance. On the contrary to this, the same-sized MDP problem can be solved by only N iteration by using the AVI algorithm, which is much smaller than the number of iterations required for the value function convergence in the exact VI. In addition, only O(SA) operations are required for each iteration. By using the Branch and Bound algorithm, computational complexity of MILP can be decomposed into one of IP and one of subordinated LP (Till, Engell, Panek, & Stursberg, 2003). Let n, d, and m denote the number of variables, integer variables, and constraints, respectively. The number of iterations of LP linearly depends on m, and the computation of arithmetic operations in each iteration is O(mn). An exact relation between the number of integer variables and the average complexity of IP is still an open issue, but the average complexity varies from

O(d) to O(D) where D is the space of integer variables (Zhang, 1999). In this case study, n, d, and m are determined by the number of suppliers (M) and size of time horizon (H). Decision vector stated in Eq. (7) has 3M + 3H + 4MH number of variables including 2MH binary variables and 2M integer variables, and the overall number of constraints is 9M + 5H + 8MH including the lower and upper bounds of the variables. The space of integer variables (D) are represented by all possible combination of integer variables as shown Eq. (48). All these combined, the computational complexity of the studied MILP model can grow to a quite severe level with large values of M and H. The heuristic approach, meanwhile, requires M! number of calculations which is the all possible combinations of unloading sequence q of multiple carriers in Eq. (42). Since M is much smaller than H and diversity of suppliers is limited in most cases, the heuristic approach can relax the computation. In this case study, the MILP model has 159 variables (including 66 integer variables) and 317 constraints, while only 6 iterations are needed by the heuristic approach. M

D  22 MH  TAi 2 where TAi  Avl ,i  Av f ,i  1 i 1

(48)

The approximation performance of the heuristics-based estimation of the cost and state transition suggested in 4.3.2 is first examined by comparing the resulting scheduling costs with those using the MILP model. Fig. 10 21

shows an example of the calculated scheduling cost by the MILP (solid line) and the heuristics-based approach (dashed line) for all states, given the order decision of a  3 0 6 , meaning that 3 and 6 orders are released to supplier 1 and 3 respectively. Computed scheduling costs from the heuristic model do not closely match those of the MILP in the case of extremely low or high inventory levels (extreme left or right side of the graph in Fig. 9). However, these extreme inventory states are rarely visited during the simulation runs for approximating the value function. Since the overall value function is structurally updated with the observed sample values at the visited states that have normal inventory levels, the inaccuracy of estimated scheduling cost and state transition by the heuristic approach does not have a great effect on the value function approximation. For further comparison, the exact value function (converged from the exact value iteration) and the approximated value function are graphed in Fig. 11, and the resulting function coefficients of the approximated model are specified in Eq. (49). We claim that the value function can be quite well approximated with the piecewise linear model, based on the small enough root mean squared error (RMSE) of the approximated value function for all states, defined by Eq. (50).

  1636.62 9.24 1.26 0.10 0.54 0.18

RMSE 

1 S

 V (s)  V (s) 

2

sS

 1.6365

(49)

(50)

5.2 Case Study 2: Larger scale To examine the applicability of the approach to more general and realistic problems, a larger scale case study is examined with five suppliers which have different supply conditions and bigger size random lead time space (Table 7). Here, the setup cost for replenishment orders is in proportion to the reliability on carrier arrival. The maximum unloading flowrate is assumed to be 5, and the resulting aggregated action space is 162. Inventory and demand space are defined with the following parameters: xmin = 0, xmax = 30, dmin = 20, and dmax = 15. The overall size of the state space including inventory capacity, possible demand and lead time realization is 4464. Unit costs are listed in Table 4, and other parameters are set to be the same as those of Case study 1. The number of random samples N used for AVI is set by 200 in this case. Table 8 shows the results. For this larger scale problem, the integrated model cannot be solved by the exact method. The MILP model has much large size which have 705 number of decision variables including 310 integer variables and 1395 number of constraints, as well as the state 22

and action space of the planning level are computationally intractable. That is, the MILP should be solved 44242·162 number of times to compute state transition probability matrix and one-stage cost function of the MDP formulation. Furthermore, the convergence rate of the exact VI is expected to be very slow due to a large number of states, and the discount factor closed to 1. While the exact approach cannot be adopted due to a computational complexity according to the problem size, the performance of the approximate method shows about 14 % improvement compared to the reference policy.

6. Conclusion In this study, a general inventory control problem for supplying raw materials from multiple suppliers to manufacturer has been formulated as a MDP, which allows us to incorporate supply and demand uncertainty and consider operational unloading schedules in a natural way. The planning problem was integrated with a MILP model at the scheduling level for computing one-period state transition and cost of the MDP formulation, and solved by the dynamic programming method of exact or approximate value iteration. The performance gain from employing the more rigorous integrated model compared to the policy obtained without considering the operational dynamics and lead time uncertainty on a faster time scale was verified through case studies. Our numerical results showed that the proposed method can reduce the overall costs by capturing uncertainties and integrating planning with the operational inventory dynamics and detailed unloading schedule. Since these two issues that we considered are common and important issues in supply chain planning (SCP), the presented strategy can be applied to many other SCP problems which share the common issues of integration with scheduling and incorporation of Markovian uncertainty. Additionally, an approximation framework using a linear function approximation and heuristics-based estimation of cost and state transition was tested to reduce the computational burden of the exact method. In the larger scale case study, the integrated model could not be solved by the exact approach due to computational infeasibilities; in this case, the proposed approximation framework was an excellent alternative to the exact solution method. Thus, the proposed method can broaden the applicability of the MDP approach to practically sized problems at some acceptable performance loss.

Acknowledgement

23

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2015R1A2A2A01007102).

References Bendre, A. B., & Nielsen, L. R. (2013). Inventory control in a lost-sales setting with information about supply lead times. International journal of production economics, 142(2), 324-331. Bengtsson, J., & Nonås, S.-L. (2010). Refinery planning and scheduling: An overview Energy, Natural

Resources and Environmental Economics (pp. 115-130): Springer. Bensoussan, A. (2012). Control of Inventories with Markov Demand Stochastic Analysis and Related Topics (pp. 29-55): Springer. Beyer, D., Cheng, F., Sethi, S. P., & Taksar, M. (2010). Markovian demand inventory models: Springer. Birge, J. R., & Louveaux, F. (2011). Introduction to stochastic programming: Springer Science & Business Media. Cheng, L., Subrahmanian, E., & Westerberg, A. (2003). Design and planning under uncertainty: issues on problem formulation and solution. Computers & Chemical Engineering, 27(6), 781-801. Choi, J., Realff, M. J., & Lee, J. H. (2006). Approximate dynamic programming: application to process supply chain management. AIChE journal, 52(7), 2473-2485. Dudek, G. (2009). Supply Chain Management and Collaborative Planning Collaborative Planning in Supply

Chains (pp. 5-24): Springer. Grossmann, I. E. (2005). Enterprise‐wide optimization: A new frontier in process systems engineering. AIChE

journal, 51(7), 1846-1857. Grossmann, I. E. (2012). Advances in mathematical programming models for enterprise-wide optimization.

Computers & Chemical Engineering, 47, 2-18. Guillén, G., Badell, M., Espuna, A., & Puigjaner, L. (2006). Simultaneous optimization of process operations and financial decisions to enhance the integrated planning/scheduling of chemical supply chains.

Computers & Chemical Engineering, 30(3), 421-436. Gupta, A., & Maranas, C. D. (2003). Managing demand uncertainty in supply chain planning. Computers &

Chemical Engineering, 27(8), 1219-1227. Katanyukul, T. (2010). Approximate dynamic programming application to inventory management. Colorado State University. Katanyukul, T., Duff, W. S., & Chong, E. K. (2011). Approximate dynamic programming for an inventory problem: Empirical comparison. Computers & Industrial Engineering, 60(4), 719-743. Kleywegt, A. J., Nori, V. S., & Savelsbergh, M. W. (2002). The stochastic inventory routing problem with direct deliveries. Transportation Science, 36(1), 94-118. 24

Lee, H., Pinto, J. M., Grossmann, I. E., & Park, S. (1996). Mixed-integer linear programming model for refinery short-term scheduling of crude oil unloading with inventory management. Industrial & Engineering

Chemistry Research, 35(5), 1630-1641. Lee, J. H. (2014). Energy supply planning and supply chain optimization under uncertainty. Journal of Process

Control, 24(2), 323-331. Lee, J. M., & Lee, J. H. (2004). Approximate dynamic programming strategies and their applicability for process control: A review and future directions. International Journal of Control Automation and

Systems, 2, 263-278. Littman, M. L., Dean, T. L., & Kaelbling, L. P. (1995). On the complexity of solving Markov decision problems. Paper presented at the Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. Maravelias, C. T., & Sung, C. (2009). Integration of production planning and scheduling: Overview, challenges and opportunities. Computers & Chemical Engineering, 33(12), 1919-1930. Mula, J., Poler, R., Garcia-Sabater, J., & Lario, F. (2006). Models for production planning under uncertainty: A review. International journal of production economics, 103(1), 271-285. Peidro, D., Mula, J., Poler, R., & Lario, F.-C. (2009). Quantitative models for supply chain planning under uncertainty: a review. The International Journal of Advanced Manufacturing Technology, 43(3-4), 400-420. Powell, W. B. (2007). Approximate Dynamic Programming: Solving the curses of dimensionality (Vol. 703): John Wiley & Sons. Powell, W. B., George, A., Simao, H., Scott, W., Lamont, A., & Stewart, J. (2012). SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology, and Policy. Informs Journal on Computing,

24(4), 665-682. doi: DOI 10.1287/ijoc.1110.0470 Pratikakis, N. (2008). Multistage decisions and risk in Markov decision processes: Towards effective approximate dynamic programming architectures. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming: John Wiley & Sons. Rodriguez, M. A., Vecchietti, A. R., Harjunkoski, I., & Grossmann, I. E. (2014). Optimal supply chain design and management over a multi-period horizon under demand uncertainty. Part I: MINLP and MILP models. Computers & Chemical Engineering, 62, 194-210. Rust, J. (2008). Dynamic programming: London, UK, Palgrave Macmillan, Ltd. Sahinidis, N. V. (2004). Optimization under uncertainty: state-of-the-art and opportunities. Computers &

Chemical Engineering, 28(6), 971-983. Song, J.-S., & Zipkin, P. H. (1996). Inventory control with information about supply conditions. Management

Science, 42(10), 1409-1419. Till, J., Engell, S., Panek, S., & Stursberg, O. (2003). Empirical complexity analysis of a milp-approach for

optimization of hybrid systems. Paper presented at the IFAC Conference on Analysis and Design of 25

Hybrid Systems. Wang, H. (1986). Recursive estimation and time-series analysis. Acoustics, Speech and Signal Processing,

IEEE Transactions on, 34(6), 1678-1678. You, F., & Grossmann, I. E. (2008). Design of responsive supply chains under demand uncertainty. Computers

& Chemical Engineering, 32(12), 3090-3111. Zhang, W. (1999). State-space search: Algorithms, complexity, extensions, and applications: Springer Science & Business Media.

26

Fig. 1. Supply chain planning matrix (Maravelias & Sung, 2009)

Fig. 2. Two level decision making in the procurement system

Fig. 3. Studied procurement system with multiple supplier and single manufacturer

27

Fig. 4. Reference planning model

Fig. 5. Multi-scale integration of procurement planning and scheduling





28

Fig. 6. Piecewise linear function approximation with respect to the inventory level

1685 1680

y = 4.7614x + 1631.2

1675

Value function

1670

sub-interval1

1665

sub-interval2

y = 3.4159x + 1624.1

1660

sub-interval3

1655 1650 1645

y = 2.5766x + 1623.8

1640 1635 1630 4

5

6

7

8

9

10

11

Demand, d

Fig. 7. The marginal effect of demand on the value function





29

Fig. 8. Estimation of unloading schedules: the cases that the unloading is started at unloading point, UPi

Fig. 9. Estimation of unloading schedules: the cases that the unloading is started due to a stock-out





30

65

MILP model

heuristic approach

Scheduling cost, Csch

55 45 35 25 15 5 0

24

48

72

96

120 144 168 192 216 240 264 288 312 336 360 384 408 432 456 480 504

State

Fig. 10. Computed scheduling cost by the MILP model and heuristics-based model, given an order plan of

a  3 0 6

1695

solved by VI

solved by Approximate VI

1685

Value function

1675 1665 1655 1645 1635 0

24

48

72

96

120 144 168 192 216 240 264 288 312 336 360 384 408 432 456 480 504

State

Fig. 11. Comparison of the value functions from exact VI and approximate VI

31

Table 1. Value Iteration (VI) algorithm for an infinite horizon problem VI for an infinite horizon problem: Step 0. n  1 , and initialize V 0 ( s ) for s  S   Step 1. s  S , V n ( s )  min c( s, a )    P  s  | s, a V n 1 ( s )  a s S  

Step 2. If V n  V n 1   , return V n ; else n  n  1 and go to Step 1.

Table 2. Approximate Value Iteration (AVI) algorithm for an infinite horizon problem AVI algorithm by forward induction: Step 0. n  1 , choose s1 , and initialize V 0 ( s ) for s  S Step 1. Choose a sample path    1 ,...,  N  Step 2. For n  1,..., N Step 2a. Find optimal decision a , and sample value vˆn





vˆ n  min c ( s n , a )   E V n 1 ( s ) | s n  a

Step 2b. Update value function with a sample value V n  U V V n 1 , s n , vˆ n  Step 2c. Compute state transition for stepping forward, and n  n  1 s n 1  S M ( s n , a,  n 1 )





32

Table 3. Tested decision policy for the case studies Policy 1: reference

Policy 2: integrated & exact method

Policy 3: integrated & approximate method

Planning model

MDP formulation with Safety stock

MDP formulation

MDP formulation

Uncertainty account

Demand

Demand & Lead time

Demand & Lead time

Integration with scheduling

No

MILP model

Heuristic approach

Exact VI

Exact VI

Approximate VI with p.w.l. value function

Solution algorithm

Table 4. Unit cost and system parameters for the case studies Case 2

Decision

H

10

30 (one-month)

horizon

T

10

12 (one-year)

M

3

5

Ch

0.1

0.1

Cun

3

5

Cld

10

10

Clv

5

5

Css

10

10

Number of suppliers

Unit cost



Case 1



33

Table 5. Parameters for three suppliers in Case study 1 Supplier 1

Supplier 2

Supplier 3

Li

{2, 3, 4, 5}

2

5

Avl ,i

9

6

10

Oi

{0, 3, 6, 9}

{0, 3, 6}

{0, 6, 9, 12}

csetup ,i

1.5

2

2

cvar,i

0.5

0.5

0.5

Table 6. Results: Case study 1

Average Cost Improvement over Policy 1 (%) Improvement over Policy 2 (%) CPU time (s)



Policy 1

Policy 2

Policy 3

197.67

171.91

172.46

-

13.03

12.76

-

-

-0.32

2.28

1343.73

23.96



34

Table 7. Parameters for the five supplier in Case study 2 Supplier 1

Supplier 2

Supplier 3

Supplier 4

Supplier 5

Li

{5, 6, 7}

{10, 11}

{13, 14, 15, 16}

4

15

Avl ,i

13

17

26

9

22

σl,i

1

0.5

1

-

-

Oi

{0, 5, 10}

{0, 5, 10}

{0, 10, 15}

{0, 5}

{0, 10, 15}

csetup ,i

1.5

1.7

1.4

2

2

cvar,i

0.5

0.5

0.5

0.5

0.5



Table 8. Results: Case study 2 Policy 1 Average Cost

Policy 2 Computationally

606.75

infeasible

Improvement over Policy 1 (%) CPU time (s)

21.43

35

Policy 3 519.46

-

14.46

-

1247.10