Finite and infinite-horizon single vehicle routing problems with a predefined customer sequence and pickup and delivery

Finite and infinite-horizon single vehicle routing problems with a predefined customer sequence and pickup and delivery

European Journal of Operational Research 231 (2013) 577–586 Contents lists available at SciVerse ScienceDirect European Journal of Operational Resea...

2MB Sizes 0 Downloads 35 Views

European Journal of Operational Research 231 (2013) 577–586

Contents lists available at SciVerse ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Production, Manufacturing and Logistics

Finite and infinite-horizon single vehicle routing problems with a predefined customer sequence and pickup and delivery D.G. Pandelis a, C.C. Karamatsoukis b,c, E.G. Kyriakidis d,⇑ a

Department of Mechanical Engineering, University of Thessaly, Athinon Avenue, Pedion Areos, 38334 Volos, Greece Department of Financial and Management Engineering, University of the Aegean, 41 Kountouriotou Street, 82100 Chios, Greece c Hellenic Army Academy, Department of Mathematics and Engineering Sciences, 16673 Vari, Greece d Department of Statistics, Athens University of Economics and Business, Patission 76, 10434 Athens, Greece b

a r t i c l e

i n f o

Article history: Received 1 May 2012 Accepted 31 May 2013 Available online 7 June 2013 Keywords: Logistics Dynamic programming Routing with pick up and delivery

a b s t r a c t We consider the problem of finding the optimal routing of a single vehicle that starts its route from a depot and picks up from and delivers K different products to N customers that are served according to a predefined customer sequence. The vehicle is allowed during its route to return to the depot to unload returned products and restock with new products. The items of all products are of the same size. For each customer the demands for the products that are delivered by the vehicle and the quantity of the products that is returned to the vehicle are discrete random variables with known joint distribution. Under a suitable cost structure, it is shown that the optimal policy that serves all customers has a specific threshold-type structure. We also study a corresponding infinite-time horizon problem in which the service of the customers is not completed when the last customer has been serviced but it continues indefinitely with the same customer order. For each customer, the joint distribution of the quantities that are delivered and the quantity that is picked up is the same at each cycle. The discounted-cost optimal policy and the average-cost optimal policy have the same structure as the optimal policy in the finite-horizon problem. Numerical results are given that illustrate the structural results. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction In the classical vehicle routing problem (VRP) the objective is to find the optimal routing of identical vehicles that start from a depot and deliver goods to a set of geographically dispersed customers. The capacitated vehicle routing problem (CVRP) is a VRP in which the capacity of the vehicles is finite. The quantity of the goods that each customer demands may be constant or a random variable. The CVPRs have drawn enormous interest from many researchers during the last 50 years. It is a NP-hard problem and a great number of exact algorithms (e.g. branch-and-bound, branch-and-cut, branch-and-cut-and-price methods) and heuristics and metaheuristics (e.g. tabu search, simulated annealing, genetic algorithms, colony optimization) have been developed. The exact algorithms find the global minimum of the cost function, while the heuristics and metaheuristics produce good solutions but do not guarantee optimality. Metaheuristics usually give better solutions than heuristics, but require more computational time. Surveys of various relevant models and solutions can be found in Toth and Vigo (2002), Simchi-Levi et al. (2005), Liong et al. (2008). The capacitated vehicle routing problem with pickups ⇑ Corresponding author. Tel./fax: +30 210 8203503. E-mail addresses: [email protected] (D.G. Pandelis), k.karamatsoukis@ fme.aegean.gr (C.C. Karamatsoukis), [email protected] (E.G. Kyriakidis). 0377-2217/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ejor.2013.05.050

and deliveries (CVRPPD) is an extension to the CVRP where the vehicles deliver goods to customers and pick some goods up at customer sites. In most articles dealing with the CVRPPD (see e.g. Toth and Vigo, 1997, 1999; Hernandez-Perez and Salazar-Gonzalez, 2004; Ropke and Pisinger, 2006) there is a distinction of the customers to delivery customers and pick-up customers. The delivery customers receive goods from the vehicles and the pick-up customers return goods to the vehicles. The goal is to find optimal routes for a fleet of vehicles to visit the pick-up and the drop-off locations. It can also be assumed (see e.g. Nagy and Salhi, 2005; Bianchessi and Righini, 2007; Gribkovskaia et al., 2007; Zachariadis et al., 2010) that all customers may simultaneously receive and give goods. For example, this situation arises when the vehicles deliver to stores a product (e.g. milk) that expires after a few days. In this case it is reasonable to assume that the vehicles simultaneously provide the customers with new (fresh) products and collect the old (expired) ones. In the present paper we introduce a simple CVRPPD. In this problem it is assumed that a single vehicle starts its route from a depot and picks up from and delivers K different products to N customers according to a predefined customer sequence 1 ? 2 ?    ? N. There is no distinction between delivery customers and pick-up customers since all customers may receive and may return products. The vehicle has a single compartment that may carry any quantity of product i 2 {1, . . . , K} provided that its total capacity

578

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

is not exceeded. It is assumed that the items of different products are of the same size. For example, if K = 2 and the two products are wine and beer it is assumed that a bottle of wine has the same size as a bottle of beer. The total quantity of products that is collected by the vehicle from each customer and his demands for new products are discrete random variables with known joint probability distribution. The vehicle is allowed during its route to return to the depot to empty returned products and restock with new products. It seems reasonable that, when the vehicle returns to the depot for replenishment, it is not always preferable to load it to its full capacity, as there needs to be some space available in order to accommodate the returned products of the next customers. The total demand of each customer for all products is assumed to be less than or equal to the capacity of the vehicle. Furthermore, the total quantity of the items that each customer returns does not exceed the vehicle capacity. It is assumed that the travel costs among all points in the network (depot and customer sites) are known. The objective is to find the policy that minimizes the expected total cost for the service of all customers. Minis and Tatarakis (2011) introduced the above problem with K = 1. They selected as decision epochs of the problem, the epochs at which the service of each customer has been completed and presented a dynamic programming formulation. They proved that, after the completion of the service of a customer, it is optimal for the vehicle to proceed to the next customer if its empty space exceeds a critical level that depends on the number of available items for delivery. This is a partial characterization of the optimal policy. Its proof is complicated and it seems impossible to be extended for K > 1. They also obtained numerical results and analyzed the performance of the algorithm. In the present paper we choose as decision epochs of the problem, the epochs at which the vehicle visits for the first time each customer and has satisfied as much of the customer’s demands for new products as possible and has picked up as much of the returned products as possible. Note that it is possible at these decision epochs to have demands for new products that have not been satisfied if the quantities carried by the vehicle are not sufficient. It is also possible at these decision epochs to have old products that have not been picked up by the vehicle due to the lack of space. However, this choice of decision epochs enables us, for any value of K P 1, to present for all possible cases suitable dynamic programming equations that have elegant and usable forms. These equations permit us not only to find numerically the optimal policy but to prove that it has a threshold-type structure, that is intuitively reasonable. Specifically, the set of all possible states (i.e. remaining quantities of the products in the vehicle and empty space of the vehicle) after the first visit to each customer consists of four disjoint subsets. If the state belongs to the first subset then the optimal decision is to proceed to the next customer. If the state belongs to the second subset then the optimal decision is to go to the depot for loading and then go to the next customer. If it belongs to the third subset then the optimal decision is to go to the depot to unload any returned products, to restock the owed quantity of the products, to load additional items, to return to the customer to satisfy the owed demands and/or to pick up remaining old products and then proceed to the next customer. If it belongs to the fourth subset then the optimal decision is to make two trips to the depot before going to the next customer. This characterization of the optimal policy is complete and permits us to design a special-purpose efficient dynamic programming algorithm for its determination. The problem that we described above can be considered as a generalization of the problem introduced by Yang et al. (2000) in which it is assumed that the vehicle delivers to the customers only one product, i.e. K = 1, and it does not pick up any old products. Yang et al. developed a dynamic programming approach and proved that for each customer j 2 {1, . . . , N  1} there exists a criti-

cal number sj such that the optimal decision, after servicing customer j, is to continue to customer j + 1 if the remaining quantity in the vehicle is greater than or equal to sj, or to return to the depot for replenishment if it is less than sj. Tsirimpas et al. (2008) studied the problem with pick-up and delivery with K = 1 when, for each customer, the quantity that is picked up and the demands for new products are not random variables but constant numbers. They also assumed that the vehicle visits each customer only once and they developed a suitable dynamic programming algorithm for the determination of the optimal policy. Note that Yang et al. (2000) and Tsirimpas et al. (2008) selected as decision epochs for the problems that they studied, the epochs at which the service of each customer has been completed. The problem of delivering K different products if the demands of the products are random variables was studied (i) by Tatarakis and Minis (2009) and Pandelis et al. (2012) when each product is stored in its dedicated compartment in the vehicle and (ii) by Tatarakis and Minis (2009) and Pandelis et al. (2013) when all products are stored together in the vehicle’s single compartment. In the present paper, we also study the corresponding infinite time-horizon problem if the service of the customers does not stop when the last customer has given the returned items and has received the new products but it continues indefinitely with the same customer order. It is assumed that, when the vehicle completes a tour, the demands of each customer for new products and the quantity of returned products are renewed for the next tour and follow the same joint distribution. Using well-known results of Markov decision processes we prove that the discountedcost optimal policy and the average-cost optimal policy have the same threshold-type structure as the optimal policy in the initial finite-horizon problem. As mentioned by Minis and Tatarakis (2011), a practical application of the considered problem could be the so-called ex-van sales. In ex-van sales the driver of the vehicle acts as a salesman. He visits his customers (retail outlets, supermarkets, kiosks, stores, etc.) in an area typically according to a predefined sequence. The demands of each customer for new products (e.g. milk, bottles of beer or wine) and the quantity of returned products (e.g. expired milk, empty bottles of beer or wine) are not known in advance but they are revealed upon arrival at the customer’s site. If a customer’s demand for a new product exceeds the quantity that is loaded in the vehicle or if the quantity of returned products exceeds the empty space in the vehicle, the driver has to go to the depot to empty returned products and to restock with new ones. Another example could be the routing of a self propelled vehicle in a manufacturing shop that transfers discrete parts to workcenters in a predefined sequence (see Rembold et al., 1985). Note that in addition to the main pathway connecting the workcenters, there are spurs connecting each workcenter with the materials warehouse, allowing the return and the reloading of the vehicle. The required quantities for new discrete parts may be stochastic due to failures and they are revealed when the vehicle arrives at a workcenter. If the required quantities for new discrete parts exceed the quantities that are loaded in the vehicle or if the quantity of useless discrete parts exceeds the empty space in the vehicle, then the vehicle has to go to the materials warehouse to empty the useless discrete parts and to replenish with new ones. The rest of the paper is organized as follows. In the next section we present the finite-horizon pick-up and delivery problem, we define the decision epochs and we present the dynamic programming equations for the determination of the optimal policy. It is proved that the optimal policy has a specific threshold-type structure. The infinite-horizon problem is presented and analyzed in Section 3. It is proved that the discounted-cost optimal policy and the average-cost optimal policy have the same structure as the finite-horizon optimal policy. Numerical results illustrate the

579

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

theoretical results of Sections 2 and 3. The conclusions of the paper are given in the last section. 2. The finite-horizon problem We consider a set of nodes V = {0, 1, . . . , N} with node 0 denoting the depot and the nodes 1, . . . , N corresponding to customers. The customers are serviced in the order 1, . . . , N by a vehicle of capacity Q. The vehicle carries K different kinds of products to satisfy customers’ demands. The same vehicle is also used to store products that are returned by the customers. We assume that the items of all products are of the same size. The vehicle starts its route with a load of all products that is less than or equal to Q and after servicing all customers it returns to the depot. The demand of customer j, j = 1, 2, . . . , N, for product i, i = 1, 2, . . . , K, is a discrete random variable nji and the total quantity of products returned by customer j is a discrete random variable wj. The joint probability distribution of nji ; i ¼ 1; . . . ; K and wj is assumed known. The actual demands for new products and the quantity of returned products of each customer become known upon the vehicle’s arrival at the customer’s site. We assume that the total demand for new products and the quantity of returned products of each customer cannot exceed P the vehicle’s capacity, that is, maxj¼1;2;...;N Ki¼1 nji 6 Q and maxj=1,2,. . .,N wj 6 Q. When the vehicle visits customer j for the first time it satisfies as much demands as possible and picks up the largest possible quantity of returned products. If part of the demands is not satisfied and/or there is not enough space for all returned products, the vehicle goes to the depot, empties returned products, restocks, and returns to customer j. After satisfying the demand of the last customer and picking up all of his returned products, the vehicle returns to the depot. We denote by cj,j+1, j = 1, 2, . . . , N  1, the travel cost between customers j and j + 1, and by cj0, j = 1, 2, . . . , N, the travel cost between customer j and the depot. These costs can be considered as the costs of the gasoline that the vehicle needs to cover the distances between customers or the distances between customers and the depot. We naturally assume that these distances satisfy the triangle inequality, i.e.

cj;jþ1 6 cj0 þ c0;jþ1 ;

z i ¼ minf0; zi g. When z = 0 and r P 0 (demand for products and empty space fully satisfied), the vehicle either proceeds directly to the next customer or goes to the depot, empties any returned products, restocks with loads hi, i = 1, 2, . . . , K, of products 1, 2, . . . , PK K, where i¼1 hi 6 Q , and then visits the next customer. When z < 0 and/or r < 0, the vehicle goes to the depot, empties returned products, and restocks the owed quantity z = min(0, z). Then it has the following choices: (i) it loads additional quantities hi of products 1, 2, . . . , K, so that at least r empty space remains after PK delivering the owed quantity z, that is, i¼1 hi 6 Q þ min ðz ; r Þ, returns to the customer, satisfies demand and/or picks up remaining returned products, and then proceeds to the next customer and (ii) returns to the customer, satisfies demand and/ or picks up remaining returned products, makes a second trip to the depot where it empties returned products, restocks with loads PK hi of products 1, 2, . . . , K, where i¼1 hi 6 Q , and proceeds to the next customer. Our objective is to determine a vehicle routing strategy that minimizes the expected total cost during a visit cycle. 2.1. The optimal routing strategy We

define

z ¼ ½z1 ; z2 ; . . . ; zK ,

vectors

h i  nj ¼ nj1 ; nj2 ; . . . ; njK ,

 h ¼ ½h1 ; h2 ; . . . ; hK , and denote by fj ðz; rÞ the minimum expected cost when the load of product i carried by the vehicle after visiting customer j for the first time is equal to zi and the empty space is equal to r. Then, an optimal routing strategy can be determined by the following dynamic programming equations (see e.g. Eq. (6.5) in Bather’s (2000) book). For j = 1, 2, . . . , N  1 we have two cases. Case 1.

If z1, z2, . . . , zK, r P 0, then

fj ðz; rÞ ¼ minfHj ðz; rÞ; Aj g: Case 2.

If

PK

 i¼1 zi

þ r < 0, then

( ej H

fj ðz; rÞ ¼ 2cj0 þ min

ð1Þ



K X

! zi ; r 

) ; Aj :

ð2Þ

i¼1

j ¼ 1; . . . ; N  1:

The road network is depicted in Fig. 1. Let zi, i = 1, 2, . . . , K, be the load of product i carried by the vehicle after the first visit at a customer’s site, and r the empty space; negative values for zi and r denote the unsatisfied demand for product i P and lack of empty space for returned products. Let z ¼ Ki¼1 z i , with

For z1, z2, . . . , zK, r P 0 we have

Hj ðz; rÞ ¼ cj;jþ1 þ Efjþ1 z  njþ1 ; r þ

K X

!   jþ1 ; min zi ; njþ1  w i

ð3Þ

i¼1

for z + r < 0 we have

e j ðz; rÞ ¼ cj;jþ1 þ H PK  h:

Efjþ1

min

h 6Q þminðz ;r Þ i¼1 i

h  njþ1 ; Q þ r  

! K X jþ1 jþ1 ; ðhi  minðhi ; ni ÞÞ  w i¼1

ð4Þ and

Aj ¼ cj0 þ cjþ1;0 þ Pmin K  h:

Efjþ1 h  njþ1 ; Q 

h 6Q i¼1 i

! K X jþ1 : ðhi  minðhi ; njþ1 ÞÞ  w i i¼1

ð5Þ In the boundary we have

! K X   fN ðz; rÞ ¼ cN0 þ 2cN0 1 zi þ r < 0 : i¼1

Fig. 1. The road network for the finite-horizon problem.

Finally, the minimum total expected cost is

ð6Þ

580

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

f0 ¼ c10 þ Pmin K  h:

Ef1

h  n1 ; Q 

! K X 1 1 ðhi  minðhi ; ni ÞÞ  w :

h 6Q i¼1 i

e j ð1; rÞ ¼ min Bj  2cj0  H PK 

h 6Q i¼1 i

h:

i¼1

ð7Þ In (3), (4), (5), (7) the expected values are taken with respect to the random vectors  nj ; j ¼ 1; . . . ; N, and to the random variables wj, j = 1, . . . , N. The first term in the curly brackets in (1) corresponds to the action of proceeding to the next customer and the second term corresponds to the action of going to the depot for restocking before proceeding to the next customer. The first term in the curly brackets in (2) corresponds to the action of returning to the depot once for restocking before proceeding to the next customer and the second term corresponds to the action of returning to the depot twice for restocking before proceeding to the next customer. In the following lemma we prove monotonicity properties needed for the characterization of the structure of the optimal routing strategy. e j ðz; rÞ are non-increasing in each of their Lemma 1. Hj ðz; rÞ and H arguments.

Efjþ1

h  njþ1 ; Q 

!

K X

jþ1 ðhi  minðhi ; njþ1 i ÞÞ  w

i¼1

 Pmin K  h:

Efjþ1

h  njþ1 ; Q 

h 6Q1 i¼1 i

K X jþ1 ðhi  minðhi ; njþ1 i ÞÞ  w

! 6 0:

i¼1

ð10Þ From Eq. (2) we have

e j ð1; rÞ; Aj g; fj ðzðK1Þ ; 1; rÞ ¼ 2cj0 þ minf H which combined with Eqs. (9) and (10) yields fj ðzðK1Þ ; 0; rÞ 6 fj ðzðK1Þ ; 1; rÞ. The monotonicity property with respect to r is proved similarly. The induction hypothesis is used directly throughout the state space except in the proof of fj ðz; 0Þ 6 fj ðz; 1Þ, where z P is such that Ki¼1 z i ¼ 0. This proof is carried out by the use of equations analogous to (9) (with fj ðz; 0Þ instead of fj ðzðK1Þ ; 0; rÞÞ and (10), which takes the form

e j ð0; 1Þ ¼ min Bj  2cj0  H PK  h:

Proof. To prove the lemma, we will need to show that fj ðz; rÞ is also non-increasing in its arguments. The proof is by induction on j. First, the induction base is established by fN ðz; rÞ being nonincreasing (Eq. (6)). Then, assuming that fjþ1 ðz; rÞ is non-increasing, e j ðz; rÞ, and fj ðz; rÞ are non-increasing. we will show that Hj ðz; rÞ; H  Function Hj ðz; rÞ is non-increasing by the induction hypothesis e j ðz; rÞ is non-increasing, and Eq. (3). To show that function H consider z0 6 z and r0 6 r. Then we get from Eq. (4) e j ðz; rÞ  H e j ðz0 ; r 0 Þ ¼ H

h:

PK

h i¼1 i

h:

PK

h:

PK

min

Efjþ1

min

K X jþ1 h  njþ1 ; Q þ r 0  ðhi  minðhi ; njþ1 i ÞÞ  w

Efjþ1

i¼1

Efjþ1

h 6Qþminðz0 ;r 0 Þ i¼1 i

! 6 0;

i¼1

ð8Þ

where the last inequality follows from the induction hypothesis. It remains to show that fj ðz; rÞ is non-increasing. First, we prove the monotonicity property with respect to zi, i = 1, 2, . . . , K. Because of symmetry it suffices to show the result with respect to one of PK1   the arguments, say zK. For i¼1 zi þ r < 0 the result follows e directly from the monotonicity of H j ðzÞ and Eq. (2). Consider now z1, z2, . . . , zK1, r P 0, and let zðK1Þ be the vector consisting of the first K  1 elements of z. For 0 6 z0K < zK and z0K < zK < 0 we get fj ðzðK1Þ ; zK ; rÞ 6 fj ðzðK1Þ ; z0K ; rÞ by the monotonicity of Hj ðzÞ and e j ðzÞ respectively (Eqs (1) and (2)). To complete the proof we still H need to show that fj ðzðK1Þ ; 0; rÞ 6 fj ðzðK1Þ ; 1; rÞ. From Eq. (1) and the triangle inequality we get fj ðzðK1Þ ; 0; rÞ 6 Aj ¼ cj0 þ cjþ1;0 þ Pmin K  h:

Efjþ1

h 6Q i¼1 i

K X jþ1 h  njþ1 ; Q  ðhi  minðhi ; njþ1 i ÞÞ  w

!

i¼1

6 2cj0 þ cj;jþ1 þ Pmin K  h:

Efjþ1 h  njþ1 ; Q 

h 6Q i¼1 i

K X jþ1 ðhi  minðhi ; njþ1 i ÞÞ  w

! ¼ Bj :

i¼1

ð9Þ

From Eqs. (4) and (9) we get

jþ1 ðhi  minðhi ; njþ1 i ÞÞ  w

h 6Q 1 i¼1 i

6 0;

i¼1

and is proved similarly to Eq. (8).

h

6

! K X h  njþ1 ; Q þ r   ðhi  minðhi ; nijþ1 ÞÞ  wjþ1  K X jþ1 h  njþ1 ; Q þ r 0  ðhi  minðhi ; njþ1 i ÞÞ  w

!

K X

Theorem 1.

!

i¼1

h 6Qþminðz0 ;r 0 Þ i¼1 i

min

K X jþ1  ðhi  minðhi ; njþ1 i ÞÞ  w i¼1

h 6Qþminðz0 ;r 0 Þ i¼1 i

PK  h:

6Qþminðz ;r  Þ

 Pmin K  h:

i¼1

Efjþ1 h  njþ1 ; Q  1 

!

The following theorem characterizes the optimal vehicle routing strategy after it visits customer j 2 {1, . . . , N  1} for the first time.

!

Efjþ1 h  njþ1 ; Q þ r  

min

Efjþ1

h 6Q i¼1 i

K X jþ1 h  njþ1 ; Q  ðhi  minðhi ; njþ1 i ÞÞ  w

(i) For each z1, z2, . . . , zK P 0 there exists integer s1j(z1, z2, . . . , zK) P 0 such that it is optimal for the vehicle to proceed to customer j + 1 if and only if r P s1j (z1, z2, . . . , zK). Moreover, s1j(z1, z2, . . . , zK) is non-increasing in each of its arguments. (ii) There exists rj 6 0 such that it is optimal for the vehicle to make two trips to the depot when z1, z2, . . . , zK P 0 and r < rj. (iii) There exists qj 6 0 such that it is optimal for the vehicle to make P two trips to the depot when r P 0 and Ki¼1 z i < qj . (iv) For each r < 0 there exists integer s2j(r) < 0 such that it is optimal for the vehicle to make two trips to the depot if and only P if Ki¼1 z i 6 s2j ðrÞ. Moreover, s2j (r) is non-increasing.

Proof. The existence of threshold s1j(z1, z2, . . . , zK) is a consequence of Hj ðz; rÞ being non-increasing in its arguments. To see why this threshold function is non-increasing in its arguments, assume that   z0K < zK and s1j z1 ; z2 ; . . . ; z0K < s1j ðz1 ; z2 ; . . . ; zK Þ. By the definition of   the threshold function, for z1 ; z2 ; . . . ; z0K and r ¼ s1j z1 ; z2 ; . . . ; z0K it is optimal for the vehicle to proceed to the next customer. Because   Hj ðz; rÞ is non-increasing, for z1, z2, . . . , zK and r ¼ s1j z1 ; z2 ; . . . ; z0K it should still be optimal for the vehicle to proceed to the next customer. However, this is a contradiction because the specific value of r is below the threshold s1j(z1, z2, . . . , zK). e j ðz; rÞ being non-increasing in its Parts (ii)–(iv) follow from H arguments. h 2.2. A special-purpose dynamic programming algorithm In view of the above theorem, the optimal policy for K = 1, i.e. the critical numbers s1j(z1) P 0, 0 6 z1 6 Q, s2j(r) 6 0, 1 6 r 6 Q,

581

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

rj 6 0 and qj 6 0 for each customer j 2 {1, . . . , N  1} can be found by the following special-purpose dynamic programming algorithm: Algorithm for the determination of the critical numbers rj, qj, s1j(z1), 0 6 z1 6 Q and s2j(r), 1 6 r 6  Q for customer j = 1, . . . , N  1    Step 0. Set fN ðz1 ; rÞ ¼ cN0 þ 2cN0 1 z jrj 6 Q, 1 þ r < 0 ; jz1 j 6 Q , z1 + r 6 Q, and j = N  1. Step 1. Set r = 1. Step 2. (Determination of critical number rj) e j ð0; rÞ > Aj , do the following: If H 1. Set rj = r + 1. 2. For (z1, r) such that 0 6 z1 6 Q, rj 6 r < 0 set fj ðz1 ; rÞ ¼ e j ð0; rÞ. 2cj0 þ H 3. For (z1, r) such that 0 6 z1 6 Q, Q 6 r < rj set fj(z1, r) = 2cj0 + Aj. 4. Go to Step 3. Otherwise, set r = r  1. If r = Q  1, do the following: 1. Set rj = Q. 2. For (z1, r) such that 0 6 z1 6 Q, Q 6 r < 0 set fj ðz1 ; rÞ ¼ e j ð0; rÞ. 2cj0 þ H 3. Go to Step 3. Otherwise, go to Step 2. Step 3. Set z1 = 1. Step 4. (Determination of critical number qj) e j ðz1 ; 0Þ > Aj , do the following: If H 1. Set qj = z1 + 1. 2. For (z1, r) such that qj 6 z1 < 0, 0 6 r 6 Q set fj ðz1 ; rÞ ¼ e j ðz1 ; 0Þ. 2cj0 þ H 3. For (z1, r) such that Q 6 z1 < qj, 0 6 r 6 Q set fj(z1, r) = 2cj0 + Aj. 4. Go to Step 5. Otherwise, set z1 = z1  1. If z1 = Q  1, do the following: 1. Set qj = Q. 2. For (z1, r) such that Q 6 z1 < 0, 0 6 r 6 Q set fj ðz1 ; rÞ ¼ e j ðz1 ; 0Þ. 2cj0 þ H 3. Go to Step 5. Otherwise, go to Step 4. Step 5. Set z1 = 0. Step 6. Set r = Q  z1. Step 7. (Determination of critical number s1j(z1)) If Hj(z1, r) > Aj, do the following: 1. Set s1j(z1) = r + 1. 2. For 0 6 r 6 s1j(z1)  1 set fj(z1, r) = Aj. 3. For s1j(z1) 6 r 6 Q  z1 set fj(z1, r) = Hj(z1, r). 4. Set z1 = z1 + 1. If z1 6 Q, go to Step 6. Otherwise, go to Step 8. Otherwise, set r = r  1. If r = 1, do the following: 1. Set s1j(z1) = 0. 2. For 0 6 r 6 Q  z set fj(z1, r) = Hj(z1, r). 3. Set z1 = z1 + 1. If z1 6 Q, go to Step 6. Otherwise, go to Step 8. Otherwise, go to Step 7. Step 8. Set r = 1. Step 9. Set z1 = 1. Step 10. (Determination of critical number s2j(r)) e j ðz1 ; rÞ > Aj , do the following: If H 1. Set s2j(r) = z1. 2. For Q 6 z1 6 s2j(r) set fj(z1, r) = 2cj0 + Aj. e j ðz1 ; rÞ. 3. For s2j(r) + 1 6 z1 6  1 set fj ðz1 ; rÞ ¼ 2cj0 þ H 4. Set r = r  1. If r P  Q, go to Step 9. Otherwise, go to Step 11.

Otherwise, set z1 = z1  1. If z1 = Q  1, do the following: 1. Set s2j(r) = Q  1. e j ðz1 ; rÞ. 2. For Q 6 z1 6  1 set fj ðz1 ; rÞ ¼ 2cj0 þ H 3. Set r = r  1. If r P  Q, go to Step 9. Otherwise, go to Step 11. Otherwise, go to Step 10. Step 11. Set j = j  1. If j P 1 go to Step 1. Otherwise, stop. 2.3. Numerical results As illustration we present the following example. Example 1. Suppose that N = 7, Q = 10, K = 1. We give below the symmetric matrix C = (cij), 0 6 i, j 6 7, whose non-zero elements are the travel costs cj,j+1 between customer j 2 {1, . . . , 6} and customer j + 1 and the travel costs cj0 between customer j 2 {1, . . . , 7} and the depot. We observe that these costs satisfy the triangle inequality.

0

0 7 6 5 4 3 7 6

B7 B B B6 B B5 B C¼B B4 B B3 B B @7 6

1

0 5 0 0 0 0 0C C C 5 0 4 0 0 0 0C C 0 4 0 3 0 0 0C C C: 0 0 3 0 2 0 0C C 0 0 0 2 0 5 0C C C 0 0 0 0 5 0 2A 0

0

0

0

0 2 0

We assume that for each customer j 2 {1, . . . , 7} the demand nj1 for new items of the product 1 and the quantity wj of returned items are independent and follow the binomial distribution B(10,     10 0.4), i.e. Pr nj1 ¼ x ¼ Prðwj ¼ xÞ ¼ 0:4x 0:610x ; x ¼ 0; . . . ; 10. x In Table 1 below we present, for each customer j = 1, . . . , 6, the critical numbers s1j(z1), 0 6 z1 6 Q, s2j(r), Q 6 r 6  1, rj, qj. Note that Parts (i) and (iv) of Theorem 1 are confirmed numerically since for j 2 {1, . . . , 6}, s1j(z1), 0 6 z1 6 10, and s2j(r), 1 6 r 6  10, are non-increasing in z1 and r, respectively. In Figs. 2 and 3 below we present the optimal decision for each state (z1, r), jz1j 6 10, jrj 6 10, z1 + r 6 10, after the first visit to the first and to the third customer. Specifically, the action of proceeding to the next customer is denoted by a dark blue dot, the action of returning to the depot once is denoted by a red square and the action of making two trips to the depot is denoted by a light blue rhomb. We implemented the algorithm by running the corresponding Matlab programs on a personal computer equipped with an Intel Core 2 Duo, 2.5 GHz processor and 4 GB of RAM. The computation time (38.81 seconds) of the special purpose algorithm is considerably smaller than the computation time (58.39 seconds) of the initial dynamic programming algorithm that is based on Eqs. (1)– (6). The minimum total expected cost f0 ¼ c10 þ min06h6Q  Ef1 h  n11 ; Q  h þ min h; n11  w1 is found to be approximately equal to 65.29. Both algorithms enable us to determine the optimal quantity of product 1 that is loaded in the vehicle when it returns to the depot for replenishment. For example, if after the first visit of the vehicle to the customer 1 the state is (z1, r) = (5, 4), then the optimal decision is to return to the depot to empty the old items that it carries, load 5 items (owed quantity), return to customer 1 to deliver the owed quantity and then proceed to customer 2. If after the first visit to customer 1 the state is (z1, r) = (5, 7), the optimal decision is to return to the depot, empty returned products, load 5 items (owed quantity), return to customer 1 in or-

582

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586 Table 1 The critical numbers of the optimal policy. Customer j

s1j(z1), 0 6 z1 6 10, s2j(r), 10 6 r 6 1, rj, qj

1

s11 ð0Þ ¼ 11; s11 ð1Þ ¼ 4; s11 ð2Þ ¼ 2; s11 ð3Þ ¼ 1; s11 ð4Þ ¼    ¼ s11 ð10Þ ¼ 0; s21 ð1Þ ¼    ¼ s21 ð5Þ ¼ 10; s21 ð6Þ ¼ 9; s21 ð7Þ ¼    ¼ s21 ð10Þ ¼ 1; r 1 ¼ 6; q1 ¼ 9

2

s12 ð0Þ ¼ 3; s12 ð1Þ ¼ s12 ð2Þ ¼ s12 ð3Þ ¼ 1; s12 ð4Þ ¼    ¼ s12 ð10Þ ¼ 0; s22 ð1Þ ¼    ¼ s22 ð7Þ ¼ 11; s22 ð8Þ ¼ 10; s22 ð9Þ ¼ s22 ð10Þ ¼ 1; r 2 ¼ 8; q2 ¼ 10

3

s13 ð0Þ ¼ 11; s13 ð1Þ ¼ 6; s13 ð2Þ ¼ 2; s13 ð3Þ ¼ 1; s13 ð4Þ ¼    ¼ s13 ð10Þ ¼ 0; s23 ð1Þ ¼ s23 ð2Þ ¼ s23 ð3Þ ¼ 10; s23 ð4Þ ¼ s23 ð5Þ ¼ s23 ð6Þ ¼ 9; s23 ð7Þ ¼    ¼ s23 ð10Þ ¼ 1; r 3 ¼ 6; q3 ¼ 9

4

s14 ð0Þ ¼ 2; s14 ð1Þ ¼    ¼ s14 ð10Þ ¼ 0; s24 ð1Þ ¼    ¼ s24 ð8Þ ¼ 11; s24 ð9Þ ¼ 10; s14 ð10Þ ¼ 1; r 4 ¼ 9; q4 ¼ 10

5

s15 ð0Þ ¼ 11; s15 ð1Þ ¼ 10; s15 ð2Þ ¼ 9; s15 ð3Þ ¼ 8; s15 ð4Þ ¼ 3; s15 ð5Þ ¼ 2; s15 ð6Þ ¼ 1; s15 ð7Þ ¼    ¼ s15 ð10Þ ¼ 0; s25 ð1Þ ¼ s25 ð2Þ ¼ s25 ð3Þ ¼ 7; s25 ð4Þ ¼    ¼ s25 ð10Þ ¼ 1; r 5 ¼ 3; q5 ¼ 6

6

s16 ð0Þ ¼ 11; s16 ð1Þ ¼ 3; s16 ð2Þ ¼    ¼ s16 ð10Þ ¼ 0; s26 ð1Þ ¼    ¼ s26 ð6Þ ¼ 10; s26 ð7Þ ¼ s26 ð8Þ ¼ 9; s26 ð9Þ ¼ s26 ð10Þ ¼ 1; r6 ¼ 8; q6 ¼ 9

3. The infinite-horizon problem

Fig. 2. The optimal decisions after the 1st visit to customer 1.

We consider the problem that we studied in the previous section with the following modification: The service of the customers does not stop when the service of the last customer N has been completed but it continues indefinitely with the same customer order. This means that, after the service of customer N has been completed, the vehicle services again customer 1, customer 2, and so on. Let cN1 denote the travel cost from customer N to customer 1. The road network is depicted in Fig. 4. The demands of the customers for the products and the quantities of the returned products are renewed at successive tours of the vehicle. We assume that for each customer j 2 {1, . . . , N} the distribution of the random vector ðnj1 ; . . . ; njK ; wj Þ that consists of the demands for products 1, . . . , K and the quantity that is given to the vehicle remains the same at each cycle. We suppose that, at each cycle, the vehicle visits each customer, satisfies as much demands as possible and picks up the largest possible quantity of returned products, and chooses one decision among some possible decisions that coincide with the possible decisions in the finite-horizon problem. If the customer’s demands for all products are satisfied and the whole quantity of returned products is picked up by the vehicle, there are two possible decisions: (i) to proceed directly to the next customer and (ii) to go to the depot to empty returned items and restock with new products (not necessarily to the full capacity of the vehicle) and then go to the

Fig. 3. The optimal decisions after the 1st visit to customer 3.

der to deliver the owed quantity and pick up 7 items (remaining returned items), make a second trip to the depot, empty 7 returned items, load 3 new items of the product and go to customer 2.

Fig. 4. The road network for the infinite-horizon problem.

583

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

next customer. If part of the customer’s demands is not satisfied and/or there is not enough empty space in the vehicle to pick up the whole quantity of the returned products, the possible options are: (i) to go to the depot to empty returned items, restock the owed quantity and load additional items of all products (so that there is some remaining empty space for the remaining returned products), return to the customer, deliver the owed quantity and/or pick up the remaining returned products, and then proceed to the next customer and (ii) to go to the depot to empty returned items, restock the owed quantity, return to the customer, deliver the owed quantity and/or pick up the remaining returned products, make a second trip to the depot, empty returned items, restock with new products and proceed to the next customer. It is assumed that the driver selects his decisions at equidistant time epochs s = 0, 1, . . . (e.g. every 12 hours). This means that if, for example, the vehicle visits the third customer and the decision is selected at 8 am then the next decision is selected at 8 pm after the visit at fourth customer’s site. It is also assumed that the time interval between two consecutive decision epochs is greater than the required time for two trips of the vehicle if it follows any of the above decisions. Although we impose these assumptions in order to apply well-known results from the theory of Markov decision processes, there are situations in which these assumptions may hold, as the practical applications that we mentioned in Section 1. Specifically, in the first application (ex-van sales) suppose that the supply of the customers with new products and the collection of expired products does not stop when the service of the last customer has been completed but it continues with the same customer order for a long time horizon. It can be assumed that the driver selects his decisions at equidistant time epochs (e.g. every 12 hours). In the second application (self propelled vehicle in a manufacturing shop) it can be assumed that the supply of the workcenters with new discrete parts and the collection of useless discrete parts from the workcenters do not stop when the service of the last workcenter has been completed but they are continued indefinitely at equidistant time epochs with the same order. The routing of the vehicle in the infinite-horizon setting is controlled by a policy p that is a rule for choosing decisions at epochs s = 0, 1, . . . The decision that is chosen by a policy at a decision epoch may depend on the history of the process or may be randomized in the sense that it is chosen by specific probabilities. An appealing class of policies is the class of stationary policies. A stationary policy chooses at each decision epoch a decision that depends only on the current state of the system. The optimization criteria in the infinite-horizon problem are the minimization of the expected total discounted cost and the minimization of the expected long-run average cost per unit time. The expected total discounted cost of a policy p is defined as the expected total cost during an infinite-time horizon if the costs are discounted at a rate a 2 (0, 1) per unit time given that policy p is employed. The expected long-run average cost per unit time of a policy p is defined as the limit as n ? 1 of the expected cost incurred until the nth decision epoch divided by n, given that policy p is employed. Using well-known results of Markov decision processes (see Chapter 6 in Ross (1992)) we will see that, under any one of these criteria, the optimal policy is stationary and has the same structure as the optimal policy in the finite-horizon problem. The state space I of the system consists of all states ðj; z; rÞ, where j = 1, . . . , N is the customer and z ¼ ðz1 ; . . . ; zK Þ and r are the possible loads of products 1, . . . , K that remain in the vehicle and the empty space of the vehicle, respectively, after it has visited the jth customer and has satisfied as much demand as possible and picked up the largest possible quantity of returned products.

3.1. The structure of the discounted-cost optimal policy and of the average-cost optimal policy Let V an ðj; z; rÞ; ðj; z; rÞ 2 I; 0 < a < 1, be the minimum n-step expected discounted cost if the initial state is ðj; z; rÞ 2 I and a is the discount factor. This quantity satisfies the following dynamic programming equations for n = 1, 2, . . . If z1, . . . , zK, r P 0, then

( K   X  V n ðj; z; rÞ ¼ min cj;jþ1 þ aEV an1 j þ 1; z  njþ1 ; r þ min zi ; njþ1 i a

jþ1

w

i¼1

 ; cj0 þ cjþ1;0 þ a Pmin K  h:

i¼1

a

EV n1 ðj þ 1; h  njþ1 ; Q

hi 6Q

) K      X jþ1 jþ1  hi  min hi  min hi ; ni w ; i¼1 



and if z + r < 0, then a

V n ðj; z; rÞ ¼ 2cj0 þ min

8 < :

cj;jþ1 þ a P K  h:

min

h 6Q þminðz ;r  Þ i¼1 i

EV an1 j þ 1; h  njþ1 ; Q þ r  

K    X hi  min hi ; njþ1 i i¼1

! wjþ1 ; cj0 þ c0;jþ1 þ a Pmin K  h:

i¼1

EV an1

hi 6Q

K    X j þ 1; h  njþ1 ; Q  hi  min hi ; njþ1  wjþ1 i

!) :

i¼1

We also have that V a0 ðj; z; rÞ ¼ 0; ðj; zÞ 2 I. In the above equations we assume that N + 1 is equal to 1 since the next customer after customer N is the customer 1. It can be shown by induction on n that V an ðj; z; rÞ is non-increasing in z1, . . . , zK, r in the same way as we proved that fj ðz; rÞ is non-increasing in its arguments in Lemma 1. Let V a ðj; z; rÞ; ðj; z; rÞ 2 I, denote the a-discounted total expected cost if the initial state is ðj; zÞ 2 I. This quantity is finite since the state space I is finite. It satisfies the following optimality equations: If z1, . . . , zK, r P 0, then

V a ðj; z; rÞ ¼ min

  cj;jþ1 þ aEV a j þ 1; z  njþ1 ; r

! K X jþ1 þ minðzi ; njþ1 Þ  w ; cj0 þ cjþ1;0 i i¼1

EV a ðj þ 1; h  njþ1 ; Q

þ a Pmin K  h:

h 6Q i¼1 i

) K X jþ1  ðhi  minðhi  minðhi ; njþ1 ÞÞ  w Þ ; i i¼1

and if z + r < 0, then 8

V a ðj; z; rÞ ¼ 2cj0 þ min

< :

cj;jþ1 þ a P K  h:

min

h 6Q þminðz ;r  Þ i¼1 i

EV a j þ 1; h  njþ1 ; Q þ r  

K X ðhi  minðhi ; njþ1 i ÞÞ i¼1

 wjþ1 ; cj0 þ c0;jþ1 þ a Pmin K  h:

EV a ðj þ 1; h  njþ1 ; Q

h 6Q i¼1 i

) K X jþ1 : ÞÞ  w Þ  ðhi  minðhi ; njþ1 i i¼1

It is well known (see Corollary 6.6 in Ross (1992)) that, as n ! 1; V an ðj; z; rÞ ! V a ðj; z; rÞ. Hence, the first terms in the curly

584

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

brackets in the above optimality equations are non-increasing in z1, . . . , zK, r and z, r, respectively. This implies that the a-discounted cost optimal policy has the threshold-type structure described in Theorem 1. We focus now on the minimization of the expected average  0Þ 2 I is accessible from cost. First we note that the state ð1; 0; any other state under any stationary policy. From Corollary 6.20 in Ross (1992) it follows that there exist numbers g and hðj; z; rÞ; ðj; z; rÞ 2 I, such that If z1, . . . , zK, r P 0, then

( hðj; z; rÞ ¼ min cj;jþ1  g þ Ehðj þ 1; z  njþ1 ; r þ

K X

minðzi ; njþ1 i Þ

i¼1

Ehðj þ 1; h  njþ1 ; Q

wjþ1 Þ; cj0 þ cjþ1;0  g þ Pmin K  h:

If z1, r P 0, jz1j 6 Q, jrj 6 Q, z1 + r 6 Q then

h 6Q i¼1 i

) K    X jþ1  ðhi  min hi  min hi ; njþ1 Þ  w ; i

  pðj;z1 ;rÞðjþ1;z0 ;r0 Þ ð0Þ ¼ Pr njþ1 ¼ z1  z01 ; wjþ1 ¼ r þ minðz1 ; z1  z01 Þ  r 0 ; 1

i¼1 

owed quantity z , loads additional product quantity  1  

h 2 0; . . . ; Q þ min z , returns to customer j, satisfies de1;r mand and/or picks up remaining returned products, and proceeds to customer j + 1. The action 3h0 means that the vehicle goes to the depot, empties returned products, loads the owed product quantity z 1 , returns to customer j, satisfies demand and/or picks up remaining returned products, makes a second trip to the depot where it empties returned products, and restocks with h0 2 {0, . . . , Q} items of product 1 and then proceeds to customer j + 1. Let pðj;z1 ;rÞðjþ1;z0 ;r0 Þ ðaÞ be the probability that the state at the next 1   decision epoch will be the state j þ 1; z01 ; r 0 if the present state is (j, z1, r) and the action a2f0; 1h ; 2h ; 3h0 g is selected, and let C((j, z1, r), a) be the corresponding expected cost. We give these quantities below.

1



and if z + r < 0, then 8 < hðj; z; rÞ ¼ 2cj0 þ min cj;jþ1  g þ P K : h:

where z01 ¼ z1 ; z1  1; . . . ; z1  Q ; r 0 ¼ r þ z1 ; . . . ; r  Q . If z1, r P 0, jz1j 6 Q, jrj 6 Q, z1 + r 6 Q, 0 6 h 6 Q then

min

h 6Q þminðz i¼1 i

Eh j þ 1; h  njþ1 ; Q þ r 

 ;r Þ

!

K X jþ1 ðhi  minðhi ; njþ1 ; i ÞÞ  w i¼1

cj0 þ c0;jþ1  g þ Pmin K  h:

K  X

 Eh j þ 1; h  njþ1 ; Q

h 6Q i¼1 i

  hi  min hi ; njþ1  wjþ1 i



!) :

i¼1

The above equations are known as the average-cost optimality equations. The number g is the minimum average cost. It does not depend on the initial state of the system. There also exists a sequence an ? 1 (see Theorem 6.18 in Ross (1992)) such that

 0Þ; ðj; z; rÞ 2 I: hðj; z; rÞ ¼ lim ½V an ðj; z; rÞ  V an ð1; 0; n!1

The monotonicity of V an ðj; z; rÞ with respect to zi, i = 1, . . . , K, and r implies that hðj; z; rÞ is non-increasing with respect to zi, i = 1, . . . , K, and r. Therefore, the first terms in the curly brackets in the above average-cost optimality equations are non-increasing in z1, . . . , zK, r and z, r, respectively. Hence the average cost optimal policy has the same threshold-type structure as the finite-horizon optimal policy and the discounted-cost optimal policy.

    jþ1 0 pðj;z1 ;rÞðjþ1;z0 ;r0 Þ ð1h Þ ¼ Pr njþ1 ¼ Q  h þ min h;h  z01  r 0 ; 1 ¼ h  z1 ; w 1

where z01 ¼ h; h  1; . . . ; h  Q ; r0 ¼ Q ; . . . ; h.     If z then 1 þ r < 0; jz1 j 6 Q ; jrj 6 Q ; 0 6 h 6 Q þ min z1 ; r

pðj;z1 ;rÞðjþ1;z0 ;r0 Þ ð2h Þ  1    ¼ h  z01 ; wjþ1 ¼ Q  h þ r þ min h; h  z01  r 0 ; ¼ Pr njþ1 1 where z01 ¼ h; h  1; . . . ; h  Q ; r0 ¼ Q þ r ; . . . ; h þ r  . 0  If z 1 þ r < 0; jz1 j 6 Q ; jrj 6 Q ; 0 6 h 6 Q then

pðj;z1 ;rÞðjþ1;z0 ;r0 Þ ð3h0 Þ  1  ¼ Pr njþ1 ¼ h0  z01 ; wjþ1 ¼ Q  h0 þ minðh0 ; h0  z01 Þ  r 0 ; 1 where z01 ¼ h0 ; h0  1; . . . ; h0  Q ; r 0 ¼ Q ; . . . ; h0 . If z1, r P 0, jz1j 6 Q, jrj 6 Q, z1 + r 6 Q then

Cððj; z1 ; rÞ; 0Þ ¼ cj;jþ1 ; Cððj; z1 ; rÞ; 1h Þ ¼ cj0 þ c0;jþ1 ; 0 6 h 6 Q :  If z 1 þ r < 0; jz1 j 6 Q ; jrj 6 Q ,

  Cððj; z1 ; rÞ; 2h Þ ¼ 2cj0 þ cj;jþ1 ; 0 6 h 6 Q þ min z1 ; r  ; C ððj; z1 ; rÞ; 3h0 Þ ¼ 3cj0 þ c0;jþ1 ; 0 6 h0 6 Q :

3.2. Computation of the average-cost optimal policy

As illustration we present the following example.

The average-cost optimal policy can be found numerically by the value-iteration algorithm, the policy-iteration algorithm and the linear programming formulation. We refer to Chapter 3 in Tijms (1994) for a detailed description of these algorithms. To implement these algorithms we must specify the one-step transition probabilities and the one-step expected costs. For simplicity we suppose that K = 1, i.e. there is only product 1. Let a 2 {0, 1h}, 0 6 h 6 Q, be the action that is selected when the system at a decision epoch is at state (j, z1, r) 2 I with z1, r P 0. We assume that the action a = 0 means that the vehicle goes directly to the next customer while the action a = 1h means that is goes to the depot, empties any returned products, restocks with h items of product 1 and then visits customer j + 1. Let a 2 f2h ; 3h0 g;    0 6 h 6 Q þ min z ; 0 6 h0 6 Q , is the action that is selected 1;r when the process at a decision epoch is at state (j, z1, r) 2 I with  z 1 þ r < 0. We assume that the action 2h means that the vehicle goes to the depot, empties returned products, restocks the

Example 2. Suppose that N = 7, Q = 8, K = 1. We give below the symmetric matrix C = (cij), 0 6 i, j 6 7, whose non-zero elements are the travel costs cj, j+1 between customer j 2 {1, . . . , 7} and customer j + 1 and the travel costs cj0 between customer j 2 {1, . . . , 7} and the depot. We observe that these costs satisfy the triangle inequality.

0

0

20 15 21 15 20 19 20

1

B 20 0 15 0 0 0 0 31 C C B C B B 15 15 0 17 0 0 0 0 C C B B 21 0 17 0 15 0 0 0 C C B C¼B C: C B 15 0 0 15 0 18 0 0 C B B 20 0 0 0 18 0 17 0 C C B C B @ 19 0 0 0 0 17 0 21 A 20 31

0

0

0

0

21

0

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

585

Table 2 The critical numbers of the average cost optimal policy. Customer j

s1j(z1), 0 6 z1 6 8, s2j(r), 8 6 r 6 1, rj, qj

1

s11 ð0Þ ¼ 1; s11 ð1Þ ¼    ¼ s11 ð8Þ ¼ 0; s21 ð1Þ ¼    ¼ s21 ð7Þ ¼ 9; s21 ð8Þ ¼ 1; r 1 ¼ 7; q1 ¼ 8

2

s12 ð0Þ ¼ 4; s12 ð1Þ ¼ 2; s12 ð2Þ ¼ 1; s12 ð3Þ ¼    ¼ s12 ð8Þ ¼ 0; s22 ð1Þ ¼    ¼ s22 ð4Þ ¼ 9; s22 ð5Þ ¼ 8; s22 ð6Þ ¼ s22 ð7Þ ¼ s22 ð8Þ ¼ 1; r2 ¼ 5; q2 ¼ 8

3

s13(0) =    = s13(8) = 0, s23(1) =    = s23(8) = 9, r3 = 8, q3 = 8

4

s14 ð0Þ ¼ 5; s14 ð1Þ ¼ 3; s14 ð2Þ ¼ 2; s14 ð3Þ ¼ 1; s14 ð4Þ ¼    ¼ s14 ð8Þ ¼ 0; s24 ð1Þ ¼ s24 ð2Þ ¼ s24 ð3Þ ¼ 9; s24 ð4Þ ¼ 8; s24 ð5Þ ¼    ¼ s24 ð8Þ ¼ 1; r4 ¼ 4; q4 ¼ 4

5

s15 ð0Þ ¼ 2; s15 ð1Þ ¼    ¼ s15 ð8Þ ¼ 0; s25 ð1Þ ¼ s25 ð2Þ ¼ 9; s25 ð3Þ ¼ 8; s25 ð4Þ ¼    ¼ s25 ð7Þ ¼ 6; s25 ð8Þ ¼ 1; r5 ¼ 7; q5 ¼ 8

6

s16 ð0Þ ¼ 5; s16 ð1Þ ¼ 3; s16 ð2Þ ¼ 1; s16 ð3Þ ¼    ¼ s16 ð8Þ ¼ 0; s26 ð1Þ ¼ s26 ð2Þ ¼ s26 ð3Þ ¼ 9; s26 ð4Þ ¼ 8; s26 ð5Þ ¼ 7; s26 ð6Þ ¼ s26 ð7Þ ¼ s26 ð8Þ ¼ 1; r 6 ¼ 5; q6 ¼ 8

7

s17 ð0Þ ¼ 9; s17 ð1Þ ¼ 8; s17 ð2Þ ¼ 5; s17 ð3Þ ¼ 3; s17 ð4Þ ¼ 2; s17 ð5Þ ¼ 1; s17 ð6Þ ¼ s17 ð7Þ ¼ s17 ð8Þ ¼ 0; s27 ð1Þ ¼ 7; s27 ð2Þ ¼ 6; s27 ð3Þ ¼    ¼ s27 ð8Þ ¼ 1; r 7 ¼ 2; q7 ¼ 6

We assume that for each customer j 2 {1, . . . , 7} the demand nj1 for new items and the quantity wj of returned items are independent and follow the uniform distribution in the set {0, 1, . . . , 8}, i.e.   Pr nj1 ¼ x ¼ Prðwj ¼ xÞ ¼ 1=9; x ¼ 0; . . . ; 8. The standard valueiteration algorithm does not converge in this example. This is

due to the periodicity (with period N) of all states of the system under any stationary policy. This problem can be circumvented by a perturbation of the one-step transition probabilities so that a transition from a state to itself with non-zero probability is allowed. Specifically, we take the following new one-step probabilities

~ðj;z1 ;rÞðjþ1;z0 ;r0 Þ ðaÞ ¼ spðj;z1 ;rÞðjþ1;z0 ;r0 Þ ðaÞ; p ~ðj;z1 ;rÞðj;z1 ;rÞ ðaÞ ¼ 1  s; p 1 1

Fig. 5. The optimal decisions after the 1st visit to customer 6.

where s is a constant such that 0 < s < 1. A reasonable choice for the value of s is 0.5. The perturbed model has the same average-cost optimal policy as the original model (see p. 209 in Tijms (1994)). We implemented the value iteration algorithm in the perturbed model and we chose e = 103 as the tolerance number in the stopping criterion of the algorithm. The algorithm converged to the optimal policy after 56 iterations. The required computation time was 12.18 seconds. The average cost of the optimal policy was found to be 44.67. In Table 2 we present, for each customer j = 1, . . . , 7, the critical numbers s1j(z1), 0 6 z1 6 8, s2j(r), 8 6 r 6  1, rj, qj that correspond to the average-cost optimal policy. From Table 2 we observe that Parts (i) and (iv) of Theorem 1 are confirmed numerically since for j 2 {1, . . . , 7}, s1j (z1), 0 6 z1 6 8, and s2j (r), 8 6 r 6 1, are non-increasing in z1 and r, respectively. In Figs. 5 and 6 we present the optimal decision for each state (z1, r), jz1j 6 8, jrj 6 8, z1 + r 6 8, after the first visit to customer 6 and customer 7, respectively. The value iteration algorithm enables us to determine the optimal quantity of product 1 that is loaded in the vehicle when it returns to the depot. For example, at the state (j, z1, r) = (6, 3, 8) the optimal decision is to go to the depot to unload, return to customer 6 in order to pick up the remaining 8 old items, make a second trip to the depot, unload the old items, restock with 7 new items and go to customer 7. If the state is (j, z1, r) = (7, 3, 0), the optimal decision is to go to the depot, unload any returned items, restock with 5 new items, return to customer 7 to deliver the owed quantity (3 items) and proceed to customer 1. 4. Conclusions

Fig. 6. The optimal decisions after the 1st visit to customer 7.

In this paper we studied a simple capacitated vehicle routing problem with pick up and delivery. It was assumed that a single vehicle delivers to and collects K different products from N clients according to a particular order. There is no distinction between delivery and pick-up customers. Each customer demands a random quantity of new or fresh products and gives a random quantity of old or expired products. The actual values of these quantities are

586

D.G. Pandelis et al. / European Journal of Operational Research 231 (2013) 577–586

revealed only when the vehicle visits the client’s site. The items of all products are stored in the single compartment of the vehicle and are of the same size. For example, the vehicle delivers identical packages that contain new products and collect empty packages. Another example could be the supply of the stores of a particular area with many kinds of fresh milk according to a particular order. When the vehicle visits each store it pickups bottles with expired milk and delivers bottles with fresh milk. The vehicle may interrupt its route to return to the depot in order to empty the returned items, restock with new ones and then resume its route. The objective was to minimize the total expected cost for the service of all customers. We chose as decision epochs, the epochs at which the vehicle visits for the first time each client and has satisfied as much of his demands for new products as possible and has collected the largest possible quantity of the old products. This choice of the decision epochs together with the assumption that the customers are serviced according to a particular order enabled us for all possible states of the process to give suitable dynamic programming equations for the determination of the optimal policy. We showed that the optimal policy has a particular threshold-type structure. In view of this result, we developed a special-purpose dynamic programming algorithm that determines the optimal policy and is considerably faster than the initial dynamic programming algorithm. We also considered a corresponding infinite-horizon problem in which the service of the customers is not completed as soon as the last customer has been serviced, but it continues periodically with the same service order. For each customer the demands for new products and the quantity that is collected are renewed in each cycle and follow the same joint distribution. The decision epochs are again the epochs at which the vehicle arrives at a customer’s site for the first time and has delivered as much new products as possible and has collected as much old products as possible. It was assumed that the times between decision epochs are equal. Using standard techniques from Markov decision processes it was proved that the discounted-cost optimal policy and the average-cost optimal policy have the same threshold-type structure as the finitehorizon optimal policy.

Acknowledgments We would like to thank Dimosthenis Drivaliaris, Ioannis Minis and Christos Tarantilis for useful discussions.

References Bather, J., 2000. Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions. Wiley, Chichester. Bianchessi, N., Righini, G., 2007. Heuristic algorithms for the vehicle routing problem with simultaneous pick-up and delivery. Computers and Operations Research 34, 578–594. Gribkovskaia, I., Halskau, O., Laporte, G., Vlcek, M., 2007. General solutions to the single vehicle routing problem with pickups and deliveries. European Journal of Operational Research 180 (2), 568–584. Hernandez-Perez, H., Salazar-Gonzalez, J.-J., 2004. A branch-and-cut algorithm for a traveling salesman problem with pickup and delivery. Discrete Applied Mathematics 145, 126–139. Liong, C.Y., Wan Rosmanira, I., Khairuddin, O., Zirour, M., 2008. Vehicle routing problem: models and solutions. Journal of Quality Measurement and Analysis 4, 205–218. Minis, I., Tatarakis, A., 2011. Stochastic single vehicle routing problem with delivery and pickup and a predefined customer sequence. European Journal of Operational Research 213, 37–51. Nagy, G., Salhi, S., 2005. Heuristic algorithms for single and multiple depot vehicle routing problems with pickups and deliveries. European Journal of Operational Research 162, 126–141. Pandelis, D.G., Kyriakidis, E.G., Dimitrakos, T.D., 2012. Single vehicle routing problems with a predefined customer sequence, compartmentalized load and stochastic demands. European Journal of Operational Research 217, 324–332. Pandelis, D.G., Karamatsoukis, C.C., Kyriakidis, E.G., 2013. Single vehicle routing problems with a predefined customer order, unified load and stochastic discrete demands. Probability in the Engineering and Informational Sciences 27 (1), 1– 23. Rembold, U., Blume, C., Dillmann, R., 1985. Computer Integrated Manufacturing Technology and Systems. Marcel Dekker, New York. Ropke, S., Pisinger, D., 2006. A unified heuristic for a large class of vehicle routing problems with backhauls. European Journal of Operational Research 171, 750– 775. Ross, S.M., 1992. Applied Probability Models with Optimization Applications. Dover, New York. Simchi-Levi, D., Chen, X., Bramel, J., 2005. The Logic of Logistics: Theory, Algorithms and Applications for Logistics and Supply Chain Management. Springer, New York. Tatarakis, A., Minis, I., 2009. Stochastic single vehicle routing with a predefined customer sequence and multiple depot returns. European Journal of Operational Research 197, 557–571. Tijms, H.C., 1994. Stochastic Models: An Algorithmic Approach. Wiley, New York. Toth, P., Vigo, D., 1997. An exact algorithm for the vehicle routing problem with backhauls. Transportation Science 31, 372–385. Toth, P., Vigo, D., 1999. A heuristic algorithm for the symmetric and asymmetric vehicle routing problems with backhauls. European Journal of Operational Research 113, 528–543. Toth, P., Vigo, D. (Eds.), 2002. The Vehicle Routing Problem. Siam, Philadelphia. Tsirimpas, P., Tatarakis, A., Minis, I., Kyriakidis, E.G., 2008. Single vehicle routing with a predefined customer sequence and multiple depot returns. European Journal of Operational Research 187, 483–495. Yang, W.-H., Mathur, K., Ballou, R.H., 2000. Stochastic vehicle routing problem with restocking. Transportation Science 34, 99–112. Zachariadis, E.E., Tarantilis, C.D., Kiranoudis, C.T., 2010. An adaptive memory methodology for the vehicle routing problem with simultaneous pick-ups and deliveries. European Journal of Operational Research 202, 401–411.