Validating vehicle routing zone construction using Monte Carlo simulation

Validating vehicle routing zone construction using Monte Carlo simulation

European Journal of Operational Research 206 (2010) 73–85 Contents lists available at ScienceDirect European Journal of Operational Research journal...

284KB Sizes 2 Downloads 55 Views

European Journal of Operational Research 206 (2010) 73–85

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Production, Manufacturing and Logistics

Validating vehicle routing zone construction using Monte Carlo simulation Jonathan F. Bard a,*, Ahmad I. Jarrah b, Jing Zan a a b

Graduate Program in Operations Research, University of Texas, Austin, TX, United States Dept. of Decision Sciences, The George Washington University, Washington, DC, United States

a r t i c l e

i n f o

Article history: Received 20 January 2009 Accepted 30 January 2010 Available online 17 February 2010 Keywords: Vehicle routing zones Monte Carlo simulation Traveling salesman problem Clustering analysis Pickup and delivery operations

a b s t r a c t The primary purpose of this paper is to validate a clustering procedure used to construct contiguous vehicle routing zones (VRZs) in metropolitan regions. Given a set of customers with random demand for pickups and deliveries over the day, the goal of the design problem is to cluster the customers into zones that can be serviced by a single vehicle. Monte Carlo simulation is used to determine the feasibility of the zones with respect to package count and tour time. For each replication, a separate probabilistic traveling salesman problem (TSP) is solved for each zone. For the case where deliveries must precede pickups, a heuristic approach to the TSP is developed and evaluated, also using Monte Carlo simulation. In the testing, performance is measured by overall travel costs and the probability of constraint violations. Gaps in tour length, tour time and tour cost are the measure used when comparing exact and heuristic TSP solutions. To test the methodology, a series of experiments were conducted using data provided by a leading shipping carrier for the Pittsburgh area. Currently, the region is divided into 73 VRZs, compared to 64 indicated by the clustering procedure. The simulation results showed that a redesign would yield approximately $334,360 in annual savings without any noticeable deterioration in service. In addition, when the heuristic TSP model was solved in place of the exact model, the average gap in tour duration increased by only 0.16 hours and 0.2 hours for the cases of 73 clusters and 64 clusters, respectively, indicating a small upward bias. However, runtimes decreased by almost 70%. Ó 2010 Elsevier B.V. All rights reserved.

1. Introduction Transport companies that regularly make pickups and deliveries in a local region are faced with the problem of how best to group customers to minimize the number of vehicles that are required to provide service. Although some businesses like wholesale distributors or online groceries may in theory try to solve a vehicle routing problem (VRP) every day as demand changes (e.g., see Mourgaya and Vanderbeck, 2007), it is more common for shipping carriers to first establish clusters or zones and then leave it to the drivers to determine the daily tours. When the number of customers in each zone is small, as is the case with just-in-time suppliers, or when the density of customers is high, as with postal deliveries, the driver will follow the same route each day skipping those sites with absent demand (Baudin, 2004). Our interest is in the intermediate case where the customer base is large but the probability that any particular customer will require service on any day may vary widely, so routes have to be replanned continually. This is the environment in which express package carriers such as UPS, FedEx and DHL operate. For companies like these that may have anywhere from 5000 to 100,000 customers in a region, driver work areas are typically designed around the size of a standard vehicle. At the planning level, demand for an average month is used as input to a clustering model (Chiou and Lan, 2001; Ríos-Mercado and Fernández, 2009) whose solution provides the minimum number of vehicles required for an average day. Once the work areas are established, drivers must solve a traveling salesman problem (TSP) to determine an optimal tour for the realized demand. This involves consideration of both pickups and deliveries (P&Ds), and perhaps time windows. Of course, this is only the goal; in practice, few if any companies have invested in the technology necessary to construct routes automatically, never mind derive optimal tours. Instead, they rely on the drivers’ familiarity with their territory to minimize travel times. In the classical vehicle routing zone (VRZ) problem, each zone is expected to contain a certain number of customers, n, on average (Newell and Daganzo, 1986; Ouyang, 2007). The zones are planned optimally for the long run with respect to expected total distance.

* Corresponding author. Tel.: +1 512 471 3076/5; fax: +1 512 232 1489. E-mail addresses: [email protected] (J.F. Bard), [email protected] (A.I. Jarrah), [email protected] (J. Zan). 0377-2217/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2010.01.045

74

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

To accommodate day-to-day fluctuations in demand, slack vehicle capacity in some zones is used to cover overflow demand in neighboring zones. In extreme cases, either additional vehicles are put into service or routes are temporarily redrawn; that is, tour failures are consolidated into secondary ‘‘sweeper” routes (Taylor et al., 2001). Typically, the total system cost is formulated as a function of n, the number of vehicles h, and the assignment decision variables. Then, the hierarchical long-term strategic design and short-term operational components are combined into one integrated optimization model and solved to provide a good overall design. This two-step approach is widespread but lacking empirical validation. The purpose of this paper is to provide insight into the effectiveness of this type of sequential approach to planning and scheduling P&D operations. The analysis is based on zone designs obtained from the metaheuristic developed by Bard and Jarrah (2009) but carries over to any clustering methodology. The major components of their model are embodied in three sets of constraints used to control the size of a cluster. The first set separately restricts the expected package count for pickups and deliveries to the capacity of the chosen vehicle, the second restricts the expected time required to provide service to the working hours available in a day, and the third ensures that all clusters are geographically contiguous, i.e., no overlap. Although not necessary, separate capacity constraints were imposed because it is common for shipping carriers to begin pickups only after making all deliveries even if that means visiting the same location twice (Hall, 1996). In some contexts, the latter service is referred to as backhauls (e.g., see Ropke and Pisinger (2006), Salhi and Nagy, 1999). For the first two constraint sets, the standard approach is to use expected values to specify the parameters. While it is relatively easy to estimate the expected demand and service time for a customer from historical data, it is less obvious how to estimate travel time between customers. The problem lies in the fact that until the cluster is constructed and the demand is realized, it is not possible to pose the corresponding TSP, so travel time is a function of the unknown demand locations. This makes it difficult to validate tour feasibility, which must really be defined in probabilistic terms because all demand is random. At the analytic level, the primary objective of this paper is to introduce a framework for determining the probability that all customers with realized demand can be serviced without violating vehicle capacity or time constraints. Monte Carlo simulation is used to perform the analysis. A corollary objective is to estimate the cost saving that might accrue to a carrier who adopts the VRZs provided by the Bard–Jarrah (B–J) clustering heuristic. In the next section, various versions of the VRP and TSP are reviewed. In Section 3, a mathematical formulation of the capacitated clustering problem is given and need for validation explained. In Section 4, we outline the procedures used to estimate the model’s parameters. This is followed in Section 5 where probability distributions are estimated and the Monte Carlo simulation is described. During each repetition we need to run a TSP code for each cluster. This can be time consuming, especially when the number of customers in a cluster is large. To reduce runtimes, a TSP heuristic is proposed in Section 6 that separates the pickup from the delivery problem and then optimally merges their results. In Section 7, a series of experiments are performed using data provided by the sponsoring company for the Pittsburgh region. We close with an assessment of our findings. 2. Problem definition and related literature The classical VRP is defined on a graph G = (V, E) with vertex set V and edge set E, where a vehicle can either pick up or deliver the entire amount of a commodity available or required at a vertex. For our purposes, a route is a circuit over a subset of vertices, starting and finishing at the depot, denoted by 0. The P&D problem involves the construction of at most h routes such that all requests are satisfied, there are no transshipments of commodities, the load on a vehicle never exceeds its capacity, and the sum of route costs is minimized. A less studied variant of this problem is one in which the objective is to minimize the number of vehicles required to service all customers. Daganzo and Erera (1999) studied the problem of designing logistics systems in the presence of uncertainty. They considered the static vehicle routing problem where the goal is to minimize the transportation cost required to deliver or collect items from a set of scattered customers with vehicles of fixed capacity Q. To simplify the analysis, they first divided the service region into delivery zones having approximately Q units of demand, and then solved a TSP for each zone to get the tours. When the realized demand was above a vehicle’s capacity, drivers in neighboring zones were called upon to handle the excess or standby vehicles were dispatched. This is standard practice in the industry. The authors developed a two-stage method for designing this type of redundancy by determining primary delivery zones for the long run in terms of expected total distance, and then choosing the routes of the secondary vehicles in day-to-day operations. Thus if some customers are skipped in the primary visits due to the overload, they can be visited by either re-routed or supplementary vehicle tours. In our work, we also present a method for clustering customers but take the next step and try to validate the results using Monte Carlo simulation. Building on previous research in VRZ design, Ouyang (2007) proposed several algorithms for clustering customers without the need for manual adjustment. Initially, a set of wedge-shaped zones was created satisfying size and shape requirements. The wedges were then conformally mapped into square zones and a disk model was applied to obtain an approximately optimal partition. Further refinements were carried out by the weighted centroidal Voronoi tessellation algorithm to balance the delivery loads within the zones. Virtually all approaches to the stochastic VRZ problem, and most heuristics for the deterministic VRP, follow a cluster first – route second philosophy (e.g., see Bräysy and Gendreau, 2005). In the stochastic case, once the zones are fixed, a probabilistic TSP must be solved for each (Liu, 2007). For the most part, heuristics aimed at minimizing expected costs subject to some service constraint have become standard practice. Tang and Hooks (2005), for example, addressed the selective traveling salesperson problem with stochastic service times, travel times, and travel costs. In their model, they assumed that not all potential customers need or can be visited on any day of the planning horizon so a subset must be selected. Their objective was to maximize the total expected reward earned by visiting customers minus travel cost while guaranteeing that the probability that the tour duration did not exceed a given threshold more than a certain fraction of the time. Variants of the VRZ problem include the periodic TSP, the VRP with backhauls, and routing problems with stochastic travel times; for example, see Chao et al. (1995), Hemmelmayr et al. (2009), Toth and Vigo (1999). 3. Formulation of capacitated clustering problem To place our work in context, it is useful to examine a mathematical formulation of the generic clustering problem. We are given a service region and a set N of n customers whose (X, Y) coordinates are located on a geometric grid. Over the planning horizon T, which is de-

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

75

fined by a typical month and a 5-day work week, customer i 2 N has an average demand pi for pickups and an average demand di for deliveries per day. In our case, all deliveries must be made before pickups begin so these two operations must be treated separately. Service is provided by a fleet of homogeneous vehicles, each of which makes one trip per day. Using data for an average day, the objective is to partition the n customers into the minimum number of clusters or routes. In deriving a solution, the clusters should be contiguous, as defined below, and satisfy both capacity and time constraints. Contiguity is a desirable property because it ensures that vehicle tours do not overlap. Although this does not guarantee that the total distance traveled each day will be minimum due to peculiarities of the road system and the possibility of route failure, it is a general characteristic of m TSP solutions when capacity is not an issue. Definition 1. Let {C1, . . . , Cp} be a partition of n data points into p disjoint, collectively exhaustive subsets. The subsets Ck, k = 1, . . . , p, are said to be contiguous if conv(Ck) \ conv(Cl) = £ for all l – k, where conv(Z) is the convex hull of the points in the set Z. In the development of the model, we make use of the following notation. Indices and sets i k ðX i ; Y i Þni¼1 N

index for pickup and delivery points index for clusters X- and Y-coordinates of all P&D points in the service region set of all P&D customers in service region: N = NP [ ND

Parameters n nP nD pi di p Q TOT BREAK STEMk

total number of P&D points in the service region number of pickup points number of delivery points average number of packages to be picked up from customer i each day average number of packages to be delivered to customer i each day upper bound on the number of clusters cluster (vehicle) capacity total time available in a day for P&D operations time allocated for breaks during the day estimated time to drive to and from depot to cluster k estimated service time for a pickup at customer i plus drive time to next customer

sPi sDi a

estimated service time for a delivery at customer i plus drive time to next customer objective function weight; a 2 [0, 1]

Decision variables xik zk fk

1 if customer i is assigned to cluster k; 0 otherwise 1 if cluster k is defined; 0 otherwise centroid of cluster k

Model

Minimize

a

p n X X

xik ðkX i  fXk k2 þ kY i  fYk k2 Þ þ ð1  aÞ

i¼1 k¼1

subject to

p X

p X

zk

ð1aÞ

k¼1

xik ¼ 1;

i ¼ 1; . . . ; n

ð1bÞ

k¼1 nD X i¼1 n X

di xik 6 Q ;

nP X

pi xik 6 Q ;

k ¼ 1; . . . ; p

ð1cÞ

i¼1





sPi þ sDi xik 6 TOT  BREAK  STEMk ; k ¼ 1; . . . ; p

ð1dÞ

i¼1

xik 6 zk ;

i ¼ 1; . . . ; n;

k ¼ 1; . . . ; p

C k ¼ fi : xik ¼ 1; i ¼ 1; . . . ; ng; xik 2 f0; 1g; fk 2 R2 ;

zk 2 f0; 1g;

k ¼ 1; . . . ; p

ð1eÞ

k ¼ 1; . . . ; p satisfy Definition 1

i ¼ 1; . . . ; n;

k ¼ 1; . . . ; p

ð1fÞ ð1gÞ ð1hÞ

The objective function (1a) is designed to minimize a weighted sum of the squared distance of each customer to the center of the assigned cluster plus the total number of clusters created. The first term is aimed at satisfying the contiguity requirement. As a is varied from 0 to 1, the solution will represent a tradeoff between the smallest number of feasible clusters and the compactness of each cluster. Constraints (1b) ensure that each customer is assigned to exactly one cluster while (1c) restricts the load on a vehicle at any time to its capacity Q. Because all deliveries are made first, two separate inequalities are required. Constraints (1d) limit the total expected service and travel times to the number of hours available in a day. A discussion of how the parameters sPi ; sDi and STEMk are calculated is provided in Section 4. Unfortunately, STEMk is really a nonconvex function of the decision variables xik so any attempt to solve (1) directly would likely require some method of estimating its value. Constraints (1e) in conjunction with the second term in (1a) impose a setup cost for each additional cluster is created. When a = 1 and the values of fk are one of the n data points, we get the capacitated p-median problem (e.g., see Lorena and Senne, 2004). The contiguity

76

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

constraints are stated implicitly in (1f) because they cannot be written efficiently as linear inequalities (cf. Drexl and Haase, 1999). Finally, the variables are defined in constraints (1g) and (1h). In practice, the binary restriction on zk can be relaxed since these variables will always be integral when xik = 0 or 1. Model (1a)–(1h) has been termed a generic continuous centered clustering problem by Negreiros and Palhano (2006) who developed a two-phase solution approach. In phase 1, feasible solutions are constructed with a next-fit/best-fit algorithm. In phase 2, a variable neighborhood search heuristic is used to improve the results. Bard and Jarrah (2009) developed a three-phase approach with the objective of minimizing Rkzk while trying to ensure that the final clusters were as compact as possible. Their algorithm begins with a preprocessing step to aggregate customers who are likely to be in the same cluster. In phase 2, a randomized grid search procedure (RGSP) is used to form the clusters. Improvement is obtained in phase 3 by solving a set covering problem derived from the phase 2 results. For all VRZ problems, the coefficients in constraints (1c) and (1d) as well as the stem time must be estimated. Our immediate goals are (i) to validate the estimation process used by Bard and Jarrah, and (ii) to gain a better understanding of the probability of tour failure due to violations of these constraints. A secondary objective is to calculate the expected cost savings that the shipping carrier would realize by adopting the zone configuration obtained with the RGSP. 4. Travel time estimation At each step of any sequential algorithm aimed at solving model (1a)–(1h), it is necessary to check whether adding the next customer to the cluster being constructed will lead to a violation of the time constraint (1d). For each of the n customers, there are three components to the time parameter: 1. Stem time (time to travel from the depot to the first customer in the cluster plus the time to return to the depot from the last customer). 2. Travel time between customers (for customer i, this value is a function of its ni nearest neighbors and the visit probability pi of each). 3. Service time (function of the number of packages plus a fixed time). In a deterministic setting, which customers are to be visited on any given day along with their orders would be known prior to the start of service. For the application we have in mind, these values are only known probabilistically so it is not possible to actually sequence customers and calculate total route time. Therefore, all three time components must be estimated. We do this using the following data elements: the distance between each customer and the depot, the distance between customers, the expected number of packages delivered per day to each customer, and the visit frequencies (the latter two elements are themselves estimates that reflect an average day in the baseline month used for the case study). In the analysis, the second and third components are calculated separately for pickups and deliveries with the understanding that if both types of service are required, they will be provided by the same driver and vehicle. To calculate the travel times we developed an ordinal weighting scheme. In brief, consider the ni closest locations to customer i, and let Ri,[j] be the ranking of the jth closest customer to i, where Ri,[j] = j for j = 1, . . . , ni, let pj be the probability that customer j requires a visit on any day, and let wi,[j] be the ‘‘likelihood” that customer [j] follows i on the route. The equations for computing the wi,[j] values is given below, where superscript P = pickups and D = deliveries.

pPj ðni  Ri;½j þ 1Þ pDj ðni  Ri;½j þ 1Þ and wDi;½j ¼ Pni D P l¼1 pl ðni  Ri;½l þ 1Þ l¼1 pl ðni  Ri;½l þ 1Þ

wPi;½j ¼ Pni

ð2Þ

The intent of Eq. (2) is to take into account both the distance of customer j from customer i and the probability of a visit. The denominator of each term normalizes the wi,[j] values so they sum to 1 giving a proximate visit probability. The values of pPj and pDj are estimated by the frequency of visits per month. Now, to determine the expected time ti to the next customer for each customer i, let D(i, j) be the distance between customers i and j and let MPH(i, j) be the corresponding travel velocity. The calculations for pickups and deliveries are as follows:

tPi ¼

ni X j¼1

wPi;½j

Dði; ½jÞ MPHði; ½jÞ

and t Di ¼

ni X j¼1

wDi;½j

Dði; ½jÞ MPHði; ½jÞ

ð3Þ

To calculate the si values in (1d), we need the expected service time at customer i, which we denote by STOPPi and STOP Di , respectively. Using these parameters, we define

sPi ¼ STOPPi þ pPi tPi and sDi ¼ STOPDi þ pDi tDi : A primary goal of our analysis is to validate the accuracy of the above estimates. 5. Monte Carlo analysis In the original study by Bard and Jarrah, expected values were used for the number of packages picked up and delivered, while travel times were based on Eq. (3). Service times were calculated more accurately but still are estimates because they are based on expected package counts. To clarify some of this uncertainty, we need to determine whether the customers assigned to a particular cluster can be routed in a way that ensures, in terms of some given probability, that the time and capacity constraints are met each day. A company’s service standard defines this probability. In light of the random nature of the data, the most direct way to make this determination is to analyze each cluster separately using Monte Carlo simulation. In this procedure, the customer base is sampled to create a realization of those customers who require service on any given day along with their package counts. For each realization, the exact distance between customers, and between customers and the depot, is calculated and the resulting TSP solved. If the package count does not exceed the vehicle’s capacity, then the TSP solution

77

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

will tell us whether the time constraint is satisfied for the cluster. If the capacity constraint is violated, we know that a secondary route will be required on that day. Repeating this process many times allows us to construct a histogram of tour times, which can be used to determine, in conjunction with the proportion of times there is a capacity violation, the probability that the service standard is met. Probability values that exceed or fall short of our design criterion indicate a need to modify the clustering procedure to better align it with the time and capacity constraints. 5.1. Sampling distributions Table 1 displays the input data for the first 10 customers in the Pittsburgh region during the base month, September 2006, after minor aggregation. The first column gives the customer identification number, the second and third columns give the X- and Y-coordinates of the geographic location in degrees of longitude and latitude, respectively, the fourth column indicates the expected number of packages delivered, gDi , the fifth column gives the average number of delivery stops per day, fiD;STOP , for customer i over the month, and the last column n o indicates the probability of delivery, pDi , where pDi ¼ min fiD;STOP ; 1 . The same data exist for pickups. The parameter gDi was calculated by dividing the total number of packages delivered to customer i in the base month by 20, the number of working days in that month, while fiD;STOP was calculated by taking the total number of times a delivery was made and dividing by 20. In most cases, however, 20  gDi (as well as 20  fiD;STOP and 20  pDi ) is not integral because not all pickups and deliveries are reported by the drivers for specific customers. Many are only reported at a 5-digit ZIP level. To account for those instances, the company uniformly allocates the gross volume to individual ZIP codes within the 5-digit ZIP on the day the pickup or delivery took place. This leads to fractional delivery quantities and fractional stops. For example, assume that 40 packages were originally delivered on a Tuesday to 22 locations identified only by a 5-digit ZIP. If there are 16 customers in that ZIP code area, then each customer would be assigned an additional 40/16 = 2.5 packages and an additional 22/16 = 1.375 stops. In the remainder of this section, we discuss the various components of the simulation. 5.2. Customers Taking each cluster separately, to determine whether customer i is in a realization and receives a delivery, a random number r is drawn from a uniform distribution U[0, 1]. If r 6 pDi , then customer i is included and a second random number is drawn to determine the package count. The same procedure was used for pickups, but with the probability value pPi as the threshold. 5.3. Number of packages If customer i is in the realization and is scheduled to receive a delivery on a trial day, we need to generate his demand. Generally, customer characteristics vary widely across a service region. Some clusters have customers with a high demand who require daily pickups or deliveries, and others have customers with an infrequent demand of only one or two packages. It is rare that demand is high and infrequent. To determine the sampling distributions, we divided all the customers in the service region into 6–14 groups using K-means cluster analysis (e.g., see Hartigan and Wong, 1979). Each group was then manually adjusted so that the number of packages in each followed the same distribution. The computations were performed with SPSS 16.0. (Yockey, 2007). Not surprisingly, the number of customers in each group ranged from single digits up to several thousand. Taking Pittsburgh with 7291 delivery customers as an example, we found that 9 groups were ‘‘optimal”. The largest group contained 5850 customers, while the smallest contained 2. In many service regions, the great majority of customers are individuals or small businesses with sparse demand. Only a small number of customers, such as shopping malls, hospitals and retail outlets, have daily demand that approach the capacity of a vehicle. In those cases, which were viewed as outliers, demand was treated deterministically and fixed at the expected value. After forming the groups, histograms were constructed to gain a better understanding of how the data were distributed. In a few cases after investigating a wide range of distributions, it was possible to verify that either the lognormal or uniform distribution matched the data as determined by a One-Sample Kolmogorov–Smirnov test (K–S test). Recall that in this test, the null hypothesis is that the sample comes from the hypothesized (proposed) distribution. Using a 5% significance level, we based the decision to reject the null hypothesis on the p-value in the report of the paired-samples t-test. A small p-value gives evidence that the alternative hypothesis is true. For a common threshold of 5%, when the p-value is smaller than this value, we can say we reject the null hypothesis with strong evidence.

Table 1 Sample aggregated delivery input data for Pittsburgh. Customer, i

X-coordinates, Xi

Y-coordinates, Yi

Avg. no. deliveries/day, gD i

Avg. no. stops/day, fiD;STOP

Probability of delivery,

1 2 3 4 5 6 7 8 9 10

81.0048 80.9874 80.9712 80.9673 80.9508 80.9493 80.9473 80.9257 80.924 80.9237

40.4801 40.4522 40.4993 40.4990 40.5094 40.4501 40.488 40.5232 40.6134 40.4754

0.12 0.12 0.06 0.06 0.06 0.25 0.06 0.11 0.17 0.37

0.06 0.12 0.06 0.12 0.12 0.18 0.06 0.1 0.17 0.31

0.06 0.12 0.06 0.12 0.12 0.18 0.06 0.1 0.17 0.31

pDi

78

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

Table 2 Statistics for the eight empirically distributed groups (Pittsburgh deliveries).

N Mean Std. dev. Minimum Maximum Percentiles 20 40 60 80

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

5850 1.20 0.31 0.24 2.00

776 2.42 0.28 2.01 3.00

482 3.93 0.78 3.00 6.00

83 7.34 0.86 6.04 9.92

37 12.15 1.50 10.13 14.92

32 17.52 1.49 15.09 19.90

17 23.55 2.04 20.23 27.24

9 33.83 3.46 29.38 39.64

1.00 1.00 1.17 1.47

2.16 2.255 2.45 2.70

3.20 3.46 4.00 4.70

6.48 7.06 7.49 8.13

10.63 11.42 12.56 13.73

16.09 16.97 17.78 19.21

21.70 22.62 24.45 25.54

30.57 32.36 34.78 38.18

When no standard distribution passed the K–S test, we used an empirical distribution derived from the minimum, maximum, 20, 40, 60, and 80 percentile values associated with the group. Realizations were generated by first drawing a random number from U[0, 1] to determine the percentile range, and then drawing a second random number to determine the actual demand from the selected interval. We investigated both the Albany and Pittsburgh regions but only discuss the latter here [see Zan (2008) for the Albany analysis]. For Pittsburgh, the delivery customers were divided into 9 groups, the first 8 being assigned empirical distributions and the 9th being assigned a uniform distribution between the minimum and maximum values 45.05 and 47.82, respectively. Table 2 gives the statistical information for the first 8 groups. The first row gives the number of customers, N, in each group; the second and third rows, respectively, give the mean and standard deviation for the package count; the next two rows identify the minimum and maximum number of packages; while the remaining rows report the 20–80 percentile thresholds in increments of 20. For the 1250 pickup customers, it was decided that 14 groups were called for with 6 having uniformly distributed demand and 8 having empirically distributed demand. This left 9 customers that were impossible to group so again their demand was considered constant. 6. Traveling salesman problem Each realization of a cluster indicates which customers are to receive pickups and deliveries, as well as the corresponding package counts. To determine whether the time constraint (1d) is satisfied for that realization, a TSP must be solved. For this component of the analysis, we used the Concorde–Linkern software (see Applegate et al., 2006). Concorde contains an exact and a series of heuristic algorithms for solving the symmetric TSP (amongst other problems). Linkern is an implementation of the chained-Lin–Kernighan TSP heuristic. The code is written in ANSI C and is downloadable from the web for academic use. The largest TSP instance we encountered contained about 500 customers and was solved within 2 seconds; many instances were solved in less than 1 seconds. In formulating the TSP, we need to define the cost coefficients, which will be stated in units of time. Let T(i, j) = D(i, j)/MPH(i, j) be the travel time between customers i and j, where T(i, j) = T(j, i) for the symmetric case. Once again, it is convenient to distinguish between a pickup and a delivery point although the two may coincide. Accordingly, let 1, . . . , n be the indices for the delivery locations in the cluster being analyzed and let n + 1, . . . , 2n be the indices for the corresponding pickup locations; that is, i and i + n denote the same location for i = 1, . . . , n. Then, the total time to provide either service at customer i’s site and transit to customer j is

cDij ¼ STOPDi þ Tði; jÞ;

i ¼ 0; 1; . . . ; n;

cPij ¼ STOPPi þ Tði; jÞ;

i ¼ 0; n þ 1; . . . ; 2n;

j ¼ 0; 1; . . . ; 2n;

i–j

j ¼ 0; 1; . . . ; 2n;

ð4aÞ i–j

ð4bÞ

where STOPD0 ¼ 0. In (4b), j ranges over all indices to allow for the case where pickups can be made at any time. DðPÞ DðPÞ In general, cDij – cDji and cPij – cPji in (4) because STOP i – STOP j . This undermines model symmetry. When the requirement that deliveries precede pickups is taken into account, it becomes even clearer that the TSPs resulting from the realization and construction of the above coefficients are asymmetric. Nevertheless, the special nature of the precedence constraints arising from the fact that deliveries must precede pickups permits a symmetric formulation. This is desirable since virtually all publicly available TSP codes, including Concorde, are for the case in which cij = cji. Assuming one cluster is analyzed at a time, to obtain the required model structure it is necessary to define a symmetric cost coefficient, call it ^cij , for each edge (i, j) in the underlying graph. For the 2n + 1 locations, the following eight types of transitions are possible in general: (0–D), (D–0), (D–D), (D–P), (P–D), (P–P), (P–0), (0–P), where D represents a delivery point, P a pickup point, and 0 the depot. Because direction is not a consideration, we only need to specify ^cij for i < j = 0, 1, . . . , 2n, where j – 0. Using the coefficients in Eq. (4), we have

 8 D D > =2; i ¼ 0; 1; . . . ; n; j ¼ 0; 1; . . . ; n; j > i where j – 0 c þ c > ij ji > > <  D P ^cij ¼ cij þ cji  cij =2 þ M; i ¼ 1; . . . ; n; j ¼ n þ 1; . . . ; 2n > >   > > : cP þ cP =2; i ¼ 0; n þ 1; . . . ; 2n; j ¼ 0; n þ 1; . . . ; 2n; j > i where j – 0 ij ji where M P

Pn

i¼0

ð5Þ

n o P n o P max cDij : j ¼ 0; 1; . . . ; n þ 2n i¼nþ1 max c ij : j ¼ 0; n þ 1; . . . ; 2n is an upper bound on the time of a tour and cij is a param-

eter used to eliminate double counting of the fixed stop time; that is, cij = fixed time at a customer site if locations i and j coincide, 0 otherwise. To ensure ^cij ¼ ^cji , it is necessary to average the travel plus services times on the two edges (i, j) and (j, i) as indicated for all terms in (5). For the second term, M represents a large penalty associated with going between delivery and pickup points. The following additional notation is used in the statement of the TSP model.

79

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

N0 S d(i) E(S) xij

set of all P&D locations including the depot; N0 = N [ {0} subset of P&D locations set of P&D locations adjacent to location i 2 N set of edges in S  N 1 if edge (i, j) is in the solution, 0 otherwise; j > i when j – 0

zSTSP ¼ Minimize

2n 2n X X

^cij xij

ð6aÞ

i¼0 j¼0;iþ1

subject to

X

xij ¼ 2;

i ¼ 0; 1; . . . ; 2n

ð6bÞ

j2dðiÞ

X

xij 6 jSj  1;

3 6 jSj 6 ðn þ 1Þ=2;

S  N0

ð6cÞ

ði;jÞ2EðSÞ

xij 2 f0; 1g;

i ¼ 0; 1; . . . ; 2n;

j ¼ 0; i þ 1; i þ 2 . . . ; 2n

ð6dÞ

The objective function (6a) sums the transition costs between all customers and the depot. Constraints (6b) ensure that there are exactly two edges incident to each customer location and the depot. The implication is that one edge represents an arrival and the other a departure. Subtours are eliminated with constraints (6c), which require that the number of edges in the set S be less than its cardinality. Only subsets whose cardinality is between 3 and (n + 1)/2 are included. Subsets of size one and two are eliminated by (6b) while constraint symmetry eliminates those greater than (n + 1)/2. The decision variables are restricted to be binary in constraints (6d) and are only defined for j > i and j = 0. In practice, if a pickup or delivery point i is not in the current realization, we set xij = 0 for all j = 0,i + 1, . . . , 2n. For the general situation, we have the following result. Proposition 1. Let zTSP be the solution to the original TSP with precedence constraints and let zSTSP be the solution to the symmetric version given by Eqs. (6a)–(6d). Then zTSP = zSTSP  M when both types of service (i.e., delivery and pickup) exist, and zTSP = zSTSP otherwise. Proof. A check of the coefficients in (5) shows that they are symmetric; that is, ^cij ¼ ^cji . This is straightforward to see when both i and j correspond to either pickup points or delivery points. For the case in which i is a delivery point and j is a pickup point, ^cji must be defined as cPji þ cDij to achieve symmetry. In an optimal solution to problem (6) there will be only one transition from some delivery (or pickup) location i to some pickup (or delivery) location j, which incurs the additional cost of M. If there is more than one such transition, the total cost would be at least 2M, which would make the solution suboptimal. When only a single service is being modeled, the original problem is still asymmetric but ^cij will be defined as the average of either cDij or cPij . In either of these cases, we would get zTSP = zSTSP. h As with all symmetric TSPs, there are at least two solutions, one in either direction. In our case, it is a simple matter to select the one in which deliveries precede pickups. If it is permissible to visit a predetermined subset, S, of pickup locations before all deliveries are made, this can be accommodated by removing the M component from the coefficients in (5) for j 2 S. The situation in which it is permissible to start making pickups after a predetermined number of deliveries have been made or at a predetermined time requires additional constraints that would alter the pure nature of the TSP as it now stands. 6.1. TSP heuristic Driver work areas may have up to 1500 customers, which implies the need to solve problems with as many as 3000 locations. This can be time consuming even when heuristics are employed. One way to reduce the size of an instance is to break it into two parts and solve each separately. The two tours could then be combined by adding links to and from the depot, and from one delivery location to one pickup location. It would also be necessary to break the link between the first delivery point and one of its neighbors on the optimal tour, and the link between the first pickup point and one of its neighbors. The neighbor at one end of the broken delivery link would be joined to the first pickup location. Also, the neighbor at one end of the broken pickup link would be joined to the depot. We now formulate the problem of linking the delivery and pickup subtours as a quadratic integer program. To simplify the notation, let us assume that the locations in the optimal solution to the delivery TSP have been relabeled sequentially from 1, . . . , n (or n, . . . , 1) as have the locations in the optimal solution to the pickup TSP. The following notation will be used. cD i

time from the depot to delivery location i, including service time

cPj

time from pickup location j to the depot, including service time

XD i

1 if delivery location i is the last to be visited before pickups begin, 0 otherwise

X Pj X D;L i

1 if pickup location j is the first to be visited after all deliveries are made, 0 otherwise

X D;R i X P;L j X P;R j

1 if delivery link (i  1, i) is removed from the optimal delivery subtour and delivery location i is the last to be visited before pickups begin, 0 otherwise 1 if delivery link (i, i + 1) is removed from the optimal delivery subtour and delivery location i is the last to be visited before pickups begin, 0 otherwise 1 if pickup link (j  1, j) is removed from the optimal pickup subtour and pickup location j is the first to be visited after deliveries end, 0 otherwise 1 if pickup link (j, j + 1) is removed from the optimal pickup subtour and pickup location j is the first to be visited after deliveries end, 0 otherwise

80

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

n X n X

Minimize

i¼1

þ

^cij X Di X Pj þ

j¼1

n  X

n  X

   ðcDi1  cDi1;i ÞX D;L þ cDiþ1  cDi;iþ1 X D;R i i

i¼1

   P;R P P ðcPj1  cPj1;j ÞX P;L j þ cjþ1  c j;jþ1 X j

ð7aÞ

j¼1 n X

subject to

i¼1 n X

X Di ¼ 1

ð7bÞ

X Pj ¼ 1

ð7cÞ

j¼1

þ X D;L ¼ X Di ; X D;R i i

i ¼ 1; . . . ; n

ð7dÞ

X P;R j

j ¼ 1; . . . ; n

ð7eÞ

þ

X P;L j

¼

X Pj ;

D;R D;L X Di ; X Pj ; X iD;R ; X D;L i ; Xj ; Xj

2 f0; 1g;

i; j ¼ 0; 1; . . . ; n

ð7fÞ

  where ^cij ¼ cDij þ cPji  cij =2.

The objective function (7a) minimizes the time associated with links that are added and removed from the corresponding solutions of the separate delivery and pickup TSPs. The first term accounts for the transition from i to j. In most cases, these locations will coincide. The second and third terms each have two components, which account for the change in time incurred when the two subtours are merged; for   is the net time associated with removing link (i  1, i) in the delivery subtour and example, the coefficient cDi1  cDi1;i of the variable X D;L i adding the depot link (0, i  1). The remaining coefficients have an analogous interpretation. The variables X Di and X Pj do not appear in the latter two terms because they only identify the neighbors i  1 and i + 1 of the last delivery location i and the neighbors j  1 and j + 1 of the first pickup location j, respectively, and not the links to be added. Fig. 1 depicts a possible solution associated with connecting the delivery subtour to pickup location j (the two dashed lines indicate omitted nodes). As shown, X iD;L ¼ 1 so link (i  1, i) is removed and X jD;R ¼ 0 so link (0, i  1) is added. The original subtour (depot not included) 1, 2, . . . , i  1, i, i + 1, . . . , n, 1 is now the path 0, i  1, i  2, . . . , 1, n, n  1, . . . , i + 1, i, j. Although node j is drawn as a unique location, it is more likely to coincide with i or one of the other n  1 customer sites. Constraints (7b) and (7c), respectively, identify the last delivery location and the first pickup location in a combined route. Constraint (7d) identifies the delivery link that is removed – either (i  1, i) or (i, i + 1), depending on whether the resultant path is in the forward or reverse direction with respect the assumed customer numbering scheme. Its counterpart, constraint (7e), identities the pickup link that is removed and hence the last pickup location before returning to the depot. Binary restrictions are placed on all the variables in constraint (7f). Because the second and third terms in (7a) automatically account for the times associated with transitions to and from the depot, it is not necessary to introduce additional variables. The corresponding links, such as (0, i  1) in Fig. 1, are implied by the solution. These developments can be summarized as follows. Proposition 2. Let zPTSP and zD TSP be the respective optimal solutions to model (6) when the pickup and delivery TSPs are solved separately, and let zMERGE be the corresponding solution to (7). Then the smallest upper bound on zTSP that results from the decomposition is zPTSP þ zD TSP + zMERGE. To linearize the first term in (7a), we need to replace the product X Di X Pj with a (binary) variable Zij and introduce the following additional constraints.

Z ij P X Di þ X Pj  1; Z ij 6 Z ij 6

X Di ; X Pj ;

i; j ¼ 1; . . . ; n

i; j ¼ 1; . . . ; n

ð8bÞ

i; j ¼ 1; . . . ; n

0 6 Z ij 6 1;

ð8aÞ ð8cÞ

i; j ¼ 1; . . . ; n

ð8dÞ

Fig. 1. Connecting the delivery subtour to the depot and to the pickup subtour.

81

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

Thus, the modified model has 3n2 new structural constraints and n2 new variables. However, as indicated in (8d), the Z-variables can be treated as continuous because they will always be 0 or 1 when the X-variables are 0 or 1. 7. Computational results The simulation and related subroutines were coded in Java Eclipse 3.4.0 and run under Microsoft Windows XP on a PC with 512 MB of RAM and an Intel 2.52 GHz Pentium 4 processor. The TSPs were solved with Concorde and the decomposition heuristic model (5) and (6) was solved with CPLEX 8.1. In the analysis, two sets of experiments were conducted for both the Pittsburgh, PA and Albany, NY regions. Again, the aim was to evaluate the B–J heuristic for estimating combined travel and service times as outlined in Section 4 and to determine the potential cost saving that could be realized by adopting the VRZs obtained with the methodology. An important parameter underlying the analysis is the number of nearest customers used in the calculation of the time-to-next-customer, Eqs. (2) and (3), which is a component of sPi and sDi in Eq. (1d). For each customer i, ni was fixed at 10. In the first set of runs, the current configurations were investigated to establish a baseline; in the second set, the configurations obtained by the B–J heuristic were investigated for purposes of comparison. We now present the results for Pittsburgh; those for Albany can be found in Zan (2008). In each set of runs we simulated 1000 days of operation and recorded the tour length, tour duration and tour cost by cluster, as well as the total cost for the region for both the exact model (6) and linearized version of the heuristic (7). If the total tour time of any cluster was greater than 11 hours or the total package count for either type of service was greater than the vehicle capacity, Q = 300, a tour failure was noted. In calculating the total costs, two components were taken into account. The first was the fixed cost of the vehicle (number of clusters), and the second was a function of the tour length. Table 3 lists the cost parameters used in the study. To check the validity of the distributions used to generate demand, for each cluster we calculated the percentage gap between average of the values obtained from the simulation and the expected value provided by the sponsoring company using the formula:

Gap ¼ 100% ðaverage simulated value  expected valueÞ=expected value Other comparisons were based on the following statistics:

Duration gap ¼ average simulated duration  expected duration Cost gap ¼ cost from B—J clustering heuristic  cost from current configuration TSP gap ¼ duration from heuristic  duration from exact model 7.1. Baseline results for Pittsburgh The output from the simulation related to package counts is given in Table 4 for the first 10 of 73 clusters that define the Pittsburgh area. Expected value headings refer to the original data set while average value headings refer to the simulation output. The first column gives the cluster ID, the second column gives the expected number of delivery packages, the third column gives the average number of simulated delivery packages, the fourth column gives the % gap in deliveries, the fifth column gives the expected number of pickup packages, the sixth column gives the average number of pickup packages, the seventh column gives the % gap in pickups. When all 73 clusters are taken into account the mean gap for deliveries is 5.58% with a standard deviation of 6.41% and the mean gap for pickups is 1.25% with a standard deviation of 6.81%. This suggests that the selected distributions are quite accurate. The tour duration results for the first 10 clusters are given in Table 5. The column headings are self-explanatory after noting that the suffix ‘‘E” refers to the calculations when the TSP was solved exactly, as best as we could determine, and the ‘‘H” refers to the calculations when the heuristic model was used. The last column gives the average runtime per replication for the exact method. In all instances, the values are less then 0.9 seconds, implying that the time to simulate a cluster was less than 900 seconds (15 minutes).

Table 3 Cost parameters. Cost per mile ($)

Cost/vehicle/day ($)

1–200 miles

201–250 miles

251–300 miles

>300 miles

1

1.1

1.2

1.3

125

Table 4 Simulated package count results for first 10 of 73 clusters in Pittsburgh. Cluster

Expected no. deliveries

Avg. no. deliveries

Gap % in deliveries

Expected no. pickups

Avg. no. pickups

Gap % in pickups

1 2 3 4 5 6 7 8 9 10

196.89 151.00 266.53 275.58 143.98 273.63 216.73 259.85 205.54 261.49

183.61 141.96 229.77 268.05 128.16 265.33 214.06 251.63 204.80 252.30

6.74 5.99 13.79 2.73 10.99 3.03 1.23 3.16 0.36 3.51

35.03 85.68 27.15 25.26 97.26 15.20 187.22 32.89 17.51 118.66

34.05 80.97 24.06 26.28 89.74 13.66 181.34 34.11 18.07 117.44

2.80 5.50 11.38 4.04 7.73 10.13 3.14 3.71 3.20 1.03

82

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

Table 5 Simulated tour duration results for first 10 of 73 clusters in Pittsburgh. Cluster

Expected total duration (hours)

Avg. tour duration_E (hours)

Duration gap_E (hours)

Avg. tour length_E (mile)

Avg. tour cost_E ($)

Avg. tour duration_H (hours)

Duration gap_H (hours)

Avg. tour length_H (mile)

Avg. tour cost_H ($)

Duration gap between two models (hours)

Avg. run time per repl. (seconds)

1 2 3 4 5 6 7 8 9 10

10.72 10.88 9.57 7.92 11.00 6.06 10.99 10.02 5.96 6.73

12.12 13.10 10.30 9.84 12.23 7.02 11.89 12.18 6.28 7.27

1.4 2.22 0.73 1.56 1.23 0.96 0.9 2.16 0.32 0.54

142.85 154.76 82.47 89.87 111.26 56.52 87.76 119.81 49.24 59.24

142.85 154.76 82.47 89.87 111.26 56.52 87.76 119.81 49.24 59.24

12.41 13.49 10.61 9.62 12.92 7.16 12.26 13.03 6.33 7.40

1.69 2.61 1.04 1.7 1.92 1.1 1.27 3.01 0.37 0.67

152.23 165.59 86.51 92.33 130.62 59.53 99.12 146.75 49.68 61.50

152.23 165.59 86.51 92.33 130.62 59.53 99.12 146.75 49.68 61.50

0.29 0.39 0.31 0.14 0.69 0.14 0.37 0.85 0.05 0.13

0.796 0.828 0.747 0.840 0.815 0.644 0.825 0.836 0.792 0.696

Table 6 Service time plus time-to-next-customer for 10 customers in Pittsburgh. Customer ID

Expected value (hours)

Simulated value (hours)

Gap (hours)

4796 4769 4541 4601 4549 5122 5144 5273 7172 7159

0.08 0.10 0.07 0.09 0.07 0.05 0.08 0.12 0.10 0.09

0.11 0.11 0.11 0.09 0.09 0.08 0.08 0.09 0.15 0.12

0.03 0.01 0.04 0 0.02 0.03 0 0.03 0.05 0.03

An important statistic is the duration gap. For almost all clusters, the simulated value was slightly greater than the expected values, indicating a small upward bias. On average, the gap computed using the exact model was 0.59 hours with a standard deviation of 0.74 hours and the gap computed with the heuristic was 0.75 hours with a standard deviation of 0.83 hours. Thus, our methodology does a reasonably good job in estimating the service time plus time to the next customer, i.e., sPi and sDi . This can be seen in more detail from the results in Table 6, which compares the expected value with the simulated values for 10 randomly selected customers. The last column gives the gap between these two parameters in hours. The average gap for the 10 customers listed in Table 6 is 0.018 hours with a standard deviation of 0.023 hours. To more formally validate the quality of these statistics, we conducted a paired-samples t-test with the null hypothesis being no difference between the expected value and the average simulated value for a significance level of 5%. The test results indicated that the null hypothesis could not be rejected; that is, there is no statistically significant difference between the expected value and the simulated value. The next concern is the gap between the exact and heuristic TSP solutions. For the 73 clusters, the difference in duration averaged 0.157 hours with a standard deviation of 0.168 hours. As expected, the gap was always nonnegative, but for 10 clusters, it was zero. With respect to cost, histograms of the cluster data suggested normality, which was confirmed by a K–S test. The results are reported in Table 7 for the total cost per day including $125/vehicle/cluster. The average gap is $233/day with a standard deviation of $41. Runtimes averaged 30 minutes per cluster when the TSP was solved exactly and 9 minutes when the heuristic was used. The final issue concerns the probability of a route failure due to a violation of either the capacity constraints or the time constraints. The latter is said to occur when a route duration exceeds 11 hours. Table 8 reports the corresponding probabilities for the first 20 clusters. The headings are self-explanatory. The only case in which a noticeable capacity failure occurred was for cluster 18, which is the densest of the 73. It includes a total of 519 customers with an average demand of 299.74 delivery packages and 284.98 pickup packages. Not surprisingly, the probability of duration

Table 7 K–S test report for the total costs per day in Pittsburgh (73 clusters).

N Normal parameters Mean Std. dev. Most extreme differences Absolute Positive Negative Kolmogorov–Smirnov Z p-Value Test results

Cost from exact model

Cost from heuristic model

1000

1000

13,301.87 41.16

13,534.91 52.60

0.026 0.026 0.013 0.834 0.489 Normal

0.044 0.044 0.023 1.397 0.052 Normal

83

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85 Table 8 Failure probabilities for first 20 of 73 clusters in Pittsburgh. Cluster

Probability of capacity failure

Probability of duration failure (exact model)

Probability of duration failure (heuristic model)

Gap of duration failure probability between exact and heuristic model

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .. .

0 0 0 0.009 0 0.003 0 0 0 0 0 0 0 0 0 0 0 0.502 0 0 .. .

0.932 0.999 0.057 0 0.980 0 0.977 0.984 0 0 0 0 0 0 1 0.047 0 0.979 0.084 0.083 .. .

0.975 1 0.207 0.005 0.998 0 0.989 1 0 0 0 0 0 0 1 0.083 0 0.997 0.236 0.193 .. .

0.043 0.001 0.15 0.005 0.018 0 0.012 0.016 0 0 0 0 0 0 0 0.036 0 0.018 0.152 0.11 .. .

73 avg.

0.043

0.196

0.217

0.021

failure was almost 1, suggesting the need for reassigning some of the customers in this cluster to neighboring clusters. In all, 16 clusters had capacity failure probabilities greater than 0. With respect to the TSP solutions, the last column in Table 8 suggests that the gap associated with duration failure for the exact and heuristic models is negligible. This was borne out by a paired t-test using all 73 data points. The bottom row of the table gives the averages for the full set of clusters. 7.2. Improved area design for Pittsburgh The B–J heuristic indicated that Pittsburgh could be partitioned into 64 rather than 73 clusters without a noticeable degradation in service. In the second set of simulation runs, we performed the same analysis as we did for the baseline. The corresponding statistics are reported in Tables 9–11. From the data in the full version of Table 9 we found that the simulated package numbers for both deliveries and pickups differed from the expected values by 5.01% and 0.54% on average with standard deviations of 5.68% and 4.96%, respectively. The negative biases imply that the fitted distributions should be shifted upwards slightly around their means to get a more accurate representation. To evaluate our service time plus time-to-next-customer calculations, we examined the same 15 customers listed in Table 6 and obtained almost identical results. With respect to tour lengths, the results in Table 10 are equally impressive indicating that the difference between expected values and simulated values averaged only 0.69 hours with a standard deviation of 0.73 hours. Similarly, the difference in tour lengths between the exact and heuristic models averaged 0.2 hours with a standard deviation of 0.18 hours. Table 11 reports the probabilities of capacity and duration failures for the first 10 clusters for both the exact and heuristic models. On average, capacity failures were 0.055 with a standard deviation of 0.17 and duration failures were 0.28 with a standard deviation of 0.41 for the exact method. The 0.28 tour failure probability is higher than desirable suggesting that our calculation of time-to-next-customer in Eq. (3) should be adjusted upwards. Rather than considering the 10 nearest customers in Eq. (2), we redid the calculations and increased this number to 20 and reran the simulation. The new results showed that the capacity failures averaged 0.03 and the duration failures averaged 0.19 for the exact method. In all, 19 clusters (compared to 16 above) had nonzero capacity failure probabilities.

Table 9 Simulated package count results for first 10 of 64 clusters in Pittsburgh. Cluster

Expected no. deliveries

Avg. no. deliveries

Gap % in deliveries

Expected no. pickups

Avg. no. pickups

Gap % in pickups

1 2 3 4 5 6 7 8 9 10

151.00 183.45 234.33 272.22 197.46 253.57 249.01 217.39 267.30 82.54

142.07 166.60 216.85 267.48 184.06 250.10 220.19 197.61 250.81 81.19

5.91 9.19 7.46 1.74 6.79 1.37 11.57 9.10 6.17 1.64

85.68 281.98 87.52 120.45 35.03 24.60 141.89 55.45 119.39 300.00

79.43 298.08 92.40 122.05 34.00 22.71 144.74 58.47 118.77 310.83

7.29 5.71 5.58 1.33 2.94 7.68 2.01 5.45 0.52 3.61

84

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

Table 10 Simulated duration results for first 10 of 64 clusters in Pittsburgh. Cluster

Expected total duration (hours)

Avg. tour duration_E (hours)

Duration gap_E (hours)

Avg. tour length_E (mile)

Avg. tour cost_E ($)

Avg. tour duration_H (hours)

Duration gap_H (hours)

Avg. tour length_H (mile)

Avg. tour cost_H ($)

Duration gap for two models (hours)

1 2 3 4 5 6 7 8 9 10

10.88 8.24 10.16 7.26 10.75 9.87 10.97 10.84 9.36 2.28

13.08 8.18 10.50 8.98 12.12 10.95 11.89 12.09 10.74 2.48

2.2 0.06 0.34 1.72 1.37 1.08 0.92 1.25 1.38 0.2

154.33 33.77 50.63 104.66 142.77 103.05 89.85 105.75 87.88 19.41

154.33 33.77 50.63 104.66 142.77 103.05 89.85 105.75 87.88 19.41

13.46 8.32 10.66 9.01 12.41 11.15 12.08 12.20 10.87 2.51

2.58 0.08 0.5 1.75 1.66 1.28 1.11 1.36 1.51 0.23

165.11 35.91 53.70 107.56 151.73 105.37 93.84 107.76 98.74 19.69

165.11 35.91 53.70 107.56 151.73 105.37 93.84 107.76 98.74 19.69

0.38 0.14 0.16 0.03 0.29 0.2 0.19 0.11 0.13 0.03

Table 11 Failure probabilities for the first 10 of 64 clusters in Pittsburgh. Cluster

Probability of capacity failure

Probability of duration failure (exact model)

Probability of duration failure (heuristic model)

Gap of duration failure probability between exact and heuristic model

1 2 3 4 5 6 7 8 9 10 .. .

0 0.196 0 0 0 0 0 0 0 0.600 .. .

1 0 0.132 0 0.946 0.472 0.919 0.992 0.286 0 .. .

1 0 0.215 0 0.975 0.624 0.960 0.994 0.328 0 .. .

0 0 0.083 0 0.029 0.152 0.041 0.002 0.042 0 .. .

64 avg.

0.054

0.284

0.313

0.029

The costs associated with the 64 clusters are partially enumerated in Table 10. When the exact model was used, they averaged $12,015 per day compared to $12,298 per day for the heuristic. Both passed the K–S test for normality with p-values of 84.8% and 63.4%, respectively, using a 5% level of significance. 7.3. Cost savings The major cost component of P&D operations is the vehicle. By reducing the number of vehicles from 73 down to 64, we save $1125 per day. The total savings when the exact model is used are $1286 per day of which $161 can be attributed to routing. The corresponding figures for the heuristic are $1237 per day and $108 for routing. Thus, the heuristic slightly underestimates the savings. The final point concerns the frequency of route failures. Although the differences are small, 1.2% for capacity and 9.9% for duration, some additional cost would be incurred in the case of 64 clusters. For the second set of runs with a doubling of locations from 10 to 20 considered in the time-to-next-customer calculations, we obtained a configuration with 68 clusters. The corresponding route failures were only insignificantly different and the total cost saving for the 68 clusters with respect to the baseline was $811 per day. 8. Summary and conclusions For a given set of vehicle routing zones, Monte Carlo simulation is the only practical way to determine the probability of route failure when customer demand is random. In previous research, we developed a three-phase heuristic for constructing such zones for regional P&D operations. The results of this study based on cost, tour length, and package count have lent credence to our construction methodology and have allowed us to calibrate the benefits of a reduced number of driver zones. For the Pittsburgh region, the cost saving of going from 73 clusters to 64 clusters is approximately $811 per day, which translates into $210,860 annually. In both instances, we saw that the route failure statistics were approximately the same, indicating that whatever additional costs would be incurred over the year are not a function of the number of clusters. In the second part of the study, we compared the results obtained by using an exact method (to the extent that its performance could be verified) for solving the TSPs each day with the results obtained from our decomposition heuristic. Although the exact method was quick, averaging 1.8 seconds per cluster per replication, simulating each cluster took approximately 2000 seconds, which was measurably more than the 600 seconds required by the heuristic. There are many problems similar to ours in which the computations may get bogged down due to the large number of locations requiring a visit so alternative approaches can be valuable. The heuristic that we developed is applicable when deliveries precede pickups; that is, the backhaul case. For the 73-cluster case, the computations showed an average gap of $233 per day, a small fraction of the daily cost. Although this error is tangible, it is easy to compensate for by systematically adjusting the heuristic output downward.

J.F. Bard et al. / European Journal of Operational Research 206 (2010) 73–85

85

In the future, two projects are under consideration. The first involves the development of a more precise algorithm for generating clusters that takes into account the additional requirement that the aspect ratio (length/width) of a cluster should be within prespecified bounds. A column generation algorithm is now being designed for this purpose. The second is an augmented simulation model for providing solutions to the probabilistic TSP that can serve as driver routes. References Applegate, D., Bixby, R.E., Chvátal, E., Cook, W., 2006. The Traveling Salesman Problem: A Computational Study. Princeton University Press, Princeton, NJ. Bard, J.F., Jarrah, A.I., 2009. Large-scale constrained clustering for rationalizing pickup and delivery operations. Transportation Research Part B 43 (5), 542–561. Baudin, M., 2004. Lean Logistics: The Nuts and Bolts of Delivering Materials and Goods. Productivity Press, New York. Bräysy, O., Gendreau, M., 2005. Vehicle routing problem with time windows, part I: Route construction and local search algorithms. Transportation Science 39 (1), 104–118. Chao, I.M., Golden, B.L., Wasil, E.A., 1995. A new heuristic for the period traveling salesman problem. Computers & Operations Research 22 (5), 553–565. Chiou, Y.C., Lan, L.W., 2001. Genetic clustering algorithms. European Journal of Operational Research 135 (2), 413–427. Daganzo, C.F., Erera, A., 1999. On planning and design of logistics systems for uncertain environments. In: Speranza, M.G., Stahly, P. (Eds.), New Trends in Distribution Logistics, Lecture Notes in Economics and Mathematical Systems, vol. 480. Springer-Verlag, Berlin. Drexl, A., Haase, K., 1999. Fast approximation methods for sales force deployment. Management Science 45 (10), 1307–1323. Hall, R., 1996. Pickup and delivery systems for overnight carriers. Transportation Research Part A 30 (3), 173–187. Hartigan, J.A., Wong, M.A., 1979. A K-means clustering algorithm. Applied Statistics 28 (1), 100–108. Hemmelmayr, V.C., Doerner, K.F., Hartl, R.F., 2009. A variable neighborhood search heuristic for periodic routing problems. European Journal of Operational Research 195 (3), 791–802. Liu, Y.H., 2007. A hybrid scatter search for the probabilistic traveling salesman problem. Computers & Operations Research 34 (10), 2949–2963. Lorena, L.A.N., Senne, E.L.F., 2004. A column generation approach for the capacitated p-median problems. Computers & Operations Research 31 (6), 863–876. Mourgaya, M., Vanderbeck, F., 2007. Column generation based heuristic for tactical planning in multi-period vehicle routing. European Journal of Operational Research 183 (3), 1028–1041. Negreiros, M., Palhano, A., 2006. The capacitated centred clustering problem. Computers & Operations Research 33 (6), 1639–1663. Newell, G.F., Daganzo, C.F., 1986. Design of multiple-vehicle delivery tours—I: A ring-radial network. Transportation Research Part B 20 (5), 345–363. Ouyang, Y., 2007. Design of vehicle routing zones for large-scale distribution systems. Transportation Research Part B 41 (10), 1079–1093. Ríos-Mercado, R.Z., Fernández, E., 2009. A reactive GRASP for a commercial territory design problem with multiple balancing requirements. Computers & Operations Research 36 (3), 755–776. Ropke, S., Pisinger, D., 2006. A unified heuristic for a large class of vehicle routing problems with backhauls. European Journal of Operational Research 171 (3), 750–775. Salhi, S., Nagy, G., 1999. A Cluster insertion heuristic for single and multiple depot vehicle routing problems with backhauling. Journal of the Operational Research Society 50 (10), 1034–1042. Tang, H., Hooks, E.M., 2005. Algorithms for a stochastic selective travelling salesperson problem. Journal of the Operational Research Society 56 (4), 439–452. Taylor, G.D., Whicker, G.L., Usher, J.S., 2001. Multi-zone dispatching in truckload trucking. Transportation Research Part E 37 (5), 375–390. Toth, P., Vigo, D., 1999. A heuristic algorithm for the symmetric and asymmetric vehicle routing problems with backhauls. European Journal of Operational Research 113 (3), 528–543. Yockey, R.D., 2007. SPSS Demystified: A Step-by-Step Guide to Successful Data Analysis. Prentice-Hall, Upper Saddle River, NJ. Zan, J., 2008. Using Monte Carlo simulation to analyze and validate heuristically determined vehicle routing zones. Thesis, Graduate Program in Operations Research and Industrial Engineering, The University of Texas, Austin, TX.