Omega 41 (2013) 706–720
Contents lists available at SciVerse ScienceDirect
Omega journal homepage: www.elsevier.com/locate/omega
Integrating commercial and residential pickup and delivery networks: A case study Jonathan F. Bard a,n, Ahmad I. Jarrah b,1 a b
Graduate Program in Operations Research and Industrial Engineering, The University of Texas Austin, TX, United States Department of Decision Sciences, School of Business, The George Washington University, Washington, DC 20052, United States
a r t i c l e i n f o
a b s t r a c t
Article history: Received 26 March 2012 Accepted 6 September 2012 Available online 28 September 2012
This paper presents a strategic analysis of the network design problem faced by pickup and delivery companies operating in metropolitan areas and serving two or more classes of customers. The focus is on a division that treats commercial and residential customers separately, a situation motivated by their respective geographic densities and the size and frequency of their demand. In constructing driver work areas, it is necessary to take into account expected demand, vehicle capacity, time on the road, and the aspect ratio of the individual territories. This leads to a capacitated clustering problem with side constraints that has been the subject of intense research over the last decade. Based on a previously developed column generation algorithm, a case study was conducted involving several scenarios that integrate the two networks to determine the extent of the resulting benefits. To no surprise, the analysis showed that a significant reduction in fleet size can be achieved when the two networks are either partially or fully combined. It also showed that small reductions with respect to current practice are possible when they are separately maintained with the added benefit that the geometry of the resultant clusters satisfies certain desirable properties with respect to their contours and aspect ratios. & 2012 Elsevier Ltd. All rights reserved.
Keywords: Network design Pickup and delivery Case study Policy analysis Column generation
1. Introduction Rationalizing driver work areas is a critical step in reducing operational costs of pickup and delivery (P&D) carriers. Over time, these areas grow haphazardly so it is not unusual for two or more drivers to visit the same or nearby locations on the same day or cross paths numerous times. In some cases, this may be unavoidable due to the level of demand or the density of the customers, but in others, it is a symptom of poor planning. It is well known, for example, that optimal TSP tours for multiple vehicles operating in the same service area should not cross paths. For express package carriers, food and beverage distributors, repair services and the like, the problem is one of partitioning the ‘‘customers’’ into a minimum number of clusters such that each cluster satisfies a host of capacity, temporal, and geometric constraints ([4,7,2], [18], [20]). Once the clusters are determined, a driver and vehicle can be assigned to each. Although there is a dynamic and uncertain nature to the problem, system planners typically use average monthly data of customer demand and visit frequencies to design the work areas. n
Corresponding author. Tel.: þ1 512 471 3076; fax: þ1 512 232 1489. E-mail addresses:
[email protected] (J.F. Bard),
[email protected] (A.I. Jarrah). 1 Tel.: þ1 202 994 6739. 0305-0483/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.omega.2012.09.001
The problem addressed in this paper falls under the general heading of capacitated clustering (CCP), and has been studied from both a generic and applications point of view. Mulvey and Beck [17] proposed one of the first models for the CCP with a p-median objective [6]. Their context was sales force territory design but similar work has since appeared in a wide range of fields including pattern recognition [1], data mining [11], general P&D operations [12], postal route formation [13], and restocking of retail outlets [8]. In the last several years, we have been collaborating with a wellknown international package carrier and have developed two distinct algorithms for constructing driver work areas. The first was a random grid-search heuristic that incrementally clusters one customer at a time. It includes both a diversification mechanism for overcoming local optimality and an intensification mechanism for refining the search in the neighborhood of promising solutions [2]. Although its performance was impressive, it was not designed to accommodate practical details related to cluster geometry. This was a major disadvantage for our application since long thin work areas were a common but undesirable aspect of the solutions. In our second algorithm, we took a more precise approach starting with a set covering master problem and used column generation to partition the service area [10]. This involved a decomposition of the feasible region associated with the CCP into a set of overlapping subregions, each associated with a pricing subproblem for generating
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
master problem columns. The distinguishing feature of the subproblems was a novel set of algebraic constraints that were instrumental in obtaining clusters of the desired form. The overall objective was to minimize the number of clusters needed to service all customers while adhering to a set of capacity, time, contiguity, and geometric constraints. In the context of our problem minimizing the number of clusters is equivalent to minimizing the number of vehicles or drivers needed to service expected demand. The primary purpose of this paper is to apply and extend our previous work by investigating policy issues related to the design of nonhomogeneous P&D networks. This involved the accommodation of multiple customer classes, the development of a new procedure to account for pickups, the consideration of different operating days for the different customer classes, and the implementation of an iterative scheme to adjust the available delivery time as a function of the pickup volume. The research is most relevant for carriers that treat different classes of demand differently. The sponsoring company, for example, maintains two separate delivery fleets, one for commercial customers and the other for residential customers, but requires the commercial fleet to make all pickups. The rationale for this operational plan is based on the number of commercial pickups, which is relatively small compared to deliveries. Commercial drivers have sufficient time and capacity to make all their deliveries as well as all the commercial and residential pickups. This leaves the residential drivers more time to make their deliveries which are typically spread out over a much wider area. Accordingly, this leads to the strategic question as to whether significant cost savings can be achieved by combining the networks. Doing so would represent a major departure from current practice and is expected to occasion strong opposition from the workforce. Therefore, if such a change were imposed it would most likely be phased in over several years. Nevertheless, it would be wise to gauge the benefits that could be realized by such a move before disrupting the status quo. To address this issue, we have undertaken a major case study that examines the impact on the networks under each of the following scenarios:
Commercial and residential customers served separately (optimized baseline)
Residential drivers handling both pickups and deliveries rather than just the latter
Reclassification of a subset of residential customers to com
mercial customers who are located in an area that is predominantly commercial, and vice versa All customers served by a single fleet (full integration).
The problem is complicated by the fact that commercial operations span Monday through Friday while residential deliveries span Tuesday through Saturday. Because there are only four days of overlap, one of the challenges has been to achieve a robust design for the latter two scenarios. In performing the analysis, we relied on our column generation algorithm to determine the (minimum) number of clusters required under each scenario. For the optimized baseline, we were able to compare our results with current operations in three representative U.S. cities: Erie, Pittsburgh and Indianapolis, and found that fleet size reductions of up to 19.2% are achievable with full integration. The real advantage provided by our solutions, though, resides in the geometry of the resulting clusters, which are compact, disjoint polyhedra with limited aspect ratios as defined by their maximum horizontal distance to their maximum vertical distance. Such configurations eliminate territorial overlap as well as crisscrossing tours.
707
With respect to basic P&D operations, a parallel line of research centers on the construction of vehicle routing zones (VRZs). The standard approach to problems in this category is to cluster first-route second ([12], [19]). As in our case, much of the VRZ research assumes that demand is random and that the primary goal is to establish robust zones that will meet a certain service standard a given percent of the time; e.g., all customer requests must be met 95% of the time. For a two-stage stochastic programming model of the VRZ problem along with an approximation solution scheme, see [5]. A related but less relevant version of the CCP that has been the subject of extensive study over many years involves political districting. The goal is to partition indivisible population units, such as counties, into a predetermined number of districts such that the units in each district are contiguous, each district is geographically compact and the sum of the populations of the units in any district lies within a predetermined range. In general, the objective is to minimize some measure of compactness [24] for a given number of clusters with limited regard for the geometry of the final configuration (as evidenced by the common practice of gerrymandering). In contrast, the number of clusters and their geometry are critical in our work. In the next section, we give a formal problem statement. In Section 3, we provide an abbreviated mixed-integer program (MIP) for the relevant CCP, an outline of our solution methodology, and a description of the geometric constraints. The master problem and subproblem models used for column generation are presented in Appendix A. In Section 4, a post-processor designed to assign pickup locations to clusters is formulated as a linear program. In Section 5, we further define the scenarios investigated. This is followed in Section 6 by the analysis and a discussion of the results. Insights and conclusions are drawn in Section 7.
2. Problem statement We are given a set N of n locations with coordinates (Xi,Yi), i¼1,y,n, contained within a superimposed rectangular grid that represents the service area. For analytic purposes it is convenient to distinguish the delivery locations, ND DN from the pickup locations, NP DN even though there is considerable overlap. Each location iAND has an average daily demand di given in terms of ‘number of packages,’ and an expected service time tD i expressed in hours. Similarly, the average daily demand for pickups is pi and the service time is tPi , for all iANP. For the procedures used to calculate and validate the service time parameters, see [2] and [3]. It is assumed that the P&D fleet is homogeneous and that each vehicle has a capacity of Q packages. Legal limits restrict the amount of time a driver can be on the road to 12 h a day (TOT¼ 12) but this time is often reduced by 1 h for lunch (BREAK¼1). Thus, the total time available for P&D operations is Tmax ¼TOT BREAK. In addition, it is necessary to drive to and from the work area at the beginning and end of the day. This time is a function of the cluster and is denoted by STEM. A final assumption, following from practical considerations, is that most if not all deliveries precede all pickups, even if the customers are at the same physical location. This division is common in the backhaul literature (e.g., see [21]). Operationally, vans are loaded at the terminal starting at 5:00 am, and within one to 2 h are ready to begin their route. Most deliveries are made before 3:00 pm at which time the van is free to begin pickups. At around 6:00 or 7:00 pm, depending on that day’s volume and customer locations, operations conclude and the driver returns to the terminal. In general terms, the objective of the problem is to partition the locations in N into a minimum number of clusters such that the expected package count of the customers in each cluster does
708
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
not exceed Q and that the expected amount of time required to make all deliveries is within T max. When the commercial and residential networks are treated separately, as they currently are, T max varies between 9 and 10 h depending on the region’s density. Moreover, all pickups are made by the commercial drivers starting around 3:00 pm. Recall that this strategy was implemented because the pickup volume is much smaller than the delivery volume and the pickup locations are much more concentrated; for Pittsburgh, for example, there are 30,880 delivery locations but only 2588 pickup locations. The implication of this arrangement is that we only need to design the clusters for deliveries. Once the final configuration is determined, if i1AND, i2ANP and i1 ¼i2, then i2 is assigned to the same cluster as i1 whenever the capacity and service time constraints permit; otherwise, i2 is assigned to a nearby cluster. To avoid anomalous or extreme shapes, we define the parameter ratio (4 0) and use it in our model to constrain the geometry of each cluster. Fig. 1 depicts the current partition of the Albany, New York service area for the sponsoring company and is representative of cities throughout the U.S. As can be seen, the driver territories lack any rationale, primarily because they evolved over many years without the benefit of systematic planning. In our work, the goal is to redraw the borders so that the resultant clusters are convex, nonoverlapping sets of points. This will be the case whether the scenario calls for separate, combined, or partially integrated networks. In performing the analysis, we insist on the following.
This restriction prevents a cluster from becoming too elongated in either the X or Y direction. Fig. 2 illustrates the configuration obtained with our column generation algorithm used in this study. For the more common problem of maximizing compactness when the number of clusters is fixed, [24] provides eight measures in the context of political districting but reports that none are universally satisfactory. For discussion of the related concept of ‘‘proximity,’’ see [15].
3. Mathematical model and solution methodology For the CCP of interest, it is not possible to construct a linear model that captures its full generality. Instead, we present an abbreviated formulation with implicitly defined variables and constraints that serves as a foundation for our column generation methodology. The model only considers deliveries for a single class of customers (N¼ND) and is based on average daily demand. In the developments, we make use of the following additional notation. p Ck ck zik
Upper bound on the required number of clusters Set of locations that consitute cluster k Cost of cluster k (function of the number of locations) (Binary variable) 1 if customer i is assigned to cluster k, 0 otherwise (Binary variable) 1 if cluster k is defined, 0 otherwise
lk Model
Property 1. Let C be a cluster that contains a set of locations LDN. C is said to be continuous if location iAconv(L) ) iAL, where conv(L) is the convex hull of the locations in the set L. Property 2. Let X max X min be the length of cluster C and let Y max Y min be its width, where X max ¼max{Xi: iAC}, X min ¼min {Xi: iAC}, and similarly for Y max and Y min. C is said to be robust if its aspect ratio r ¼(X max Xmin)/(Y max Y min) lies in the interval [1/ratio, ratio].
Minimize
p X
ck lk
ð1aÞ
k¼1
Subject to
p X
zik ¼ 1,
i ¼ 1,. . ., n
ð1bÞ
k¼1 n X
di zik rQ ,
k ¼ 1,. . ., p
ð1cÞ
i¼1
From Fig. 1 we can readily see that radical modifications to the current configuration would be needed to satisfy these two properties. Clusters that are continuous are naturally dense or compact while the geometric requirement imposed by Property 2 limits the aspect ratio of each cluster to a predetermined range.
n X
tDi zik rT max STEMk ,
k ¼ 1,. . ., p
ð1dÞ
i¼1
zik r lk ,
i ¼ 1,. . ., n;
k ¼ 1,. . ., p
ð1eÞ
29
35 16
38
24
46
8
6
45
11
31 39
34 13 1
49
44
30 15 40 23 5 1848 22 3 36 42 28 43 27
17
4 26
41
9
12 47
7 37
33
19
21 20
14
10
25
Fig. 1. Current set of clusters for Albany, New York.
Fig. 2. Set of clusters for Albany obtained with the column generation algorithm.
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
709
Fig. 3. Several subregions in a service area.
C k ¼ i : zik ¼ 1,
i ¼ 1,. . ., n,
k ¼ 1,. . ., p satisfy Property 1
ð1fÞ
max 1=ratio r ðX max ðzk ÞX min ðzk ÞY min k k ðzk ÞÞ=ðY k k ðzk ÞÞ r ratio, k ¼ 1,. . ., p
ð1gÞ
zik A 0,1, lk A 0,1,
ð1hÞ
i ¼ 1,. . ., n;
k ¼ 1,. . ., p
The objective function (1a) is designed to minimize the total weighted number of clusters (equivalently, drivers) required to accommodate all n locations in the service area. The cost of a cluster k is dominated by the vehicle cost, which can be estimated as the sum of the amortized ownership and operating costs. Because our focus is on long-term planning which involves the partition of a service area into compact work areas (or clusters), our main concern is not with optimizing the transportation routing costs that arise from daily operations. Such costs can vary sharply from one day to the next, and can best be addressed using a daily routing model. Nevertheless, designing compact clusters, or work areas, is supportive of the short-term objective of minimizing daily routing costs. In the implementation ck ¼1 for all k since the vehicles are identical. Constraints (1b) ensure that each location is assigned to exactly one cluster while (1c) defines the capacity constraints. The total expected service time (which includes travel time to next location) is limited in (1d) to no more than the number of hours available in a day. Although not indicated explicitly, STEMk is a nonconvex function of the decision variables zik so any attempt at solution requires some degree of approximation. In our approach, we assume that STEMk is the time to go from the terminal to the nearest location in cluster k. Constraints (1e) in conjunction with (1a) impose a setup cost for each additional cluster that is created. The cluster continuity requirements are stated implicitly in (1f) because they cannot be written efficiently as linear inequalities. Property 2 is enforced with (1g), where the min and max variables are functions of zk ¼(z1k,z2k,y,znk) for a given cluster k. Finally, the variables are defined in constraints (1h). In practice, the binary restriction on lk can be relaxed since these variables will always be integral when zik are 0 or 1. 3.1. Column generation Real instances of model (1) contain anywhere from 5000 to 50,000 locations. For the basic CCP (e.g., see [23]), current
technology limits the reliable attainment of exact solutions with a commercial code like CPLEX to instances with no more than 40 data points, and to instances with up to 100 data points for specialized codes (e.g., see [14,16]). As such, we developed a column generation algorithm loosely based on Dantzig–Wolfe (D–W) decomposition. Recall that the D–W approach is best suited for optimization problems which have a set of complicating constraints that are used to form the master problem and a series of ‘‘staircase’’ constraints that are used to form the pricing subproblems (e.g., see [22]). Since this structure does not naturally exist in our problem in a form that we could exploit, the first step was to artificially partition the feasible region into overlapping subregions, each symmetric and centered at a seed. This allowed us to set up a corresponding number of pricing subproblems using dual variable information from the master problem (models for both problems are given in Appendix A; there is one subproblem for each subregion). More importantly, though, it allowed us to explicitly write out an efficient set of linear constraints (given below) that capture the geometric requirements in Properties 1 and 2. Fig. 3 depicts several subregions within the service area. The white dots at their center represent seeds; the uncovered black dots are locations contained in subregions not shown. The rectangular blocks used to build the subregions derive from the capacity, service time and geometric constraints. By construction, a feasible cluster must lie wholly within one of the rectangles. Note that we decided on overlapping rather than disjoint subregions to increase the diversity of possible solutions. The details can be found in [10]; a simpler version of our column generation algorithm which relies on explicit enumeration is described in [9]. Nevertheless, the major difficulty in applying the D–W approach to (1) is that it requires the creation of a large number of subregions to ensure that each location is adequately covered. This results in an equivalent number of nontrivial mixed-integer subproblems given by Eqs. (5a–5n) in Appendix A.2 that must be solved at each D–W iteration. Essentially, each subregion gives rise to a subproblem, and the solution to each subproblem results in a cluster that will be added to the master problem (4a–4c) in Appendix A.1 if its objective function value is negative. But even after column generation converges, we only have the linear programming solution of (4), not the optimal integer solution. This is because it is necessary to treat the l variables as continuous when solving the master problem (4). All that the resultant LP solution provides is a lower bound on the original objective function (1a).
710
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
At this point, branch and bound is generally used to find integral solutions, leading to what is called branch and price. However, because of the extensive runtimes associated with the subproblems, we decided that branch and bound would be too costly to pursue. As a compromise, we limited our efforts to finding integral solutions to the master problem using a variable fixing heuristic. This allowed us to balance the prospect of obtaining good solutions with the goal of reasonable runtimes. As a first step, we tried to solve the master problem as an IP with a commercial code but this proved impossible. We believe the difficulty stemmed from the relatively high density of the columns – several hundred entries of 1 were typical for a cluster – which resulted in highly fractional solutions for the master problem variables, l. The corresponding search tree grew exponentially large with no signs of convergence after days of computations. Given this experience, we developed the following methodology, the first five steps of which are discussed in [2] and [10]. 1. Aggregate customers who are likely to be in the same cluster in the final solution. 2. Identify seed locations for centering the subregions. 3. Construct a subregion around each seed taking into account the capacity, time and geometric constraints. 4. Use column generation and variable fixing to derive feasible solutions. 5. Remove overlap amongst the clusters to arrive at the final solution. 6. Assign pickup locations to clusters. The first step involves preprocessing the input data to identify compatible customers who are within a maximum distance of each other and then merging them to form an aggregate customer or location. The aim is to reduce the computational burden by limiting the number of locations to no more than 10,000, which translates into an equivalent number of rows in the master problem. It is common to consolidate customer orders when shipping from manufacturing and distribution centers. Care must be taken, though, to avoid creating large orders that might preclude good options when the final clustering is performed. In P&D operations, forming an aggregate customer whose total package count requires 20 or 30% of a vehicle’s capacity, may result in poor solutions. This can be seen by simply viewing the problem as one of bin packing; many fewer bins are likely to be needed for packing a large number of small items than a small number of large items, assuming that the total volume is the same. Similar consideration is given to define population units when constructing political districts, although in that case the aggregation is much greater (e.g., see [15]). Steps 2 and 3 are designed to create the pricing subproblems. The service area is first divided into a set of nonoverlapping rectangles such that each contains a unique subset of the n customers such that their expected demand is approximately 0.05Q packages. For each rectangle, the customer closest to the center is selected as the seed. Depending on the data and the size of the service area, anywhere from 250 to 2300 subregions might be created in Step 3. However, solving all of the corresponding subproblems at each D–W iteration proved too time consuming so we limited the number to 5% of the total or 50, whichever was larger. Tabu search logic was used for subproblem management (see [9]). At Step 4, a modified column generation algorithm is used to populate the master problem starting with an artificial basis. A pseudocode is given in Appendix A. To investigate the various scenarios described in Section 5, it was necessary to make a number of minor modifications to the existing code. After feasibility is obtained, a limited number of new columns are added until some threshold is reached. At that point, a restricted
set covering problem is solved to fix between two and four master problem variables to 1. The rows (locations) that are covered by columns associated with the variables that have a value of 1 in the solution are then removed from the formulation and the process is repeated until about 2000 locations remain uncovered. The resultant master problem is then solved as an IP, terminating with a feasible integral solution to the original problem. Because the subregions overlap, the clusters obtained at Step 4 invariably overlapped as well, so at Step 5 a greedy algorithm is used to associate each location with a unique cluster (see [9]). Given the final configuration, we attempt to associate each pickup location with a unique cluster, this time by solving a relaxed generalized assignment problem at Step 6. The model is described in the next section and represents a new addition to our P&D methodologies. The column generation algorithm was previously run on six data sets provided by the sponsoring company. The results showed that vehicle reductions of up to 22% were possible without compromising service. The analysis was limited, though, to the commercial network so it was unclear how much additional savings could be realized with either partial or full network integration. This issue is taken up in Section 5. However, in order to conduct the analysis, the following modifications to our existing methodology were required. First, we had to allow for two classes of customers (commercial and residential), and conduct runs that reflected scenarios involving complete separation as well as partial and complete integration of the two networks. Since commercial operations span Monday through Friday and residential operations cover Tuesday through Saturday, it was necessary to analyze the Monday, Tuesday through Friday, and Saturday operations separately. Second, it was necessary to conceive and implement a procedure for dealing with pickups, especially when an individual driver did not have sufficient capacity to accommodate all the customers assigned to his route. This led to the formulation of a generalized assignment problem that was then integrated with the existing methodology. An important property of the formulation is its bias towards splitting pickups of only the largest requests. Finally, we developed a structured procedure for allocating delivery and pickup times over the 11 working hours in the day. By iterating over half-hour increments for delivery, we were able to converge to an ‘‘optimal’’ partition of the day that readily generalizes to a majority of regions served by the sponsoring company. 3.2. Implementing the geometric constraints In order to represent the contiguity and compactness requirements implied by Property 1, and to create subproblems for column generation, we impose the following restriction on each cluster. Property 3. Any cluster generated for a subregion should span a symmetric rectangle centered at the seed. By implication, a feasible cluster k must include all locations min within the rectangle determined by the four points ðX min k , Y k Þ, min max max max max min ðX k , Y k Þ. ðX k , Y k Þ, ðX k , Y k Þ that define the ratio constraint (1g). To enforce this requirement, which is implicitly stated in (1f) in the context of column generation, we associate each cluster k with a subregion r and let zir zik, Next, we introduce two new sets of binary variables, four new inequalities, and some additional notation. Let nr denote the number of locations enclosed within subregion r, let List_X¼([1], [2],y, [nr 1]) be a list of the nr 1 locations (minus the seed) ordered by their increasing absolute X distance from the seed, and let List_Y¼(/1S, /2S,y,/nr 1S) be a list of the nr 1 locations ordered by their increasing absolute Y distance from the seed. The
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
711
two new sets of binary variables are xir and yir, i¼1,y,nr, rAR, and the new constraints are
tik
zir r xir ,zir r yir ,zir Zxir þ yir 1,i ¼ 1,. . .,nr
ð1f:1Þ
cik
x½a,r r x½a1,r ,a ¼ nr ,nr 1,. . .,2
ð1f:2Þ
tremain k
y o b 4 ,r r y o b1 4 ,r ,b ¼ nr ,nr 1,. . .,2
ð1f:3Þ
xir A f0,1g,yir A f0,1g,zir A f0,1g,i ¼ 1,. . .,nr
ð1f:4Þ
The first step is to calculate the distance from each pickup location i to the centroid of cluster k and then calculate the corresponding travel time. This value is used to derive the objective function coefficients. We also need to determine the time remaining in each cluster that can be used for pickup operations. The calculations are: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Distik ¼ ð53ðX i X k ÞÞ2 þð69:1ðY i Y k ÞÞ2
The first set of constraints (1f.1) requires that for location i to be in the cluster (that is, for zir ¼1), the accounting variables xir and yir must also be 1. According to constraints (1f.2) and (1f.3), however, the latter variables can only be 1 when those x and y variables associated with locations that are closer to the seed are also 1. A minor simplification to (1f.4) is that zir can be relaxed and treated as continuous rather than binary in the full model. In additional to (1f.1–1f.4), it is possible to include a series of valid inequalities to model (1) that measurably tightens its relaxed feasible region. These are presented in [3] along with their polyhedral properties.
time to travel from location i to centroid of cluster k (hours) cost of assigning the pickup volume associated with location i to cluster k time remaining in cluster k (hours)
tik ¼ Distik =MPHði,kÞ ( cik ¼
0,
if i A C k
tik , otherwise
tremain ¼ TOTBreakSTEM k k
X
tDi zik
i A Ck
4. Assigning pickup locations to clusters Upon obtaining a solution to model (1a–1h), it is necessary to test the feasibility of allocating pickup locations to the set of ‘‘optimal’’ clusters. It is not likely, though, that all pickup locations can be individually assigned to a unique cluster without disaggregating their demand into any number of fractional components; in some cases, the expected demand of a location exceeds the vehicle capacity. We handle this situation by finding a continuous rather than integral solution to a general assignment problem (GAP), letting the optimization process determine how the loads should be split. If the results indicate that there is insufficient time remaining in the day to perform the pickups, a new optimal set of clusters is found but with fewer hours allowed for deliveries. The following additional notation is used to develop the assignment model.Clustering solution n
C Ck
zik ðX k ,Y k Þ STEMk
set of optimal clusters; n ¼9C 9 and C ¼{C1,y,Cn } set of delivery locations in cluster kAC n; nk ¼9Ck9 and k k k C k ¼ i1 ,i2 ,. . .,ink One if delivery location i is in cluster k, 0 otherwise X- and Y-coordinates of centroid of cluster k stem time from cluster k back to the terminal [assumed to be the same as the outbound value determined when model (1) is solved] n
n
n
n
Parameters (Xi,Yi)
X- and Y-coordinates of location i (given in terms of longitude and latitude) MPH(i,k) speed when traveling between location i and the centroid of cluster k Decision variable zik
fraction of pick up volume at location i assigned to cluster k
Computed values Distik
distance from location i to the centroid of cluster k (miles)
where the numerical values in the equation for Distik are needed to convert longitude and latitude measurements to miles. Generalized assignment problem X X Minimize cik zik ð2aÞ i A P k A Cn
Subject to
X
zik ¼ 1, 8i A P
ð2bÞ
k A Cn
X pi zik r Q , 8kA C n
ð2cÞ
iAP
X
tPi zik r tremain ,8k A C n k
ð2dÞ
iAP
zik A ½0,1,8i A P,k A C n
ð2eÞ
The objective function (2a) assigns fractions of pickup demand to the clusters that constitute the optimal set C n. Constraints (2b) ensure that all locations are serviced, while constraints (2c) and (2d), respectively limit the number of packages to be picked up in each cluster to the capacity Q of the vehicle and guarantee that no more than the time remaining is used for the pickups. The variables are restricted to the unit interval in (2e). Ideally, we would like to solve model (2a–2e) as an integer program but practical considerations argue against it. Expected pickup quantities vary widely, ranging from a fraction of a unit to more than a truckload. In the latter case, several vehicles are required to meet demand so some amount of disaggregation is required. To increase the likelihood that pickup location i is assigned to the same cluster that provides deliveries to location i, at least for low volume demand, the objective function coefficients are replaced with the following. ( 1=pi , if i A C k cik ¼ ð3Þ tik , otherwise This definition indicates that the smaller the pickup load at location i the more attractive it is to assign it to the vehicle that makes the deliveries to that location. The functional form in (3) is motivated by the desire to minimize splitting small volumes among different clusters as well as limiting the assignment of small volumes to distant clusters, even if intact. We will see in Section 6 that in the solution to model (2,3), less than 2% of the pickup demand is split.
712
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
5. Scenarios As previously described, the sponsoring company currently operates two separate networks-one for commercial customers and the second for residential customers. Depending on the service area, deliveries take between 8 and 10 h and precede all pickups, which can consume the remainder of the day. Commercial operations span Monday through Friday and include all residential pickups; residential operations take place on Tuesday through Saturday and involve only deliveries. The rationale for this design is historical. Originally, the company only served commercial customers. When it was decided to expand into the residential market, it was not deemed to be costeffective to use a single fleet to serve both sets of customers because residential demand was comparatively small and geographically segregated from the commercial market. Over the last 15 years, the residential network has grown substantially and now has significant overlap with the commercial network. This has led to several questions about changing the way both customers and their service requests are handled. The basic question, of course, is whether or not to integrate the two networks; if not fully, then perhaps partially? Also, if the networks are going to remain independent, should the responsibility for residential pickups be assigned to the residential drivers? To answer these questions, we examine the following four network options. 1. (Optimized baseline) Current design with separate commercial and residential networks. 2. (Integrated) A single, fully integrated network. 3. (Residential pickups) Current design but with the residential drivers making pickups. 4. (80–20%) Reclassification of customers to reflect the character of the neighborhood in which they reside, but leaving two networks. The fourth scenario represents a compromise between the first and second. Commercial customers who are located in an area in which 80% of their 100 closest neighbors are residential customers are reclassified as residential, and vice verse. The objective is to remove some of the overlap in the individual routes that occurs in the two networks while keeping them mostly intact. The 80–20 reclassification was recommended by the sponsor after having done a preliminary study. They did not want to go much below that ratio since any further reduction (say, to 70–30) was shown to result in the shifting of ‘‘too many’’ residential customers to the commercial network, thus undermining the two-network structure. We also found that going much above that ratio (say, to 90–10) reduced the benefits from partial network integration. Although the 80–20 division is somewhat arbitrary it seemed to strike a good balance because it reflects the average division between commercial and residential customers in the integrated scenario. Implementation details are described in the next section.
6. Case study results The analysis centered on three representative service areas located in the eastern portion of the U.S. Table 1 identifies the areas and the average, current vehicle count per day. The last column gives the bin packing lower bound on the number of vehicles needed for all pickup operations and was calculated by dividing the total number of packages by 300—the vehicle capacity. All algorithms were coded in Mosel 3.0.3 and all optimization problems were solved with FICO Xpress Release 7.0.2. A 3.16 GHz Xeon 64-bit PC with 16 GBytes of RAM was used for the computations.
Table 1 Current vehicle count. Service area Commercial Residential Total
LB for pickups
Mon Tue–Fri Sat Erie 21 Pittsburgh 85 Indianapolis 93
11 42 37
21 85 93
32 127 130
11 42 37
15 80 80
In conducting the analysis we report the results for either five or six runs for each service area, depending on the number of hours allowed for deliveries. Our goal was to use as many delivery hours as possible, not to exceed 9, while ensuring that the remaining time is sufficient to handle the pickups for the scenario. The results served as the basis for determining the performance of each network design option mentioned above. Table 2 lists the runs and provides a short explanation of each. After a solution was obtained for the commercial runs, a feasibility check was made by solving the GAP, model (2), to see if sufficient time remained in the day to make the pickups. Initially, the amount of time allowed for deliveries was set at 9 h, but if this proved insufficient for pickups, it was decreased in 1/2-h increments down to a minimum of 7.5 h. The Commercial 2 runs allow 9 h a day for deliveries and are paired with the residential pickups design option. The extra time is justified because the drivers are now only responsible for deliveries and pickups for commercial customers. With respect to reclassification, for each residential location we check its closest 100 neighbors and if 80% of the total delivery packages are commercial, the location is tagged as commercial. Once all residential locations are examined, the tagged locations are reclassified as commercial. Next, we go through commercial locations without pickups and perform a similar check; namely, if 80% of the delivery packages of the 100 closest locations are residential, then the location is tagged as residential. Once all the commercial locations are processed, the tagged locations are made permanent. Typically, many residential locations get reclassified as commercial but only a handful of commercial locations are reclassified as residential. The Commercial-R and ResidentialR runs are used to evaluate the 80–20% scenario. 6.1. Input data and subregion construction The data were provided by an international P&D carrier and reflect average demand for September 2010. Table 3 highlights the input statistics for each area after aggregation. Columns 2 and 3, respectively list the number of original customers and the resultant number of composite locations after aggregation, which is down anywhere from 69% for Erie commercial which has 9317 original customers, to 79% for Pittsburgh commercial which has 31,286 original customers. Similar reductions can be seen for the residential networks. The total number of locations, the average package count per stop, and the average probability of a stop per location per day are reported for deliveries and pickups in columns 4–9. As expected, the latter two statistics increased with respect to the original data. Column 10 gives the average number of customers at an aggregate location. These values increase linearly with the number of original customers. The last two columns report the solution time for the preprocessing step of calculating the distance matrix and the expected time to the next customer for all aggregate locations, and the solution time for the aggregation algorithm, respectively. Table 4 summaries the output of the seed selection algorithm and the subregion construction procedure. Columns 2 and 3 list
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
713
Table 2 Computational experiments. Run
Description
Commercial 1
Optimized current commercial network. The allowable number of deliveries hours for the run is the largest value in the set {7.5, 8.0, 8.5, 9.0} that results in a solution that can feasibly handle the pickups within the hours remaining for each cluster. See Eq. (2d). If Commercial 1 is run with allowable delivery hours o 9, a second run is made with allowable delivery hours ¼ 9. The results are used in the residential pickups scenario. Optimized residential network. Generally, 9 delivery hours are used. An exception is made for Indianapolis, where 10 delivery hours are used due to the rural nature of the service area. Optimized ‘‘reclassified’’ commercial network. Any residential location whose ‘‘neighborhood’’ is 80% commercial is reclassified as a commercial location. The allowable number of delivery hours for the run is the largest value in the set {7.5, 8.0, 8.5, 9.0} that results in a solution that can feasibly handle the pickups within the hours remaining for each cluster. Optimized ‘‘reclassified’’ residential network. Any commercial location without pickups whose ‘‘neighborhood’’ is 80% residential is reclassified as a residential location. The commercial and residential networks are combined and a single fleet is used to service the single network. The allowable number of deliveries hours for the run is the largest value in the set {7.5, 8.0, 8.5, 9.0} that results in a solution that can feasibly handle the pickups within the hours remaining for each cluster.
Commercial 2 Residential Commercial-R
Residential-R Combined
Table 3 Input data for runs after aggregation. Service area
Erie commercial Erie residential Erie commercial-R Erie residential-R Erie combined Pittsburgh commercial (1&2) Pittsburgh residential Pittsburgh commercial-R Pittsburgh residential-R Pittsburgh combined Indianapolis—commercial Indianapolis residential Indianapolis commercial-R Indianapolis residential-R Indianapolis combined
Total no. of original P&D customers
Total no. of aggregate P&D locations
Deliveries
Pickups
Solution times
Total no. locs.
Avg. no. packs/ stop
Avg. prob. Total of stop/loc. no. locs.
Avg. no. packs/ stop
Avg. prob. Avg. no. of stop/loc. cust./ agg.
Travel time calc. (sec)
Agg. process (sec)
9,317 10,425 11,846 7,896 17,076 31,286
2,902 3,682 3,181 3,065 4,829 6,572
2,857 3,682 3,140 3,065 4,811 6,529
2.4 1.3 2.8 1.3 2.7 3.8
0.29 0.17 0.28 0.16 0.25 0.43
453 – 457 – 462 1,689
7.6 – 7.5 – 7.5 15.4
0.44 – 0.44 – 0.43 0.43
6.6 6.1 7.8 5.8 7.3 7.4
6.8 8.5 7.4 7.3 11.0 13.8
3.6 3.5 5.9 1.9 10.1 28.2
40,819 41,344 30,761 57,492 34,561 45,521 46,316 33,766 66,901
6,871 6,937 5,693 8,290 8,316 9,346 8,932 7,525 11,580
6,871 6,898 5,693 8,275 8,264 9,346 8,884 7,525 11,557
1.6 4.5 1.5 4.8 3.7 1.4 4.4 1.4 4.3
0.32 0.43 0.30 0.47 0.39 0.27 0.39 0.25 0.39
– 1,717 – 1,722 1,791 – 1,791 – 1,821
– 15.2 – 15.2 14.9 – 14.8 – 14.8
– 0.43 – 0.43 0.45 – 0.45 – 0.44
9.0 9.4 8.4 10.4 6.8 8.6 8.9 8.2 10.0
14.6 14.6 12.1 17.4 18.8 19.6 18.7 15.8 24.1
46.8 50.2 27.1 90.6 28.9 45.8 51.9 25.4 98.8
Table 4 Seed selection and subregion construction for runs. Service area
No. of seeds
Time to calc. seeds (s)
Avg. no. critical points/subregion
Min–Max critical points
Avg. no. agg. locations/subregion
No. agg. locs./ subregion (range)
Time to generate subregions (s)
Erie commercial Erie residential Erie commercial-R Erie residential-R Erie combined Pittsburgh commercial 1 Pittsburgh commercial 2 Pittsburgh residential Pittsburgh commercial-R Pittsburgh residential-R Pittsburgh combined Indianapolis—commercial Indianapolis Residential Indianapolis Commercial-R Indianapolis Residential-R Indianapolis combined
288 88 300 85 355 1316 1305 486 1405 424 1658 1430 542 1545 464 1871
0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2
17.6 28.5 18.4 26.3 21.9 19.8 19.5 31.9 19.7 31.8 20.7 18.5 41.7 18.8 41.4 21.0
1–50 12–62 1–55 17–58 1–54 1–58 1–60 13–70 1–53 16–70 1–51 1–52 18–76 1–57 19–72 1–61
359.3 898.1 366.2 924.4 446.8 273.5 296.4 648.7 289.2 696.1 264.8 288.5 1,017.3 284.0 1,128.3 311.3
6–987 336–1,722 5–1,013 418–1743 3–1389 3–952 3–1080 116–1551 2–1046 146–1421 3–1239 5–1158 381–1905 5–1249 379–1904 6–1326
44.2 48.2 50.6 43.7 102.5 274.6 297.2 272.1 327.1 233.3 406.4 404.0 650.8 438.1 540.5 740.9
the number of seeds generated and the total time of the calculations, respectively. Columns 4 and 5 report the average number of critical points and their range for each subregion. For a given cluster, a critical point is the closest customer in the longitudinal
direction such that adding it to the cluster would result in a violation of either a capacity or service time constraint. They are an essential component of the construction procedure. The size of model (1) increases by two constraints and two variables for each
714
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
Table 5 Solutions for scenarios by service area. Scenario by service area Erie Optimized base Residential pickups 80–20% Integrated Pittsburgh Optimized base Residential pickups 80–20% Integrated Indianapolis Optimized base Residential pickups 80–20% Integrated
Available delivery hours/clustera
Avg. delivery hours/clusterb
9.0/9.0: 9.0/9.0
8.3/8.5/8.9
57.4
61.7
19/28/9
9.5/12.5/18.2
2/4/2
20
9.0/9.0: 9.0/9.0
8.3/8.5/8.9
65.1
84.7
19/28/9
9.5/12.5/18.2
2/4/2
20
9.0/9.0: 9.0/9.0 9.0/9.0/9.0
8.2/8.4/8.9 8.2/8.6/8.9
59.4 64.3
70.4 75.7
21/29/8 19/26/9
0.0/6.2/27.3 9.5/18.8/18.2
0/3/3 2/6/2
15 28
7.5/7.5: 9.0/9.0
6.7/7.4/8.9
230.4
385.3
81/120/39
4.7/5.5/7.1
4/7/3
35
7.5/9.0: 9.0/9.0
7.4/7.9/8.9
251.0
336.5
81/114/39
4.7/10.2/7.1
4/13/3
59
9.0/9.0: 9.0/9.0 7.5/9.0/9.0
7.6/7.9/8.9 6.7/8.0/8.9
231.0 250.2
323.3 363.4
82/112/30 81/105/39
3.5/11.8/28.6 4.7/19.2/7.1
3/15/12 4/22/3
75 95
9.0/9.0: 10.0/10.0
6.8/7.7/9.9
250.2
429.3
92/130/38
1.1/0.0/ 2.7
1/0/–1
0
9.0/9.0: 10.0/10.0
6.8/7.7/9.9
277.9
490.5
92/130/38
1.1/0.0/ 2.7
1/0/–1
0
9.0/9.0: 10.0/10.0 9.0/9.0/10.0
7.1/7.8/10.0 6.8/7.7/9.9
254.7 281.5
422.3 455.2
96/125/29 4.3/3.8/21.6 92/121/38 1.1/6.9/ 2.7
Lower bound for pickup Available pickup hours/day hours/day
No. of clusters
Reduction (%)
No. vehicles saved
3/5/8 1/9/ 1
No. vehicledays saved
25 36
a The first term a/b in the notation a/b:c/d refers to the available hours for commercial delivery on Mon and Tue–Fri, respectively; the second term c/d refers to the available hours for residential delivery on Tue–Fri and Sat, respectively. For the integrated network only one value is used for the available delivery hours on Mon/Tue–Fri/Sat. b In general, entries with three values a/b/c correspond to results for Mon/Tue–Fri/Sat.
critical point in each subregion. In many cases the minimum number is 1 which is a result of a preprocessing step that removes locations from a subregion when including them in any cluster would lead to a violation of the ratio constraints (1g). Columns 6 and 7, respectively give the average number of aggregate locations and the minimum and maximum for each subregion. The average values range, for example, from 273.5 for Pittsburgh commercial which has 1316 subregions to 359.3 for Erie commercial which has 288 subregions. No discernible relationship exists between the number of aggregate locations in a cluster and the number of subregions, but the density of a subregion plays a partial role. The last column gives the time required to generate the subregions, and is less than 13 minutes in all cases. 6.2. Scenario analysis and computations The results for the four scenarios are highlighted in Table 5 (those interested in the statistics associated with the column generation computations are referred to Appendix B). Column 2 provides the delivery hours available for each cluster. For the first three scenarios there are two separate fields, each containing two entries. The first field contains commercial hours for Monday and then Tuesday through Friday. The second contains residential hours for Tuesday through Friday and Saturday, respectively. We need to isolate Monday and Saturday because only one service is provided on these days. Looking at the scenarios for Indianapolis we see that 10 h are allowed for residential deliveries, while for the other terminals, at most 9 h are allowed. This is because a large portion of Indianapolis is rural and locations are farther apart. This translates into more time on the road for the drivers even though the average package count per location is consistent with the other regions. Column 3 gives the average delivery hours used in each cluster for Monday, Tuesday–Friday, and Saturday, respectively. In general, the most slack exists on Monday when only commercial service is provided. As opposed to residential locations, commercial locations are much more densely packed so the limiting factor is predominantly the capacity, Eq. (1c), of the vehicle rather than
the drive time, Eq. (1d). On Saturday, virtually all the time available for residential operations is used. The lower bound for pickup hours per day reported in column 4 is computed by summing the expected travel time from one location to the next, the expected service time at each location, and the stem time for each cluster [see [3] for more detail] and then dividing the result by the number of clusters for Tue–Fri (middle entry in column 7). The total number of hours available for pickups each day is reported in column 6. The difference between the available hours and the lower bound gives a measure of how tight the pickup operations are likely to be. For the four Erie scenarios, the average difference is 11.6 h/day which is fairly tight. For Pittsburgh and Indianapolis, the differences are 111.5 and 183.3 h/day, respectively, which suggest that the limiting factor is vehicle capacity rather than time for the corresponding clusters. The most critical results are contained in the next three columns which report the total number of clusters, the percent reduction in fleet size with respect to current operations, and the number of vehicles saved. These values are translated into the number of vehicle-days saved per week in the last column. As expected, the integrated scenario provided the best results but there was also tangible improvement when we simply optimized the current fleet. For Erie, the smallest of the three regions, we saved 4 vehicles on Tue–Fri, and 2 vehicles on both Mon and Sat. Identical savings were realized for the second scenario where the residential drivers are required to do pickups as well as deliveries, rather than just deliveries as is now the case. Therefore, it is questionable whether any advantage exists in pursuing this scenario for Erie (as well as for Indianapolis). For the 80–20 scenario, there is also some improvement but less than for the first two. Reclassifying residential locations as commercial and vice versa produced an increase of 2 vehicles on Mon and an increase of 1 vehicle on Tue–Fri. The 1-vehicle decrease on Sat was not sufficient to offset the former increases. Continuing with Erie, the greatest saving was achieved for Tue–Fri when the networks were fully integrated. Six vehicles
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
715
Table 6 Ratio of commercial to residential locations per cluster for integrated network. Service area
Erie Pittsburgh Indianapolis
No. of clusters
26 105 121
Commercial
Residential
Avg. % locations
Max % locations
Min % locations
Avg. % locations
Max % locations
Min % locations
78.3 79.0 81.5
99.9 100.0 100.0
51.7 60.3 44.2
21.7 21.0 18.5
48.3 39.7 55.8
0.1 0.0 0.0
Table 7 Composition of clusters for 80–20% scenario. Service area
No. of clusters (Tue–Fri)
Current no. of vehicles (Tues–Fri)
Mixed (%)
Pure commercial (%)
Pure residential (%)
Avg. comm.-res. ratio for mixed commercial clusters (%–%)
Erie Pittsburgh Indianapolis
29 112 125
32 127 130
93.2 75.0 73.6
3.4 2.7 8.8
3.4 22.3 17.6
94.1–5.9 94.0–6.0 95.1–4.9
were eliminated and 28 vehicle-days were saved, primarily by identifying clusters that better balanced the available capacity and work time. At $125,000 per vehicle per year, this translates into an annual savings of $750,000. Notice that the results for Mon and Sat were the same for the optimized baseline and the integrated scenarios. This is always the case because they are derived from the same runs. Similar results were obtained for Pittsburgh. The optimized baseline decreased the number of vehicles required on Tue–Fri from 127 to 120, or 5.5%, while the total number of vehicle-days saved per week was 35. Also, having residential drivers do their own pickups now noticeably decreased the number of vehicles required by 13 for the middle four days. For the 80–20 scenario, we see a surprisingly large decrease in the number of vehicles required on Sat for residential operations. The drop from 42 to 30 was mostly due to the large number of residential reclassifications coupled with ample commercial capacity. A closer look at the Residential-R run revealed a decrease in the number of aggregate locations to 82.9% of the original number with an accompanying 74.2% decrease in package volume. The 30 vehicles now required seems reasonable since it is 76.9% of the optimized baseline solution of 39. Again, the dominant solution was associated with the integrated scenario, which evidenced a savings of 22 vehicles on Tue–Fri. Using a unit cost of $125,000, this is equivalent to an annual savings of $2,750,000. For Indianapolis, the results were less impressive. Neither the optimized baseline nor the residential pickups scenarios were able to reduce the current fleet size for commercial operations, and, in fact, required one additional vehicle for residential operations. Given that the number of packages per stop for Indianapolis is approximately the same as for Erie and Pittsburgh, we believe that this result can be explained by the relative sparseness of the Indianapolis region and the use of long days. Because locations are more spread out, drivers must travel greater distances to service their customers and hence are forced to work up to 11 h each day. Although we could have extended the number of delivery hours available to 11 h in our runs, we opted for relative consistency across the three regions and only increased the delivery hours to 10 for the Indianapolis residential network. For Tue–Fri in the integrated scenario, we enforced the restriction of 9 h for delivery. Had we relaxed this value to, say, 9.5 or 10 h additional savings would have been realized. To be conservative, though, we imposed the same requirement of 9 h for both commercial and residential deliveries despite the fact that the base case for the latter is 10 h.
6.2.1. Further analysis of mixed-location scenarios Both the fully integrated scenario and the 80–20 scenario include clusters that contain a mixture of commercial and residential customers, and both provide a measurable advantage over the current configuration and the optimized baseline. As the results in Table 5 show, if the two networks are merged, the cost savings would be significant, not only from the approximate 15% reduction in fleet size but from the administrative savings that would be realized by combining, where possible, the two independently run terminals in each region. When full integration is not possible due to contractual restrictions, union agreements or company policies, a compromise as defined by the 80–20 scenario may be the best option. To better understand the implications of these changes, we need to know the degree to which partial or full integration will alter the character of the current clusters. Table 6 provides this information for the fully integrated case. The headings are mostly self-explanatory. For Erie, which would now require 26 rather than 32 vehicles, the average division of commercial to residential locations would be 78.3% to 21.7%. The maximum and minimum percentages are 99.9 and 51.7, respectively for commercial locations and 48.3 and 0.1 for residential locations. The results are quite similar for Pittsburgh and Indianapolis. Interestingly, the minimum percentage of commercial locations in a cluster is 44.2, while the minimum percentage of residential locations in a cluster is virtually 0 in all regions. This is because there are many more commercial customers, and most are concentrated in relatively compact geographic areas. The residential customers are more widely dispersed and generally located in the outskirts of the core business districts. Their numbers in a cluster never exceeded 55.8%. A final observation from the statistics in Table 6 is that the average ratio of commercial to residential locations is approximately 80/20 for all three regions. When we reclassify customers based on their neighborhoods, we get the results reported in Table 7. Compared to the integrated case, the average number of vehicles required increased by 7.2%, but decreased by 8.8% with respect to current number in use (see columns 2 and 3). For Erie, 93.2% of the clusters are mixed after reclassification while only 3.4% are pure commercial and pure residential, on average. The mixed percentages drop to 75.0 for Pittsburgh and 73.6 for Indianapolis but for pure residential, the percentages increase substantially to 22.3 and 17.6, respectively. These two regions have more isolated suburbs than Erie. The last column in Table 7 gives the average ratio of commercial to residential percentages
716
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
them. In practice, such customers are often visited several times during the day. Those with small requests are never split among drivers—it is always possible to load a few more packages.
for the mixed commercial clusters (those with no more than 20% residential locations). In all cases, the ratio is about 95/5. In contrast, the mixed residential clusters are almost 100% residential because there are very few commercial customers in those neighborhoods.
7. Conclusions 6.2.2. Pickup assignments When model (1) is solved for the various runs, only deliveries are explicitly included, the assumption being that sufficient time will exist afterwards to make the pickups. To validate this assumption, we solve model (2) to obtain cluster assignments for each pickup location. Ideally, the clusters obtained from model (1) would be able to handle all the pickups for their delivery customers; if not, we assign the pickups to neighboring delivery clusters. Also, recall that some splitting of pickups is inevitable because the demand for a small number of locations exceeds the capacity of a truck. Table 8 highlights the results obtained with model (2) for all scenarios and regions. For Erie, for example, there is an average of 2.2 h available for pickups per vehicle each day, and 97.6% of the pickup locations are assigned to their corresponding delivery cluster. An additional 1.8% are partially assigned. In total, 97.8% of all pickup locations are kept intact and of those that aren’t, 2% are split between 2 clusters and 0.2% are split among 3 or more clusters. The results for the other two regions are nearly identical. Looking at the columns ‘‘% of pickups assigned to delivery cluster’’ and ‘‘% of pickups not split’’ we see that the former averages 97.8% and the latter averages 98.3%, which more than validates the assignment procedure. In fact, these percentages are much higher than we anticipated given that the sponsoring company has not been able to achieve assignments nearly as consistent, even with a greater number of vehicles. Finally, the column ‘‘Avg. no. pkgs. for split pickups’’ gives the average number of packages at pickup locations that are split between two or more clusters. For Erie, for example, of the 10 ( ¼454 2.2%) pickup locations that are split, the average package count is 62.9 for the optimized baseline. These statistics represents a small fraction of the total number of pickups and can be easily handled by neighboring drivers. Only customers with large requests are split and then only a few of
The principal means by which management can reduce the costs of operating P&D networks is by reducing the number of vehicles used to provide service. The most effective way of doing this is by creating compact, contiguous work areas for the drivers. In the analysis presented here, we have shown that by rationalizing the current network structure of an international carrier, an average savings of 6% can be achieved simply by optimizing the route assignments. If the two separate networks now in use in each region are fully integrated, the new structure would produce savings of 15% on average, and perhaps more if the assumed available time on the road was marginally increased.
Table 9 Column generation phase I computations for runs. Service area
No. iterations
Final no. columns
Time I (min)
IP solution
Erie commercial Erie residential Erie commercial-R Erie residential-R Erie combined Pittsburgh commercial 1 Pittsburgh commercial 2 Pittsburgh residential Pittsburgh commercial-R Pittsburgh residential-R Pittsburgh combined Indianapolis—commercial Indianapolis residential Indianapolis commercial-R Indianapolis residential-R Indianapolis combined
3 2 3 2 3 8 6 4 7 4 7 6 5 8 4 7
98 50 102 60 124 320 295 188 306 167 392 325 229 334 171 402
2.5 4.1 2.9 5.0 4.5 5.7 5.9 8.6 7.0 8.5 8.2 8.2 21.4 8.0 17.5 12.4
26 15 29 10 41 123 127 59 123 44 161 138 56 150 44 195
Table 8 Results for generalized assignment problem for scenarios. Scenarios by service area
Erie Optimized baseline Residential pickups 80–20% Integrated Pittsburgh Optimized baseline Residential pickups 80–20% Integrated Indianapolis Optimized baseline Residential pickups 80–20% Integrated
% of pickups % split not split between 2 clusters
% split among Z3 clusters
62.9
97.8
2.0
0.2
1.8
68.7
98.0
1.8
0.2
99.0 98.7
0.8 1.1
53.5 86.6
98.9 98.7
0.9 1.1
0.2 0.2
3.21
97.2
2.4
420.4
98.3
1.0
0.7
1689
2.95
96.1
2.8
429.0
97.9
1.4
0.7
1717 1722
2.89 3.46
96.4 97.4
2.7 2.2
324.5 396.0
97.8 98.3
1.3 1.0
0.9 0.7
1791
3.30
98.5
1.0
396.0
98.4
0.8
0.8
1791
3.77
98.5
1.0
397.1
98.4
1.2
0.4
1791 1821
3.38 3.76
98.3 98.7
1.3 1.0
363.1 404.0
98.2 98.5
1.1 0.9
0.7 0.6
No. pickup locations
Hours available/day/ vehicle for pickups
% of pickups assigned to % of pickups delivery cluster partially assigned
454
2.20
97.6
1.8
454
3.03
97.6
457 462
2.43 2.91
1689
Avg. no. pkgs. for split pickups
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
717
Table 10 Column generation phase II and overall computations for runs. Service area
Phase II
Erie commercial Erie residential Erie commercial-R Erie residential-R Erie combined Pittsburgh commercial 1 Pittsburgh commercial 2 Pittsburgh residential Pittsburgh commercial-R Pittsburgh residential-R Pittsburgh combined Indianapolis—commercial Indianapolis residential Indianapolis commercial-R Indianapolis residential-R Indianapolis combined
Overall
No. iters
No. sub-problems solved
Avg. time/ subproblem (s)
Time II (min)
Iter. best soln.
IP soln
Total no. cols.
Final no. cols.
Total time (min)
39 39 80 30 108 227 174 114 171 87 220 194 155 174 95 214
1,950 1,458 4,000 1,195 5,400 14,982 11,310 5,700 11,970 4,350 18,260 13,968 7,750 13,398 4,750 20,116
1.4 3.6 1.5 3.7 1.9 1.0 1.2 2.6 1.1 2.7 1.1 1.1 4.2 1.2 4.4 1.4
50.9 94.1 118.5 75.0 242.5 1,354.8 1,211.2 576.4 1,481.6 326.1 2,592.6 1,490.8 979.6 1,822.6 625.7 3,354.4
15 39 15 16 107 217 151 109 166 26 154 149 67 170 88 149
19 9 21 8 26 81 75 39 82 30 105 92 38 96 29 121
1,256 1,069 2,690 970 3,944 11,050 8,368 4,149 8,845 3,214 13,732 10,113 6,291 9,746 3,540 14,927
198 1,009 1,719 910 2,040 6,478 5,549 3,189 6,033 2,249 9,106 6,438 3,322 6,982 2,508 10,509
53.4 98.2 121.4 80.0 247.0 1360.5 1217.1 585.0 1488.6 334.6 2600.8 1499.0 1001.0 1830.6 643.2 3366.8
Table 11 Cluster and overlap statistics for runs. Service area
No. of No. of Avg. no. clusters locations locations/ cluster
Range of locations/ cluster (min–max)
Locations in 1 cluster (%)
Locations in 2 clusters (%)
Locations in 3 clusters (%)
Locations in Z 4 Unique clusters (%) clusters (%)
Erie commercial Erie residential Erie commercial-R Erie cesidential-R Erie combined Pittsburgh commercial 1 Pittsburgh commercial 2 Pittsburgh residential Pittsburgh commercial-R Pittsburgh residential-R Pittsburgh combined Indianapolis—commercial Indianapolis Residential Indianapolis commercial-R Indianapolis combined
19 9 21 8 26 81 75 39 82 30 105 92 38 96 121
1–341 113–605 1–394 122–633 1–435 1–272 1–272 34–423 1–295 35–454 1–442 1–422 114–580 1–452 1–491
80.8 96.4 84.9 81.3 84.5 81.3 80.6 86.9 79.1 86.7 78.9 77.7 83.8 79.8 79.5
16.5 3.6 14.8 18.5 14.6 17.4 17.8 12.6 19.8 14.1 20.3 20.4 15.5 19.0 19.1
2.4 0.0 0.3 0.2 0.9 1.3 1.6 0.5 1.1 0.2 0.8 1.9 0.7 1.2 1.2
0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2
2,902 3,682 3,181 3,065 4,829 6,572 6,572 6,871 6,937 5,693 8,290 8,316 9,346 8,932 11,580
186.6 424.1 174.8 455.9 216.2 97.4 106.1 200.2 103.2 217.4 96.3 112.4 287.4 112.9 116.9
Also included in the study was a compromise scenario in which residential customers were reclassified as commercial if 80% or more of their neighbors were commercial, and vice versa. This option produced savings that were competitive with full integration while having the advantage of being straightforward to adapt in the current business environment. Because drivers ‘‘own’’ territory in their service area, any major change in customer assignments would likely require a substantial payout to a large percentage of the drivers. The corresponding costs might be too high to offset the expected benefits. In addition, integrated deliveries for a service area imply that the residential terminal would have to be closed and that the commercial terminal would have to be expanded or replaced. This could be a costly proposition if adopted nationwide. The final scenario we investigated involved the expansion of the workload of the residential drivers by requiring them to make pickups in addition to their normal deliveries. Although this would free up some time of the commercial drivers who now do all the pickups to spend more time on the road, we found that the savings for Erie and Indianapolis were no greater than were achievable by optimizing the baseline with respect to current practice. Noticeable savings were realized for Pittsburgh, though, because pickup operations were binding in the optimized
5.3 0.0 4.8 0.0 7.7 4.9 5.3 0.0 3.7 0.0 3.8 9.8 0.0 9.4 6.6
baseline on Mon–Fri. Freeing up time led to a reduction of 6 vehicles on those days. To some extent, our findings for commercial operations were similar to the results we obtained previously for the same three regions as well as for three other regions, albeit with older data. In order to extrapolate the current results to other service areas, it would be best to base any conclusions on the number of delivery and pickup customers served, the density of commercial and residential customers, and the square mileage of the region. For example, because Topeka, Kansas is very similar to Erie on each of these measures, we would expect that the same solution properties would hold for both. Extrapolating the results from Pittsburgh to Anaheim, California would be problematic, though, because Anaheim has a much larger customer base and a significantly higher ratio of commercial to residential customers.
Appendix A. Column Generation Master Problem and Subproblems The master problem used in our column generation algorithm is created from model (1) using the objective function (1a) and the demand constraints (1b) to form a set covering model that is solved
718
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
as a linear program. The subproblems, one for each region r, are created from the remaining constraints and have objective functions that correspond to the reduced cost of a generic column in the master problem. The purpose of the subproblems is to find additional columns to include in the master problem. Those whose reduced costs are negative will lead to a reduction in the objective function, or equivalently, a reduction in the number of (fractional) clusters, when the master problem is re-solved with the new columns.
max min X^ r ðX^ r Þ Maximum (minimum) X-coordinate in subregion r; max min ¼ max X i : i A NðrÞ, X^ r ¼ min X i : iA NðrÞ X^ max min r ^ ^ Y r ðY r Þ Maximum (minimum) Y-coordinate in subregion r; max min Y^ r ¼ max Y i : i A NðrÞ, Y^ r ¼ min Y i : iA NðrÞ X-coordinate of the seed location s in subregion r Y-coordinate of the seed location s in subregion r
Xrs Yrs
Decision variables A.1 Master problem In defining the master problem, we make use of the following notation. r R k N(r)
index for subregions set of subregions index for clusters set of locations in subregion r (equivalent to N(s) where s is the seed for subregion r); nr ¼ 9N(r)9 set of all feasible clusters in subregion r cost of cluster k; equal to the pro-rated daily ownership and operating cost for a single vehicle parameter equal to 1 if cluster k in subregion r serves location i, 0 otherwise binary decision variable equal to 1 if cluster k in subregion r is selected, 0 otherwise
K(r) ck Z kir
lkr
MP
y
¼ Minimize
X X
(binary) 1 if location i belongs to the cluster defined over subregion r, 0 otherwise X max ðX min Þ largest (smallest) X-coordinate in the cluster derived r r for subregion r min xmax ir ðxir Þ (binary) 1 if longitudinal coordinate Xi associated with location i is the largest (smallest) in the cluster derived for subregion r Y max ðY min Þ largest (smallest) Y-coordinate in the cluster derived r r for subregion r min umax ir ðuir Þ (binary) 1 if latitudinal coordinate Yi associated with location i is the largest (smallest) in the cluster derived for subregion r STEMr time to drive to and from terminal in subregion r zir
Subproblem for subregion r (SPr) 8 9 < = X yi zir Minimize cðrÞ ¼ c : ;
ck lkr
ð4aÞ
r A R k A KðrÞ
subject to
X X
Z kir lkr Z 1,i r A R k A KðrÞ
X
Subject to ¼ 1,. . .,n
lkr A f0,1g,8r A ,k A KðrÞ,8r A R,k A KðrÞ
ð4bÞ
ð4cÞ
Parameters
tDi zir r TOTBREAKSTEMr
BREAK ratio
tDi yi
ð5cÞ
zir : i ¼ 1,. . .,n satisfies property 3
ð5dÞ
Z X i zir þ X s ð1zir Þ,8i A NðrÞ X max r
ð5eÞ
X
¼ X max r
max
X i xir
,
i A NðrÞ
X
max xmax ¼ 1, xir rzir ,8i A NðrÞ ir
r X i zir þ X s ð1zir Þ,8i A NðrÞ X min r X
¼ X min r
X
min
X i xir ,
i A NðrÞ
min xmin ¼ 1, xir r zir ,8iA NðrÞ ir
¼ Y max r
Y i umax ir ,
i A NðrÞ
X
¼ Y min r
umax ¼ 1,umax r zir ,8iA NðrÞ ir ir
Y i umin ir ,
X
max
ð5iÞ ð5jÞ
umin ¼ 1,umin ir ir r zir ,8i A NðrÞ
ð5kÞ ð5lÞ
i A NðrÞ
X min Þ=69:1ðY max Y min Þ rratio 1=ratio r53ðX max r r r r zir , xir
ð5hÞ
i A NðrÞ
r Y i zir þ Y s ð1zir Þ,8i A NðrÞ Y min r X
ð5gÞ
i A NðrÞ
Z Y i zir þ Y s ð1zir Þ,8i A NðrÞ Y max r X
ð5fÞ
i A NðrÞ
i A NðrÞ
delivery demand at location i total time available in a day for delivery operations (see Table 5) time allocated for breaks during the day aspect ratio bound for a cluster (same as ratio used Section 3.3 to remove points during subregion construction) estimated service time for a delivery at location i plus drive time to next location dual variable associated with constraint (1b) for location i obtained when solving the LP relaxation of model (1)
ð5bÞ
i A NðrÞ
A.2 Subproblems The symbols below are used to create the subproblems. The constraints used to determine the maximum and minimum X- and Y-coordinates in the derived clusters are written for North America where west longitude is encoded as Xi o0 and north latitude is encoded as Yi 40 for all iAN(r).
di zir r Q
i A NðrÞ
X
The objective function in (4a) is aimed at minimizing vehicle ownership. When the fleet is homogeneous, as it is for the company that provided us with data, ck is a constant and (4a) reduces to minimizing the number of clusters. Constraints (4b) ensure that each location is assigned to at least one cluster. Although it would be more accurate to write these constraints as equalities, thus creating a set partitioning problem, set covering problems are much easier to solve so our preference is for the current formulation. Binary restrictions are placed on the variables in (4c).
di TOT
ð5aÞ
i A NðrÞ
ð5mÞ
min
max min , xir ,umax ir ,uir A 0,1,8i A NðrÞ,X r
max min max A ½X rs , X^ r ,X min A ½X^ r ,X rs ,Y max A ½Y rs , Y^ r , r r min A ½Y^ r ,Y rs Y min r
ð5nÞ
As is customary in column generation, the objective function in (5a) ‘‘prices out’’ potential clusters based on the current set of dual variables obtained when MP is solved. The expression
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
P c i A NðrÞ yi zir cðrÞ is the reduced cost of a generic column in model (1) for subregion r, where c is the (assumed constant) unit cost of a vehicle ck and yi Z0 is the ith dual variable. At optimality, ck ðrÞ Z0 for all kAK(r) in model (1), so if the solution to SPr for any rAR has a negative objective function value, then the corresponding cluster denoted by ðZ kir : i ¼ 1,. . .,nÞ is added to the restricted version of (4) in the form of a column. The knapsack constraints in (5b) ensure that the sum of all deliveries does not exceed the capacity of a vehicle. Constraint (5c) ensures that the time available for deliveries is not exceeded for the selected locations. In [2], we explain how the parameters tDi (and tPi ) and are estimated from the given data. The estimation procedure is validated in [3]. The constraints required to compute the stem time STEMr are presented below. Constraint (5d) is an implicit statement of Property 3, which ensures continuity and centrality. Eqs (1f.1–1f.4 are an explicit representation of this property. Constraints (5e) and (5g), respectively place lower and upper bounds on the X max and X min r r variables, while constraints (5f) and (5h) are used to determine the largest and smallest X-coordinates in the derived cluster. The second term in (5e) is needed to allow X max to be less than zero r when one or more of the zir variables is zero. Similar logic applies to (5g). The first two equalities in (5f) ensure that exactly one location is chosen as the largest coordinate while the third set of inequalities ensures that the chosen location is in the cluster. The three sets of constraints in (5h) ensure the same for the smallest X-coordinate. In the absence of (5f) and (5h), X max and X min may r r not assume one of the values in the set {Xi: iAN(r)} when the ratio constraint (5m) is considered. Equivalently, constraints (5i–5l) are used to determine the largest and smallest Y-coordinates in the derived cluster. Once these values are known, the aspect ratio of the rectangle containing the selected locations can be calculated. The two inequalities in (5m) ensure that this ratio r falls within the specified range [1/ratio, ratio]. Each can be linearized by multiplying through by the denominator 69:1ðY max Y min Þ. As mentioned in the text, the r r numerical prefixes 53 and 69.1 convert longitude and latitude to miles. The remaining constraints (5n) define the variables and their bounds. In practice, it is not necessary to write (5e–5l) for all iAN(r), but only for those locations that are candidates for min and max values with respect to the seed. To improve the solution quality, several additional valid inequalities were included in the model. Each is discussed by [10] along with an analysis of their polyhedral properties. Regarding the construction of the objective function in (5a), given a solution to the LP relaxation of model (4), the reduced cost of the kth column for subproblem r is X k ck ðrÞ ¼ ck Z ir yi i A NðrÞ
Since it is assumed that each vehicle has the same cost, ck is replaced with c. Also, since we are interested in a generic column (or cluster) associated with subproblem r and not the kth column, we replace the known value of Z kir with its equivalent variable zir. Finally, noting that yi, for all iAN(r), is a constant at this point in P the computations, we get cðrÞ ¼ c i A NðrÞ yi zir , which is (5a). Stem time calculation. We take the stem time to be twice the time to travel from the terminal indexed by 0 to the nearest location in the derived cluster. For location i, the two-way travel qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi time is T i ¼ 2 ð53ðX i X 0 ÞÞ2 þð69:1ðY i Y 0 ÞÞ2 =MPHð0,iÞ, where the velocity, denoted by MPH, is calculated with the formula given in [2]. To determine STEMr for subregion r, let variable t min ¼ 1 if the ir travel time between 0 and i is the minimum for all i such that zir ¼1 in the solution to SPr, and 0 otherwise. Adding the following constraints to model (5) guarantees that exactly one location in
719
the cluster will be chosen as the basis for the stem time and that location will be the closest to the terminal. STEM r rT i zir þmax T j : j A NðrÞ ð1zir Þ,8i A NðrÞ ð5oÞ STEM r ¼
X
T i t min ir ,
i A NðrÞ
X
t min ¼ 1, ir
t min ir r zir ,
8iA NðrÞ
ð5pÞ
i A NðrÞ
If the terminal is in subregion r, then STEMr ¼0. A.3 Summary of column generation algorithm An outline of the procedural steps involved in solving the work area design problem is now given. Input. Set of locations N and their coordinates {(Xi, Yi): i ¼1,y,n}; expected number of packages to be picked up and delivered per day{(di: i¼1,y,n}; time parameters {tD i : i ¼1,y,n}; vehicle capacity Q; aspect ratio parameter ratio; variable fixing parameters. Output. Set of clusters Kn that that are feasible to model (1) that represent driver work areas. Step 0. (Initialization) Set up restricted MP and specify all parameter values. Perturb objective function and right-hand-side coefficients. Divide the service area into four quadrants centered at the terminal (used for variable fixing). Step 1. (Phase I) Add artificial variables to model (4) and solve the relaxed problem using column generation to get lLP. Remove the artificial variables and set the coefficients of remaining variables to 1. Re-solve MP to get new lLP and rounded pseudo lower bound dLBe Call the variable fixing algorithm to get lIP and UB. IP Put UB’UB 1 þ e and remove all columns with lk ¼ 0, ^ k A K. Put tabu tenure ’+, put Restarting_Active’otrue 4, set Freq_IP¼ 250, set Fixing_Threshold ¼500, and set Freq_Restart¼1000. Step t. (Phase II—General iteration t) Identify subproblems that remain active from iteration t 1; randomly select the remaining subproblems from the inactive pool excluding those that are tabu. \\Active subproblems are those that are solved at the current iteration, and represent about 5% of the total. Only those subproblems whose objective function values were negative and relatively small (first quartile) are carried over from iteration t 1 to t. Solve model (4) with column generation. \\At each D–W iteration new values of lLP and the dual variables yi, i ¼1,y,n, are obtained. Using the dual values, the subproblem objective function (5a) is updated for all active P subregions r to get c i A NðrÞ yi zir . The active subproblems are then solved and if any of their objective function values are negative, the associated columns are added to (4). The process repeats until no columns with negative objective function values are found. At that point, we have the optimal value of yMP for the current set of active subproblems. If yMP has not decreased by more than 0.01 in each of the last five iterations, then go to Step n. \\Termination check Update the tabu tenure of the current and previously active subproblems. If K^ ZFixing_Threshold, then call the variable fixing IPðtÞ algorithm to get l and UBt. \\K^ is the number of
720
J.F. Bard, A.I. Jarrah / Omega 41 (2013) 706–720
Step n.
columns in MP when the column generation component of the algorithm converges. If UBt oUB, then put UB’UBt 1þ e and update the incumbent; else if UBt ZUB, Restarting_Active¼ otrue4, and K^ ZFreq_Restart then put Restarting_ Active’ofalse 4. If K^ ZFreq_Restart and Restarting_Active¼ otrue4 , then restart (remove all columns in MP except those of the incumbent, put tabu tenure ’+); else put Fixing_Threshold ’Fixing_ThresholdþFreq_IP. Put t’t þ1 and continue. IPðnÞ (Final iteration) Call variable fixing algorithm to get l and UBn. If UBn oUB, then update the incumbent. Remove overlap from final set of clusters and report ^ solution as yMP ¼UB and C n ¼{Ck: kAK n D K}, where IP n ^ K ¼{k: lk ¼ 1, kAK } and Ck\Cl ¼+ for all kalAK n.
Appendix B. Computational results for column generation procedure Model (1) is solved with a two-phase column general algorithm designed for large-scale set covering problems. In phase I, the master problem is set up with one row for each delivery location, and an artificial basis is introduced. Using the standard approach, the phase I problem is to minimize the sum of the artificial variables. The computational results are highlighted in Table 9 for each of the five runs associated with the three service areas. In all instances, less than 10 iterations were required to obtain an objective function value of 0. At termination, a feasible solution is available which serves as the initial basis for phase II. The last column in the table reports the corresponding objective function values. The amount of time spent on each run varied from a low of 2.5 min for the smallest instance to 21.4 min for the Indianapolis residential run which is the next to the largest instance with 9346 rows (see Table 3). The phase II results are reported in Table 10. Columns 2–7 highlight the computations, giving the number of iterations, the number of subproblems solved, the average time per subproblem and time to convergences, the iteration at which the best solution was found, and the objective function value (IP soln). Comparing this value with the IP solution reported in the last column of Table 9, we calculate that phase II improved the phase I solution by an average of 33.3%. The algorithm terminates when the improvement in 5 consecutive iterations is o0.01. Runtimes were as low as 50.9 min for Erie commercial and as high as 3354.4 min for Indianapolis combined, the largest instance with 11,557 rows. In all cases, the relaxed LP solutions were highly fractional so the algorithm found it extremely difficult to get integral solutions without extensive enumerations. Although the subproblems solved quickly, the fact that it was necessary to solve the Indianapolis combined subproblem 20,116 times resulted in a lengthy runtime. The last three columns in Table 10 give the total number of unique columns that were generated (because columns are removed periodically from MP, some of them may be regenerated at subsequent iterations), the final number of columns in the master problem upon convergence, and the total runtimes. The
last two columns are highly correlated. The reader interested in the specifics of how the algorithm works is referred to [3]. The final set of statistics concerns the makeup of the clusters and the degree of customer overlap at the end of phase II. The results are reported in Table 11 and are self-explanatory for the most part. The third column gives the number of aggregate locations in each cluster and the fourth column gives the average number of locations per cluster. The percent of locations that appear in only one cluster is included in column 6. On average, 82.6% of the locations meet this condition. The next three columns give a breakdown of the overlap, while the last column indicates the percentage of clusters that had no overlap. For example, for Erie commercial, of the 19 clusters in the solution, 5.3%, or 1 cluster, had a unique set of customers. References [1] Al-Sultan KS, Khan MM. Computational experience on four algorithms for the hard clustering problem. Pattern Recognition Letters 1996;17(3):295–308. [2] Bard JF, Jarrah AI. Large-scale constrained clustering for rationalizing pickup and delivery operations. Transportation Research Part B 2009;43(5):542–61. [3] Bard JF, Jarrah AI, Zan J. Validating vehicle routing zone construction using Monte Carlo simulation. European Journal of Operational Research 2010;206(1):73–85. [4] Daganzo CF. Logistics System Analysis. 4th ed. Berlin: Springer; 2005. [5] Daganzo CF, Erera A. On planning and design of logistics systems for uncertain environments. In: Speranza MG, Stahly P, editors. New trends in distribution logistics of lecture notes in economics and mathematical systems, vol. 480. Berlin: Springer-Verlag; 1999. [6] Daskin MS. Network and discrete location: models, algorithms, and applications. New York: John Wiley & Sons; 1995. [7] Drexl A, Haase K. Fast approximation methods for sales force deployment. Management Science 1999;45(10):1307–23. [8] Gaur V, Fisher ML. A periodic inventory routing problem at a supermarket chain. Operations Research 2004;52(6):813–22. [9] Jarrah AI, Bard JF. Pickup and delivery network segmentation using contiguous geographic clustering. Journal of the Operational Research Society 2011;62(10):1827–43. [10] Jarrah AI, Bard JF. Large-scale pickup and delivery work area design. Computers & Operations Research 2012;39(12):3102–18. ´ lafsson S. An optimization approach to partitional data [11] Kim J, Yang J, O clustering. Journal of the Operational Research Society 2009;60(8): 1069–84. [12] Langevin A, Soumis F. Design of multiple-vehicle delivery tours satisfying time constraints. Transportation Research Part B 1989;23(2):123–38. [13] Laporte G, Chapleau S, Landry P-E, Mercure H. An algorithm for the design of mailbox collection routes in urban areas. Transportation Research Part B 1989;23(4):271–80. [14] Lorena LAN, Senne ELF. A column generation approach for the capacitated p-median problems. Computers & Operations Research 2004;31(6):863–76. [15] Mehrotra A, Johnson E, G.Nemhauser GL. An optimization based heuristic for political districting. Management Science 1998;44(18):1100–14. [16] Mehrotra A, Trick M. Cliques and clustering: a combinatorial approach. Operations Research Letters 1998;22(1):1–12. [17] Mulvey JM, Beck MP. Solving capacitated clustering problems. European Journal of Operational Research 1984;18(3):339–48. [18] Negreiros M, Palhano A. The capacitated centred clustering problem. Computers & Operations Research 2006;33(6):1639–63. [19] Ouyang Y. Design of vehicle routing zones for large-scale distribution systems. Transportation Research Part B 2007;41(10):1079–93. [20] Rı´os-Mercado RZ, Ferna´ndez E. A reactive GRASP for a commercial territory design problem with multiple balancing requirements. Computers & Operations Research 2009;36(3):755–76. [21] Toth P, Vigo D. VRP with backhauls. In: Toth P, Vigo D, editors. The vehicle routing problem. SIAM Monographs on Discrete Mathematics and Applications; 2002. p. 195–224. [22] Vanderbeck F. On Dantzig–Wolfe decomposition in integer programming and ways to perform branching in a branch-and-price algorithm. Operations Research 2000;48(1):111–28. [23] Wolsey LA. Integer programming. New York: Wiley; 1998. [24] Young H. Measuring the compactness of legislative districts. Legislative Studies Quarterly 1988;13(2):105–15.