Available online at www.sciencedirect.com
Transportation Research Part E 44 (2008) 333–349 www.elsevier.com/locate/tre
Identifying geographically diverse routes for the transportation of hazardous materials Yashoda Dadkar a
a,1
, Dean Jones
b,2
, Linda Nozick
a,*
School of Civil and Environmental Engineering, Cornell University, Hollister Hall, Ithaca, NY 14853, United States b Sandia National Laboratories, P.O. Box 5800 MS 1138, Albuquerque, NM 87185-1138, United States Received 4 May 2006; received in revised form 16 October 2006; accepted 19 October 2006
Abstract Often, the carrier/shipper of hazardous materials is interested in a collection of routes with approximately the same performance so that they can switch between different routes to avoid exposing the same population and potentially as a security measure. We develop a K shortest path algorithm for which the performance of each highway facility, with respect to each objective, can be stochastic and can vary over time. We also devise a mixed integer program to identify a subset of paths, which represents an acceptable trade-off between geographic diversity and performance. These models are then applied to a realistic case study. 2006 Elsevier Ltd. All rights reserved. Keywords: Hazardous materials; Stochastic dynamic network; Routing; K shortest path problem; Genetic algorithm
1. Introduction The repetitive transportation of hazardous materials through a community often generates resentment in members of that community. For an excellent survey on hazmat transportation, see List et al. (1991), Current and Marsh (1993) and Erkut et al. (2006). As is observed in Linder-Dutton et al. (1991) if the shipper can be proactive in identifying routes that are equitable they are more likely engender trust in the communities through which they must pass and avoid some of the bitterness that can arise. Gopalan et al. (1990) develop a method to identify an equitable set of routes that minimize the total cost across the collection of routes while keeping the inequity across a predefined set of zones beneath some threshold. Linder-Dutton et al. (1991) extend the analysis in Gopalan et al. (1990) to optimize the order in which the routes developed by the analysis in Gopalan et al. (1990) are used.
*
1 2
Corresponding author. Tel.: +1 607 255 6496; fax: +1 607 255 9004. E-mail addresses:
[email protected] (Y. Dadkar),
[email protected] (D. Jones),
[email protected] (L. Nozick). Tel.: +1 607 254 2939. Tel.: +1 505 284 4886.
1366-5545/$ - see front matter 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tre.2006.10.010
334
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
The carrier is concerned with finding routes that minimize the economic cost as well as the risk consequences of a release stemming from an accident. Much of the economic cost to the carrier is proportional to the time taken for the shipment to travel from the origin to the destination. Thus we assume that cost minimization can be obtained through minimizing the total travel time. Defining public risk is more complex, but two characteristics – population exposure and accident rate – have gained wide acceptance. Population exposure and accident rate can be combined into a single consequence measure for a route by summing the product of accident rate and exposure across all links in the route. This is the measure of consequence we adopt in this analysis. Erkut and Verter (1998), Erkut and Ingolfsson (2000) and Chang et al. (2005), among others, use the same measure in their hazmat routing studies. Neither the travel time nor the consequence measures are deterministic. They are inherently uncertain since they depend on characteristics like visibility, traffic volumes and activity patterns. Further they vary with the time of the day since the traffic characteristics vary throughout the day. Using the expected values of the attributes ignores the variation resulting from these two causes and could lead to selection of routes that have acceptable expected values but perform ‘‘poorly’’ based on when they are actually used (Nozick et al., 1997). Hence, we represent the uncertainty for each attribute by a probability distribution which varies by the time-of-day. In this application, we assume the carrier uses an objective that is a weighted sum of travel time and consequence, and that across multiple shipments, the carrier wants to be able to use a set of paths that are relatively distinct geographically. Of course, this creates the need for exploring different trade-offs. Paths that result in small changes in the value of the travel time/consequence measure are likely to vary only slightly from the shortest path geographically. To obtain nearly distinct paths geographically, much larger changes in the objective are likely to have to be tolerated. Our concern is to be able to identify the nature of this trade-off effectively in large networks. In order to find sets of paths, we develop a two-step process: develop a ‘‘large’’ set of paths to choose from and an optimization model to select a subset of those paths to use that are ‘‘geographically distinct’’. An algorithm to estimate the first K shortest paths in networks is developed where the routing attributes are random variables that vary according to the time of the day. To the best of our knowledge, Nielsen et al. (2005) is the only stochastic and dynamic K shortest path algorithm. They focus on problem instances for which the departure times are integer and the travel times on links are integer valued discrete random variables based on the departure time. We focus on the use of continuous distributions for link attributes because that is more appropriate to this application area. Also, their focus is on identifying the exact solution to the K shortest path problem. This leads to computational challenges in large networks. Our focus is on the development of a computationally efficient heuristic to estimate the solution in realistic networks. Therefore, we extend Yen’s (1971) deterministic and static K shortest path algorithm to accommodate stochastic and dynamic routing attributes. The new algorithm uses a convolution–propagation method to propagate the means and variances of the uncertain routing attributes along paths, and calculates the covariance between the routing attributes so that the quality of each path can be estimated. We extend Chang et al. (2005) to achieve this. It is also important that the paths identified be geographically distinct in addition to performing well with respect to the objectives. For the purposes of this analysis, the performance of a path is measured by the mean of the distribution of the weighted objectives in this analysis. In other circumstances it may be more appropriate to focus on other percentiles of the distribution. From the set of K shortest paths identified, we want to identify a subset of paths that represent an acceptable trade-off between geographic diversity and performance. Therefore, the second step of the analysis involves the development of an integer program to identify this subset of paths. For this purpose, we define percentage overlap as the ratio of the number of miles two paths have in common to the actual length of one of the paths. Thus the lower the percentage overlap between a pair of paths, the more geographically diverse the two paths are. Notice that this definition implies that the percentage overlap is not symmetric, but depends on the path to which the comparison is made. Depending on the type of hazardous material to be moved and the motivation for using a variety of routes, other measures of geographic diversity could be more appropriate. While Akgu¨n et al. (2000) use essentially the same metric as we use, they do note that sometimes using the number of people in the areas of overlap between the buffer zones of each route is more appropriate.
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
335
Carotenuto et al. (2004) develop a model with similar goals but use a different definition of geographic diversity and hence a different objective. They seek to select a subset of paths that minimize the total number of times multiple hazmat vehicles are on the same link at the same time and the total risk of the set of paths selected. Carotenuto et al. (2007) also focuses on selecting dissimilar paths using greedy add procedures. They enforce geographic diversity by limiting the maximum amount of risk any group of people can be exposed to. Our approach is similar to Kuby et al. (1997), Akgu¨n et al. (2000) and the follow-up paper Thyagarajan et al. (2005). They pursue a minimax objective and consider a metric based on the number of miles in common between the paths selected. Kuby et al. (1997) also includes an ad hoc mechanism to analyze the trade-off between the length of the paths selected and the geographic variety. Akgu¨n et al. (2000) focuses on only maximizing the minimum difference. Our model can be viewed as an extension of the models proposed in Akgu¨n et al. (2000) and Thyagarajan et al. (2005) for which trade-offs between geographic diversity and performance can be explored. We illustrate the application of the entire analysis process on a realistic case study of the highway transportation of hazardous material. The origin and the destination of these shipments are assumed to be Jackson, Mississippi and Tallahassee, Florida, respectively. The contribution of this paper is fourfold. First, we develop a computationally tractable K shortest path algorithm when the arc attributes are stochastic and dynamic and examine the computational performance of this algorithm on several test networks. This algorithm is useful in other contexts where there is a need to identify the K shortest paths in large-scale stochastic and dynamic networks. Second, we build on Akgu¨n et al. (2000) to identify a subset of geographically distinct paths by allowing for the effective exploration of the trade-offs between geographic diversity and performance. Third, we develop an integrated analysis that illustrates how a large set of routes (and schedules) can be developed and a subset of those routes (and schedules) can be selected to facilitate the movement of hazardous materials when the routing attributes are stochastic and dynamic and geographic diversity in the routes selected is important. Fourth, we illustrate the application of this methodology to a realistic case study. The next section describes the K shortest path (KSP) algorithm. The third section evaluates the computational performance and the quality of the solution produced by the algorithm. The fourth section develops an optimization model to select a subset of paths that represent an appropriate trade-off between path quality and geographic diversity. The fifth section develops a realistic case study to illustrate the analysis process. The sixth section describes key conclusions and opportunities for future research. 2. KSP algorithm The KSP algorithm for stochastic and dynamic arc attributes is comprised of two key elements. The first element is a method to construct a probability distribution of the performance of a path. The second element is a method to compute shortest paths such that each successive shortest path generated performs slightly worse. The next subsection focuses on estimating the probability distribution for path performance. The second subsection focuses on the algorithm developed to generate successive paths roughly in decreasing order of quality as measured by the mean of the composite measure. 2.1. Estimating path performance A key element in estimating K shortest paths in stochastic and dynamic networks is the development of a method to construct an estimate of the probability distribution of path performance (as measured by a distribution that is the weighted combination of travel time and the consequence measure). Chang et al. (2005) develop a method to construct an estimate of the probability distribution of travel time and one or more attributes along a path when the link attributes distributions are any continuous distributions. They assume that the distributions of the attributes along a path are approximately normally distributed and that there is independence in the attributes on successive links on a path. He et al. (2002) in a preliminary study using PARAMICS (simulation model) for 15 min in the peak period of a simulation of Irvine Center Drive finds that travel time along a path is approximately normally distributed. However, they do find evidence of correlation in the travel times across successive links. We do not include this extension due to the preliminary nature of the
336
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
results and since currently it is unclear how to obtain the data to estimate this correlation. However, this may become an important element of research in the future. The reader is referred to Chang et al. (2005) for a discussion of the method to estimate the probability distributions for routing attributes along a path when the link attributes distributions are any continuous distributions. We extend this method to include the estimation of the covariance between travel time and consequence measure in order to estimate a probability distribution that is a weighted average of travel time and the consequence measure. This enables us to develop an estimate of path performance. We use the same notation as Chang et al. (2005) to simplify the presentation. The remainder of this subsection presents our method developed to estimate the needed covariance. The covariance between Yj and Cj when the relative importance of the consequence measure to the travel time is W, is given by: CovðY j ; wC j Þ ¼ wðCovðY i ; C i Þ þ E½C i lðy i Þ E½C i E½lðy i Þ þ E½lc ðy i ÞY i Þ wðE½lc ðy i ÞE½Y i E½lc ðy i Þlðy i Þ þ E½lc ðy i ÞE½lðy i ÞÞ
ð1Þ
The terms Cov(Ci, Yi), E[Yi] and E[Ci] are as derived in Chang et al. (2005). The rest of the terms are determined as follows: E½lðy i Þ ¼ dij þ
L X
pli ql ij
ð2Þ
l¼1
E½lc ðy i Þ ¼
L X
pli clij
ð3Þ
l¼1
E½lc ðy i Þlðy i Þ ¼
L X
pli clij ql ij
ð4Þ
l¼1
Chang et al. (2005) uses a series of incomplete expectations of Yi within each interval l to calculate E[Yil(Yi)]. The same procedure can be extended for calculating E[lc(yi)Yi]: E½lc ðy i ÞY i ¼
L X
clij IEli
ð5Þ
l¼1
At first glance it may appear that E[Cil(yi)] is simply the product of E[Ci] and E[l(yi)]. Depending on how the parameters in the model are developed this may be true. However, if there is a relationship between the travel time and the consequence measure there may be some correlation. We develop a mechanism to calculate E[Cil(yi)] that includes the correlation. If no correlation exists the result will simply be the product of E[Ci] and E[l(yi)]. To facilitate the computation of E[Cil(yi)], we divide each interval l into E subintervals of equal duration. Let U el be the end of time subinterval e in time interval l. Let Tl be the end of time interval l. Now, if the path passes through node k immediately before reaching link ij the probability that Yi falls into the lth time interval at node i, pli can be calculated as a sum of the conditional probabilities that the path reached node i within a certain time interval l given that the path left node k within a certain subinterval e of interval r over all possible subintervals for each interval for node k X X e1 pli ¼ P T l1 6 Y i 6 T l ¼ ð6Þ P U r 6 Y k 6 U er P T l1 U e1 6 d ki 6 T l U er r r
Thus E½C i lðy i Þ ¼
e
XX P U e1 6 Y k 6 U er ðC k þ crki Þ r r
e
l e l1 d P T l1 U e1 6 d 6 T U þ q ki ij r r ij
ð7Þ
Computation of E[Cil(yi)] is illustrated with a small example. Assume that there are three nodes in series labeled 1, 2, 3 and that a shipment originates at Node 1 at 8 AM and is bound for Node 3. To make
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
337
Table 1 Routing attributes for example network Link
Type
Travel time
Consequence measure
Day
1–2 2–3
Urban Urban
Night
Day
Night
Mean
SD
Mean
SD
Mean
SD
Mean
SD
0.5 0.9
0.05 0.09
0.4 0.7
0.04 0.07
420 300
14 10
210 150
7 5
calculations easier to understand in this example, assume that the probability distributions of the routing attributes only change from day (8 AM–8 PM) to night (8 PM–8 AM). For this example, the distributions are assumed to be normal and are as listed in Table 1. Also assume that the relative importance of the consequence measure to the travel time (W) is 1 and that the values for the routing attributes are not normalized. The convolution–propagation method developed by Chang et al. (2005) is used to propagate the values for means and variances of travel time and consequence measure over the route 1–2–3. Eq. (1) is used to calculate the covariance between the travel time and the consequence measure for the route 1–2–3, i.e. Cov(Y3, C3). As mentioned earlier, all the terms except E[C2l(y2)] in this equation can be calculated using the convolution–propagation method developed by Chang et al. (2005) but calculating E[C2l(y2)] requires the use of conditional probabilities. To facilitate these estimations, each day is divided into 24 equal intervals of 1 h each and further this 1 h interval is divided into 20 equal subintervals. Since we are interested in calculating E[C2l(y2)], we focus on the consequence measure at Node 2 (C2), the travel time over link 2–3 (d23) and the conditional probabilities of arriving at Node 2 during a time interval given that the shipment left Node 1 during certain intervals. For example, the probability that the route arrives at Node 2 within the time interval 8–9 can be estimated as a sum of conditional probabilities P f8 6 Y 2 < 9g ¼ P f8 6 Y 2 < 9j7 6 Y 1 < 8g þ P f8 6 Y 2 < 9j8 6 Y 1 < 9g
ð8Þ
However, since Node 1 is the first node that the shipment started from, there is no uncertainty about when the shipment departed from it. However, for the sake of this explanation, assume that the departure time from Node 1 (Y1) follows the probability distribution N(8, 0.1). Thus the conditional probabilities can be estimated by calculating a series of probabilities for Y1 as well as d12 within the subintervals and then summing them over the multiple intervals as illustrated by the following equations: P f8 6 Y 2 < 9j7 6 Y 1 < 8g ¼ P f7:95 6 Y 1 < 8:0g P f0:05 6 d 12 < 0:1g þ P f7:90 6 Y 1 < 8:0g P f0:1 6 d 12 < 0:15g þ þ P f7:05 6 Y 1 < 8:0g P f0:95 6 d 12 < 1:0g P f8 6 Y 2 < 9j8 6 Y 1 < 9g ¼ P f8:0 6 Y 1 < 8:95g P f0:0 6 d 12 < 0:05g þ P f8:0 6 Y 1 < 8:9g P f0:05 6 d 12 < 0:1g þ þ P f8:0 6 Y 1 < 8:05g P f0:9 6 d 12 < 0:95g
ð9Þ ð10Þ
Note that we assume that waiting is not considered, i.e. the shipment leaves the node as soon as it reaches it. Thus for the case where d12 lies within subinterval 0.3–0.35 and the shipment reaches Node 2 during interval 8–9, Y1 should be within the subintervals 7.7–8.65 which is further split into two: 7.7–8.0 and 8–8.65 since the probability distributions for the routing attributes vary before and after 8 AM. According to our assumptions, at Node 1 travel time (Y1) has the probability distribution N(8, 0.1) and the probability distribution for travel time over link 1–2 (d12), depending upon whether the route left Node 1 during the time interval 7–8 or 8–9, is either N(0.4, 0.04) or N(0.5, 0.05). These probability distributions are used to estimate the conditional probabilities. Breaking the probability into conditional probabilities allows us to associate values of C2 and d23 with each subinterval and to find E[C2l(y2)] by summing the components for all possible combinations for d12 and Y1. C2 is given by C1 + c12(Y1) where the consequence measure for the link 1–2 (c12) is determined by the time at which the shipment departs from Node 1 (Y1). Similarly the travel time on link 2–3 (d23) is determined by the time at which the shipment arrives at Node 2, i.e. Y1 + d12
338
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
E½C 2 lðy 2 Þ ¼ P f8:0 6 Y 1 < 8:95g ðC 1 þ c12 ðY 1 ÞÞ P f0:0 6 d 12 < 0:05g d 23 ðY 1 þ d 12 Þ þ P f7:95 6 Y 1 < 8:0g ðC 1 þ c12 ðY 1 ÞÞ P f0:05 6 d 12 < 0:1g d 23 ðY 1 þ d 12 Þ þ P f8:0 6 Y 1 < 8:9g ðC 1 þ c12 ðY 1 ÞÞ P f0:05 6 d 12 < 0:1g d 23 ðY 1 þ d 12 Þ þ þ P f8:0 6 Y 1 < 8:05g ðC 1 þ c12 ðY 1 ÞÞ P f0:9 6 d 12 < 0:95g d 23 ðY 1 þ d 12 Þ
ð11Þ
Consequence measure (C1) at Node 1 is set to probability distribution N(0, 0) since Node 1 is the origin node. The probability distribution for consequence measure over link 1–2 (c12) depends upon whether the route left Node 1 during the time intervals 7–8 or 8–9 and is either N(420, 14) or N(210, 7). Thus the two corresponding components of the summation for E[C2l(y2)] in the case that d23 lies within subinterval 0.3–0.35 and that the route reaches Node 3 during interval 8–9 are: P f7:7 6 Y 1 < 8:0g ðC 1 þ c12 ðY 1 ÞÞ P f0:3 6 d 12 < 0:35g d 23 ðY 1 þ d 12 Þ ¼ 0:5 210 0:1 0:9 ¼ 9:45 P f8:05 6 Y 1 < 8:65g ðC 1 þ c12 ðY 1 ÞÞ P f0:3 6 d 12 < 0:35g d 23 ðY 1 þ d 12 Þ ¼ 0:5 420 0:1 0:9 ¼ 18:9 Repeat these set of calculations for all possible d12 and Y2 to estimate E[C2l(y2)] as per Eq. (11) and substitute this estimated value into Eq. (1) to get an estimate ofCov(Y3, C3). For this example, these estimates are 283.5 and 6.29, respectively. 2.2. Description of the algorithm This algorithm is an extension of Yen’s algorithm (1971) and it estimates the first K shortest paths in networks with a single routing objective and for which the performance of each link in the network with respect to that routing objective is stochastic and varies over time. Again, we use a weighted sum of two different attributes, which yields a single routing objective. We adopt the notation from Yen (1971) in our description of the new algorithm to facilitate the comparison of the original algorithm with this extension. Throughout the algorithm, we maintain two-ordered lists of paths. One is the list of estimated shortest paths found to date (e.g. if we have found X paths of the K this list has X paths). The second list is ‘‘extra paths’’ that have been identified but each time they were considered for inclusion on the first list they were worse than some other path also under consideration. We refer to the first list as list A and the second as list B. For an N-node network, we define the following notation: • i = 1, 2, . . ., N, be the nodes where (1) is the origin and (N) is the sink; • (1)–(i)– –(j), i 5 j 5 5 1, be the path from (1) to (j), passing through (i), . . .; • Ak = (1)–(2k)–(3k)– –(Qk) –(N), k = 1, 2, . . ., K, be the estimated kth shortest path from (1) to (N) where (2k), (3k), . . ., (Qk) are, respectively, the 2nd, 3rd, . . ., Qkth nodes of the estimated kth shortest path. To illustrate the algorithm we develop a small example network with 12 bidirectional links as shown in Fig. 1. The values for each link represent the distance in miles of the link and the type of the link (u implies urban link and r indicates rural). These values allow for estimations of probability distributions for the travel time and consequence measure for each of the links. We are interested in finding the paths from Node 1 to Node 7 departing Node 1 at 12 midnight. There are 32 paths from 1 to 7 in this example.
13, u 2
14, u
4 12, u 20, r
17, r
1
28, r 22, r
8, u
6 11, r
25, r
19, r 3
10, u
5
Fig. 1. Example network.
7
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
339
First, estimate the shortest path through the network for travel time and the consequence measure separately using the means of the distributions, respectively. We use an adaptation of the algorithm in Chang et al. (2005) to estimate the shortest path in stochastic and dynamic networks. The key adaptations are that we consider only one objective, we branch from the partial path with the smallest mean value and we apply the dominance rule to any two partial paths arriving at the same intermediate node regardless of the difference in the time of arrival of the two paths at that intermediate node. Let the mean of the travel time distribution and the mean of the consequence distribution for the estimated shortest paths with respect to travel time and the consequence measure be denoted pt and pc, respectively. Yp and Cp are defined as the travel time and the consequence measure for path p, respectively. The mean and variance of the distribution for path performance is then given by Eqs. (12) and (13), respectively: E½Y p wE½C p þ pt pc
ð12Þ
Var½Y p w2 Var½C p Cov½Y p ; C p þ þ 2w 2 2 pt pc pt pc
ð13Þ
where W is the relative importance of the consequence measure to the travel time. Notice that Eqs. (12) and (13) normalize the two characteristics, travel time and the consequence measure. Next, estimate the shortest path through the network when the objective is to minimize the mean of the distribution for the weighted average of travel time and the consequence measure (as given in Eq. (12)). Again, use the method described in the previous section to estimate the distribution for travel time and the consequence measure and the resultant distribution of performance. This yields A1. A1 = 1–3–5–7 for the example network. In order to estimate Ak, i.e. the kth (k = 2, 3, . . ., K) shortest path, the k 1 shortest paths (i.e. 1 A , A2, . . ., Ak1) have to be already estimated. Thus for the example network, assume that we are searching for the fourth path, i.e. k = 4 and the second and third paths are already estimated using this algorithm as 1–3–4–7 and 1–3–4–5–7, respectively. For each of i = 1, 2, . . ., Qk1, do the following: (Note that for our example, k 1 = 3 and Qk1 = 5.) (a) Check if the subpath consisting of the first i nodes of Ak1 in sequence coincide with the subpath consisting of the first i nodes of Aj in sequence for j = 1, 2, . . ., k 1. If so, eliminate the link iq- where (q) is the (i + 1)st node of Aj; otherwise, make no changes. Note that the links iq are eliminated for computations in iteration k only. They should be replaced before iteration k + 1 starts. We define: • Aki as the estimated shortest path among all the paths that branch from Ak1 at node (i). • Rki as the root of Aki , i.e. the subpath of Aki that coincides with Ak1 , i.e. (1)–(2k)–(3k)– –(ik) in Aki ; i k k k • S i as the spur of Ai , i.e. the end portion of Ai that has only one node coinciding with Ak1 , i.e. (ik)– – i k (N) in Ai . For our example, say we are looking at i = 3 and hence at Aki ¼ A43 . We check paths A1, A2 and A3 to see if any of these paths contains the root R43 , i.e. the subpath created by the first i = 3 nodes of A3, i.e. 1–3–4. Since A2 and A3 satisfy this criteria, we find the (i + 1)st, i.e. the 4th node of each, i.e. Node 5 and Node 7, respectively, and eliminate the links 4–5 and 4–7. A1 is not considered since it does not contain the subpath 1–3–4. Thus 1–3–4 is the root ðR43 Þ for A43 . Note that the links 4–5 and 4–7 will be reinstated before the next iteration. (b) Estimate the shortest path from (i) to (N), not allowing it to pass through those nodes that are included in the root or through those links that are eliminated in step (a). The former ensures that Aki is loopless, i.e. does not contain any node more than once and the latter ensures that the new path found is not the same as any of the older paths found so far. Note if there is more than one subpath from (i) to (N) that have the minimum length, choose any arbitrary one of them and denote it by S ki . Again, use the methods in the previous subsection to estimate the shortest path and the normalization given in Eq. (12). For our example, say we are still looking at i = 3 and hence at Aki ¼ A43 . We want to find the spur ðS 43 Þ for A43 . The spur is found by estimating the shortest path from Node 4 to Node 7 without using the links 4–5
340
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
and 4–7 or any of the nodes contained in the root, i.e. Node 1 and Node 3. Note that the probability distributions for the travel time and the consequence measure at Node 4 are as propagated for the root R43 (1–3–4) and are used to estimate the shortest path as 4–6–7. Thus 4–6–7 is the spur ðS 43 Þ for A43 . (c) Find Aki by joining Rki and S ki . Then add Aki to List B. For our example, at i = 3 we found R43 ¼ 1–3–4 as the root and S 43 ¼ 4–6–7 as the spur for A43 . Thus 4 A3 ¼ 1–3–4–6–7. We add this path to List B. Find from List B the path that has the minimum mean composite measure (using the normalization given in Eq. (12)). Move this path to List A and repeat the procedure till k = K, i.e. K paths are found. It is important to note that this procedure is a heuristic because at each step we use shortest path calculations that are only estimates. 3. Solution quality and computational performance To evaluate the computational performance and solution quality of this algorithm, we evaluate the use of the algorithm on nine sample networks. The objective is again to minimize the weighted average of travel time and the consequence measure. For the purposes of this analysis we assume the following: 1. Link travel time is modeled as a sum of free-flow time plus an exponentially distributed random delay with a mean of 10% of the free flow travel time. Even though the random delay on a link is modeled as a constant plus an exponential random delay we assume that the travel time across a path is approximately normal based on the Central Limit Theorem and as also assumed in Chang et al. (2005). 2. Accident probability varies according to a gamma distribution based on the work of Nembhard (1994). The scale parameter is based on whether the link is urban or rural and whether the link is traversed in the day or night. The day-time accident rate is assumed to hold between 8 AM and 8 PM and the night-time accident rate is assumed to hold between 8 PM and 8 AM based on the relative changes in heavy truck accident rates found by Harwood et al. (1993). 3. The major hazards are fire or vapor exposition, or both, in the event of an accident. Thus the major at risk population is other traveler on the highway near the shipment, rather than the population residing at some distance from the road. We assume that the on-link exposure can be described by the normal distribution and varies for urban freeways and rural freeways according to the hourly distributions developed by Tittemore et al. (1972) and New York State Department of Transportation (1982), respectively. Table 2 reports the number of nodes, arcs and the total number of paths in the networks as identified by path enumeration. For example, the first network has 153 nodes and 179 bidirectional arcs. There are almost 900,000 paths from the source to the sink node in this network. A key indicator of solution quality is the percentage of paths correctly identified by the algorithm for a given number requested. For example, if the top 1000 shortest paths are requested how many of the actual
Table 2 Dimensions of the sample networks Network ID
Number of nodes
Number of bidirectional arcs
Number of paths
1 2 3 4 5 6 7 8 9
153 172 148 91 89 80 52 49 55
179 201 175 106 107 93 62 57 63
896,458 923,776 2,055,450 3344 2876 1437 323 212 130
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
341
1000 max. paths identified paths correctly identified
Number of paths correctly identified
800
600
400
200
0 0
200
400
600
800
1000
Number of paths estimated
Fig. 2. Number of paths correctly identified for sample network 1.
Table 3 Values of K, percentage of paths correctly generated and computation time for the sample networks Sample network
Maximum K requested
Percentage of paths correctly generated
Time taken to generate K paths (s)
1 2 3 4 5 6 7 8 9
1000 1000 1000 500 500 500 100 100 100
97.8 98.8 89.4 98.4 98.8 99.1 99.8 98.5 99.7
420 258 294 46 64 34 3 4 2
top 1000 shortest paths are correctly identified? Fig. 2 illustrates this calculation for network 1 for K from 1 to 1000. The solid line indicates for each number K from 1 to 1000 how many paths were correctly identified. The dashed line gives the maximum number of paths, which could be correctly identified, i.e. K. The ratio of the area of the triangles created by the solid and the dashed lines with the X-axis is a measure of the percentage of paths correctly identified when a fixed number of paths are requested. For this network this ratio is about 98%. Table 3 identifies for each network the maximum number of paths to be estimated by the algorithm, the average number of paths correctly identified for a range of values from 1 K as well as the computation time to estimate the K shortest paths for each sample network. All code is programmed in C++ and all the experiments are run on a Pentium 4 3.4 GHz PC. In general, the algorithm has correctly generated the appropriate paths in the correct order. 4. Path selection This section develops a mixed integer program to identify a fixed number of paths, N, which represent an acceptable trade-off between the geographic diversity in the routes and the distribution of the composite measure. Geographic diversity in the routes is measured by the largest overlap between any two paths in the set of N paths. The quality of the set of N paths with respect to the composite measure is given by the maximum value for the mean of the distribution. The MIP is then formulated as follows:
342
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
min
o þ bm
ð14Þ
such that
ðxi þ xj 1Þaij o 6 0 8i; j bi xi m 6 0 8i X xi ¼ N
ð15Þ ð16Þ
o;m
ð17Þ
i
xi ¼ 0; 1
8i
ð18Þ
where aij is the percentage overlap of path i with path j, bi is the normalized value of the composite measure for path i, b reflects the relative importance of the maximum values of the composite measure mean in comparison to the value for the maximum percentage overlap across the subset of paths, N is the number of paths to be selected, xi is an indicator variable that takes on a value of 0 if the path is not selected and a value of 1 if it is, o is the maximum overlap across the identified subset of paths and m is the maximum value of composite measure mean. Eq. (15) defines the relationship between the value of maximum overlap and the paths selected. Eq. (16) defines the relationship between the largest value of the composite measure mean and the paths selected. Eq. (17) limits the number of paths selected. Eq. (18) requires that each path either be selected or not. By varying the value of b the trade-off frontier between performance and geographic diversity can be obtained. This MIP has K binary variables if the first K shortest paths are used. In the interest of program size, however, it may be more appropriate to generate a very large number of paths using a K shortest path algorithm and then use every ith path in the optimization where i could be 20 or 50, etc. Akgu¨n et al. (2000) also uses all of the K shortest paths. They also experiment with using an iterative penalty method which is similar to using every ith path in a K shortest path solution. We use a genetic algorithm (GA) to estimate the optimal solution. Since we are interested in a set of N paths, the string used in GA consists of the N path IDS included in this set. At each iteration, the current population is changed through crossover and mutation into an offspring population. Two parents (a mother and a father) are chosen from the current population using tournament selection, i.e. they are chosen at random, biased by solution quality as measured by the value of the objective function. A crossover point R is chosen at random such that 0 6 R 6 N. An offspring is produced using the first R entries of the mother and R–N entries from the father chosen sequentially starting from the end of the string such that there are no repetitions. If mutation occurs, an entry in the offspring string is chosen at random and changed to different random path ID that does not already exist in the string. We use elitism to transfer a fixed number of the best solutions from the parent population to the offspring generation without crossover or mutation. The elitism mechanism guarantees that a fixed number of the best population members are carried forward. The trade-off frontier can be generated by running the GA multiple times for different values of b. However, we take an alternative approach. That approach is to adapt the fitness function such that the GA solves the optimization for all values of b simultaneously. Each solution to the optimization can be described by the maximum overlap and the maximum average composite measure. Hence, the fitness function used is the perpendicular distance of these two measures to a piecewise linear estimate of the efficient frontier updated based on each successive generation created. 5. Case study To illustrate the use of the algorithm described above on a complete analysis, we consider routing a hypothetical shipment from Jackson, Mississippi to Tallahassee, Florida over the highway network. We wish to identify five paths that represent an acceptable balance between geographic diversity and performance from a set of 10,000 paths. The network is shown in Fig. 3 and has 1173 nodes and 1403 bidirectional links where the interstate roads are highlighted. It is roughly 10 times larger than the largest network considered in Section 3. We assume that the shipment departs Jackson at 7 AM and that we generate 10,000 paths, using the algorithm described above, to choose from. The case study focuses on a single departure time. It is straightforward to extend the analysis to simultaneously consider a variety of different departure times.
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
343
Fig. 3. The case-study network.
For the purposes of the case study, we use the same assumptions to develop the distributions for travel time, accident probability and population exposure as those used for the sample networks described in the previous section. We focus on minimizing the mean of the distribution for the composite measure when both travel time and the consequence measure is equally important. In other applications and depending on how the underlying attribute distributions are constructed, it may be important to consider the means and some measure of variability. The normalization used to calculate bi is as given in Eq. (19), where p is an index over all paths and as previously defined, Yp and Cp denote the travel time and the consequence measure for path p, respectively 0 1 E½Y i min E½Y p E½C i min E½C p p p A þ ð19Þ bi ¼ 50 @ maxp E½Y p min E½Y p maxp E½C p min E½C p p
p
Notice that the term in the parenthesis in Eq. (19) has a natural range from 0 to 2. Since the percentage overlap has a natural range from 0% to 100% we rescale the term in parenthesis in Eq. (19) by multiplying by 50 so the values of bi also have a natural range of 0–100. The string used in the GA to solve the optimization described in the previous section is five entries long and each entry is a path ID number. We use one point crossover with 75% probability, tournament selection, elitism, 50% probability of random mutation, 30 trials, 100 individuals in a generation, and 20,000 generations. The fitness function described in the previous section is used. Fig. 4 illustrates the tradeoff frontier between maximum overlap and the maximum mean for the composite measure. 50
Maximum Measure
40
(22.77%, 25)
30
20
(34.25%, 12.2)
10
0 20
30
40
50
60
70
80
90
100
Maximum Overlap (%)
Fig. 4. Trade-off frontier between maximum overlap and maximum measure.
344
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
Two reasonable solutions in Fig. 4 are (22.8%, 25) and (34.3%, 12.2). Figs. 5 and 6 display the paths represented by these two points. Tables 4 and 5 give the characteristics of the paths represented by these two solutions. For example, path 2614 is included in the set of five paths that correspond to the solution (22.7%, 25). This path has a composite measure with a mean of 12.91 and variance of 29.55. This path extends east on I-20, then east on US 80, south on US 43, southeast on US 84, south on US 331 and finally east on I-10 into Tallahassee, Florida. The largest overlap this path has with any of the other four paths in the solution is about 23% with path 8053. Path 8053 similarly, begins via west bound I-10 but rather than turning north on US 331 continues on I-10 through Georgia and into Mississippi before turning north on US 49. This second path, 8053 is shorter but since it extends into the urban areas of Mobile, Alabama and Biloxi, Mississippi it has higher exposure than path 2614. It is useful to comment that from Figs. 5 and 6, some pairs of routes look quite similar but their performance appears to be somewhat different. For example consider paths 8185 and 2117. From the figures, these paths appear to be the same except for a small portion of the route near Mobile, Alabama. The mean of the
Fig. 5. Maps of paths for (22.7%, 25).
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
345
Fig. 6. Maps of paths for (34.3%, 12.2).
Table 4 Probability distributions estimated for paths for (22.7%, 25) Path ID
2614 7641 8053 8067 8185
Travel time
Consequence measure
Composite measure
Mean (h)
Variance (h2) (102)
Mean (people– hours) (102)
Variance (people– hours)2 (·104)
Mean
Variance
8.68 8.72 8.26 8.81 8.69
1.49 2.87 2.52 1.57 2.74
26.85 31.80 31.98 33.23 33.90
12.43 17.72 20.29 15.12 21.49
12.91 20.68 20.53 17.12 24.94
29.55 44.64 48.30 36.46 51.55
Maximum percentage overlap (with path ID)
22.71 22.11 22.77 22.25 22.44
(8053) (2614) (8185) (2614) (8053)
346
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
Table 5 Probability distributions estimated for paths for (34.3%, 12.2) Path ID
995 2056 2117 2617 3240
Travel time
Consequence measure
Composite measure
Maximum percentage overlap (with path ID)
Mean (h)
Variance (h2) (102)
Mean (people– hours) (102)
Variance (people– hours)2 (·104)
Mean
Variance
8.64 8.06 8.50 8.56 8.49
1.51 2.75 3.65 3.16 2.62
26.20 28.52 27.26 27.22 27.96
12.06 12.30 15.41 15.02 12.09
11.42 7.67 11.20 11.92 12.19
28.78 32.39 39.68 38.71 30.89
33.50 33.36 28.62 33.99 34.25
(2617) (2056) (995) (3240) (2617)
composite measure for 8185 is 24.94 where as the mean for the composite measure for 2117 is only 11.20. In reality these paths overlap by about 70%, considerably less than is obvious from the figures. Further, 8185 is more heavily urban and therefore it has larger values for the consequence measure. Figs. 7 and 8 show the probability distributions for the composite measure for the five different paths for each solution. As the emphasis on the maximum value of the mean of the consequence distribution across the five paths increases, the paths become less geographically distinct as illustrated by comparing Figs. 5 and 6. As the paths become less geographically distinct, the composite measure probability distributions become more similar. This can be observed in the two solutions illustrated below by noticing that in Fig. 8 the five distributions are more similar and are shifted to the left in comparison to those in Fig. 7. It is important to notice that the probability distributions are non-zero for negative values. This is the result of the assumption of normality.
0.2
Probability
2614 0.15
7641 8053
0.1
8067 8185
0.05 0 -10
0
10
20
30
40
50
Composite Measure Fig. 7. Probability distributions for composite measure at (22.8%, 25).
0.2
Probability
995 0.15
2056 2117 2617 3240
0.1 0.05 0 -10
0
10
20
30
40
50
Composite Measure Fig. 8. Probability distributions for composite measure at (34.25%, 12.2).
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
347
Probability
0.4 0.3 0.2 0.1 0 -10
10
0
20
30
40
Composite Measure Fig. 9. Probability distribution for the composite measure obtained both analytically and via simulation for path 2614.
Fig. 9 illustrates the probability distribution for path 2614 estimated analytically and an estimate of that same probability distribution developed using simulation. The mean obtained through simulation and analytically is about 13.1 and 12.9, respectively. Similarly, the variance obtained through simulation and analytically is about 33.4 and 29.6, respectively. 6. Conclusions and opportunities for further research The objective of this paper was to develop a computationally efficient method to identify a small set of routes for use in the routing of hazardous materials shipments. In order to achieve this we develop a K shortest path algorithm where the arc attributes are stochastic and vary over time. This algorithm is based on the integration of the notion of propagating means and variances of attribute distributions across a path as developed in Chang et al. (2005) and the structure for generating K shortest paths for static and deterministic networks as developed by Yen (1971). This results in a very efficient algorithm for generating successive paths in a network from a fixed origin to a fixed destination where each successive path is, in general, slightly worse than the previous path. The K shortest path algorithm is used to generate a large set of routes which is used as input to a mixed integer optimization problem. The mixed integer optimization seeks to identify a relatively small subset of routes of fixed size so as to strike a balance between the quality of the worst path and the largest amount of overlap between any two paths. This balance is defined by the decision-maker’s judgment about the relative importance of solution quality and geographic diversity. We develop a genetic algorithm for use as the solution procedure. Finally, a case study on a large network illustrates that the methods can be applied in a realistic setting. This paper contributes to the literature by developing a computational efficient K shortest path algorithm for stochastic and dynamic networks. This algorithm is valuable in a variety of application areas including the routing of hazardous materials. This paper extends the formulation to select dissimilar paths given in Akgu¨n et al. (2000) and Thyagarajan et al. (2005) to allow for the estimation of the trade-off frontier between geographic diversity and performance. Finally, this paper creates and illustrates a methodology to identify geographically diverse paths for the routing of hazardous materials and applies that methodology to a real-world case study. There are at least three areas for additional research. The first area is the development of a K shortest path algorithm for which geographic diversity is considered in the development of the paths. K shortest path algorithms generally apply a fixed set of calculations against a good path to generate a new path which is generally just a little worse in quality than the original path. Typically this results in a minor modification to the original path. Though every iteration of the algorithm generates a different path, it typically takes many iterations to produce a substantially different path. If the goal is to produce a stream of paths, which are simply an ordering of many of the paths in the network based on solution quality, this is a very reasonable procedure. However, if only a small set of paths, which are geographically diverse, is needed, many of the paths generated via a K shortest path algorithm must be discarded. Research, which effectively integrates geographic diversity into
348
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
the process of generating alternative paths, could prove very useful. We have overcome this difficulty by creating a post-processor, which is the mixed integer program to integrate the goal of geographic diversity into the analysis. If the path generation algorithm were reconfigured to include geographic diversity, this second step might not be needed. A second important area for additional research is the development of better data for use in these types of algorithms. The analyses we have done as part of this research demonstrate the calculations, which are needed, and the data we have used allow us to perform those calculations. However, the data we have used are the result of blending the results of several different data collection efforts for different purposes and often on slightly different facilities or types of vehicles. This is useful as an illustration of the analysis but we could improve the quality of decision-making in this area substantially by collecting data, which is more accurate to the goals of the analysis. A final important area for additional research is the development of better visualization tools. All of the examples in this paper have assumed that the decision-maker knows the relative importance of each of the attributes. These relative importance values are then used to guide the K shortest path algorithm to generate the appropriate paths. They are also used in the optimization for the valuation of each route-departure time combination. In practice the decision-maker may have a general sense of their preferences between each of the attributes; however, they will likely still want to understand how the final subset of routes might vary based on changes in these relative priorities. Developing an understanding of this is difficult because each route-departure time combination has multiple probability distributions that describe its performance. When the decisionmaker is confronted with multiple sets of route-departure time combinations along with the geography for each route it is difficult for a human to identify the key differences. Therefore the development of effective representations for summary information is important along with the development of effective measures for the difference between two sets of paths. Acknowledgments The authors are grateful to two anonymous referees for their useful and valuable comments that helped improve this manuscript significantly. References Akgu¨n, V., Erkut, E., Batta, R., 2000. On finding dissimilar paths. European Journal of Operational Research 121, 232–246. Carotenuto, P., Galiano, G., Giordani, S., 2004. A metaheuristic approach for hazardous materials transportation. In: Fahrion, R., Oswald, M., Reinelt, G. (Eds.), Operations Research Proceedings OR 2003. Springer, Berlin, pp. 119–126. Carotenuto, P., Giordani, S., Ricciardelli, S., 2007. Finding minimum equitable risk routes for hazmat shipment. Computers Operations Research 34, 1304–1327. Chang, T., Nozick, L.K., Turnquist, M.A., 2005. Multi-objective path finding in stochastic dynamic networks. Transportation Science 39, 383–399. Current, J., Marsh, M., 1993. Multi-objective transportation network design: taxonomy annotation. European Journal of Operational Research 65, 4–19. Erkut, E., Ingolfsson, A., 2000. Catastrophe avoidance models for hazardous materials route planning. Transportation Science 34, 165– 179. Erkut, E., Verter, V., 1998. Modeling of transport risk for hazardous materials. Operations Research 46, 625–642. Erkut, E., Tjandra, S., Verter, V., 2006. Hazardous materials transportation. In: Barnhart, C., Laporte, G. (Eds.), Handbooks in Operations Research & Management Science: Transportation, vol. 14. Gopalan, R., Kolluri, K., Batta, R., Karwan, M., 1990. Modeling equity of risk in the transportation of hazardous materials. Operations Research 38, 961–973. Harwood, D.W., Viner, J.G., Russell, E.R., 1993. Procedure for developing truck accident release rates for hazmat routing. Journal of Transportation Engineering 119, 189–199. He, R., Liu, H., Kornhauser, A., Ran., B., 2002. Study travel time variability from probe vehicle data. In: Proceedings of the 7th International Conference on Applications of Advanced Technology in Transportation, Cambridge, MA, pp. 16–23. Kuby, M., Zhongyi, X., Xiaodong, X., 1997. A minimax method for finding the k best differentiated paths. Geographical Analysis 29, 298–313. Linder-Dutton, L., Batta, R., Karwan, M., 1991. Equitable sequencing of a given set of hazardous materials shipments. Transportation Science 25, 124–137.
Y. Dadkar et al. / Transportation Research Part E 44 (2008) 333–349
349
List, G., Mirchandani, P.B., Turnquist, M.A., Zografos, K.G., 1991. Modeling analysis for hazardous materials transportation: risk analysis, routing/scheduling, facility location. Transportation Science 25, 146–156. Nembhard, D.A., 1994. Heuristic path selection in graphs with non-order preserving reward structure, Ph.D. dissertation, The University of Michigan, Ann Arbor, MI. New York State Department of Transportation, 1982. Estimation of current traffic, Internal Memorandum, Albany, NY. Nielsen, L.R., Pretolani, D., Andersen K.A., 2005. K-shortest paths in stochastic time-dependent networks. Working Paper. Nozick, L.K., List, G.F., Turnquist, M.A., 1997. Integrated routing scheduling in hazardous materials transportation. Transportation Science 31, 200–215. Thyagarajan, K., Batta, R., Karwan, M., Szczerba, R.J., 2005. Planning for dissimilar paths for military units. Military Operations Research 10, 25–41. Tittemore, L.H., Birdsall, M.R., Hill, D.M., Hammond, R.H., 1972. An analysis of urban area travel by time of day, report FH-11-7519, US Department of Transportation, Federal Highway Administration, Washington DC. Yen, J.Y., 1971. Finding the K shortest loopless paths in a network. Management Science 17, 712–716.