Accident Analysis and Prevention 135 (2020) 105358
Contents lists available at ScienceDirect
Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap
An optimal network screening method of hotspot identification for highway crashes with dynamic site length
T
Jinwoo Leea, Koohong Chungb, Ilia Papakonstantinouc, Seungmo Kangb, Dong-Kyu Kimd,* a
The Cho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, 193, Munji-ro, Yuseong-gu, Daejeon, 34051, Republic of Korea b School of Civil, Environmental and Architectural Engineering, Korea University, 145 Anam-ro, Seongbuk Gu, Seoul, 02841, Republic of Korea c Department of Civil and Urban Engineering, New York University, Brooklyn, NY, 11201, United States d Department of Civil and Environmental Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
ARTICLE INFO
ABSTRACT
Keywords: Highway Safety Hotspot Identification Network Screening Dynamic Site Length Optimization Empirical Bayesian Estimate
We propose a novel network screening method for hotspot (i.e., sites that suffer from high collision concentration and have high potential for safety improvement) identification based on the optimization framework to maximize the total summation of a selected safety measure for all hotspots considering a resource constraint for conducting detailed engineering studies (DES). The proposed method allows the length of each hotspot to be determined dynamically based on constraints the users impose. The calculation of the Dynamic Site Length (DSL) method is based on Dynamic Programming, and it is shown to be effective to find the close-to-optimal solution with computationally feasible complexity. The screening method has been demonstrated using historical crash data from extended freeway routes in San Francisco, California. Using the Empirical Bayesian (EB) estimate as a safety measure, we compare the performance of the proposed DSL method with other conventional screening methods, Sliding Window (SW) and Continuous Risk Profile (CRP), in terms of their optimal objective value (i.e., performance of detecting sites under the highest risk). Moreover, their spatio-temporal consistency is compared through the site and method consistency tests. Findings show that DSL can outperform SW and CRP in investigating more hotspots under the same amount of resources allocated to DES by pinpointing hotspot locations with greater accuracy and showing improved spatio-temporal consistency.
1. Introduction In an effort to alleviate the adverse effect resulting from traffic crashes (Murray et al., 1996) with limited government resources, government agencies developed data-driven procedure (AASHTO, 2010) to detect hotspots (i.e., sites that suffer from high collision concentration and have high potential for safety improvement) and allocate resources to maximize the reductions in crash frequency (Harwood et al., 2010). The performance of such organized effect heavily depends upon types of network screening procedure and safety measure. There is rich literature on the comparison among various safety measures in the hotspot identification (HSID) process (e.g., Cheng and Washington, 2005; Montella, 2010) in terms of their performance consistency, applicability under data availability, and appropriateness to different purpose of use. Moreover, for a better prediction of the safety level, there have been numerous studies conducted to overcome the limitations of the currently used methods, which are well
documented in some review papers such as Mannering and Bhat (2014), Savolainen et al. (2011), and Lord and Mannering (2010). However, finding a better screening method has not been getting great attention compared to developing a better safety measure even though some studies emphasize that improving screening methods can bring additional benefits in terms of robustness and consistency in HSID (e.g., Grembek et al., 2012; Kwon et al., 2013; Medury and Grembek, 2016; Chung and Ragland, 2019). The Simple Ranking (SR) and Sliding Window (SW) methods are commonly used network screening techniques (AASHTO, 2010; Harwood et al., 2010) associated with a strict definition of candidate sites in terms of their lengths and endpoint locations. One of the critical limitations of the SR method, based on pre-segmented candidate sites, is that they do not consider the crash frequency adjacent to a site such that it can result in selecting only the portion of the potential hotspot. Also, for both methods, the resulting hotspot can markedly vary depending on the predetermined site length (Kwon et al., 2013). If hotspot
Corresponding author. E-mail addresses:
[email protected] (J. Lee),
[email protected] (K. Chung),
[email protected] (I. Papakonstantinou),
[email protected] (S. Kang),
[email protected] (D.-K. Kim). ⁎
https://doi.org/10.1016/j.aap.2019.105358 Received 17 March 2019; Received in revised form 20 September 2019; Accepted 5 November 2019 0001-4575/ © 2019 Published by Elsevier Ltd.
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
length is given too short, these methods can suffer from high false positive errors due to random fluctuation in the data. Thus, it is possible to detect a highway section as a hotspot, which is not actually risky. This false positive error can unnecessarily increase the costs of Detailed Engineering Studies (DES), and delay the time for real hotspots to be investigated. Increasing the investigation site length to mitigate the false positive rate can increase false negative rate when the site length of a true hotspot is relatively short compared to the length of the investigation site: it could not be detected since its significance can be averaged-out along the investigation window length. To address the shortcomings in the SR and SW methods, Chung and Ragland (2007) develop Continuous Risk Profile (CRP) that estimates robust crash frequencies by filtering out the noise in the data and identifies hotspots with dynamic site length. A hotspot is identified where its CRP values are higher than the threshold defined as the upper confidence interval of the safety performance function (SPF), which refers to the observed mathematical relationship between typically crash frequency and explanatory variables of similarly grouped highways, ramps, and intersections (Tegge et al., 2010). Although the CRP is reported to be reproducible over multiple sites (Chung et al., 2009) and can address some of the limitations in other network screening procedures (Grembek et al., 2012), their endpoint locations can still be affected by random fluctuations and the bias in SPF (Lee et al., 2016). The biased hotspot endpoint locations can become an issue when the agency cost for site investigation is limited to a certain covered length or number of hotspots. To address this issue, Medury and Grembek (2016) propose a screening method based on Dynamic Programming (DP), which maximizes crash frequencies involving pedestrians within identified hotspots associated with flexible site length. This procedure, however, does not correct the regression-to-the-mean (RTM) bias but only counts the naïve number of crashes, and agencies’ resource constraints are not considered. To this end, this paper developed a novel Dynamic Site Length (DSL) method for detecting hotspots that can maximize the value of any kind of safety performance measures such as the expected average crash frequency with empirical Bayesian (EB) adjustment that accounts for the RTM bias effectively (Hauer, 1997; Hauer et al., 2002b; Huang et al., 2009; Montella, 2010; Wu et al., 2014). The proposed DSL method can also be used to strategically allocate resources and determine the number of hotspots to be invested annually. The problem formulation with dynamic site length is presented in Section 2 followed by a detailed explanation of the proposed method. Section 3 discusses the logic behind the DSL method. The performance of the DSL method is measured by evaluating tests that are explained in Section 4. The results of the performance test are reported in Section 5. This paper ends with brief concluding remarks in Section 6.
K
y * = max yN , S, E = N , S, E
N k, Sk , E k, k
K
y kk
max
k=1
k
N , S ,E
k
=
Nk
ynk (snk, enk ),
max
N k , Sk, E k , k
k = 1 n= 1
(1) where yN , S, E refers to the total safety measure covered by the selected hotspots from the whole system, the objective value to maximize by determining the decision variables, ; y k k k k denotes the total N , S ,E
safety measure covered by the selected N k hotspots from corridor k , ; ynk stands for the safety measure of hotspot n on k th corridor be; S represents the set of Sk for all tween its endpoints snk and enk , 2 k ; E is defined as the set of E k for all k ; and Sk is the set of upstream corridor, endpoints of the selected hotspots from k th Sk = {s1k, …, snk , …s k k } ; E k stands for the set of downstream endpoints of N
the selected hotspots from k th corridor, E k = {e1k, …, enk , …e k k } . N Equation 2 shows the constraint on the total length of hotspots that can be investigated within the limited agency budget. It describes that the summation of site length for all hotspots identified from K corridor sections, denoted by LN , S, E , should not exceed the given value B that are reserved for conducting DES (i.e., safety investigation). K
Nk
(enk
s. t . , LN , S, E =
snk )
B.
(2)
k=1 n=1
If the objective value is the same for multiple optimal sets of (N , S, E ) from the above optimization problem, it would be more economical to select an option with lower agency costs. In this case, the agency cost is considered as the lower-level objective in the bi-level hierarchical optimization as well as constrained by B . The original objective function is modified as Equation 3 by introducing a positive 0+ . infinitesimal term,
P * = max [yN , S , E N ,S,E
LN , S, E ],
(3)
where P represents the objective value of the bi-level hierarchical optimization. The upper limit of site length shown in Equation 4 considers the future countermeasure activity applied on a site (Medury and Grembek, 2016), while Equation 5 shows the minimum length of a hotspot. The minimum length condition ensures not selecting hotspots that are too short to have meaningful statistical significance. The impacts of values of the site-specific constraints, Equations 4 and 5, on the optimal results have been evaluated both analytically in Section 3 and empirically in Section 5.
enk
snk
lmax ,
n , k;
(4)
enk
snk
lmin,
n, k .
(5)
As a special case of fixed site length, if lmax = lmin , the maximization of the sum of selected safety measure (defined in Equation 1 or Equation 3) is equivalent to the identification of the locations with the highest safety measure. Also, Equation 2 becomes identical to the traditional HSID constraint shown in Equation 6, which limits the total number of selected sites, e.g., the top 10 identified locations. Here, N u stands for the upper limit of the site number.
2. DSL screening method The proposed DSL screening method identifies hotspots with dynamic site length, which maximizes the value of the safety measures. Hotspots were detected along an extended highway system consisting of k k x k xterm K corridors, each of which spans in the range of x ini , where x k is post-kilometer of a location on k th corridor for all k = 1, …, K . The decision variables include the number of hotspots from k th corridor and within the whole network, denoted by N k and N respectively, and upstream and downstream endpoints of each corridor, designated by Sk = {s1k , …s k k } and E k = {e1k , …e k k } for all k . The hotspots N N chosen within a corridor are indexed starting from upstream, and overlapping of hotspots is not allowed, i.e., k k x init snk < enk < snk+ 1 < enk+ 1 xterm , n < N k . The objective is to maximize the total safety measure denoted by y within the coverage length by the selected hotspots as addressed in Eq. (1).
K
Nk
s. t . N = k=1
Nu =
B lmax
=
B lmin
.
(6)
3. Solution methodology As an optimization solution of the DSL method, we develop a dynamic-programming-based method applicable to any safety measure y which guarantees: (i) close-to-optimal results; and (ii) the polynomial solution complexity to the size of network and historical crash data. The DSL method is comprised of two layers for the system-level problem and the corridor-level problems. The overall process starts 2
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
Fig. 1. A conceptual overview of the dynamic-programming-based solution algorithm of the Dynamic Site Length (DSL) screening method for crash hotspot identification (HSID).
from decomposing the system-level problem into K corridor-level problems using the Lagrangian method. Each decomposed corridor-level problem is solved using a dynamic programming approach, and the optimal Lagrangian multiplier is found based on the corridor-level results to identify the optimal hotspots locations. Graphical illustration of the DSL solution process is shown in Fig. 1 to aid the detailed explanation of the method presented in this section.
3.2. Solution for the corridor-level problem and divided into Corridor k is discretized with gap k k Qk (= (x term xinit )/ ) segments that are indexed by j . The downstream endpoint location of segment j is denoted by x jk and calculated k k + j except the last segment indexed as Q k whose x jk is x term as x init . k The initial point x init is set as x 0k . The corridor-level Lagrangian function is converted into the Bellman equation as:
3.1. Problem decomposition
L kj ( ) = max {option 1, option 2},
We adopt the Lagrangian relaxation method to solve the systemlevel problem consisting of the system-level objective function (Equation 3) and the system-level constraint (Equation 2). The original problem and the resulting problem from the Lagrangian method are called as “primal problem” and “dual problem” respectively. The dual problem is given as follows with the non-negative Lagrangian multiplier :
= inf max yN , S, E N , S, E
LN , S, E
(LN , S, E
B ),
y( )
L( )
(L ( )
B) =
option 1 = L kj
+ k
N k , Sk , Ek
max y k k k k N k , Sk, E k N , S , E
(
option 2 = L k ( ) + y (x k , x jk ) j
lmax
N , S k, E k
,
(8)
max yNk , S, E
N k , Sk , Ek
(
+ ) Lkk
N , S k, E k
.
xk j
x jk
(
+ ) (x jk
x k ), j
j < j, s . t . x jk
lmin
can include multiple sub-options for different j . The number of suboptions is the number of segments between x jk lmax and x jk lmin . Lastly, the initial condition is given in Equation 11. The Bellman equation is solved recursively for every i from 1 to Qk . The complexity of this recursive algorithm is
where the decomposed dual problem for corridor k is:
Lk ( )
j
In Equation 10, L kj ( ) is defined as the maximum corridor-level dual objective value associated with , shown in Equation 9, achieved along the range from the start of the road up to the point x jk , i.e. k x k {x init , x jk } . There are two categories of option at jth iteration to choose the maximum dual value until x jk . The first option is not to include the location of x jk as the downstream endpoint of any hotspot. The second option is to include a candidate site defined between x j and x j , which
B
+ ) Lkk
1
and
(7)
max L k ( ) =
k
(11)
where,
k . Thus, the Lagrangian dual
B+
(10)
L0k = 0,
where D** designates the optimal dual value while P * in Equation 3 is the optimal primal value. Double- and single-star notations are used to denote the optimality for dual and primal respectively. The Lagrangian dual function is decomposable into K corridor-level dual functions, L k k , due to the fact that the corridor-level decision factors, N k , Sk , and E k , do not affect the corridor-level Lagrangian dual
value, L k , of another section of k function is found by:
j = 1, …, Q k ;
O
k (x term
k )(l xinit max 2
lmin )
,
which is linear to the length of corridor, and inversely related to the square of the minimum gap . In practice, we do not need very accurate location of the endpoints of a hotspot, so reasonably large could be accepted. In the California practice, 0.016 km is the minimum unit of post-kilometer, so lower than 0.016 km does not bring additional benefit. The optimal corridor-level Lagrangian dual k (xterm
(9)
The solution methodology for the corridor-level dual problem and the solution methodology for the system-level dual and primal problems are explained in section 3.2 and 3.3, respectively 3
k x init ),
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
value, L k ( ) , is found as L kk ( ) , and the optimal number of hotspots,
contradictory. If Solution 4 was below the triangle, it would be inferior to Solution 1. Therefore, P of Solution 4 in the triangle, which is not a dual solution, should be higher and lower than those of Solution 1 and Solution 2 respectively. If such a primal solution exists (e.g., Solution 4), the dual solution (e.g., Solution 1) is not always the primal solution but the close-to-optimal solution. In this paper, we consider the dual solution as an acceptable solution because of the following reasons: (i) the dual solution is easy to find by the proposed method in Section 3.2 with polynomial complexity; (ii) if L** B , the dual solution is approximately the same as the primal solution, and this condition is satisfied more easily as a highway system and the size of its crash history data are bigger; and (iii) otherwise, the dual solution is close-to-optimal but has higher
Q
N k ( ) , and their locations, Sk ( ) and E k ( ) , are obtained as N kk ,
S kk = {s1k, …, s Nk k } and EQk = {e1k, …, e Nk k } respectively found by: Q
Qk
Qk
at jth iteration starting from 1:
N jk , Sjk, E jk =
N jk 1, Sjk 1, E jk N k * + 1, S k * +
{x k''}, j
Q
1
, if option 1 is maximum
E k * + {x jk } , if option 2 is maximum (12)
at the initial stage, j = 0 :
N0k = 0, S0k =
and E0k =
(13)
,
P **
4. Evaluation test The performance of the proposed DSL method is compared with two conventional screening methods, SW and CRP, which report hotspots in fixed and flexible lengths respectively (Chung and Ragland, 2019). These methods are elaborated in Appendix A. As shown in Table 1, SW and CRP have their own strengths. However, SW is forced to have a fixed window length, meaning that it can suffer from false positive when the window length is too short and false negative when it is set to be too long. CRP addresses some of these issues by allowing SPF and the excess collision risk to define the endpoints of the sites. This can, however, result in a site length to be much longer or shorter than a desirable range of site length. Moreover, CRP can be applied only to performance measures estimating a performance threshold, while SW is compatible with any type of performance measure. DLS imposes empirically determined minimum and maximum site lengths and strives to emulate the advantage of both CRP, which lets a detected hotspot have flexible site length, and SW, which provides better compatibility. However, DSL requires understanding of optimization theories such as Dynamic Programming, while the other two methods are relatively simple. The three screening methods are evaluated based on their optimal objective value (i.e., performance of detecting sites under the highest risk) and their spatio-temporal consistency using the site and method consistency tests. Total rank difference and false identification rate had been investigated in other studies (e.g., Cheng and Washington, 2008) to evaluate the performance of different screening methods. Total rank difference is not evaluated in the present study since the total rank difference test is inapplicable to sites that are identified with dynamic length. The false identification test is also not evaluated since evaluating the false identification is only possible for simulated crash data where the true hotspot locations are deterministically known.
3.3. Solution for the system-level problem Once we solve the corridor-level optimization problem, Equation 9, (0, ) , associated with the site-level conunder a given value of straints defined in Equations 4 and 5, y ( ) ( + ) L ( ) is calculated, and the optimal Lagrangian multiplier, * , for the system-level dual problem can be found. Note that for specific (P **, L**) it is possible to be associated with multiple ** and multiple sets of (N **, S **, E **). If the dual covered length L** is equal to B , the dual objective, infmax P 0 , is the same as the primal objective, max P , so the strong N , S, E
P*
safety measure density than the primal solution, i.e., ** > * , which is L L also another desirable property in terms of increasing the effectiveness of DES investment.
where j is the optimal index selected in Equation 10 maximizing L kj ( ) in Option 2. N jk refers to the optimal number of hotspots up to the point at x jk , and Sjk and E jk stand for the sets of optimal endpoints of N jk hotspots detected up to x jk respectively. In Equation 13, means an empty set. While the optimal objective value is unique, there might be multiple optimal strategies of N k ( ) , Sk ( ) and E k ( ) . If multiple optimal strategies are found, practitioners need to determine the best solution based on additional criteria on top of the selected safety measure and covered length.
N ,S,E
duality holds. In other words, the optimal dual strategy, (N **, S **, E **), is the same as the optimal primal strategy, (N *, S *, E *) . However, if L** < B , the optimal dual strategy is not always the same as the optimal primal strategy. Fig. 2 shows an example of this case. Owing to the discretization with , (P *, L*) and (P **, L**) for continuous B are discrete. Based on the discreteness, suppose that we find three dual solutions indexed by 1, 2, and 3 as represented on Fig. 2 associated with a range of different Lagrangian multipliers and that there is no other Dual solution within this range. For all Lagrangian multipliers ranged within ( 1 , 2 ], the optimal dual values, (P **, L**) , are found as Solution 2. If the budget is given as B1, L** of Solution 2 is equal to B1, so the primal solution is the same as the dual solution. For a budget equal to B2 , the dual solution is Solution 1, but it is possible that there exists an optimal primal solution represented as Solution 4 within the gray triangle on Fig. 2. If Solution 4 was above the triangle, the solution should have been found as a dual solution, and this is
Fig. 2. Optimal total hotspot lengths and safety measures associated with the different upper limits of covered length for both primal and dual problems.
4
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
Table 1 Summary of screening methods. Screening method
Accounts for the bias due to fixed site length
Applicable to various safety measures
Algorithm complexity
Sliding Window (SW)
No
Yes
Continuous Risk Profile (CRP) Dynamic Site Length (DSL)
Partially (endpoints of each site are uncontrollable) Yes (endpoints of each site are controllable within a given range)
Only to safety measures estimating a performance thresholda Yes
Simply enumerates all candidate sites by a predetermined increment and selects the highest sites Plots a CRP and finds sections where the CRP is above the threshold Needs understanding of optimization theories (such as Dynamic Programming)
a Examples of safety measures that estimate a performance threshold are found in Exhibit 4-8 of Highway Safety Manual (HSM by AASHTO, 2010), such as Relative Severity Index and Excess Expected Average Crash Frequency with EB Adjustments.
4.1. Site Consistency Test
evaluated using the DSL, SW, and CRP methods together with the set of SPFs developed by Kwon et al. (2013) following HSM convention. Empirical evaluation of the SPF developed by Kwon et al. indicated improved fit to data over existing SPFs, and these SPFs are used in our case study. Both corridors are segmented discretely by the roadway groups. Within each section, the SPF, exposure, and overdispersion factor are assumed to be constant. To identify hotspot locations in 2013, 2014, and 2015 respectively, we use the crash data collected in the previous three years, i.e., T = 3, together with the corresponding AADT data. There exist a number of different safety measures that can be used to detect HCCL and their pros and cons are discussed in HSM. Three selected safety measures including Average crash frequency with SPF adjustments with SPF adjustments (ACF), Expected Average Crash Frequency with EB adjustment (EB), and Excess Expected Average Crash Frequency with EB Adjustment, which is also known as Potential for Safety Improvements (PSI). These safety measures are compared in the DSL method, which are mathematically defined in Appendix B. The site and method consistency tests prove that EB is the most appropriate safety measure (see Appendix C), and this result is consistent to the previous research evaluating safety measures in the simple ranking method (e.g., Cheng and Washington, 2008; and Montella, 2010). Thus, hereafter, the comparison study among three screening methods is conducted using EB.
The site consistency test evaluates the temporal consistency of the crash number within the identified hotspots. The test output T1 is defined as the summation of absolute error (SAE) of the actual crash numbers between period 1 and period 2 for the hotspots identified in period 1. A lower value of T1 indicates less temporal fluctuation between two subsequent periods. T1
K k= 1
k N period 1 n= 1
|#period 1 (snk,period 1, enk, period 1)
#period 2 (snk, period 1, enk, period 1 )|
K Nk k = 1 period 1
,
(14) where is the number of hotspots identified on corridor k in period 1 based on the crash data during the last T years; snk, period 1 and enk, preiod 2 are the endpoint locations of the nth hotspot; and #t (snk , enk ) is the adjusted number of crashes within the site between snk and enk in year t. k Nperiod 1
4.2. Method Consistency Test The method consistency test examines the proportion of the overlapped distance of two sets of hotspots identified in consecutive periods to the total summation of hotspot length. A value of T2 closer to 1 indicates that the HSID method is spatially consistent. k xterm K k=1 k x init
T2
I ( n, x
5.1. Performance Comparison among Screening Methods
[snk, period 1, enk, period 1 ]) I ( n , x
[snk, period 2, enk, period 2 ]) dx
.
B
The length of candidate sites in the SW method, lSW , is predetermined for three scenarios: (i) “Short” case where lSW = 0.24 km; (ii) “Medium” case where lSW = 0.32 km; and (iii) “Long” case where lSW = 0.40 km. The CRP site length is flexibly given by the excess CRP over a threshold set based on SPF, but, as pointed out in Table 1, the site length and endpoint locations of each hotspot are not directly controllable. Fig. 3 conceptually illustrates the CRP method with two different thresholds. Using Threshold 1, CRP will detect only single hotspot, Site 1, while using Threshold 2 will detect two hotspots, Site 1 and Site 2, with different endpoints. Notice how using Threshold 2 resulted in a longer segment length for Site 1. In the CRP method, the sole decision variable is the CRP threshold (i.e., the specified upper confidence interval), and the site length and endpoint locations of all identified hotspots are automatically determined based on the selected threshold. However, the DSL method allows controlling the site length and endpoints within the given range from lmin to lmax . Thus, among the three screening methods, the DSL method has the greatest degrees of freedom. To investigate the sensitivity of the DSL method to different ranges of site length, we consider two scenarios of lmin and lmax : (i) “Narrow” case where lmin = 0.24 km and lmax = 0.40 km; and (ii) “Wide” case where lmin = 0.16 km and lmax = 0.48 km. Table 3 summaries all scenarios for the three screening methods. The moving increment of the SW method, , and the discretization gap of the DSL method, , are set as 0.016 km. Fig. 4 shows the total EB of the hotspots detected in 2013, using the three methods with respect to the cumulative site length of the detected
(15)
The inside of the integral is 1 if location x is included in both hotspot sets identified in period 1 and 2. 5. CASE STUDY Traffic collision data (see Table 2) obtained from the Interstate-80 Westbound from post-kilometer 0.74 to 112.90 and the Interstate-880 Northbound near San Francisco California from 2010 to 2014 are Table 2 Summary of the crash data (data source: SWITRS by California Highway Patrol). Corridors (located within Caltrans District 4)
k = 1 (I80W)
k = 2 (I880 N)
Freeway length (km), (tterm
112.16 510 529 586 584 504 44
71.47 419 487 487 465 507 62
tinit ) t = 2010 2011 2012 2013 2014 Number of roadway sections segmented by the predetermined roadway grouping categorization Number of crashes, Ctk
5
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
Fig. 4. Total Potential for Safety Improvement (PSI) (crashes/year) included in hotspots identified by three hotspot identification methods: Sliding Window (SW), Continuous Risk Profile (CRP), and Dynamic Site Length (DSL). SW Short: lSW = 0.24 km; SW Medium: lSW = 0.32 km; SW Long: lSW = 0.40 km; DSL Narrow: lmin = 0.24 km, lmax = 0.40 km; DSL Wide: lmin = 0.16 km, lmax = 0.48 km.
Fig. 3. Conceptual illustration of the Continuous Risk Profile (CRP) method.
hotspots. Notice that the combined site length is ranged from the minimum site length to 2% of the whole network on the graph. For a certain covered length, the DSL method can detect hotspots with the highest summation of EB estimates. Due to the lower flexibility of site length and the additional constraints, the SW method is shown to be less efficient than the DSL method as expected. Lastly, the CRP method shows the lowest total EB estimates since it determines hotspots’ locations without attempting to maximize their total EB covered by combined hotspot length but finding the sites based on the relative EB. If a threshold-based safety measure is chosen for the comparison, such as PSI, instead of the non-threshold-based measure of EB, CRP may show better performance in identifying sites associated with a high safety measure. In this case study, we select a non-threshold-based safety measure to demonstrate the limitation of CRP. Between two scenarios of the DSL method, “DSL Narrow” and “DSL Wide”, the results are improved as more flexibility of site length is given. For the SW method, there is no clear trend between fixed site length and the total EB estimate. This is because too short or too long site length could yield false positive or false negative errors respectively, and effects of both directions of error are usually combined for a specific site length over a network. The improved performance of DSL over the CRP and SW curves can be related to multiple factors. The DSL method selects the boundary of candidate sites with greater flexibility which allows the hotspots to have a shorter length. Fig. 5 plots the average hotspot length for various scenarios of the three screening methods. The average site length of the DSL method is shorter under higher flexibility of site length, i.e., more hotspots are detected, and the narrow sections around the highest peaks can be selectively identified as hotspots. For this reason, even though the covered length
Fig. 5. Average hotspot length for different hotspot identification methods: Sliding Window (SW), Continuous Risk Profile (CRP), and Dynamic Site Length (DSL). SW Short: lSW = 0.24 km; SW Medium: lSW = 0.32 km; SW Long: lSW = 0.40 km; DSL Narrow: lmin = 0.24 km, lmax = 0.40 km; DSL Wide: lmin = 0.16 km, lmax = 0.48 km.
increases, the average site length remains close to the minimum site length, and only the total hotspot number increases. On the other hand, in the CRP method, both average site length and total site number of the identified hotspot increase with the total covered length as depicted in Fig. 3. In Fig. 5, the SM Short scenario shows that its fixed site length, 0.24 km, is lower than that of the DSL Narrow scenario (see that the blue dotted line is below the bold black line). Between these two scenarios, the DSL method identifies a smaller number of hotspots
Table 3 Summary of the considered scenarios. Screening method
SW
CRP
DSL
Site length
Fixed
Flexible but uncontrollable
Flexible and controllable
-
Narrow 0.24 0.40
Short Site length range Fixed site length, lSW [km] Minimum hotspot length, lmin [km] Maximum hotspot length lmax [km]
0.24 -
Medium
Long
0.32 -
0.40 -
6
Wide 0.16 0.48
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
associated with longer average site length than the SMW method, but the DSL method’s greater flexibility in selecting the boundary location of the candidate sites contributes in improving the performance. In the site consistency test, we identify hotspots in 2013 for different values of the maximum total covered length, B , based on the crash data collected in 2010, 2011, and 2012, and calculate SAEs between 2013 and 2014 and between 2013 and 2015 for the identified hotspots. The method consistency test shown in Equation 15 is conducted for the three sets of identified hotspots for years 2013, 2014, and 2015, denoted by {S **, E **} 2013 , {S **, E **} 2014 , and {S **, E **} 2015 respectively. The DSL method shows the greatest temporal consistency between the top 0.5% and 1% of the entire network. However, as the total covered length increases, owing to the higher average site length of SW Medium and CRP, their temporal consistency becomes higher than that of SW Short and DSL Narrow whose average site length is still short (see Fig. 5). The effect of random fluctuation on the EB estimate is reduced as a site length increased. Within the rage of B from 0.4 to 1.3% of the entire network length, the DSL method reported a higher performance over the other methods in terms of spatial consistency. As in practice interested in the top 0.5 ∼ 1% hotspots, the DSL method can be considered as a consistency screening method based on the results of the consistency tests.
site is uncontrollable in CRP, we exempt CRP from the comparison. In Table 4, shows the value of the parameters used in SW and DSL. The covered length, average hotspot lengths were identical, but notice that DSL reports much higher performance of spatial consistence over that of SW. Significant improvement in Total EB is also observed. The result of the temporal consistency is comparable. The greater flexibility and controllability of endpoints location enables the proposed method could have contributed in improving the performance of DSL. 6. CONCLUSIONS AND FUTURE WORK This paper aims to design a flexible screening method that locates hotspots in terms of the highest safety measure on a roadway network. The screening method is addressed as a constrained optimization model, where the summation of hotspot length cannot exceed given length or the total number of hotspots is limited. The optimization problem is solved using DP-based screening techniques so that a closeto-optimal result is guaranteed, and its computational complexity is polynomial to the size of a problem. The formulation accounts for historical crash data for the roadway, evaluating their importance according to the years of collection. The performance of the proposed model together with two other existing network screening methods, SW and CRP have been evaluated using crash data, collected for five years from 2010 to 2014 over two extended freeway sections near San Francisco. Three sets of hotspots in 2013, 2014, and 2015 are identified based on the three years of data from 2010 to 2012, from 2011 to 2013, and from 2012 to 2014 respectively. Using the EB estimate as a safety measure, it is shown that the proposed DSL method outperforms two other screening methods, in terms of both the total safety measures within the detected hotspots and spatial-temporal consistency of HSID for the range of total covered length by identified hotspots from 0.5% to 1% of the entire network length. Lastly, the conventional screening methods can identify hotspots with a pre-determined or semi-flexible site length, but if a detected segment is defined with too long or too short site length, it is necessary to adjust it for further investigating. This is the reason why we developed a method for finding dynamic endpoints of the hotspot, which can be free from the limitations of the segmentation method, and the proposed DSL method is optimized in this aspect. During our follow up study, we plan to elevate our DP screening method together with the recently developed non-parametric and artificial intelligence (AI) models for better crash prediction, including neural network models (Zeng et al., 2016; Zeng and Huang, 2014; Chimba and Sando, 2009; Abdelwahab and Abdel-Aty, 2001). Moreover, the Full Bayesian (FB) method (Cheng et al. 2017; Persaud et al., 2010; Huang et al., 2009; Washington and Oh, 2006; Davis and Yang, 2001) and the improved EB method based on a finite mixture of negative binomial regression models (Zou et al., 2018) can be tested with the DSL method as safety measure alternatives. Our future work will aim to extend to include additional decision factors about safety countermeasure that can be applied to the identified hotspots (Oh and Park, 2014) and the corresponding agency costs and simulation work that aims to investigate the false identification rate for different screening methods.
5.2. Comparison of sites detected by SW and DSL While the lengths of hotspots detected by CRP are indirectly determined by a threshold, the lengths of sites detected by SW can be completely determined as its predetermined window length, and that of DSL can be directly controllable within the range from lmin to lmax . In this sub-section, the performance of SW and DLS is investigated when they report the same number of hotspots as well as the same total combined site length. Within approximately 1% of the entire network (B 2 km), the screening results in 2013 are shown in Table 4. The DSL Narrow scenario defined with lmin = 0.24 km and lmax = 0.40 km is selected as the reference case, where seven hotspots are detected, and their average site length is 0.272 km. Since the DSL method tends to detect a relatively short site around a peak location in order to maximize the objective value, it is observed that the average site length is close to lmin as shown in Fig. 5. Note that the average site length of DSL is controllable by adjusting lmin if it is too short to apply in practice. However, SW and CRP cannot pinpoint the peaks due to their lower flexibility in site length, the performance comparison would be biased to DSL if their average site length was longer than that of CRP. Also, as aforementioned in the previous sub-section, a lower site length of SW than that of DSL yields worse results due to false positive errors. Thus, we choose lSW to be the same as the average site length of DSL Narrow as 0.272 km so that the average site length and the number of detected hotspots of SW are the same as DSL. Because the length of each Table 4 Summary of optimal results of the hotspot screening methods including Sliding Window (SW) and Dynamic Site Length (DSL), where the average hotspot length is adjusted to 0.272 km, and the total covered length is constrained to the 1% of the entire network (B = 2 km). Screening methods Common factors: Setting:
Screening results:
Number of hotspots, N Covered length, L [km] Average hotspot length [km] Minimum hotspot length, lmin [km] Maximum hotspot length, lmax [km] Fixed hotspot length, lSW [km] Total EB [crashes/year] Site (temporal) consistency test Method (spatial) consistency test
SW
DSL
Acknowledgment
7 1.90 0.272 0.272 81.76 39 45.6%
7 1.90 0.272 0.24 0.40 91.96 41 70.5%
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2019R1H1A1080045 and 2018R1A2B6005729).
7
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
Appendix A. . SW AND CRP methods A.1 Sliding Window (SW) method k k In the SW method described in HSM, a window associated with a fixed length lSW is discretely moved along corridor k from x init to x term by a k specified increment . The corridor is divided into M segments according to the highway group classification, and they are indexed by m associated with the post-kilometer range of (smk, emk ). The total number of segments in the whole network is designated by M . At every increment, the window’s location is indexed by j . The upstream and downstream endpoints of window j on corridor k are denoted by s jk and ejk respectively, and they satisfy k k } , and ejk = s jk + lSW . s jk+ 1 = s jk + , s1k = xinit , ejk+ 1 = min{e jk + , xterm HSM identifies a risky segment as a hotspot according to the highest value of safety measure recorded by a window within the segment, where the hotspot is not the window of the highest safety measure but the whole section including the window. However, the segmentation is not usually conducted based on the safety characteristics but on the route types and other site features, so it is arbitrary to consider the whole segment as a hotspot. In this paper, we propose a modified SW method that picks the top-ranked windows from all segments as hotspot candidates and identify given number of hotspots among them. The mathematical formulation comprised of the objective function (Equation A1) and the site number constraint (Equation A2) is represented below. The optimization problem can be understood as an augmented formulation of the original systemlevel problem (Equation 2 and Equation 3) with additional constraints such as fixed site length, limited location of hotspots. The decision variable z mk is binary and identifies whether the riskiest window within segment m on corridor k is selected as a hotspot or not. Equation A3 describes the sliding process along segment m . A window under the highest value of y is selected as the representative of the segment, and the value is used as ymk . We slightly modified the boundary condition of the SW algorithm presented in HSM to avoid the situation that a specific window represents multiple segments at the same time. K
y SW = maxyz = z
K
Mk
max
k {0,1}, k, m zm k=1 m =1
zmk ymk ,
(A1)
Mk
zmk lSW
s . t . , Lz =
B.
(A2)
k=1 m=1
ymk = maxy (s jk, ejk ),
s jk
j s. t . smk
j
+
ejk
< emk
2
(A3)
The solution algorithm of the above problem is: Step 1. For each segment, find ymk based on the SW process depicted in Equation A3; Step 2. Sort ymk for all m and k from the highest to the lowest; Step 3. Select top
B lSW
segments, and their representative windows are identified as the hotspots.
The complexity of the above enumerative algorithm is
O
K k k = 1 (xterm
2
k )l x init SW
, which is also linear to the total length of the system and inversely
related to the square of the increment . Specifically, the corridor-level complexity, DSL method,
O
k (x term
k )(l xinit max 2
lmin )
O
k (xterm
k )l x init SW 2
, has the similar dimensionality of that of the
.
A.1 Continuous Risk Profile (CRP) Method To filter out the negative effects of random fluctuation on the HSID results, the CRP method is invented (Chung et al. 2009). CRP is a continuous curve defined from discrete crash data with a specific smoothing function, . Any threshold-based safety measure can be used to plot a CRP. For example, according to whether the RTM bias is corrected through the EB method or not, the curves can be defined as Equations A4 and A5 respectively. The smoothing function should satisfy the conditions presented in Equation A6. The width of the smoothing shape is designated by lCRP , and, outside of the shape, the value of is zero, where lCRP is determined by the maximum difference between the actual crash location and the 2 (x ) dx reported crash location. The area of with respect to x is one since it stands for the single crash, and the variance of ( ) is empirically given by the reporting accuracy of crash locations. ACF without EB adjustment:
µk ( x )
CRP k (x ) = i [1, … , Ctk ], t
T µ k (x ) t=1 t
(dik, t
x)
(A4)
With EB adjustment: T
1
CRP k (x ) = t=1
(x ) > 0, if x
1+
µtk (x ) µk (x )
T k t = 1 µt (x ) k (x )
lCRP lCRP , , 2 2
T µ k (x ) t=1 t
+ i [1, … , Ctt ], t
(x ) = 0, otherwise , and
µk (x )
1
1 1+
T k t = 1 µ t (x ) k (x )
(x ) dx = 1
T µ k (x ) t=1 t
(dik, t
x) (A5) (A6)
We identified risky locations where the CRP is higher than a specific upper confidence interval, CI CRP (0,1) , of the predicted safety measure on the target year. As described in Equation A7 and Equation A8, the objective is to find hotspots where their CRP values are higher than the threshold, 1 (CI ) k (x ) , and the summation of their length is close to but not higher than B . Here, µk (x ) + and k (x ) are the CDF and the over-dispersed 8
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
standard deviation of the presumed distribution of annual crash frequency per kilometer at x on corridor k respectively. (A7)
CI CRP = minCI K
k xterm
I (CRP k (x )
s. t . , LCI =
(µk (x ) +
1 (CI )
k (x ))
> 0) dx
B.
(A8)
k k = 1 x init
Since LCI is non-increasing with CI , it is possible to find the unique CI CRP (0,1) by numerical algorithms such as Newton’s method. If you K k k x init ) , the initial value of CI could be reasonably selected as 0.99 in a are interested in the top 1% of total roadway length, i.e., B = 0.01 k = 1 (xterm numerical method. Note that the final CI CRP obtained from the numerical process could deviate from the initial value as much as 0.99. The difference increases with the reporting accuracy (i.e., the variance of ), the minimum site length, and the SPF bias. Appendix B. . SAFETY PERFORMANCE MEASURES Average crash frequency with SPF adjustments (ACF), expected average frequency with EB adjustments (EB), and the excess expected average crash frequency with EB adjustments, which is also known as Potential for Safety Improvements (PSI), are used as safety measures in the present study. As noted in Highway Safety Manual (HSM by AASHTO, 2010) and several studies (Wright et al., 1988; Carlin and Louis, 2008; Hauer et al., 2002a; Meza, 2003), ACF fails to take into account the RTM bias. The SPF-adjusted ACF value of a specific site n associated with endpoints (snk, enk ) located on corridor k can be formulated as:
ACF (snk , enk )
µk (dik, t ) I (snk T µ k (dik, t ) t=1 t
i {1, … , Ctk }, t
dik, t
enk ) =
T # (s k , t=1 t n
enk )
T
,
(B1)
where is the predicted crash frequency [crashes/km-year] at location x on corridor k expected on similar segments in year t during the past T years (t {1, …, T } ), which is found by SPFs; µk (x ) is the predicted crash frequency in the target year; and I ( ) is the indicator function, whose output is one, if the input statement is true, and zero, otherwise. The number of crashes occurred along corridor k in year t is Ctk . The location of each crash, indexed by i , is denoted by dik, t subject to dik, t d k , 1 i < i Ctk for all t . A set of dik, t for all i included in Itk is dtk . i ,t In order to correct the RTM bias, the level of the safety performance of site n on corridor k during the target prediction year can be estimated by the EB method as Equation B2, denoted by EB (snk, enk ). The value is the weighted summation of the predicted crash frequency during the target prediction year, which is the integrated SPF values µk (x ) for all x from snk to enk , and the SPF-adjusted crash frequency of the target segment from the history data. (Estimate of the expected crash frequency for an entity = weight × expected crash frequency estimated by SPF + (1 weight) × count of crashes on this entity) (B2)
µtk (x )
+
1
1 i {1, … , Ctk }, t
1+
T k k t = 1 µt (di, t ) k (d k ) i, t
µk (dik, t ) I T µ k (dik, t ) t=1 t
(snk
dik, t
enk )
.
T k t = 1 µt (x ) k (x )
The weight in Equation B2 at post-kilometer x is defined as 1 +
1
, where
k (x )
is the overdispersion factor estimated per unit
distance [crashes/km]. As can be seen in Equation B2, the more years of data collection, the higher weight of historical data on this site. The sample variance of crash rate is typically observed to be higher than the mean, so a plethora of literature has shown that the crash rates follow other distributions such as Negative Binomial model, Poisson-lognormal model, and Hierarchical Poisson model rather than Poisson distribution (e.g., Huang et al., 2009). As the data is more overdispersed, i.e., sample variance is higher than its mean value, EB (snk, enk ) depends more on the historical data instead of SPF. However, even if we have plenty of historical data, too old data could not be related to the current and future situations, because the site-specific traffic environments and transportation technologies related to safety are rapidly changing. Therefore, decision-makers should pay attention to select the appropriate duration of historical data to identify hotspots for future planning. In practice, T is usually set as three years (e.g., Hauer, 1997; Cheng and Washington, 2005). Finally, PSI is the excess expected frequency defined as the difference between the unbiased expected crash frequency of the target site, EB (snk, enk ) , and the crash frequency expected on similar segments as:
PSI
(snk,
enk )
EB (snk ,
enk )
enk
µk (x ) dx.
(B3)
snk
Appendix C. . CONSISTENCY TEST RESULTS FOR THREE SAFETY MEASURES IN THE DSL METHOD To compare the different safety measures, ACF, EB, and PSI, in the DSL method, both site and method consistency tests are conducted. Figs. C1 and C2 represent the results of the tests when the maximum and minimum hotspot lengths, lmax in Equation 4 and lmin in Equation 5, are 0.40 km and 0.24 km respectively (). The hotspots in 2013, which are identified based on the crash data collected in 2010, 2011, and 2012, are used to calculate SAEs between 2013 and 2014 and between 2013 and 2015 in the site consistency test. The relationship between the summation of the two values of SAEs and the proportion of B to the total network length is presented in Fig. C1. As observed here, the EB estimate shows the best temporal consistency of crash 9
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al.
Fig. C1. Results of the site consistency test for the Dynamic Site Length (DSL) screening method: Summation of absolute errors for the Average Crash Frequency (ACF), Empirical Bayesian (EB), and Potential Safety Improvement (PSI) methods. The x-axis stands for the proportion of the covered length by identified hotspots to the length of the entire network, 183.63 km.
Fig. C2. Results of the method consistency test for the Dynamic Site Length (DSL) screening method: Ratio of overlapped sections by the hotspots identified in 2013, 2014, and 2015 for the Average Crash Frequency (ACF), Empirical Bayesian (EB), and Potential Safety Improvement (PSI) methods.
number for the detected hotspots until the total covered length reaches 1.75% of the total network length. A lower SAE means better temporal consistency. After the point of B at 1.75%, the three safety measures show similar consistency, as the identified hotspot sets overlap. Note that the current practice is investigating the top few hotspots (Kwon et al., 2013; Caltrans, 2002) and even this strict requirement results in excess hotspot locations to create a backlog in investigation, evaluating 1% of the whole network length would be adequate. Within approximately 1% of the entire network (B 2 km), it can be concluded that the EB method is most temporally consistent in the DSL method. The sets of hotspots for years 2013, 2014, and 2015 are used in the method consistency test. The test output stands for the proportion of the length of overlapped sections by the three sets of hotspots, and the results for the three safety measures are plotted in Fig. C2. For those three consecutive years, it is shown that the DSL method based on the EB estimates identified the highest number of reproducible hotspots for different B for up to 1.6% of the entire network. The initial points of the curves start from zero, because the hotspots identified in different years are not the same. Considering the top 1% length of the entire network, it is also reasonable to infer that the EB estimate has relatively high spatial consistency compared to the ACF and PSI methods.
Chimba, D., Sando, T., 2009. The prediction of highway traffic accident injury severity with neuromorphic techniques. Advances in transportation studies 2009 (19), 17–26. Chung, K., Ragland, D.R., 2007. A Method for Generating a Continuous Risk Profile for Highway Collisions. In: Presented at 86th Annual Meeting of the Transportation Research Board. Washington D.C.. Chung, K., Ragland, D., Madanat, S., Oh, S., 2009. The Continuous Risk Profile Approach for the Identification of High Collision Concentration Locations on Congested Highways. Transportation and Traffic Theory 2009: Golden Jubilee. pp. 463–480. Chung, K., Ragland, D., 2019. Discussion of theoretical and practical challenges in developing high collision concentration location detection procedures. International Journal of Urban Sciences 23 (1), 1–15. Davis, G., Yang, S., 2001. Bayesian identification of high-risk intersections for older drivers via Gibbs sampling. Transportation Research Record: Journal of the Transportation Research Board 1746, 84–89. Grembek, O., Kim, K., Kwon, O., Lee, J., Liu, H., Park, M., Washington, S., Ragland, D., Madanat, S., 2012. Experimental Evaluation of the Continuous Risk Profile (CRP) Approach to the Current Caltrans Methodology for High Collision Concentration Location Identification. State of California Department of Transportation. Harwood, D.W., Torbic, D.J., Richard, K.R., Meyer, M.M., 2010. SafetyAnalyst: Software Tools for Safety Management of Specific Highway Sites. Publication FHWA-HRT-10063. Midwest Research Institute, Kansas City, MO. Hauer, E., 1997. Observational Before–After Studies in Road Safety – Estimating the Effect of Highway and Traffic Engineering Measures on Road Safety. Elsevier Science, Inc.,
References AASHTO, 2010. Highway Safety Manual, 1st ed. American Association of State Highway and Transportation Officials, Washington, DC. Abdelwahab, H., Abdel-Aty, M., 2001. Development of artificial neural network models to predict injury severity in traffic accidents at signalized intersections. Transportation Research Record: Journal of the Transportation Research Board 1746, 6–13. California Highway Patrol, Statewide Integrated Traffic Records System, https://www. chp.ca.gov/programs-services/services-information/switrs-internet-statewideintegrated-traffic-records-system, (Accessed 3 March 2019). Caltrans, 2002. Table C Task Force: Summary Report of Task Force’s Findings and Recommendations. California Department of Transportation, Sacramento. Carlin, B., Louis, T., 2008. Bayes and empirical Bayes methods for data analysis. CRC Press. Cheng, W., Washington, S.P., 2005. Experimental Evaluation of Hotspot Identification Methods. Accident Analysis & Prevention 37 (5), 870–881. Cheng, W., Washington, S.P., 2008. New Criteria for Evaluating Methods of Identifying Hot Spots. Transportation Research Record: Journal of the Transportation Research Board 2083, 76–85. Cheng, W., Gill, G.S., Dasu, R., Xie, M., Jia, X., Zhou, J., 2017. Comparison of Multivariate Poisson lognormal spatial and temporal crash models to identify hot spots of intersections based on crash types. Accident Analysis & Prevention 99, 330–341.
10
Accident Analysis and Prevention 135 (2020) 105358
J. Lee, et al. Tarrytown, New York. Hauer, E., Kononov, J., Allery, B., Griffith, M., 2002a. Screening the Road Network for Sites with Promise. Transportation Research Record: Journal of the Transportation Research Board 1784, 27–32. Hauer, E., Harwood, D., Council, F., Griffith, M., 2002b. Estimating Safety by the Empirical Bayes Method: a Tutorial. Transportation Research Record: Journal of the Transportation Research Board 1784, 126–131. Huang, H., Chin, H., Haque, M., 2009. Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots: Naive Ranking, Empirical Bayes, and Full Bayes Methods. Transportation Research Record: Journal of the Transportation Research Board 2103, 32–41. Kwon, O., Park, M., Yeo, H., Chung, K., 2013. Evaluating the Performance of Network Screening Methods for Detecting High Collision Concentration Locations on Highways. Accident Analysis and Prevention 51, 141–149. Lee, J., Chung, K., Kang, S., 2016. Evaluating and Addressing the Effects of Regression to the Mean Phenomenon in Estimating Collision Frequencies on Urban High Collision Concentration Locations. Accident Analysis & Prevention 97, 49–56. Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice 44 (5), 291–305. Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: Methodological frontier and future directions. Analytic methods in accident research 1, 1–22. Medury, A., Grembek, O., 2016. Dynamic Programming-Based Hot Spot Identification Approach for Pedestrian Crashes. Accident Analysis & Prevention 93, 198–206. Meza, J.L., 2003. Empirical Bayes Estimation Smoothing of Relative Risks in Disease Mapping. Journal of Statistical Planning and Inference 112 (1), 43–62. Montella, A., 2010. A comparative analysis of hotspot identification methods. Accident Analysis & Prevention 42 (2), 571–581. Murray, C.J., Lopez, A.D., World Health Organization, 1996. The Global Burden of
Disease: A Comprehensive Assessment of Mortality and Disability from Diseases, Injuries, and Risk Factors in 1990 and Projected to 2020: Summary. Oh, J., Park, D., 2014. Analysis on crash reduction factors for road segment safety. International Journal of Urban Sciences 18 (3), 396–403. Persaud, B., Lan, B., Lyon, C., Bhim, R., 2010. Comparison of empirical Bayes and full Bayes approaches for before–after road safety evaluations. Accident Analysis & Prevention 42 (1), 38–43. Savolainen, P.T., Mannering, F.L., Lord, D., Quddus, M.A., 2011. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. Accident Analysis & Prevention 43 (5), 1666–1676. Tegge, R., Jo, J.-H., Ouyang, Y., 2010. Development and Application of Safety Performance Functions for Illinois. Publication FHWA-ICT-10-066. Illinois Center for Transportation, Urbana, IL. Washington, S., Oh, J., 2006. Bayesian methodology incorporating expert judgment for ranking countermeasure effectiveness under uncertainty: Example applied to at grade railroad crossings in Korea. Accident Analysis & Prevention 38 (2), 234–247. Wright, C., Abbess, C., Jarrett, D., 1988. Estimating the Regression-To-Mean Effect Associated with Road Accident Black Spot Treatment: Towards a More Realistic Approach. Accident Analysis and Prevention 20 (3), 199–214. Wu, L., Zou, Y., Lord, D., 2014. Comparison of sichel and negative binomial models in hot spot identification. Transportation Research Record 2460 (1), 107–116. Zeng, Q., Huang, H., 2014. A stable and optimized neural network model for crash injury severity prediction. Accident Analysis & Prevention 73, 351–358. Zeng, Q., Huang, H., Pei, X., Wong, S.C., Gao, M., 2016. Rule extraction from an optimized neural network for traffic crash frequency modeling. Accident Analysis & Prevention 97, 87–95. Zou, Y., Ash, J.E., Park, B.J., Lord, D., Wu, L., 2018. Empirical Bayes estimates of finite mixture of negative binomial regression models and its application to highway safety. Journal of Applied Statistics 45 (9), 1652–1669.
11