Transportation Research Part C 36 (2013) 72–82
Contents lists available at ScienceDirect
Transportation Research Part C journal homepage: www.elsevier.com/locate/trc
A novel visible network approach for freeway crash analysis Jianjun Wu a, Mohamed Abdel-Aty b, Rongjie Yu b,d,⇑, Ziyou Gao c a
State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL 32816-2450, United States MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing 100044, China d School of Transportation Engineering, Tongji University, 4800 Cao’an Road, 201804 Shanghai, China b c
a r t i c l e
i n f o
Article history: Received 8 March 2013 Received in revised form 24 June 2013 Accepted 12 August 2013
Keywords: Freeway safety Crash data Safety analysis Visible network
a b s t r a c t Freeway crashes have attracted considerable attention in recent years leading to the development of various methodologies to unveil the crash occurrence mechanisms including two general modeling approaches: parametric and non-parametric. In this paper, a novel visible network approach has been proposed to analyze crash characteristics with realtime traffic and weather data. In the suggested model, traffic states prior to crash occurrence have been extracted from real-time data; and crashes are mapped as nodes on the network. Each node contains information for the most hazardous factors relate to crash occurrence selected by random forest algorithm. With the help of transferring technology, links are connected between the nodes according to the state values. Therefore, complete freeway crash evolution networks can be obtained by analyzing one year crash data (including real-time weather and traffic variables) on I-70 in the state of Colorado. Additionally, the suggested method is also used to analyze single- and multi-vehicle crashes separately to identify their distinct characteristics. Compared with the traditional analysis methods, the proposed visible approach has the advantages of easy to be extended, transferred, and applied; easy to identify the effects of the various contributing factors on a traffic crash and to visually inspect the model. Moreover, the crash contributing factors identified in this study is beneficial for designing advanced early-warning and risk assessment systems in the context of real-time highway management. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction With the increase of densities of roadways and vehicles, real-time crash risk analysis has received much attention in recent years. In addition, timely prediction of incidents has also become a critical field in freeway traffic management. To improve the highway safety conditions, researchers have developed various methods to incorporate different types of data and concluded a variety of countermeasures (Yu et al., 2013). In this direction, detailed crash data, e.g., the crash type, the crash location (Sobhani et al., 2013; Abdel-Aty et al., 2007), the weather condition, the road condition, and effective statistical techniques for analysis of crash frequency data would better enable the identification of the crash contributing factors and probabilities. To model the crash characteristics, a wide variety of methods have been employed. The most popular one is the Poisson regression method suggested by Jovanis and Chang (1986) which has been extended after many years since being proposed. In addition, various methods have been developed to solve different issues caused by the crash data (over-dispersion, underdispersion, etc.). These methods include Poisson-Gamma (Oh et al., 2006), generalized estimating equation (Lord and Persaud, 2000), generalized additive (Xie and Zhang, 2008), random-effects Poisson (Johansson, 1996; Wang et al., 2009), ⇑ Corresponding author. E-mail address:
[email protected] (R. Yu). 0968-090X/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.trc.2013.08.005
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
73
bivariate Poisson (Miaou and Lord, 2003; Aguero-Valverde and Jovanis, 2009; N’Guessan, 2010), neural network, Bayesian neural network, and support vector machine (Abdelwahab and Abdel-Aty, 2002; Chang, 2005; Riviere et al., 2006; Xie et al., 2007; Li et al., 2008; Gregoriades and Mouskos, 2013), and so on. More detailed review can be found in Lord and Mannering (2010). In order to analyze the crash characteristics accurately, various types of factors contributing to traffic crashes are considered. Among which the most important ones are weather conditions and traffic variables. Caliendo et al. (2007) used hourly rainfall data and transformed it into a binary indicator of daily pavement surface status (dry and wet). Miaou et al. (2003) also used a surrogate variable to indicate wet pavement conditions. Malyshkina et al. (2009) investigated the effects of precipitation, snowfall amounts and temperature on crashes. In addition, traffic variables always play a vital role in crash occurrence studies. It has been proven that traffic density (Kononov et al., 2011), different traffic flow scenarios (Noland and Quddus, 2004), Annual Average Daily Traffic (Chang and Chen, 2005) were the key determinants for freeway crash frequencies. The results are useful in the analysis of freeway crash frequency. For example, it was found that when some critical traffic density is reached, the crash occurrence likelihood would increase at a faster rate with an increase in traffic density. However, using only weather related variables or aggregated traffic data would lead to the loss of the most valuable information of the pre-crash traffic status. Therefore, to improve the crash frequency analysis, both real-time weather conditions and traffic variables should be considered. Yu et al. (2013) employed a Bayesian inference approach with random effects Poisson models to establish safety performance functions; it was found that the weather condition variables, especially precipitation, play a key role in the crash occurrence models. Although those traditional methods of analyzing crash-frequency data had been applied over the years, all of them have their own limitations. First of all, the abovementioned methods need appropriate hypothetical models with adjustable parameters which are different by datasets. Therefore, these methods could not be easily extended or transferred to be used for another roadway. In addition, they are difficult to visually inspect and complex to carry out in the treatment of huge quantity of real-time crash data. It is difficult for us to identify which situation has a higher probability of crash occurrence under the combined influence factors, such as the combination of crash location, weather, crash time, and road geometrics. In recent years, network analysis technology has become an important tool to discover the complexities in the transportation network. Among them, the flow properties of the transported entities become of primary interest, e.g., traffic flow. Recently, there has been some research on the relationship between network and time series. Zhang and Small (2006) developed an interesting analysis of complex network and pseudoperiodic time series, where different time series corresponded to different network structures. From the perspective of dynamics, scale-free structure has recently been found to spontaneously appear in coupled logistic maps when linking is based on dynamical considerations (Gong and van Leeuwen, 2003) rather than the notion of preferential attachment usually cited in the evolution of scale-free networks. But these results are difficult to be applied in the real data analysis. Li et al. (2007) proposed a network model based on the traffic flow states generated by the Nagel and Schreckengerg’s traffic model and modified comfortable driving traffic model. Wu et al. (2008) built a connection between the chaos time series and complex network in the car following model. Research towards the development of such techniques has particular relevance for a number of reasons. First, in the emerging topic of network complexity, one can view the road crash as a complex system which is affected by many factors. These factors can be obtained from the real-time traffic data. Therefore, the techniques developed in this paper may be viewed as a step on the way toward a novel statistical approach in real-time road crash data analysis. In this study, we attempt to develop a visible network model and apply it to the freeway crash data analysis via mapping to the network from real-time freeway crash data. The network model is based on a simple and fast computational method, the visible graph algorithm which converts the data into a graph. Due to the visible (here means it can be represented by a graph form) and modeling simplicity, the network approach has its own advantages in real-time applications with rapid development. 2. Data description This study focuses on a 15-mile mountainous freeway on I-70 in Colorado. It had been demonstrated that the weather condition, e.g., snow, visibility, has a great influence on the crash frequencies (Ahmed et al., 2011). Yu et al. (2013) employed real-time weather data (visibility, precipitation and temperature), freeway geometric characteristics data and real-time traffic data (speed, volume and lane occupancy) in developing safety performance functions. Our research is based on four datasets as listed below: (1) Dataset D1. This data set represents 1 year of crash data from August 2010 to August 2011 provided by Colorado Department of Transportation (CDOT). The data contains crash time, crash date, crash direction, mile post, etc. A total of 252 crashes were documented within the study period. (2) Dataset D2. The related road segment geometric characteristics data captured from the Roadway Characteristics Inventory (RCI). The 15-mile freeway section was split into 120 homogenous segments (60 in each direction) according to the major segmentation criterion of roadway alignment homogeneity and the Roadway Characteristics Inventory (RCI) data. Both horizontal and vertical alignments were scrutinized. A minimum-length of 0.1 mile was used to avoid the low exposure problem and the large statistical uncertainty of the crash rates in short segments.
74
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
(3) Dataset D3. Real-time weather data recorded by 6 weather stations along the study roadway segment. This data contains temperature, visibility and 1-h precipitation. (4) Dataset D4. Real-time traffic data detected by 30 Remote Traffic Microwave Sensor (RTMS) radars. Fifteen radar detectors were available for each direction to provide speed, volume and lane occupancy information. RTMS data corresponding to each crash case was extracted by a matching algorithm (Yu and Abdel-Aty, 2013). Therefore, we can get average, standard deviation and coefficient of variation of speed, volume and occupancy during the 5-min interval before and after the crash record in dataset D1 as well as the ones in downstream and upstream. 3. Visible network model Li et al. (2007) built a connection between the traffic flow evolution and complex network, in which the traffic flow is represented by a network. In this paper, we develop a tool to analyze the roadway crash data named the visible network. This algorithm offers an interesting method to map a network from the real-time data prior to crash occurrence. The first step of our algorithm is to select the key factors related to crash occurrence from the original data which will be used as inputs in the visible network algorithm. Generally, two typical factors are related to crashes; traffic and weather factors. Besides, geometric design factors are coherent with traffic factors since the geometric changes would reflect on traffic flows; and traffic flow factors are more directly related to crash occurrence. First, in this study, variable selection is used to select the key traffic factors related to crash occurrence which would be utilized as inputs in the visible network algorithm. Additionally, it has been proven that three weather factors, e.g., temperature, visibility and precipitation play an important role in roadway crashes (Yu et al., 2013). Therefore, the selected traffic factors and three weather factors are used here. 3.1. Variable selection There are a lot of variables that contribute to crash occurrence. However, selecting the important variables for crash occurrence is a problem related to the data analysis technology. Recently, random forest (RF) models have been widely used to rank the variable importance in traffic safety studies (Abdel-Aty and Haleem, 2011; Ahmed and Abdel-Aty, 2012). Unlike the classification and regression tree models, random forest models can provide unbiased error estimates and does not require a cross-validation dataset (Breiman, 2000). During the tree growing procedure, about one-third of the data were left out from the training trees, which become the OOB (out-of-bag) data. The OOB data are utilized to achieve unbiased estimate of variable importance as trees are added to the forest. In this paper, we use random forest model to perform the variable selection. The importance of a variable was estimated through the random forest algorithm by monitoring how much the prediction error increases when OOB data for that variable is permuted while all others were left unchanged (Liaw and Wiener, 2002). R package ‘‘randomForest’’ (Liaw and Wiener, 2002) was employed to perform the variable importance ranking; using m = 4 whereas four variables were randomly sampled as candidates for each split and totally 200 trees were constructed. Fig. 1 shows the final results of variable importance rankings with the ‘‘Mean Decrease Gini’’ as the selection criterion. It can be drawn from the figure that DAS (downstream average speed), AS (average speed of crash location), DSDO (downstream standard deviation of occupancy), SDO (standard deviation of occupancy in crash location) and AO (average occupancy of crash
Fig. 1. Variable importance provided by random forest.
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
75
Table 1 Main factors for crash data. Factors variable
Description
Weather variables TEMP FV PPTN
Temperature Forward visibility Precipitation
Traffic variables AS AO DAS SDO DSDO
Average speed of crash location Average occupancy of crash location Downstream average speed Standard deviation of occupancy in crash location Downstream standard deviation of occupancy
Table 2 FV-values for different crash data before and after division. FVoriginal FVdivision FVoriginal
2.7 3 0.1
5.9 3 7.1
0.9 1 1.4
0.5 1 2.7
3.5 3 1.3
1.0 2 2.5
1.3 2 0.6
Division criterion If FVoriginal > 1.5, FVdivision = 3; High visibility
FVdivision
1
3
2
3
2
3
1
If FV original 6 1, FVdivision = 1; Low visibility
If 1 < FV original 6 1:5, FVdivision = 2; Middle visibility
Table 3 Division criterions of other factors for the real traffic data. Factor description
Critical value
Division criterion
TEMP
Vc = 32
PPTN AS AO DAS SDO DSDO
Vc = 0/1 Interval 10 mile in [0, 70] besides the first value 5 Vc = 3/5/10 Interval 10 mile in [0, 70] besides the first value 5 Vc = 3/5/10 Vc = 3/5/10
If TEMPoriginal > 32, TEMPdivision = 2; If TEMP original 6 32, TEMPdivision = 1 Two levels 0, 1 ASdivision = 70, 60, 50, 40, 30, 20, 10, 5 AOdivision = 1, 3, 5, 10 DASdivision = 70, 60, 50, 40, 30, 20, 10, 5 SDOdivision = 1, 3, 5, 10 DSDOdivision = 1, 3, 5, 10
location) are the most important traffic factors. Hence, the five selected variables were used as inputs along with the three weather variables in the modeling steps. Table 1 provides the final result of the variable selection procedure. 3.2. Visible network algorithm 3.2.1. Construct state space States of crash sequence are first built from the data to determine the crash occurrence under all key factors. It can be used to describe the characteristics of each crash data record. In this study, the state set S of freeway crashes contains all the information hidden in the 8 factors. For example, for the first road crash record, the state is defined as S1 = {TEMP1, FV1, PPTN1, AS1, AO1, DAS1, SDO1, DSDO1}. So, the state set S {s1, s2, s3, , sn} includes all the information which have an effect on crash occurrence extracted from the random forest algorithm, where n is the total number of freeway crashes data record. In other words, a state will correspond to the combination of weather condition and traffic variables. 3.2.2. Divide the region for factors value In order to simplify the problem, the 7 factors were integrated and then divided into several regions with critical values Vc. According to the different critical values Vc, different areas can be divided. For example, for the V is data, two critical values are given, Vc1 = 1.5 and Vc2 = 1. If the values of V is before division FVoriginal > 1.5, a value FVdivision = 3 (FVdivision: after division) is given which means high visibility. If 1 < FV original 6 1:5, the middle visibility can be seen with FVdivision = 2. While FVdivision = 1 is given for the rest of the data indicating a low visibility. The V is values of different crash data before and after division can be shown in Table 2. The critical values and division criteria for other factors can be found in Table 3. The values chosen as division criteria based on the following reasons. (1) Vc = 32 is chosen because it means the temperature is zero Celsius. It is the critical value of freezing which is used to differentiate the snow and dry seasons. Previous studies indicate that crashes are more likely to happen during snow seasons compared to dry seasons.
76
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
(2) The splitting criteria for average speed are based on graphic examinations; and the group interval is set to be 10 mph. We have analyzed the relationships between average speed and crash occurrence as shown in Fig. 2. It is clear that the areas with red dot have the obvious high frequency characteristics. Moreover, the critical values stand for speed intervals instead of point average speeds. For example, the value 30 means that average speed is more than 30 mile/h but less than 40 mile/h. According to the previous criteria, average speeds are divided into 8 levels, e.g., 5, 10, 20, 30, 40, 50, 60, 70. Similarly, DAS also have 8 levels. (3) 4 levels of occupancy are given based on the relationship between volume and occupancy as shown in Fig. 3. Therefore, we divide it into 4 parts as 1 6 AOoriginal < 3, 3 6 AOoriginal < 5, 5 6 AOoriginal < 10 and AOoriginal P 10 which correspond to low occupancy, moderate occupancy, high occupancy, and very high occupancy conditions, respectively. And we utilize the threshold values (1, 3, 5, 10) to represent the intervals in Table 3. 3.2.3. Transfer Si to binary digit To find the critical factor combinations contribute to crashes, it is necessary to combine all the factor values of each crash record and determine the relationship between state and traffic crash. Due to the different division criteria, it is easier to be handled when transferring combined values into binary digit chains. In a digit chain, each value of factor k can be described by a binary digit sequence Bk with the length of mk. For keeping consistency of every crash record, fixed length of binary digits were assigned according to the maximum factor values. For example, if the maximum value of ASdivision (ASdivision: the value of AS after division) is 77 in the crash data, and its binary digit is 1001101. Clearly, the maximum length of this binary digit sequence is 7. Therefore, for another record ASdivision = 70, it will be represented as a set B4 = {0111100} with the fixed
Fig. 2. The frequency distribution of average speed.
Fig. 3. The relationship between volume and occupancy.
77
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
B1
B2
B3
B4
B5
10
11
001
0111100
0001
1000110
m 4=7
m 5=4
m6 =7
m1=2 m 2=2 m3 =3
B6
L1 = 101100101111000001100011000010001 M1 = 33
B7
B8
0001
0001
m 7=4
m8=4
π 1 = 6004200000
Fig. 4. Illustration of binary digit sequence Li of the ith crash data.
Step 1
Step 2
Step 3
Step 4
L1
L1
L1
L1
Step 5 L1 L4 L4
L3
L3 L3
L2
L2 New node
L2
L2
Existed node
Link
Fig. 5. Illustration of network generation process.
10 9 8
Link
7 6 5 4 3 2 1
0
50
100
150
200
Node index Fig. 6. The network of crash evolution and its link statistical graph.
length m4 = 7. Clearly, a sequence of binary digits for the factor k is obtained which contains the whole information of crash data. Therefore, the binary digit sequence of state Si can be represented by Li = {B1, B2, B3, , BK} with the total length P Mi ¼ Kk¼1 mk , where i = {1, 2, 3, , n} shows the evolution of traffic crash. The binary digit state Li can be transferred to PM 0 the value pi ¼ m0i¼1 2m 1 Li . In the evolution process of roadway crash, a series L = {L1, L2, L3, , Ln} is obtained. Fig. 4 illustrates the binary digit sequence Li of ith crash data.
3.2.4. Generate the network The visible network mentioned above has two important elements: nodes and links. In the suggested approach, each binary digit Li is regarded as a node in the visible network. Therefore, the first binary digit state L1 corresponds to the first node V1 in the network. For each crash record, we do the following steps: Step 1. Generate nodes. The states which are obtained at different time steps but have the same pi value are deemed as the same state. For each Li, if it is a newborn state, a new node will be added into the original network. If the state Li has existed in the network, there will not be a new node generated for the ith data record. Step 2. Connect links. If a new node is generated, a link between this node and its precursor is created. For the existing node related to the state Li, a link between the existing node and its precursor is created. Here, multiple links are prohibited. Fig. 5 gives an illustration of the network generation process.
78
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
Low crash occurrence frequency region
High crash occurrence frequency region
0 -0.5 -1
log (P(C))
-1.5 -2 -2.5 -3 -3.5 -4 -4.5
0
0.5
1
1.5
2
2.5
log (C) Fig. 7. Principle components graph and crash distribution for all record.
Table 4 Typical crash characteristics with high occurrence frequency. TEMP
FV
PPTN
AS
AO
DAS
SDO
DSDO
632 632 632
61 61 >1.5
0 0 0
40 6 AS < 50 30 6 AS < 40 40 6 AS < 50
63 3 < AO 6 5 63
40 6 DAS < 50 30 6 DAS < 40 40 6 DAS < 50
63 63 63
63 63 63
We can easily check that by means of the present algorithm, the associated graph extracted from a state sequence is always: connected (each node has at least one link in the network) and undirected (the way the algorithm is built, there is no direction defined in the links).
4. Results and discussion 4.1. Total crash analysis Up till now, the network presenting the crash data has been built. The complete evolution network with 171 nodes and 207 links for 252 crash records and its links number statistics are shown in Fig. 6. In the network, each node has different value as shown in Fig. 6 (right), which reflects the different states for the crash data with real-time traffic and weather information. Furthermore, the above network can be re-plotted as shown in Fig. 7 according to the principle components layout offered by Netdraw software (Borgatti, 2002) which can divide the nodes with larger connectivities and smaller connectivities from spatial perspective. Obviously, the nodes 29, 54 and 3 have a great number of links in the network, which means that the frequency of these states is higher than others. Therefore, it can be concluded that most crashes occurred in the states which correspond to nodes 29, 54 and 3. In addition, the degree distribution of node (the degree of a node in a network is the number of links the node has to other nodes) is shown in Fig. 7 (log–log graph). The degree distribution follows approximately a power-law characteristic: few nodes are of high degree, while most nodes are of low degree. It means that few states have a higher crash frequency, while most states exist with a low frequency in the crash network. Moreover, three frequency regions can be divided: low, middle and high crash occurrence. In the red region, there are almost 30% with four states, while in the middle occurrence region, about 50% can be seen for about 20 states. But in the low crash occurrence region, there are many crash states located here with frequency 20%. By analyzing these higher crash occurrence frequency nodes, we provide the combined factor information in Table 4. It is clear that most of the crashes occurred in those combined situation, e.g., TEMP 6 32, FV 6 1, PPTN = 0, 40 6 AS < 50, AO 6 3, 40 6 DAS < 50, SDO 6 3, DSDO 6 3. These numbers are useful to perform the crash risk assessment according to the real-time road and weather conditions. Furthermore, advanced warnings according to the results with real-time data feed can be used for proactive traffic management strategies. In addition, the typical factors’ information for crashes with higher occurrence frequency indicates that weather and realtime traffic conditions have a great influence on the road crashes. Under freezing point, bad visibility conditions combined with other information, the probability of crash occurrence would increase substantially. This can be understood easily be-
79
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
Low crash occurrence frequency region
High crash occurrence frequency region
0
log (P(C))
-1
-2
-3
-4
-5
0
0.5
1
1.5
2
2.5
log (C) Fig. 8. The principle components graph and distribution for single-vehicle crashes.
Low crash occurrence frequency region 0
High crash occurrence frequency region
log (P(C))
-1
-2
-3
-4
-5
0
0.5
1
1.5
2
log (C) Fig. 9. The principle components graph and distribution for multi-vehicle crashes.
cause the bad weather conditions will increase the crash occurrence generally. Besides, these combinations of factors interact with each other and make them more complex. In fact, many microscopic driving behaviors which cannot be detected are hidden in these real-time data. Under the bad weather conditions, drivers would drive more carefully, and the road will be more congested than common conditions. But we can see that for the higher crash occurrence the average speed is almost 40 6 AS < 50 and the average occupancy is below 3. So, the combination of higher average speed (40 6 AS < 50), low traffic occupancy (less than 3) and bad weather condition (low temperature and low visibility) would have large probability of having a crash compared to other situations. This combined condition found with our visible network model will have a great contribution in the crash occurrence. Another interesting finding is that higher crash occurrence does not include precipitation. This means that drivers are more careful through those consistent high precipitation areas, which might indicate that drivers are more careful during these frequent precipitation times and segments. 4.2. Single-vehicle and multi-vehicle crash analysis In this study, single-vehicle (SV) and multi-vehicle (MV) crashes are analyzed with the consideration of differences between the two crash types. The same modeling approach is applied and summarized descriptive statistics of high occurrence frequencies can be found in Figs. 8 and 9. Fig. 8 presents the graph of crash evolution and crash distributions for SV crashes. It is found that the high occurrence crash states are the nodes 8 and 32. By analyzing the states of 8 and 32, it can be seen that single vehicle crashes are more likely to occur in two cases: one is in the state of bad weather conditions with high average traffic occupancy and another is
80
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
Table 5 Combined factors values of SV and MV.
SV MV
TEMP
FV
PPTN
AS
AO
DAS
SDO
DSDO
632 632 632
>1.5 61 61
0 0 0
40 6 AS < 50 40 6 AS < 50 30 6 AS < 40
3 < AO 6 5 63 3 < AO 6 5
40 6 DAS < 50 40 6 DAS < 50 30 6 DAS < 40
63 63 63
63 63 63
Table 6 Comparisons between visible model and two traditional analysis approaches.
Data
Function form Complexity
Visible Extendibility
Combined critical value Application scope
Parametric model
Non-parametric model
Visible model
Influenced by the characteristics of data, and it cannot be transformed to other data (Lord and Mannering, 2010) Accurate functional form is needed
Larger size of data is needed, but it cannot be transformed to other data (Xie et al., 2007)
Without special requirement for crash data, and it can be generalized to other data Do not need accurate functional form Easy to be carried out, and have high computational effectiveness Graph visual Very easy to be extended because all the data can be represented by a graph Can determine the combined critical values which contribute to the traffic crash The accurate statistical result is not required
The estimation process is easy (Lord and Mannering, 2010), but the complexity will increase with the total number of variables Mathematical methods and estimations It is difficult to be extended because many methods are affected by data pattern
Mainly computed by a ‘‘black-box’’ approach (Lord and Mannering, 2010) The estimation process is complex (Lord and Mannering, 2010), and the solve time will increase with the data size Black-box Easy to be extended because of no data pattern requirement
Cannot determine the combined critical value
Cannot determine the combined critical value
Very accurate statistical result and mathematical model
Accurate statistical result and mathematical model
good visibility and cold weather with high average speed. It is reasonable because in good visibility conditions, the drivers will not pay much attention to their driving behavior (e.g., speeding). This result is different for the case of MV crashes as shown in Fig. 9. We can see that MV crashes are more influenced by the weather and the road traffic conditions. Moreover, the crashes with low speed may be multi-vehicle crash as the downstream average speed is low. Additionally, in bad weather conditions together with high average traffic occupancy, the stop-and-go driving condition would increase the probability of multi-vehicle crashes. The combined factors values of SV and MV have the characteristics shown in Table 5. Compared with SV crashes, the influence of the average speed prior to the crash time and the visibility on MV crashes were significant. 4.3. Comparison with two typical models Table 6 provides the comparison of our visible network method with two typical modeling approaches: parametric model (e.g., Possion regression, Gamma, etc.) and non-parametric model (e.g., neural network, support vector machine, etc.). It can be seen that the main advantages of the visible method are modeling simplicity and visible which are important in engineering applications. 5. Conclusions In this paper, a novel visible network analysis method has been adopted to analyze freeway crashes with real-time data. The traffic and weather conditions for the crashes are represented as nodes and links are connected between the nodes and their precursors. Through the visible network, states with higher crash occurrence probability can be identified. Combinations of factors can be obtained by transferring the states; and conclusions of frequent crash occurrence conditions have been reached. Moreover, different characteristics of SV and MV have been investigated with the same approach. With the application of real crash data, we can get the combined condition of different variables that has great contributions to crash occurrence, e.g., higher average speed (40 6 AS < 50), low traffic occupancy (less than 3) and bad weather condition (low temperature and low visibility). In addition, many crashes happen under low speed traffic flow conditions with bad weather and low downstream average speed. Bad weather conditions together with high average occupancy, the stop-and-go driving condition would increase the probability of multi-vehicle crashes. Finally, for the field application, advanced warnings can be provided for drivers with the real-time crash risk assessment. The proposed method has also the following advantages compared to traditional crash analysis methods: (1) Visible graphs can be provided as result of crash analysis and results can be directly applied in traffic management centers.
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
81
(2) Combinations of crash occurrence contributing factors can be identified with high accuracy. (3) State distributions of crash data can be provided with the network analysis technology. (4) This method can be extended to analyze and identify crash injury severity and crash prone locations. Although the merits and advantages of the proposed methodology are obvious, the model also has some application limitations. First, this visible network approach requires reliable traffic surveillance system to archive real-time traffic flow data. On the other one hand, the values chosen as division criteria are a key challenge which will have some influence on the results. In addition, the complexities of the computation will dependent on the classification of the different level and the values chosen. However, it is clear that the more the classified levels, the more complex in the application. Besides the traffic flow and the weather, it seems that other factors, like the geometric design parameters of the freeway, traffic composition rate, presence of motorcycles, also have significant effects for crash occurrences. Future studies can include these extra factors and analyze their effects on crash occurrence. Acknowledgement We will thank CDOT for providing the data. This paper is partly supported by the National Basic Research Program of China (2012CB725400), NSFC (71210001, 71131001), the Program for New Century Excellent Talents in University (NCET-120764) and FANEDD (201170). References Abdel-Aty, M., Dhindsa, A., Gayah, V., 2007. Considering various ALINEA ramp metering strategies for crash risk mitigation on freeways under congested regime. Transportation Research Part C 15 (2), 113–134. Abdel-Aty, M., Haleem, K., 2011. Analyzing angle crashes at unsignalized intersections using machine learning techniques. Accident Analysis and Prevention 43, 461–470. Ahmed, M., Abdel-Aty, M., 2012. The viability of using automatic vehicle identification data for real-time crash prediction. IEEE Transactions on Intelligent Transportation Systems 13, 459–468. Abdelwahab, H.T., Abdel-Aty, M.A., 2002. Artificial neural networks and logit models for traffic safety analysis of toll plazas. Transportation Research Record 1784, 115–125. Aguero-Valverde, J., Jovanis, P.P., 2009. Bayesian multivariate Poisson log-normal models for crash severity modeling and site ranking. Paper Presented at the 88th Annual Meeting of the Transportation Research Board, Washington, DC. Ahmed, M., Huang, H., Abdel-Aty, M., Guevara, B., 2011. Exploring a Bayesian hierarchical approach for developing safety performance functions for a mountainous freeway. Accident Analysis and Prevention 43, 1581–1589. Borgatti, S., 2002. Net Draw: Graph Visualization Software. Harvard Analytic Technologies, Cambridge. Breiman, L., 2000. Some Infinity Theory for Predictor Ensembles. Tech. Report 579, Dept. of Statist., Univ. of Calif., Berkeley. Caliendo, C., Guida, M., Parisi, A., 2007. A crash-prediction model for multilane roads. Accident Analysis and Prevention 39, 657–670. Chang, L.Y., 2005. Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Safety Science 43 (8), 541–557. Chang, L.Y., Chen, W.C., 2005. Data mining of tree-based models to analyze freeway accident frequency. Journal of Safety Research 36, 365–375. Gong, P., van Leeuwen, C., 2003. Emergence of scale-free network with chaotic units. Physica A 321, 679–688. Gregoriades, A., Mouskos, K.C., 2013. Black spots identification through a Bayesian networks quantification of accident risk index. Transportation Research Part C 28, 28–43. Johansson, P., 1996. Speed limitation and motorway casualties: a time series count data regression approach. Accident Analysis and Prevention 28 (1), 73– 87. Jovanis, P.P., Chang, H.L., 1986. Modeling the relationship of accidents to miles traveled. Transportation Research Record 1068, 42–51. Kononov, J., Lyon, C., et al., 2011. Relating flow speed and density of urban freeways to functional form of an SPF. In: Compendium of Papers CD-ROM, Transportation Research Board 2011 Annual Meeting, Washington, DC. Li, X.G., Gao, Z.Y., Li, K.P., Zhao, X.M., 2007. Relationship between microscopic dynamics in traffic flow and complexity in networks. Physical Review E 76, 016110. Li, X., Lord, D., Zhang, Y., Xie, Y., 2008. Predicting motor vehicle crashes using support vector machine models. Accident Analysis and Prevention 40 (4), 1611–1618. Liaw, A., Wiener, M., 2002. Classification and regression by randomforest. R news 2(3), 18–22.
. Lord, D., Mannering, F., 2010. The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation Research Part A 44, 291–305. Lord, D., Persaud, B.N., 2000. Accident prediction models with and without trend: application of the generalized estimating equations procedure. Transportation Research Record 1717, 102–108. Malyshkina, N., Mannering, F., Tarko, A., 2009. Markov switching negative binomial models: an application to vehicle accident frequencies. Accident Analysis and Prevention 41, 217–226. Miaou, S.-P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus Empirica Bayes. Transportation Research Record 1840, 31–40. Miaou, S.P., Song, J.J., Mallick, B.K., 2003. Roadway traffic crash mapping: a space–time modeling approach. Journal of Transportation and Statistics 6, 33–58. N’Guessan, A., 2010. Analytical existence of solutions to a system of nonlinear equations with application. Journal of Computational and Applied Mathematics 234, 297–304. Noland, R.B., Quddus, M.A., 2004. A spatially disaggregate analysis of road casualties in England. Accident Analysis and Prevention 36 (6), 973–984. Oh, J., Washington, S.P., Nam, D., 2006. Accident prediction model for railway–highway interfaces. Accident Analysis and Prevention 38 (2), 346–356. Riviere, C., Lauret, P., Ramsamy, J.F.M., Page, Y., 2006. A Bayesian neural network approach to estimating the energy equivalent speed. Accident Analysis and Prevention 38 (2), 248–259. Sobhani, A., Young, W., Sarvi, M., 2013. A simulation based approach to assess the safety performance of road locations. Transportation Research Part C 32, 144–158. Wang, C., Quddus, M.A., Ison, S., 2009. The effects of area-wide road speed and curvature on traffic casualties in England. Journal of Transport Geography 17 (5), 385–395. Wu, J.J., Sun, H.J., Gao, Z.Y., 2008. Mapping to complex networks from chaos time series in the car following model. In: The 6th International Conference on Traffic and Transportation Studies, pp. 397–407.
82
J. Wu et al. / Transportation Research Part C 36 (2013) 72–82
Xie, Y., Lord, D., Zhang, Y., 2007. Predicting motor vehicle collisions using Bayesian neural networks: an empirical analysis. Accident Analysis and Prevention 39 (5), 922–933. Xie, Y., Zhang, Y., 2008. Crash frequency analysis with generalized additive models. Transportation Research Record 2061, 39–45. Yu, R.J., Abdel-Aty, M., Ahmed, M., 2013. Bayesian random effect models incorporating real-time weather and traffic data to investigate mountainous freeway hazardous factors. Accident Analysis and Prevention 50, 371–376. Yu, R.J., Abdel-Aty, M., 2013. Utilizing support vector machine in real-time crash risk evaluation. Accident Analysis and Prevention 51, 252–259. Zhang, J., Small, M., 2006. Complex network from pseudoperiodic time series: topology versus dynamics. Physical Review Letters 96, 238701.