Effect of driver’s age and side of impact on crash severity along urban freeways: A mixed logit approach

Effect of driver’s age and side of impact on crash severity along urban freeways: A mixed logit approach

JSR-01087; No of Pages 10 Journal of Safety Research xxx (2013) xxx–xxx Contents lists available at SciVerse ScienceDirect Journal of Safety Researc...

476KB Sizes 12 Downloads 52 Views

JSR-01087; No of Pages 10 Journal of Safety Research xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Journal of Safety Research journal homepage: www.elsevier.com/locate/jsr

3

Kirolos Haleem ⁎, Albert Gan 1

4

Department of Civil and Environmental Engineering, Florida International University, 10555 West Flagler Street, EC 3680, Miami, FL 33174

O

F

2

Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach

1

a r t i c l e

i n f o

a b s t r a c t

Introduction: This study identifies geometric, traffic, environmental, vehicle-related, and driver-related predictors of crash injury severity on urban freeways. Method: The study takes advantage of the mixed logit model’s ability to account for unobserved effects that are difficult to quantify and may affect the model estimation, such as the driver’s reaction at the time of crash. Crashes of 5 years occurring on 89 urban freeway segments throughout the state of Florida in the United States were used. Examples of severity predictors explored include traffic volume, distance of the crash to the nearest ramp, and detailed driver’s age, vehicle types, and sides of impact. To show how the parameter estimates could vary, a binary logit model was compared with the mixed logit model. Results: It was found that the at-fault driver’s age, traffic volume, distance of the crash to the nearest ramp, vehicle type, side of impact, and percentage of trucks significantly influence severity on urban freeways. Additionally, young at-fault drivers were associated with a significant severity risk increase relative to other age groups. It was also observed that some variables in the binary logit model yielded illogic estimates due to ignoring the random variation of the estimation. Since the at-fault driver’s age and side of impact were significant random parameters in the mixed logit model, an in-depth investigation was performed. It was noticed that back, left, and right impacts had the highest risk among middle-aged drivers, followed by young drivers, very young drivers, and finally, old and very old drivers. Conclusions: The study provided a promising approach to screening the predictors before fitting the mixed logit model using the random forest technique. Furthermore, potential countermeasures were proposed to reduce the severity of side impacts due to lane changing, such as devising side crash avoidance systems. © 2013 National Safety Council and Elsevier Ltd. All rights reserved.

P

Article history: Received 7 July 2012 Received in revised form 13 February 2013 Accepted 18 April 2013 Available online xxxx

T

E

D

Keywords: Mixed Logit Severity Random Forest Urban Freeway Driver’s Age Side of Impact

R

E

C

6 7 8 9 10 11 13 12 14 15 Q3 16 17 18 19 20 21

R O

5

44

R

43

1. Introduction

46 47

High-speed roadways such as freeways have continued to be a research focus due to their high correlation with injury severity (Malyshkina & Mannering, 2008; Renski, Khattak, & Council, 1999). Identifying the significant predictors of crash injury severity along these facilities can help to select more effective countermeasures that can better tackle the underlying safety deficiencies. This paper proposes to apply the mixed logit model as a more robust approach to modeling injury severity. The model has been shown to be effective in analyzing injury severity (Anastasopoulos & Mannering, 2011; Gkritza & Mannering, 2008; McFadden & Train, 2000; Milton, Shankar, & Mannering, 2008; Pai, Hwang, & Saleh, 2009). The model is characterized by its capability to account for unobserved predictors, which is a highly desirable property in severity studies due to the difficulty in quantifying some features, such as the driver behavior and reaction at the time of a crash (see Kim, Ulfarsson, Kim, & Shankar, 2011).

50 51 52 53 54 55 56 57 58 59 60

U

48 49

N C O

45

⁎ Corresponding author. Tel.: +1 321 276 7889; fax: +1 305 348 2802. E-mail addresses: khaleemm@fiu.edu (K. Haleem), gana@fiu.edu (A. Gan). 1 Tel.: +1 305 348 3116; fax: +1 305 348 2802.

Q2 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 41

As demonstrated by McFadden and Train (2000) and Train (2009), the mixed logit model, also referred to as the mixed multinomial logit model, the random parameters logit model, and the error-components logit model, is a highly flexible model that can be used to approximate different random utility functions. The model can account for the standard multinomial logit model limitations, while allowing for random variation across the observations and unrestricted substitution patterns, as well as correlation in unobserved features across time. In this model, some parameters are held fixed while others are allowed to be random. The random effects mixed logit model is preferred to the fixed effects model since it allows time-invariant explanatory variables, such as gender and age, to be used as main effects. The application of the mixed logit model in analyzing traffic crash injury severity on urban freeways that typically carry a high number of commuters has not been extensively explored in safety studies. According to the Traffic Safety Facts (National Highway Traffic Safety Administration, 2007), urban fatalities increased by 8% from 1998 to 2007, while rural fatalities decreased by 9% during the same time span; the significance of this increase is highlighted by the fact that, nationwide, Florida was ranked second after California for urban fatalities.

0022-4375/$ – see front matter © 2013 National Safety Council and Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jsr.2013.04.002

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145

C

108 109

E

106 107

R

104 105

R

99 100

O

97 98

C

95 96

N

93 94

U

91 92

F

This section reviews severity studies that applied the mixed logit model, the multinomial logit model, as well as severity analysis on freeways. One of the first studies that applied the mixed logit model for safety analysis is by Gkritza and Mannering (2008). They analyzed the safety belt usage in single- and multi-occupant vehicles using data collected from an observational roadside survey of safety belt use in Indiana. They hypothesized that the mixed logit approach offered more flexibility while capturing individual-specific heterogeneity from roadway characteristics, driver behavior, and vehicle types. The authors found that for single-occupant vehicles, male drivers, truck and van drivers, and those driving in the morning were less likely to use safety belts. For multiple-occupant vehicles, sport utility vehicle (SUV) drivers and drivers along an interstate highway were more likely that both front-seat occupants were restrained, while driving in a van and in the afternoon decreased this likelihood. In another pioneering study, Milton et al. (2008) used the mixed logit model to study the injury severity distribution of crashes on highway segments in Washington. They found that volume-related variables such as average daily traffic, average daily truck traffic, truck percentage, number of interchanges per mile, and weather features such as snowfall, were better modeled as random parameters, while roadway variables, such as the number of horizontal curves, number of grade breaks per mile, and pavement friction, were better modeled as fixed. Afterwards, Pai et al. (2009) examined the characteristics of auto–motorcycle crashes related to gap-acceptance at T-junctions in the United Kingdom, using the British accident injury database from 1991 to 2005. The authors used the mixed logit approach to model the factors contributing to motorists’ right-of-way (ROW) violation. It was concluded that motorcycles’ ROW was more likely to be violated on non-built-up roads and in poor lighting conditions. Several more recent studies can be found in the literature. Kim, Ulfarsson, Shankar, and Mannering (2010) used police-reported crashes from 1997 to 2000 from North Carolina to investigate pedestrian injury severity predictors using the mixed logit approach. They discovered several factors that more than double the pedestrian fatal injury risk, including darkness with no streetlights, trucks, freeway facilities, speeding involvement, and drinking while driving. Malyshkina and Mannering (2010) empirically investigated the effect of highway design exceptions on the frequency and severity of crashes. Design exceptions are defined as some cases for constructing or maintaining highways without meeting the specified guidelines (e.g., for design speed, lane width, shoulder width). Crash data were extracted from the State of Indiana’s crash records for the period of 2003 to 2007. The authors

89 90

O

103

88

R O

2. Literature review

86 87

P

102

84 85

classified the severity into three levels: fatal, injury (possible, evident, and disabling), and property damage only. They mapped these crashes on 35 segments that had design exceptions at bridges, and 13 segments that had design exceptions along the roadways. Malyshkina and Mannering further used the mixed logit model and concluded that crashes in urban areas had a lower probability of injury crashes, whereas high speed limits were associated with higher injury crashes likelihood. Moreover, they found that the current process for granting design exceptions did not have a significant impact on crash frequency or severity. Anastasopoulos and Mannering (2011) compared the fixed and random parameter logit (or mixed logit) models in the context of their data structure using five-year data from interstate highways in the State of Indiana. The authors explored two types of data within each model: detailed crash-specific data and general non-detailed data, including the injury outcome of the crash and some roadway and traffic features. The authors concluded that the mixed logit model using less detailed data could provide an acceptable level of estimation accuracy. Another recent study from Moore, Schneider, Savolainen, and Farzaneh (2011) looked into why the mixed logit model could help avoiding the shortcomings of previously explored severity models (e.g., the multinomial logit model). The authors explained that the main advantages of the mixed logit lie in its ability to relax the independence from irrelevant alternatives (IIA) property, as well as the heterogeneity in parameter estimates across the observations. The authors investigated both the multinomial logit and mixed logit models to identify those geometric, environmental, and crash type characteristics affecting bicyclists’ injury severity at intersection and non-intersection locations. The data were extracted from 2002 to 2008 for bicycleinvolved crashes in Ohio. For crashes occurring at intersection locations, the probability of bicyclist injury severity increases if the bicyclist is not wearing a helmet, if the motorist is under the influence of alcohol, if the motor vehicle involved in the crash is a van, if the motor vehicle strikes the side of the bicycle, and if the crash occurs on a horizontal curve that has a grade. For non-intersection location crashes, the likelihood of bicyclist severity increases if the bicyclist is under the influence of drugs, if the motorist is under the influence of alcohol, if the motor vehicle strikes the side of the bicycle, and if the motor vehicle involved in the crash is a truck. The latest studies found in the literature have included Kim et al. (2011) and Ye and Lord (2011). Kim et al. (2011) developed a mixed logit model of driver injury severity in single-vehicle crashes to explore the effect of driver age on those particular crashes. They used crash data from 2003 to 2004 from the California Highway Patrol data records. The identified factors that increased fatal injury probability included older drivers (65 years or above), male drivers, drunk driving, older drivers driving older vehicles, and darkness with no streetlights. Ye and Lord (2011) compared the three most frequently-used severity models: the multinomial logit, the ordered probit, and the mixed logit models based on the required sample sizes for an effective estimation of the parameters. To achieve this objective, they used a Monte-Carlo approach using simulated and observed crash data. The authors concluded that the mixed logit model required the largest sample size. The recommended sample sizes for the multinomial logit, ordered probit, and mixed logit models are 1,000, 2,000, and 5,000, respectively. From the studies that examined the multinomial logit model, Shankar and Mannering (1996) modeled the injury severity resulting from vehicle/motorcycle crashes using crash data from 1989 to 1994 in Washington. The authors concluded that the helmet usage could reduce injury severity in motorcycle crashes. Carson and Mannering (2001) analyzed the impact of ice-warning signs on crash severity along roadway sections and did not find them to have a clear impact. In another study, Ulfarsson and Mannering (2004) analyzed the difference in male and female injury severity, and concluded that hitting a guardrail could cause an opposite severity effect for both genders. Golob, Recker, and Pavlis (2008) examined the effect of traffic flow

T

101

This study makes use of 5-year crashes (2001 to 2005) that occurred on 89 urban freeway segments throughout the state of Florida in the United States. The primary objective is to help improve selection of countermeasures by identifying effective predictors of injury severity on urban freeways by applying the mixed logit approach to account for unobserved (or latent) effects. A second objective is to find out how the model’s coefficients could vary from both the mixed logit model and the multinomial logit model. The latter model belongs to the same family of the mixed logit model and has been successfully applied in severity studies (Carson & Mannering, 2001). A third objective is to compare the goodness-of-fit from the fitted mixed and multinomial logit models to help in assessing the recommendation of a robust approach for severity analysis on urban freeways. The next section of this paper provides a review of those studies that applied the mixed logit and multinomial logit approaches to model injury severity. This is followed by a description of the data and explored variables, and an overview of the mixed and multinomial logit models. The results from the fitted mixed and multinomial logit models are then presented and compared. The final section concludes with the key findings and provides recommendations for further research.

D

82 83

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

E

2

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

233 234 235 236 237 238 239 240 241 242 243 244 245 246

251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275

C

231 232

E

229 230

R

F

227 228

R

225 226

N C O

223 224

U

221 222

4. Methodology 4.1. Mixed logit framework

0

278 279 280 281 282 283 284 285 286 287 288

290

Traditionally, crash injury severity has been modeled with approaches that assume the effect of each variable stays fixed across the observations (Kim et al., 2010). These approaches could include, but are not limited to, the multinomial logit, the ordered probit, and the nested logit. While these approaches have been applied in many severity studies to highlight the effect of various roadway, traffic, and driver predictors of injury severity, they cannot model the influence of unobserved predictors (such as physical health and driver behavior, as illustrated by Kim et al., 2010). On the other hand, the mixed logit approach is able to do so by allowing the parameter estimates to randomly vary across the observations. In general, a full set of both observed and unobserved variables is considered difficult to collect and some of these unobserved variables can only be gathered via crash-test dummies (Kim et al., 2010). In the absence of the data for unobserved predictors, the mixed logit model becomes an attractive option. According to Hensher and Greene (2003), the mixed logit model is considered the most promising discrete choice model currently available. Also, as described in McFadden and Train (2000), Bhat (2001), and Milton et al. (2008), the mixed logit approach is effective for modeling crash injury severity. This study follows the path of Milton et al. (2008), Pai et al. (2009), and Train (2009). The leading function (Ujn) defining the crash injury severity category j probability (severe or non-severe) on an urban freeway segment n is: U jn ¼ βj X jn þ ε jn

276 277

289

O

In this study, five years of reported statewide police crashes (2001 through 2005) on Florida’s freeways were used. The data were extracted from the Crash Analysis Reporting (CAR) system maintained by the Florida Department of Transportation (FDOT). Crash records were then screened to include only urban freeway crashes. This resulted in a total of 56,727 crashes. This count is significantly more than the minimum desirable number of 5,000 observations suggested by Ye and Lord (2011) for the mixed logit model estimation. The crash records contain those environmental, driver, and vehicle-related features deemed integral to the analysis. The other relevant geometric and traffic features for analysis were obtained from FDOT’s Roadway Characteristics Inventory (RCI) database. The RCI includes roadway-related data such as functional classification, number of lanes, shoulder width, and median types. Both the crash and RCI databases were merged, and the necessary variable features from both databases were appended together. Notably, the RCI divides the roadway into homogeneous sections and subsections of variable lengths, with the section limit defined by any change in the roadway geometric or traffic characteristics. In this study, a total of 89 freeway segments were included, where a segment is representative of a unique roadway ID. For example, Interstate-95 spans many counties in Florida; thus, it has various roadway IDs (segments). As part of the data preparation, some variables were categorized based on their distribution. An example is categorizing the speed limit into low speed limit (less than or equal to 95 km/hr or 60 mph) and

219 220

R O

250

218

P

3. Data and variables settings

216 217

D

249

214 215

high speed limit (greater than 95 km/hr or 60 mph). Another example is categorizing the driver’s age into very young, middle, old, and so on based on previous studies (see Abdel-Aty, Chen, & Schott, 1998). Additionally, possible correlations among the variables (indication of collinearity) were examined, and no significant correlations were found. The variables used in this study, along with their descriptive statistics, are shown in Table 1. This study makes use of some new variables that had not been extensively examined, such as the distance of the crash to the nearest on/off-ramp location. Moreover, a comprehensive list of vehicle types for the at-fault driver (e.g., passenger cars, vans, SUVs, pick-ups, trucks, recreational vehicles “RVs”), driver's age, and sides of impact was investigated.

T

247 248

parameters on severity while collecting traffic data from 8,000 loop detector locations, spaced 0.5 to 0.8 km along California’s freeways. They found that when the road transfers from free flow to congested conditions or vice-versa, there is no effect on injury severity in the left and interior lanes. Most recently, Geedipally, Turner, and Patil (2011) analyzed motorcycle crashes in Texas’s urban and rural areas, using six years of crash data from 2003 to 2008. They developed separate models for urban and rural crashes, and for urban crash injury severity, found that the alcohol, gender, lighting conditions, and existence of horizontal and vertical curves were significant factors. For rural areas, the effects of similar factors were observed, in addition to the rider’s age and crash type. Regarding severity studies on freeways, Shankar, Mannering, and Barfield (1996) analyzed the injury severity of single-vehicle crashes on rural freeways, and the significant features were environmental conditions, highway design, vehicle characteristics, and driver characteristics. Renski et al. (1999) analyzed the effect of speed limit increases on the most severe occupant injury in single-vehicle crashes on North Carolina’s interstate highways. One of their findings is that increasing speed limits from 104.6 to 112.7 km/h did not have a significant impact on severity. Malyshkina and Mannering (2008) investigated the influence of raising the speed limits in 2005 on vehicles’ crash injury severity for Indiana’s interstate highways, using crash data in 2004 (for the before period) and 2006 (for the after period). They found that the raising of speed limits did not have a significant impact on severity. Most recently, Haleem and Gan (2011) compared the crash injury severity along two critical facilities: freeways and arterials. An interesting finding of this investigation was that SUVs and pick-up trucks showed a severity increase on freeways and a decrease on arterials. It can be concluded from the above review that applications using the mixed logit framework to analyze crash injury severity on urban freeways have been relatively limited. The use of the mixed logit model in this study will provide the ability to identify the nonuniform effects of the crash injury severity predictors on the observations. Moreover, this non-uniformity in parameter estimation should be accounted for while proposing countermeasures on urban freeways.

E

212 213

3

291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313

ð1Þ

where:

314 315

βj' Xjn

vector of parameters to be estimated; vector of explored variables (geometric, traffic, environmental, driver-related, and vehicle-related); and random error term that is iid (independently- and identicallydistributed) extreme value.

εjn

As pointed out in Train (2009), βj or εjn cannot be observed. Thus, the probability “Pn(j)” of injury severity j (severe or non-severe) from all injury severity categories J (representing both severe and non-severe injuries) on an urban freeway segment n is the following:  0  exp βj X jn  0  P n ðjÞ ¼ ∑ exp βj X jn

316 317 318 319 320 321 322 323 324 325

ð2Þ

J

326 327

Eq. (2) describes the probability conditioned on βj, which is 328 unknown; thus, the unconditioned choice probability (or the mixed 329

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

4

Table 1 Full Description of Explored Variables and Associated Summary Statistics.

Continuous variables Geometric

t1:10 t1:11 t1:12 t1:13 t1:14

Traffic

Categorical variables Environmental

t1:15

Variable Name

Summary Statistics

Right shoulder width (ft) Left shoulder width (ft) Median width (ft) Ln(section length in ft) Ln(crash distance to the nearest ramp in ft) Ln(AADT) Percentage of trucks

M⁎ = 9.96, SD⁎ = 2.24 M = 7.40, SD = 3.94 M = 44.59, SD = 31.24 M = 6.18, SD = 1.28 M = 2.56, SD = 2.98

Lighting condition

F⁎(daylight) = 38,726(68.27%); F(dusk) = 1,217(2.14%); F(dawn) = 936(1.65%); F(dark street light) = 11,788(20.78%); F(dark no street light) = 4,060(7.16%) F(cloudy) = 13,267(23.39%); F(rainy) = 8,362(14.74%); F(foggy) = 201(0.35%); F(other) = 146(0.26%); F(clear) = 34,751(61.26%) F(≤95 km/hr) = 24,781(43.68%); F(>95 km/hr) = 31,946(56.32%) F(morning peak) = 10,315(18.18%); F(morning off-peak) = 12,509(22.05%); F(afternoon peak) = 14,988(26.43%); F(night/dawn off-peak) = 18,915(33.34%) F(blacktop) = 11,668(20.57%); F(other) = 690(1.22%); F(gravel/stone) = 44,369(78.21%) F(wet/slippery) = 4,441(7.83%); F(dry) = 52,286(92.17%) F(very young: 15 b = age b = 19) = 4,405(7.77%); F(young: 20 b = age b = 24) = 10,028(17.67%); F(middle: 25 b = age b = 64) = 39,408(69.47%); F(old: 65 b = age b = 79) = 2,388(4.21%); F(very old: age > = 80) = 498(0.88%) F(alcohol involved) = 3,124(5.51%); F(drug) = 175(0.31%); F(alcohol and drug) = 175(0.31%); F(no alcohol or drug) = 53,253(93.87%) F(vans) = 3,628(6.40%); F(SUVs and pick-ups) = 10,048(17.71%); F(trucks) = 5,487(9.67%); F(RVs) = 73(0.13%); F(buses) = 182(0.32%); F(motorcycles) = 878(1.55%); F(autos) = 35,895(63.28%); F(other) = 536(0.94%) F(back-side) = 6,084(10.73%); F(left) = 3,526(6.22%); F(right) = 3,524(6.21%); F(top) = 334(0.59%); F(front) = 39,320(69.31%); F(other) = 3,939(6.94%)

M = 11.70, SD = 0.52 M = 9.44, SD = 4.40

F

Variable Characteristic

t1:4 t1:5 t1:6 t1:7 t1:8 t1:9

Weather condition Traffic

Speed limit Hour of crash

t1:18 t1:19 t1:20

Geometric Driver-related

Road surface type Road surface condition At-fault driver’s age

Vehicle-related

Alcohol involvement of the at-fault driver At-fault driver’s vehicle type

t1:21 t1:22 t1:23

O

t1:16 t1:17

D

At-fault driver’s crash impact

R O

t1:3

P

t1:1 t1:2

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

Node Purity Value. ⁎ M = Mean, SD = Standard deviation, and F = Crash frequency for each involved level (italicized percentages in parentheses).

330

logit probability) is the integral of Pn(j) over all possible values of βj, as follows:

T

C

331

E

t1:24 t1:25

R

E

 0  exp βj X jn  0 f ðβ=θÞdβ P jn ¼ ∫ ∑ exp βj X jn J

339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357

R

O

C

337 338

N

335 336

where f(β/θ) is the density function of β and θ is the vector of parameters for the assumed distribution (e.g., the mean and the variance for the normal distribution). For example, if f(β/θ) has a normal distribution, then β ~ N (b, W), with b and W denoting the components of the θ vector. From Eq. (3), β can now account for segment-specific variations of the effect of X on injury probabilities. The mixed logit probabilities are a weighted average for different values of β across freeway segments, where some components are held fixed and some are randomlydistributed. For the random parameters, the mixed logit weights are determined by the density function f(β/θ). The normal distribution of the random parameters is widely used and has been shown to be superior over the uniform, log-normal, and triangular distributions (see Ben-Akiva & Bolduc, 1996; Gkritza & Mannering, 2008; Mehndiratta, 1996; Milton et al., 2008; Moore et al., 2011; Revelt & Train, 1998). However, in this study, each of the normal and uniform distributions was explored so as to select the best distribution that would yield plausible parameter estimates. The mixed logit model requires that a segment included in the analysis has at least one observation in each of the injury severity levels. The initial attempt in the analysis was to include all five levels of injury severity (i.e., property damage only, possible injury, non-incapacitating injury, incapacitating injury, and fatal injury). However, with five injury severity levels, the number of eligible segments, despite with the relatively large dataset, was found to be reduced significantly. Consequently,

U

332 333 334

ð3Þ

the five injury severity levels were aggregated into the binary severe and non-severe levels and the binary response was used. Obviously, it was much easier for a segment to have observations in each of two levels than in each of five. Note that the severe level in this study includes incapacitating and fatal injuries, whereas the non-severe includes property damage only, possible injury, and non-incapacitating injury. In total, there were 91.11% non-severe injuries and 8.89% severe injuries. The parameter estimates of a mixed logit model are computed via the simulated maximum likelihood simulation. This is because the standard maximum likelihood estimation is computationally overwhelming due to the complex numerical integration of the logit formula over the distribution of the random parameters. The simulation maximum likelihood estimation is done using Halton draws (usually 200 draws, as used in this study), and this number has proven to provide efficient parameter estimates (see Anastasopoulos & Mannering, 2011; Gkritza & Mannering, 2008; Milton et al., 2008; Pai et al., 2009). In this study, the LIMDEP software package (Econometric Software, Inc.) is used to estimate the mixed logit model. Further, a 5% significance level is used to test the statistical significance of variables in the mixed logit model.

358

4.2. Multinomial logit framework

378

Crash injury severity is inherently ordered; thus, the use of the ordered approaches for modeling (e.g., the ordered probit and the ordered logit) is relevant in most severity studies. However, some other studies have applied the multinomial logit approach, and their reasoning is documented in Washington, Karlaftis, and Mannering (2003), Savolainen and Mannering (2007), and Malyshkina and Mannering (2010). As mentioned, two issues related to ordered probability models might arise. The first issue is the under-reporting of noninjury crash level that could yield biased and inconsistent parameter

379 380

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377

381 382 383 384 385 386 387

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

403 404 405 406 407 408 409 410 411 412 413 414 415 416 417

πj log πJ where:

420

α β

423

j ¼ 1; 2; …; J−1

intercept to be estimated for each of the “J-1” models; vector of parameter estimates for each of the “J-1” models; and vector of explored variables.

x

428 429 430

The possible number of equations is “J-1” and the number of parameters to be estimated is “(J-1) * (k + 1)”, while assuming k variables (excluding the intercept). Note that the parameters are estimated via the standard maximum likelihood estimation method. The probability of all categories except for the baseline within the response y is estimated as follows: πj ¼

expðβxÞ ; J−1 X 1þ expðβxÞ

j ¼ 1; 2; −−−; J−1

ð5Þ

U

j¼1

431 432 433

N C O

427

R

424 425 426

ð4Þ

R

421 422

¼ α j þ βj x;

E

418 419

!

The probability of the baseline category “J” is estimated as: 1

πJ ¼ 1þ

J−1 X

ð6Þ

expðβxÞ

j¼1

434 435 436 437 438 439 440 441

F

401 402

5. Model estimation

In this study, the response has only two categories (i.e., J = 2), denoting severe and non-severe injuries. Therefore, the multinomial logit model converges to a binary logit model. The binary logit model is estimated using the SAS package (2002). Like for the mixed logit model, a 5% significance level is also used to test the statistical significance of the variables in the binary logit model.

O

399 400

443 444 445 446 447 448 449 450 451 452 453 454 455 456

5.1. Mixed logit model results

457

A major issue while fitting the mixed logit model is the initial determination of the random parameters (see Moore et al., 2011). In fact, the process of determining the random parameters is not feasible. Many trials would have to be performed until a significant set of parameters is estimated. Some studies (e.g., Moore et al., 2011) recommended starting with all possible variables, then reducing one-at-a-time; however, this is not achievable in some cases, especially for a relatively large number of observations. Rather, for this purpose, an efficient technique to screen the severity predictors before fitting a model is needed. A robust data mining method efficiently used in previous studies for variable selection is the random forest technique (Haleem et al., 2010; Harb et al., 2009), which is implemented in this study to help compile a reduced list of predictors before fitting the mixed logit model. This approach, having demonstrated its capability through successful use herein, can be recommended before fitting a mixed logit model to facilitate the modeling procedure. The variable importance ranking from the random forest technique is shown in Fig. 1. A total of 50 trees were used to grow the forest, and this number was deemed sufficient to yield reliable results. Using the node purity measure (an indication of how pure or important the variable is), the explored variables were ranked in descending order from the most to the least important. Using a cut-off value of 200, nine variables were chosen to enter the model; in order, these variables are the logarithm of the crash distance, logarithm of section length, logarithm of AADT, vehicle type, trucks percentage, median width, crash impact, crash time, and driver’s age. The fitted mixed logit model for urban freeways using 200 simulated Halton draws is shown in Table 2. Goodness-of-fit statistics for the model, including log-likelihood at convergence, log-likelihood at zero, Pseudo R2, and Akaike information criterion (AIC) are also shown in the same table. The normal and uniform distributions of the random parameters were examined, and the normal distribution was shown to yield more plausible estimates. The parameters found random were those that yielded statistically significant standard deviations for the normal distribution (see Milton et al., 2008; Train, 2009). On the other hand, if their estimated standard deviations were not statistically different from zero, the parameters were held fixed across the freeway segments. The back, left, and rightside impacts, as well as very young at-fault driver’s age, were found to be random parameters in the model. Note that the last column in Table 2 shows the elasticities or marginal effects of severe injury probability. The marginal effects are the partial derivatives of the probability of crash injury severity with respect to the vector of independent variables (Zhang, 2010). The marginal effects also depict the effect of change in a certain independent variable on the probability of an injury severity level. For continuous

458 459

R O

397 398

Since the random forest is used in this study as a variable screening technique before fitting the mixed logit model (as will be shown later), a brief overview is instructive. Random forest was introduced by Breiman (2001), and is considered one of the pioneering machine learning techniques for screening exogeneous variables (Abdel-Aty & Haleem, 2011; Haleem, Abdel-Aty, & Santos, 2010; Harb, Yan, Radwan, & Su, 2009). In this technique, a number of trees are grown by randomly selecting some observations from the original dataset with replacement, then searching over a randomly selected subset of variables at each split till the variable importance is ranked (Haleem et al., 2010). The application of the random forest with datasets having binary responses (as is used in this study) has been successful and can be viewed in Harb et al. (2009) and Sparks (2009).

P

395 396

442

D

394

4.3. Random forest technique

E

392 393

T

390 391

estimates, but this is not the case for unordered models (such as the mixed and multinomial logit models). According to Yamamoto, Hashiji, and Shankar (2008), underreporting cannot impact the estimation of mixed and multinomial logit models (i.e., non-ordered models). The reason is that these models are structurally flexible, where independent variables are not forced to be the same across all severity levels. This is not the case for ordered models. The structural flexibility is relevant as it allows different sets of independent variables to be selected as significant predictors of different injury severity levels. This flexibility in the distribution of injury severity levels in the model structure of mixed and multinomial logit models could thus yield unbiased and consistent parameter estimates. On the other hand, the lack of flexibility in the injury severity distribution of ordered probability models could result in biased estimates. The second issue with ordered models is that the ordered models restrict the variables’ effects, such that a variable that increases the likelihood of the most (least) severity level would also decrease the likelihood of the least (most) severity level, but unordered models do not impose this restriction. For comparisons with mixed logit model, the multinomial logit model is also implemented in this study. As shown in Haleem (2009) (originally documented in Agresti, 2007), for the multinomial model, the categories’ count of the response variable follows a multinomial distribution. This approach is mainly used to model nominal responses when the order of the categories is not of concern. Assuming j = 1, 2, 3, …, J, where J denotes the number of categories for the response y (crash injury severity). Also, let {π1, …, πJ} denote the response categories’ probabilities, satisfying ∑jπj = 1. The multinomial logit model then pairs each of the response categories with a baseline category. Assuming that the last category “J” is the baseline, the possible “J-1” logit models are the following:

C

388 389

5

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503

6

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

effect represents the percentage the variable is associated with severe 507 injury occurrence. 508 5.1.1. Random parameters interpretation The back-side crash impact is normally distributed with a mean of -0.151 and standard deviation of 0.426. With this, one can calculate the probability that the distribution is less than zero. Since the distribution follows a Z-distribution, the following equation applies: Z¼

0−mean parameter estimate st:deviation of estimate

509 510 511 512 513

ð7Þ 514 515

R O

O

F

0−ð−0:151Þ Thus, Z ¼ ¼ 0:35 0:426 From the Z tables, at a Z-value of 0.35, the corresponding probability is 63.68%. This means that 63.68% of the distribution is less than 0 and 36.32% is greater than 0. In other words, in 63.68% of the segments, back-side crashes are associated with a lesser severe injury probability. Hence, the back impacts are not as severe compared to those of the other impact types. In addition, the severe injury elasticity is -0.104%. This implies that back-side impacts are 0.104% less likely to be associated with severe injuries compared to the other impacts. The left-side crash impact is normally distributed with a mean of 0.158 and a standard deviation of 0.605. This means that 39.74% of the distribution is less than 0 and 60.26% is greater than 0. This implies that, in a large portion of the freeway segments (60.26%), left-side crash impacts are associated with a higher severe injury probability. This might be explained by the fact that left impacts occur on the driver’s side; hence, its impact highly influences crash severity. The last type of impact, right-side crash impact, is normally distributed with a mean of 0.039 and a standard deviation of 0.644. Given these estimates, 47.61% of the distribution is less than 0 and 52.39% is greater than 0. This means that in nearly more than half of the segments (52.39%), right-side crash impacts are associated with a higher severe injury probability, possibly due to passengers’ vulnerability to right-side impacts wherein the vehicle’s door cannot absorb the force of the crash. Finally, very young at-fault driver’s age is normally distributed with a mean of -0.294 and a standard deviation of 0.466. Given these estimates, 73.57% of the distribution is less than 0 and 26.43% is greater than 0. This implies that in 73.57% of the freeway segments, very young drivers are 0.142% less likely to receive severe injuries when compared to other age groups. This conforms to the findings of Haleem and Abdel-Aty (2010) and is mainly due to their stronger physical conditions.

504

variables (e.g., Ln(AADT)), the marginal effect measures the influence of a unit change in an independent variable on the probability of severe injury occurrence. For categorical or dummy variables, the marginal

t2:1 t2:2

Table 2 Mixed Logit Model Estimates.

t2:8 t2:9 t2:10 t2:11 t2:12 t2:13 t2:14 t2:15 t2:16 t2:17 t2:18 t2:19 t2:20 t2:21 t2:22 t2:23 t2:24 t2:25 t2:26 t2:27

C

R

E

Severe injury elasticity (%)

R

t2:7

P-value

O

t2:6

Standard Error

Random Parameters (standard deviation of parameters’ normal distribution in parentheses) Impact: Back -0.151 0.046 0.001 -0.104 (0.426) (0.209) (0.041) Left 0.158 0.054 0.003 0.005 (0.605) (0.205) (0.003) Right 0.039 0.056 0.480 -0.037 (0.644) (0.184) (0.000) At-fault driver age: -0.294 0.054 0.000 -0.142 Very young (15 ≤ age ≤ 19) (0.466) (0.238) (0.050) Fixed Parameters Intercept for severe injury Ln(AADT) Ln(crash distance to the nearest ramp) Median width Percentage of trucks Crash hour: Morning off-peak (11:01 -15:00) Afternoon peak (15:01-19:00) Vehicle type: Trucks Buses Motorcycles At-fault driver age: Young (20 b age b 24) Ln(section length) Number of observations Log-likelihood at convergence Log-likelihood at fitting the intercept McFadden’s Pseudo R2 AIC

C

t2:5

Simulated Maximum Likelihood Estimate

2.992 -0.630 0.558

0.038 0.024 0.007

0.000 0.000 0.000

— -35.778 5.288

0.004 0.016 -0.284

0.000 0.002 0.035

0.000 0.000 0.000

1.081 0.781 -0.277

-0.259 -0.344 2.423 1.710 0.078

0.032 0.048 0.178 0.089 0.034

0.000 0.000 0.000 0.000 0.024

-0.299 -0.154 0.066 0.220 0.071

-0.710 56,727 -9,724.88 -17,012.55

0.013

0.000

-21.082

N

t2:4

Variable Description

U

t2:3

T

505 506

E

Fig. 1. Variable Importance Ranking Using the Random Forest Technique.

D

P

Node Purity Value

0.43 19,489

5.1.2. Fixed parameters interpretation For the fixed parameters, increasing the logarithm of AADT significantly reduces the probability of crash injury severity. This result is widely deduced from other severity studies (e.g., Haleem & Abdel-Aty, 2010; Klop & Khattak, 1999; Milton et al., 2008). A possible explanation is the speed reduction at relatively high AADTs, causing injury severity to decline. Moreover, the severe injury elasticity is -35.778%. This implies that increasing the logarithm of AADT of AADT by unity reduces the severe injury probability by around 36%. A new finding from this study is that increasing the logarithm of the distance of the crash to the nearest on/off-ramp location significantly increases the severe injury likelihood. This shows that the likelihood of occurrence of severe injury nearby ramp locations is relatively lower since drivers are often more attentive at merging and diverging areas, and tend to reduce their speeds. On the other hand, away from ramp junctions, drivers are usually speeding at mid segments and might change lanes at high speeds, which could explain the high severity risk. This conforms to Xie, Zhang, and Liang (2009) who also observed lower injury severity risk near junctions. Increasing the trucks percentage also significantly increases the severe injury probability on urban freeways, which agrees with the

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618

90 80 70 60

Speed limit > 95 km/hr

60.95

50 40

39.05

59.92 40.08

Speed limit <= 95 km/hr

63.16

36.84

80

62.44

20

20 10 0 Young

Middle

A multinomial logit model (or more specifically binary logit model) for urban freeways has been fitted, and Table 4 only shows the estimated coefficients of those significant parameters in the mixed logit model per Table 2. The goodness-of-fit statistics, such as log-likelihood at convergence, log-likelihood at zero, Pseudo R2, and AIC, are also displayed in Table 4. The odds ratio of the fitted parameters is displayed as well. For the purpose of the study, this model will be compared to the mixed logit approach in order to recommend the best model, based on the performance of the observed parameters and the goodness-of-fit statistics. From Table 4, it is strange to observe the positive coefficient for the morning peak period, which can be explained by the fact that the binary logit model holds all the parameters as fixed in the estimation procedure and ignores the random effect, thereby causing illogic parameter estimates in some situations. Of all the investigated vehicle types, buses and motorcycles are shown to have the highest risk of severe injury compared to autos (around five times riskier), which concurs with the mixed logit model’s finding. Front-side impacts of the at-fault vehicle have a higher severe injury risk (10% riskier) compared to back impacts. This might be interpreted that front impacts of at-fault vehicles are mostly rear-end collisions, and rear-end crashes are considered severe at high speeds (see Khattak, 2001). Very young at-fault drivers experience the highest significant reduction in the severe injury likelihood compared to the very old. This shows that the weak physical condition of old drivers may contribute to their lower ability to sustain severe injuries (also shown by Abdel-Aty et al., 1998). The increase in the median width has almost no noticeable impact on the severe injury likelihood. It is strange to observe the negative coefficient for truck percentage, might be because the binary logit model holds all the parameters as fixed.

642

6. Model comparison and study applications

672

By comparing the AIC, Pseudo R 2, and the log-likelihood values in the mixed and binary logit models from Tables 2 and 4, it is obvious that the mixed logit model fits the data better (has lower AIC and higher Pseudo R2 and log-likelihood at convergence). Also, to compare the mixed and the binary (or standard) logit models, a likelihood ratio test (LRT) is performed (see Kim et al., 2010; The University of California at Berkeley, 2000). The LRT follows a chi-square distribution, and is estimated by subtracting the deviance “-2 log-likelihood” from

673

O

F

641

621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640

643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671

37.56

30

Very young

5.2. Multinomial logit model results

R O

589 590

P

587 588

619 620

D

585 586

and 50.8% is greater than zero. This shows that in slightly more than half of the freeway segments, dusk is associated with an increase in the severe injury likelihood among very young drivers compared to other lighting conditions. Similar to the dusk lighting conditions, dark conditions with and without street lights are also associated with an increase in the severe injury likelihood among very young drivers. This is possibly because teenage drivers are more likely to drive at dark compared to other age groups, and are thus more severity prone. It is interesting to find that increasing the inside shoulder width increases the probability of crash injury severity among very young drivers. The random variation of the dark with street light condition in the young model reveals that 39.36% of the distribution is less than zero and 60.64% is greater than zero. This is similar to the very young model’s finding and shows that dark lighting conditions negatively impact the severe injury likelihood among very young and young drivers. Dawn lighting conditions are associated with a reduction in the severe injury likelihood among young drivers, possibly since they are more attentive in the early morning. A similar finding is observable from the back-side impacts, which shows that back impacts on young drivers are not as severe as other impact types.

E

583 584

T

581 582

C

579 580

E

577 578

R

575 576

R

574

N C O

572 573

U

570 571

finding of Das and Abdel-Aty (2010). The fact that trucks could hinder vision and block horizontal sight distance could encourage the following drivers to make risky lane change maneuvers. The increase in median width by one foot (0.33 m) increases the severe injury probability by around 1%. This is possibly due to the high collision risk between parked and traveled vehicles with high speeds for relatively narrow medians. In the afternoon peak period, the severe injury likelihood is significantly reduced, possibly due to the higher AADTs at the afternoon peak. Similarly, there is a reduction in the probability of severe injury at the morning off-peak, albeit having relatively low AADT (could be because drivers are more attentive and focused in the morning). As anticipated, buses and motorcycles significantly increase the severe injury risk, which can be explained by the fact that bus occupants have an additional risk of colliding with each other; thus, increasing the risk of severe injuries, which was also discussed in the report by the National Transportation Safety Board (1999). For motorcycles, it is high-speed crashes without a helmet that could contribute to an increased severe injury risk (see Ouellet & Kasantikul, 2006). It is interesting to find a reduction in the severe injury likelihood for trucks as compared to other types, might be due to truck drivers’ attentiveness and experience since they are accustomed to driving on interstate freeways for long periods of time and in inclement conditions. Young at-fault drivers are associated with a significant increase in the severe injury risk relative to other groups, possibly because they may exhibit carelessness and lack of driving skills. This conforms to the study by McGwin and Brown (1999). To validate this finding, an investigation was conducted on the relationship between speed and different age groups, and the results are shown in Fig. 2. This figure shows that old and very old drivers have a higher risk of severe injuries (compared to their younger counterparts) on freeway segments with speed limits greater than 95 km/hr. This could be attributed to the weaker physical condition of old drivers. On the other hand, for freeway segments with speed limits less than or equal to 95 km/hr, Fig. 2 shows that young drivers have the highest severe injury risk and old drivers encounter the lowest. This could explain the lack of driving inexperience of young drivers and their tendency to take risky maneuvers even at relatively low speeds limits. Lastly, as freeway homogeneity spreads over a relatively large distance, the severe injury probability is significantly reduced. This shows that changes in the freeway cross-section could negatively impact severity. From Table 2, it can be deduced that very young and young are the two significant at-fault age groups in the model. Thus, separate mixed logit models have been estimated for each to uncover more insightful severity predictors affecting each group, as shown in Table 3. The common variables in Tables 2 and 3 have the same signs, and the following interpretation only highlights newly-interesting findings. The random variation of the dusk lighting condition in the very young model reveals that 49.2% of the distribution is less than zero

Percentage of Severe Injuries

568 569

7

Old

Very old

Age Group Fig. 2. Distribution of Severe Injury Proportion for Different Age Groups by Speed Limit.

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

674 675 676 677 678 679 680

8

t3:19 t3:20 t3:21 t3:22 t3:23 t3:24

681

690 691

both models, where the degrees of freedom are the difference between the models’ number of estimable parameters. LRT is equal to -2 [-14,602.51-(-9,724.88)] =9,755.26 with 18 degrees of freedom. The result indicates that the standard logit model can be rejected relative to the mixed logit with over 99.99% confidence. For this, the mixed logit model is superior to the standard logit model in terms of the parameters’ interpretation, as well as the goodness-of-fit. The random variation of the estimates across the observations for the mixed logit model makes it suitable for accurately modeling crash injury severity on urban freeways, as well as identifying significant predictors.

t4:1 t4:2

Table 4 Binary Logit Model Estimates.

P-value

0.192 (0.458)

0.118 (0.015)

0.275 0.153 0.052 0.002

0.000 0.000 0.000 0.000

0.671

0.000

0.270

0.026

0.244 0.098

0.006 0.000

N/S⁎ 0.300 (1.105) 3.496 -0.814 0.632 0.010 N/S -2.473 N/A⁎⁎ 0.600 N/S N/S -0.663 -1.054 -297.52 -1422.33 0.79 615.04

In addition to the LRT, to test whether a significant difference exists between the standard errors associated with the common significant variables in both the mixed and binary logit models in Tables 2 and 4, a paired t-test for dependence is performed. It is concluded that the standard errors for the same variables in the mixed logit model are significantly lower than those in the binary logit model (P-value = 0.038). This implies that the mixed logit model is preferred to the binary logit model. Since back, left and right-side crash impacts, as well as very young at-fault drivers were significant random parameters in the mixed logit model in Table 2, an in-depth investigation was performed to examine

T

C

E

R

688 689

R

686 687

Standard Error

P

⁎ N/S means not significant ⁎⁎ N/A means not applicable.

684 685

Simulated Maximum Likelihood Estimate

Random Parameters (standard deviation of parameters’ normal distribution in parentheses) Dusk lighting condition 0.095 (4.459) 1.957 (1.795) 0.961 (0.013) Dark (street light) lighting condition N/S Fixed Parameters Intercept for severe injury 4.387 1.048 0.000 Ln(AADT) N/S Ln(crash distance to the nearest ramp) 0.803 0.202 0.000 Median width N/S Inside shoulder width 0.191 0.090 0.035 Lighting: Dawn N/S Dark (street light) 1.465 0.684 0.032 Dark (no street light) 5.826 1.683 0.000 Crash hour: Afternoon peak (15:01-19:00) -1.759 0.713 0.013 Night/dawn off-peak (19:01-24:00) & -2.383 0.917 0.009 (24:01-7:00) Crash impact: Back N/S Ln(section length) -0.384 0.163 0.019 Log-likelihood at convergence -29.88 -188.53 Log-likelihood at fitting the intercept McFadden’s Pseudo R2 0.84 AIC 79.76

t3:25

682 683

P-value

O

t3:5 t3:6 t3:7 t3:8 t3:9 t3:10 t3:11 t3:12 t3:13 t3:14 t3:15 t3:16 t3:17 t3:18

Young Age Group Model

Standard Error

F

Very Young Age Group Model Simulated Maximum Likelihood Estimate

R O

Variable Description

t3:4

D

t3:3

Table 3 Mixed Logit Model Estimates for Very Young and Young Age Groups.

E

t3:1 t3:2

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

Variable Description

t4:4 t4:5 t4:6 t4:7 t4:8 t4:9 t4:10 t4:11 t4:12 t4:13 t4:14 t4:15 t4:16 t4:17 t4:18 t4:19 t4:20 t4:21 t4:22 t4:23 t4:24 t4:25 t4:26 t4:27 t4:28 t4:29

Intercept for severe injury Ln(AADT) Ln(crash distance to the nearest ramp) Crash hour: Morning peak (7:01-11:00) Morning off-peak (11:01 -15:00) Afternoon peak (15:01-19:00) Night/dawn off-peak (19:01-24:00) & (24:01-7:00) Vehicle type: Trucks Buses Motorcycles Autos Impact: Back Left Right Front At-fault driver age: Very young (15 b age b 19) Young (20 b age b 24) Very old (age ≥ 80) Ln(section length) Median width Percentage of trucks Number of observations Log-likelihood at convergence Log-likelihood at fitting the intercept McFadden’s Pseudo R2 AIC

t4:30

⁎ Standard errors in parentheses.

U

N

C

O

t4:3

Maximum Likelihood Estimate⁎

P-value

Odds Ratio

95% Odds Ratio Wald Confidence Limits

8.656 (0.417) -0.718 (0.031) 0.306 (0.006) 0.065 (0.033) 0.009 (0.031) -0.162 (0.029) Baseline -0.625 (0.085) 1.307 (0.171) 1.231 (0.100) Baseline -0.248 (0.054) 0.078 (0.059) -0.009 (0.060) Baseline -0.121 (0.060) -0.045 (0.046) Baseline -0.431 (0.013) 0.005 (0.000) -0.014 (0.004) 56,727 -14,602.51 -17,012.55 0.14 29,273

0.000 0.000 0.000 0.048 0.770 0.000

0.488 1.359 0.979 0.925 0.779

0.458 1.342 0.875 0.830 0.709

0.519 1.376 1.094 1.030 0.856

0.000 0.000 0.000

0.755 5.212 4.833

0.669 3.651 4.094

0.851 7.442 5.705

0.000 0.186 0.869

0.899 1.246 1.141

0.809 1.107 1.011

0.999 1.403 1.289

0.044 0.329

0.707 0.762

0.513 0.562

0.973 1.035

0.000 0.000 0.000

0.650 1.005 0.986

0.632 1.004 0.978

0.668 1.006 0.994

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

692 693 694 695 696 697 698 699 700 701 702

724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748

753 754

t5:1 t5:2

C

722 723

E

720 721

R

718 719

R

716 717

N C O

714 715

U

712 713

Fig. 3. Severity Risk for Back, Left, and Right Impacts on Different Age Groups.

O

This study identified geometric, traffic, driver-related, and vehiclerelated predictors of crash injury severity along a critical high-speed facility, urban freeway, using the mixed logit model together with a

710 711

R O

752

709

Table 5 Severity Risk for Crash Impact by At-fault Driver’s Age.

t5:3

At-fault driver's age

t5:4

Very old & old⁎

Very young

Young

Middle

1 1 1

1.23 1.43 1.85

2.89 3.57 5.08

12.68 10.74 17.62

t5:5 t5:6 t5:7

Crash impact

t5:8

⁎ Reference group.

Back Left Right

comprehensive statewide dataset. Some rarely-explored severity predictors were considered, such as the distance of the crash to the nearest ramp junction and detailed levels of vehicle types, driver's age, and sides of impact. The mixed logit approach has the flexibility to capture freeway segment heterogeneity and to accommodate latent variables that are difficult to quantify and might affect the model estimation. The distance of the crash to the nearest ramp was a significant geometric-related variable from the mixed logit model, whereas the logarithm of traffic volume, percentage of trucks, and time of the crash (e.g., the morning off-peak and afternoon peak) were found to be significant traffic-related predictors. The significant vehicle-related variables included at-fault driver’s vehicle type (trucks, buses, and motorcycles) and the side of at-fault driver’s crash impact (back, left, and right-side). The at-fault driver’s age was also a significant driver predictor. Moreover, in investigating the influence of one of the essential predictors on different age groups, crash impact, it was noticed that back, left, and right impacts had the highest risk among middleaged drivers, followed by young drivers, very young drivers, and finally, old and very old drivers. The two mixed logit models for very young and young at-fault drivers revealed that dark lighting conditions negatively impact crash injury severity among both age groups compared to other conditions. Moreover, dawn lighting conditions and back-side impacts were significant predictors of injury severity among young drivers. Both variables were associated with a reduction in the probability of severe injury. This study also provided a promising approach to screening the predictors before fitting the mixed logit model using the random forest technique. It is commonly believed that the process of fitting the mixed logit model is done by starting with all possible variables, then reducing one-at-a-time; however, this is not feasible when the dataset involves a large number of observations, as in this study. The random forest technique assisted in assembling a screened list of variables to be entered in the mixed logit model. Also, to capture the variation of parameters’ coefficients across different models, the standard binary logit model, which ignores the random variation and holds all parameters as fixed, was examined and compared to the mixed logit one. It was found that the standard logit model could yield illogical estimates. The goodness-of-fit statistics show that the mixed logit model outperformed the standard logit model. Future research could focus on exploring the effect of the interaction of two or more variables during fitting the mixed logit model which could help to better interpret variables influencing crash injury severity. Another important path is that exploring other data mining techniques for variables’ screening and selection (e.g., classification and regression trees) before fitting the mixed logit model could be of interest as well.

P

7. Concluding remarks

707 708

D

751

705 706

T

749 750

the effect of these three crash impacts on the at-fault drivers’ age. From the observed crashes on the analyzed freeway segments, the severity proportion of each category was estimated (e.g., back impact on middle age, left impact on very young). It was noticed that the least severe injury proportion was for very old and old drivers in tabulation with the three impacts, and the severity risk was hence assigned a value of 1 for all three impacts’ influence on very old and old drivers (acting as reference groups). Afterwards, the severity risk for very young, young, and middle drivers was estimated relative to the very old and old drivers. Table 5 shows the severity risk tabulation of the different age groups for the back, left, and right impacts. For example, back-side impacts on young drivers are 2.89 times riskier than back impacts on very old and old drivers. This can be better observed in Fig. 3. It can be seen that the three impact types have the riskiest influence among middle-aged drivers, followed by young drivers, and very young drivers. Also, the right-side impacts have the highest risk for middle-aged drivers (17.62 times riskier than the old). These rightside impacts could be attributed to changing lanes at high speeds. To reduce side impacts due to lane changing, two strategies (the primary and secondary, as proposed by Peek-Asa & Kraus, 1996) can be recommended. The primary strategy is mainly concerned with preventing crash occurrence, while the secondary one aims to reduce the injury severity of the crash. A possible primary countermeasure is to conduct campaigns to convey the hazardous effect of changing lanes at higher speeds. Another possibility is to device side crash avoidance systems, as found in Hetrick (1997). These systems are in-vehicle warning devices to alert drivers of a potential crash risk. Secondary strategies could stress the use of crashworthy systems within each vehicle, which may include seatbelts and airbags. These crashworthy systems are important to absorb the energy of the impact and direct it away from drivers and passengers. In addition, based on the findings from this study, to reduce the severe injury likelihood along urban freeways, an effective strategy includes restricting buses’ and motorcycles’ access to freeways at specific hours of the day since they were associated with the highest severity risk. Given that the increase in trucks percentage could negatively impact severity by blocking the line of sight, setting some restrictions on trucks’ access to freeways could also be helpful. Moreover, since motorcycles were associated with a high severe injury risk on urban freeways, a potential primary countermeasure could be to require motorcyclists to wear bright clothes (Peek-Asa & Kraus, 1996). A secondary countermeasure might involve stressing the use of helmets and energy-absorbing leg protectors. To reduce the crash injury severity of bus occupants, a possible remedy is to assess the use of passenger restraints to prevent the lateral movement of passengers at the time of a crash; thus, reducing the chances of passengers colliding with each other.

E

703 704

9

F

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002

755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801

K. Haleem, A. Gan / Journal of Safety Research xxx (2013) xxx–xxx

806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871

Abdel-Aty, M., Chen, C., & Schott, J. (1998). An Assessment of the Effect of Driver Age on Traffic Accident Involvement Using Log-linear Models. Accident Analysis and Prevention, 30(6), 851–861. Abdel-Aty, M., & Haleem, K. (2011). Analyzing Angle Crashes at Unsignalized Intersections Using Machine Learning Techniques. Accident Analysis and Prevention, 43(1), 461–470. Agresti, A. (2007). An Introduction to Categorical Data Analysis (2nd ed.). Wiley Series. Anastasopoulos, P., & Mannering, F. (2011). An Empirical Assessment of Fixed and Random Parameter Logit Models Using Crash- and Non-crash-specific Injury Data. Accident Analysis and Prevention, 43(3), 1140–1147. Ben-Akiva, M., & Bolduc, D. (1996). Multinomial Probit with a Logit Kernel and a General Parametric Specification of the Covariance Structure. Working Paper, Department of Civil Engineering, MIT. Bhat, C. (2001). Quasi-random Maximum Simulated Likelihood Estimation of the Mixed Multinomial Logit Model. Transportation Research Part B, 17(1), 677–693. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. Carson, J., & Mannering, F. (2001). The Effect of Ice Warning Signs on Ice-accident Frequencies and Severities. Accident Analysis and Prevention, 33, 99–109. Crash Analysis Reporting System. http://tlhost01.dot.state.fl.us/bluezone/FDOT_Session/ default.htm (Accessed January 2008) Das, A., & Abdel-Aty, M. (2010). A Genetic Programming Approach to Explore the Crash Severity on Multi-lane Roads. Accident Analysis and Prevention, 42(2), 548–557. Econometric Software, Inc. LIMDEP Software Package. http://limdep.com (Accessed March 2012) Geedipally, S., Turner, P., & Patil, S. (2011). An Analysis of Motorcycle Crashes in Texas Using a Multinomial Logit Model. Paper Presented at the 90th Annual Meeting of the Transportation Research Board, Washington, D.C. Gkritza, K., & Mannering, F. (2008). Mixed Logit Analysis of Safety-belt Use in Single- and Multi-occupant Vehicles. Accident Analysis and Prevention, 40, 443–451. Golob, T., Recker, W., & Pavlis, Y. (2008). Probabilistic Models of Freeway Safety Performance Using Traffic Flow Data as Predictors. Safety Science, 46, 1306–1333. Haleem, K. (2009). Comprehensive Analytical Investigation of the Safety of Unsignalized Intersections. PhD Dissertation. Orlando, FL.: University of Central Florida. Haleem, K., & Abdel-Aty, M. (2010). Examining Traffic Crash Injury Severity at Unsignalized Intersections. Journal of Safety Research, 41(4), 347–357. Haleem, K., Abdel-Aty, M., & Santos, J. (2010). Multiple Applications of the Multivariate Adaptive Regression Splines in Predicting Rear-end Crashes at Unsignalized Intersections. Transportation Research Record, 2165, 33–41. Haleem, K., & Gan, A. (2011). Identifying Traditional and Non-traditional Predictors of Crash Injury Severity on Major Urban Roadways. Traffic Injury Prevention, 12(3), 223–234. Harb, R., Yan, X., Radwan, E., & Su, X. (2009). Exploring Precrash Maneuvers Using Classification Trees and Random Forests. Accident Analysis and Prevention, 41(1), 98–107. Hensher, D., & Greene, W. (2003). The Mixed Logit Model: The State of Practice. Transportation, 30(2), 133–176. Hetrick, S. (1997). Examination of Driver Lane Change Behavior and the Potential Effectiveness of Warning Onset Rules for Lane Change or Side Crash Avoidance Systems. MS Thesis. Blacksburg, VA: Virginia Polytechnic Institute and State University. Khattak, A. (2001). Injury Severity in Multi-Vehicle Rear-end Crashes. Transportation Research Record, 1746, 59–68. Kim, J., Ulfarsson, G., Kim, S., & Shankar, V. (2011). A Mixed Logit Model Approach to Investigate effects of Age on Driver-Injury Severity in Single-Vehicle Accidents. Paper Presented at the 90th Annual Meeting of the Transportation Research Board, Washington, D.C. Kim, J., Ulfarsson, G., Shankar, V., & Mannering, F. (2010). A Note on Modeling Pedestrianinjury Severity in Motor-vehicle Crashes with the Mixed Logit Model. Accident Analysis and Prevention, 42, 1751–1758. Klop, J., & Khattak, A. (1999). Factors Influencing Bicycle Crash Severity on Two-Lane, Undivided Roadways in North Carolina. Transportation Research Record, 1674, 78–85. Malyshkina, N., & Mannering, F. (2008). Analysis of the Effect of Speed Limit Increases on Accident Injury Severities. Transportation Research Record, 2083, 122–127. Malyshkina, N., & Mannering, F. (2010). Empirical Assessment of the Impact of Highway design exceptions on the frequency and severity of vehicle accidents. Accident Analysis and Prevention, 42, 131–139. McFadden, D., & Train, K. (2000). Mixed MNL Models for Discrete Response. Journal of Applied Econometrics, 15, 447–470.

F

References

O

805

R O

The authors wish to thank the Florida Department of Transportation for providing the data used in this study.

P

803 804

McGwin, G., & Brown, D. (1999). Characteristics of Traffic Crashes among Young, Middle-aged, and Old Drivers. Accident Analysis and Prevention, 31(3), 181–198. Mehndiratta, S. (1996). Time-of-day Effects in Inter-city Business Travel. PhD Dissertation. Berkeley: University of California. Milton, J., Shankar, V., & Mannering, F. (2008). Highway Accident Severities and the Mixed Logit Model: An Exploratory Empirical Analysis. Accident Analysis and Prevention, 40(1), 260–266. Moore, D., Schneider, W., IV, Savolainen, P., & Farzaneh, M. (2011). Mixed Logit Analysis of Bicyclist Injury Severity Resulting from Motor Vehicle Crashes at Intersection and Non-intersection Locations. Accident Analysis and Prevention, 43(3), 621–630. National Highway Traffic Safety Administration (2007). Traffic Safety Facts. Washington, DC: National Center for Statistics and Analysis, US Department of Transportation. National Transportation Safety Board (1999). Bus Crashworthiness Issues. Highway Special Investigation Report, NTSB/SIR-99/04. Washington, DC: Author. Ouellet, J., & Kasantikul, V. (2006). Motorcycle Helmet Effect on a Per-Crash Basis in Thailand and the United States. Traffic Injury Prevention, 7(1), 49–54. Pai, C., Hwang, K., & Saleh, W. (2009). A Mixed Logit Analysis of Motorists’ Right-of-way Violation in Motorcycle Accidents at Priority T-junctions. Accident Analysis and Prevention, 41, 565–573. Peek-Asa, C., & Kraus, J. (1996). Injuries Sustained by Motorcycle Riders in the Approaching Turn Crash Configuration. Accident Analysis and Prevention, 28(5), 561–569. Renski, H., Khattak, A., & Council, F. (1999). Effect of Speed Limit Increases on Crash Injury Severity: Analysis of Single-vehicle Crashes on North Carolina Interstate Highways. Transportation Research Record, 1665, 100–108. Revelt, D., & Train, K. (1998). Mixed Logit with Repeated Choices. The Review of Economics and Statistics, 80, 647–657. Roadway Characteristic Inventory. http://webapp01.dot.state.fl.us/Login/default.asp (Accessed May 2010) SAS Institute, Inc. (2002). Version 9 of the SAS System for Windows. NC: Cary. Savolainen, P., & Mannering, F. (2007). Probabilistic Models of Motorcyclists’ Injury Severities in Single- and Multi-vehicle Crashes. Accident Analysis and Prevention, 39, 955–963. Shankar, V., & Mannering, F. (1996). An Exploratory Multinomial Logit Analysis of SingleVehicle Motorcycle Accident Severity. Journal of Safety Research, 27(3), 183–194. Shankar, V., Mannering, F., & Barfield, W. (1996). Statistical Analysis of Accident Severity on Rural Freeways. Accident Analysis and Prevention, 28(3), 391–401. Sparks, J. (2009). A Comparison of Data Mining Methods for Binary Response Variables in Direct Marketing. PhD Dissertation. Chicago: University of Illinois. The University of California at Berkeley (2000). Mixed Logit Workshop, Econometrics Laboratory. http://elsa.berkeley.edu/eml/qca_reader/7b.mixed.pdf (Accessed March 2011) Train, K. (2009). Discrete Choice Methods with Simulation (2nd ed.). Cambridge, UK: Cambridge University Press Publication. Ulfarsson, G., & Mannering, F. (2004). Differences in Male and Female Injury Severities in Sport-utility Vehicle, Minivan, Pickup and Passenger Car Accidents. Accident Analysis and Prevention, 36, 135–147. Washington, S., Karlaftis, M., & Mannering, F. (2003). Statistical and Econometric Methods for Transportation Data Analysis. Boca Raton, FL: Chapman & Hall/CRC. Xie, Y., Zhang, Y., & Liang, F. (2009). Crash Injury Severity Analysis Using Bayesian Ordered Probit Models. ASCE: Journal of Transportation Engineering, 135(1). Yamamoto, T., Hashiji, J., & Shankar, V. (2008). Underreporting in Traffic Accident Data, Bias in Parameters and the Structure of Injury Severity Models. Accident Analysis and Prevention, 40, 1320–1329. Ye, F., & Lord, D. (2011). Comparing Three Commonly Used Crash Severity Models on Sample Size Requirements: Multinomial Logit, Ordered Probit and Mixed Logit Models. Paper Presented at the 90th Annual Meeting of the Transportation Research Board, Washington, D.C. Zhang, H. (2010). Identifying and Quantifying Factors Affecting Traffic Crash Severity in Louisiana. PhD Dissertation. Baton Rouge, LA: Louisiana State University.

D

Acknowledgment

U

N

C

O

R

R

E

C

T

802

E

10

Kirolos Haleem: is currently a Research Associate at Florida International University (FIU), Miami, FL, U.S.A. He earned his Ph.D. in December 2009 in Civil (Transportation) Engineering from the University of Central Florida, Orlando, FL, U.S.A. His areas of expertise are traffic safety analysis, application of statistical and econometric models and data mining techniques in Transportation Engineering, traffic operations, traffic simulation, and ITS. At FIU, he conducts traffic safety research and supervises Ph.D. and master’s students. Professor Albert Gan is his post-doctoral research supervisor. Albert Gan: is a Professor of Transportation Engineering in the Civil and Environmental Engineering Department at Florida International University (FIU). He is also the Deputy Director of the Lehman Center for Transportation Research (LCTR) at FIU. His areas of research include highway safety, traffic simulation, ITS, GIS, transit planning, and demand modeling. Dr. Gan has authored or co-authored over 150 refereed papers, technical reports, and articles. He has been the developer of more than a dozen transportation software systems, including the nationally known Florida Transit Information System (FTIS).

872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949

950

Please cite this article as: Haleem, K., & Gan, A., Effect of Driver’s Age and Side of Impact on Crash Severity along Urban Freeways: A Mixed Logit Approach, Journal of Safety Research (2013), http://dx.doi.org/10.1016/j.jsr.2013.04.002