Accident Analysis and Prevention 40 (2008) 502–510
Quantile regression provides a fuller analysis of speed data Paul Hewson ∗ School of Mathematics and Statistics, University of Plymouth, Drake Circus, Plymouth PL4 8AA, United Kingdom Received 21 December 2006; received in revised form 3 August 2007; accepted 8 August 2007
Abstract Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a “community speed watch” programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated. © 2007 Elsevier Ltd. All rights reserved. Keywords: Quantile regression; Speed; Intervention; B-spline; Non-parametric; Community speed watch
1. Introduction There is a large body of literature which can be interpreted as evidence that increased relative vehicle speed may be associated with increased collision risk (Finch et al., 1994; Ledolter and Chan, 1996; Kloeden et al., 1997; Quimby et al., 1999; Taylor et al., 2000, 2002; Ossiander and Cummings, 2002; Patterson et al., 2002; Vernon et al., 2004). Consequently, excess speed is widely, although not universally, accepted as a marker of “road danger”, and as a result evidence of speed reduction following an intervention may be regarded as a success measure for any action taken. Additionally, evidence for the effectiveness of interventions such as speed cameras, reviewed, for example in Pilkington and Kinra (2005) and Wilson et al. (2007), can also be taken as evidence that relatively higher speed may be a collision risk factor. Underlying some of these analyses are assumptions about the homogeneity of the vehicles being studied. When attention is focused on drivers, there is a broad evidence base which indi-
∗
Tel.: +44 1752 232778; fax: +44 1752 232780. E-mail address:
[email protected].
0001-4575/$ – see front matter © 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.aap.2007.08.007
cates that habitually faster drivers present a greater collision risk than habitually slower drivers. For example, Corbett (1995) categorised motorists as “conformers”, who already obeyed speed limits, “deterred” who modified their speed following the introduction of speed cameras, “manipulators” who slowed down for cameras alone and “defiers” who carried on driving in excess of the speed limit regardless. Corbett and Simon (1999) noted a higher self-reported collision risk among “defiers” and “manipulators”, but suggested that prosecution was effective deterrent for all except the latter group. Similar work for the Scottish office Stradling (2003) also suggested that drivers who had been issued with speed tickets had elevated collision histories. In an earlier study, Stradling and Meadows (2002) reported on 2880 recipients of speeding tickets issued in Glasgow (of which they received 510 responses) in which they identified three groups of drivers, those who said they were now more speed sensitive and slowed down generally (41–56% of all responses), those who said they were camera sensitive and slowed down only at camera sites (31–32% of responses), and those who said they hadn’t altered their behaviour (14–15% of responses). This work therefore appears to suggest that there are groups of high risk drivers, either high “violators” or “defiers” who tend to drive fast. There is nevertheless potential for confounding
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
within this, as young male tends to be a marker for each group and high speed profiles, but these behavioural findings correspond with observations from speed recording data. To give one example, Keenan (2004) reports limited observations of four camera sites, which indicated higher speeds 500 m upstream and downstream of camera sites. He suggested that the conspicuity of the cameras may have increased manipulative activity upstream whilst conversely affording some speed lowering downstream. There has been more formal work associating speed recording data on roads with their collision profiles. In many modelling exercises it is common to incorporate information both on the average and variance of the speed; such modelling work has indicated that both mean speed and the coefficient of variation of speed are important risk factors (Taylor et al., 2002). It may therefore be important to note more recent findings (Dey et al., 2006) which have distinguished unimodal and bimodal speed distribution curves. This latter work highlighted that the ratio of 15th percentile to mean speeds may be a better marker of bimodality than the ratio of 85th percentile to mean speeds—the implication being that the speed profile of a road may be more complex than the two summary statistics (mean speed and coefficient of variation) imply. Taking earlier points about non-homogeneity of the driving population, the importance of the coefficient of variation in various modelling exercises, the evidence for higher speed drivers to present higher risk and empirical evidence for the complexity of vehicle speed data it would appear that a regression modelling strategy which can consider a range of quantiles may provide further insights, and ultimately, the goal of using quantile regression would be to make a broader assessment of speed when attempting to analyse speed data. This paper will however demonstrate one feature of quantile regression, the ability to determine whether the 85th percentile (or any other percentile) speed can be considered to have changed significantly. Assessment of the 85th percentile is a ubiquitous feature in the evaluation of many published and nonpublished safety interventions even where the actual collision reduction cannot be determined. To take just one example, Harre (2003) conducted an experiment on a stretch of road with traffic flows of 700–750 vehicles per hour. Three scenes were presented on a kerb at the experimental site; a null scene, a scene containing children playing and finally a scene containing children waiting to cross the road. One thousand four hundred and forty-six readings of the speed of cars passing these scenes were taken with any one of the road-side scenes being acted out. There was negligible difference in the mean speeds of traffic passing the three scenes, observed mean speeds were 55.6 kph (7.5 S.D.), 54.3 kph (6.2 S.D.), 52.8 kph (6.9 S.D.), respectively, but no significance tests are given for the 85th percentile speeds which were reported as as 62.3 kph at the null scene, 59.5 kph at the first experimental scene and 59.2 kph at the second experimental scene. Whilst it is extremely important to know that the mean speeds were little altered, it would also be of value to know whether the 85th percentile could be considered to have been lowered significantly in the presence of the children. There are a number of algorithms proposed to estimate confidence intervals for quantiles, to give a
503
recent example Hutson (1999) proposed using an approximation based on fractional order statistics. However, where we have to make a number of comparisons as may be the case in an experimental design, or we wish to account for cyclical patterns such as speeds varying by time of day it may be more appropriate to use an approach based on linear models. This paper therefore proposes the use of quantile regression when assessing speed data. At its simplest, this therefore means we have a conventional hypothesis test (based on a linear model) analogous to the t-test but which allows a determination as to whether the 85th percentile speeds have changed. However, quantile regression allows us to study any percentile between 0 and 100%, it may well the that the 85th percentile is not the best predictor of collision risk. One major advantage of using a modelling approach to studying quantiles is that all the extensions to regression, such as non-parametric smoothers, are available. This means that we also have a sensitive way of handling data collected over time when the relationship changes over time, this is commonly seen in speed data where patterns vary during the day, during the week and were sufficient data available, seasonally. The structure of the paper is as follows. A brief overview of quantile regression will be given in Section 2. We illustrate the technique on two sets of data which will be described in association with the results of their analysis in Section 3, the former example using an experimental design and the latter using non-parametric smoothers to isolate the effect of temporal fluctuation. A brief discussion and some conclusions are presented in Section 4.
2. Quantile regression Quantile regression extends the least squares estimate of conditional means for a range of models estimating conditional quantile functions (Koenker and Basset, 1978). A number of reviews have appeared recently (Yu et al., 2003; Koenker and Hallock, 2001). For conventional regression, where we wish to model response y as yi = xTi β + i given xi , a vector of covariates with xi = (1, xi1 , . . . , xip ) with p possibly equal to one, β is a corresponding vector of coefficients such that β = (β0 , β1 , . . . , βp ) and N ∼ (0, σ 2 ). Where we aim to model the conditional mean response, the Gaussian likelihood is given by
n
1 2 (yi − xTi β) L(β) ∝ exp − 2 2σ
(1)
i=1
and maximising L(β) over β gives the familiar least squares estimates. Our interest is not in estimating the conditional mean, rather we are interested in estimating a conditional quantile, 100α%.
504
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510 (∗)
We can state the likelihood for such a model formally as
n Lα (β) ∝ exp − ρα (yi − xTi β)
(2)
i=1
which is similar in structure to the conventional regression likelihood set out in Eq. (1), although have incorporated an asymmetric loss function ρα which requires further explanation. If we consider that the 100α% quantile of the residual is the 100α% largest value, i.e. has 100α% of values smaller than it and 100(1 − α)% of values larger than it. Quantile regression consists of finding estimates βˆ where 100α% of the residuals are below zero, and 100(1 − α)% are above zero. We can formally identify the positive and negative residual values by means of an indicator function IA on the set A: IA () =
1
∈A
0
A
and use this indicator function to give a formal definition of conditional quantiles. The step which allowed the development of model fitting for quantile regression was to use standard decision theory to define the loss function ρα () as follows: ρα () = α I(−∞,0] () − (1 − α) I[0,∞) (U)
(3)
for any value of α between 0 and 1. It has been shown that finding the value of βˆ maximising the likelihood of the quantile regression problem equates to finding the value of βˆ which minimises this loss function (Koenker, 2005, p. 33). Essentially, this function finds the difference between the product of α and the sum of the negative residuals and the product of (1 − α) and the sum of the positive residuals. It is not possible to fit such a model using the simple methods available for ordinary least squares regression, but linear programming techniques are available which can minimise Eq. (3) and model fitting software is widely available (Koenker, 2006); we use this software throughout. As will be noted later, current practice when recording speeds is to round to the nearest integer. This is problematic in quantile regression and can lead to an excessively high number of zero residuals during model fitting which means that there are a number of values of βˆ which minimise Eq. (3). Ideally, we would wish to work with speed data that had been recorded with greater precision, but there are two accessible ways of working round this problem. The first method is to use jittering, which is commonly used with discrete data such as count data (Koenker, 2005, p. 259). We have experimented with adding a small amount of random noise to the speed recordings using both the observed data sets and simulation studies and it appears to work satisfactorily. However, in this paper we present more conservative results here based on the assumption that the data follow a censoring process. It is possible to assume left and right censored observations within the quantile regression framework (Koenker, 2005, p. 41). If we regard yi as our censored (rounded) observation of
“true” vehicle speed yi we essentially assume: ⎧ (∗) ⎪ ⎨ yi if y > yi yi = y(∗) otherwise ⎪ ⎩ y¯ i if y(∗) < y¯ i where the underscore and overscore denote rounding down and up, respectively. The basic quantile regression model can be used in an analogous method to the “ordinary least squares” linear model. For example, when considering the effectiveness of an intervention it is possible to use a simple treatment factor for the before–after contrast and use the ratio of the subsequent parameter estimate to its standard error as a t-ratio in determining the statistical significance of the change in the αth quantile speed. For the purposes of demonstration in this paper, widely available data based upon a quasi-experimental design with control sites and two “after” recordings will be used. We also examine more recent data in which the use of quantile regression permits the use of a nonparametric smoother to model cyclical patterns in speeds during the day. 2.1. Smoothing splines By using quantile regression rather than a series of independent hypothesis tests, it is possible to incorporate non-parametric smoothing functions to deal with temporal patterns. There are a number of alternative approaches, we briefly outline one possible method based on B-splines. Perhaps the most obvious features which can be anticipated are congestion during peak travelling times acting to slow vehicles down and differences in weekend and holiday periods. Essentially, instead of considering a relationship with a covariate x, we are modelling relative to some smooth function f (x). For example, one could consider modelling: yi = f (xi ) + i
(4)
where yi is the dependent variable, xi the dependent (covariate) and f is a convenient smooth function and for conventional regression purposes i ∼ N(0, σ 2 ). However, when considering the quantile regression application, we wish to maximise: n Lα (β) ∝ exp − ρα (yi − f (xi )) i=1
There are a range of methods available for fitting a smooth function, here we decide to use regression splines. We choose a series of basis functions such that function j is given by bj (x) and f can be written as f (x) =
q
bj (x)βj
j=1
which provides a convenient linear model. In other words, in order to fit a realistic curve to the temporal patterns in speed data, we actually fit a number of shorter pieces of curves. These are then joined up at “knots” (very formally these “knots” can be considered as breakpoints in the third derivative of these curves).
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
505
Fig. 1. The amisdata from Cambridgeshire: box and whiskers plot denoting vehicle speeds reported before (B), after (A) or a while after (L) a sign was introduced to the intervention site. Intervention sites denoted by (S), control sites (no sign) denoted by (N).
Fuller explanation of these smooth functions is given by Hastie (1992), and we note that from the available choices we use Bsplines. These have the property that they are strictly local, in other words each basis function is zero beyond the m + 3th adjacent, evenly spread, knot. Another way of considering these is given by Wood (2006), essentially, for a k parameter B-spline basis we define k + m + 1 knots x1 < x2 < · · · < xk+m+1 . The first and the last m + 1 knot locations are arbitrary as the splines are evaluated over [xm+2 , xk ]. The splines, of order (m + 1) are given by f (x) =
q j=1
Bjm (x)βj
(5)
with recursively defined B-spline basis functions: x − xj Bm−1 (x) xj+m+1 − xj j
Bjm (x) =
+ and Bj−1 (x)
xj+m+2 − x Bm−1 (x), xj+m+2 − xj+1 j+1
=
1 0
xi ≤ x < xi+1 otherwise
j = 1, . . . , q
(6)
(7)
There is some trade-off between the fit of the curve to the data and the smoothness of the curve; these can be controlled by
506
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
minimising: n
ρα (yi − f (xi ))2 + λ
f (x)2 dx
(8)
i=1
subject to adjustments in the value of λ. However, for the purposes of this presentation, the B-splines will be used as an set of additive terms in the linear predictor only. Having discussed the models we will be using, we next consider two sets of data regarding vehicle speed. The first will demonstrate a conventional fixed effects model, analogous to ANOVA, in a quantile regression framework using publicly available data. The second will use automatically collected data and demonstrate the use of the non-parametric smoother as a means of smoothing out daily patterns in speed profiles. 3. Data and model fitting results The R software (R Development Core Team, 2005) was used throughout this study, the quantreg library (Koenker, 2006) was used for the quantile regression. To deal with the rounding, which was only problematic when using the data from Cambridgeshire we used the “fcen” algorithm, the Fitzenberger implementation of Powell’s (fixed) censored quantile regression estimator (Powell, 1986), available within quantreg. 3.1. Speed data from Cambridgeshire The amis dataset is a widely available dataset published in association with a book by Davison and Hinkley (1997). Within that work, these data are used to demonstrate use of the bootstrap. It is reasonably straightforward to use this re-sampling method to determine whether the 85th percentile has changed when comparing two groups of data. However, we use these data as there are in fact observations on approximately 100 vehicle speeds at each of 14 pairs of sites recorded before, after and some time after warning signs were installed, making a total of six experimental conditions for each pair of sites. The sites are paired to form a matched control and intervention site. Unfortunately, Davison and Hinkley (1997) give little further information about the intervention or the data collection; it seems likely that the readings have been obtained manually but we can infer nothing about the timings of the readings—whether the three periods could be considered comparable in terms of weekday, time of day or season. Nevertheless, given that these data are publicly available
they seem like an idea resource to demonstrate any analytical technique as it allows other workers to compare results. Fig. 1 provides “box and whiskers” plots (Chambers et al., 1983) for each of the matched control and intervention sites in the amis data, before, after and some time after the signs were introduced. The plots have been presented in the format of a trellis plot Becker et al. (1994). It would appear from this representation that the intervention sites generally had lower speeds than their matched sites during the before period. The fitted models have the form: η = Xβ
(9)
where X is a design matrix consisting of an intercept, treatment contrasts for the period (“after” and “long time after”), warning sign and the interaction between period and warning sign. It is therefore possible to partition β = (β0 , β1 , . . . , βp ), where β0 denotes the intercept, β1A denotes the contrast between “before” and “after”, β1L denotes the contrast between “before” and “long time after”, β2 denotes the contrast between unsigned and signed members of a pair, β1A:2 denotes the interaction between the after contrast and the sign contrast, and effectively tells us about the effect of the sign for the after period, likewise β1L:2 tells us about the effect of the sign for the long time after period. A number of other β3:15 parameters block out the differences between pairs of sites. The corresponding linear model fitted by ordinary least squares fits these data poorly with an adjusted R2 of only 0.2. Nevertheless, for comparison purposes, the results from fitting this linear model can be reported as intercept β0 was estimated as 38.23 (S.E. 0.15), “after” (β1A ) as 0.91 (S.E. 0.22), “long after” (β1L )as 1.5 (S.E. 0.21), “warning sign” (β2 ) as −1.7 (S.E. 0.21) and the key interaction terms “sign:after” (β1B:2 ) as −1.66 (S.E. 0.31) and “sign:long after” (β1L:2 ) as −0.38 (S.E. 0.31). This confirms the visual impression that the sites receiving the intervention tended to have a lower speed at all times (before and after), it also suggests that speeds in general increased after the intervention period—which may well be a seasonal, time of day or time of week effect. However, the key result is the interaction term; both are negative indicating some effect of the signs in terms of speed reduction. Nevertheless, the effect is not significant during the later period, possibly suggesting that the benefits of the intervention were short lived. We therefore consider the results obtained from the quantile regression. The main results are summarised in Table 1, again we omit the estimates for the specific sites as these blocking terms are not important to the conclusions.
Table 1 Parameter estimates (standard errors) from fitting five quantile regression models (15th, 30th, 50th, 70th and 85th percentiles) to the amis data Percentile
Intercept (β0 ) After (β1A ) Long after (β1L ) Warning sign (β2 ) Interaction, after:sign (β1A:2 ) Interaction, long after:sign (β1L:2 )
15th
30th
50th
70th
85th
32 (0.36) 0 (0.47) 1 (0.48) −1 (0.39) −1 (0.50) 0 (0.49)
35 (0.23) 0 (0.21) 0 (0.32) −1 (0.36) −2 (0.40) 0 (0.51)
38 (0.53) 0 (0.50) 0 (0.50) −1 (0.47) −2 (0.72) 0 (0.57)
41 (0.68) 0 (0.56) 1 (0.59) −1 (0.55) −2 (0.70) −1 (0.83)
45 (0.99) 0 (0.80) 0 (0.71) −1 (0.86) −2 (1.23) 0 (1.09)
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
Fig. 2. Parameter estimates for βInteraction (After:Sign) denoting parameter and 95% confidence interval.
Table 1 indicates that the intercept speed increases with the percentile, a useful validity check. None of the interactions between “sign” and “long-after” can be considered significant in terms of a t-test derived from the ratio of the parameter to its standard error. However, four of the interaction terms between the immediate “after” and “sign” can be considered significant, with the notable exception of the 85th percentile where the standard error has increased considerably. With that exception, it appears that speed reduction may be lowest at the 15th percentile and greater at the other percentiles. Fig. 2 depicts the key result in terms of the initial effect of the signs graphically. This reinforces the point that with only 100 replicates (approximately) per site, the standard errors are quite large and the 95% confidence intervals are very broad. However, all but one of these parameters are “significant” in that the t-ratio formed from the ratio of the parameter to its standard error are significantly large. This clearly demonstrates that the quantile regression approach offers a very simple way to obtain hypothesis test type results on the effectiveness of an intervention with regard to various percentiles, although more data are required before making a definitive statement about the significance of the speed reduction at the 85th percentile.
507
task enquiry officers to investigate any particularly problematic behaviour. The programme runs from March to October, and consists of three cycles of alternative weeks where volunteers are present for a week and then absent for a week. The volunteers are clearly identified in visible jackets which brand them as “community speed watch” operators. The data reported here relate to a programme initiated on the 19th of February in the village of Aynho at a site on Banbury Road. It is hoped that the presence of the volunteers encourages drivers to observe the speed limit. Traffic speed data were collected before and after the 6-week period by an radar-based automatic counter/classifier. It is not the purpose of this paper to report on the effectiveness or otherwise of the “community speed watch” programme, it is clear that many confounding factors could explain any apparent changes in speed. The “before” data correspond to the period 5th to 12th February 2007, the “after” data correspond to the period 2nd to 9th April. There may well be seasonal differences between the two time periods which explain any speed changes. Clearly, these are matters which can only be addressed adequately by careful study design. Nevertheless, these are useful data with which to explore the potential for quantile regression. Firstly, by way of illustrating the data, Figs. 3 and 4 depict “before” and “after” speeds on the Wednesday of the respective monitoring periods for inbound and outbound traffic, respectively. It is clear that the pattern of recorded speed varies
3.2. Speed data from Northamptonshire community speed watch initiative We now demonstrate the use of quantile regression on speed data collected by Northamptonshire County Council and Northamptonshire Police in conjunction with wider activities in relation to a “community speed watch” programme. As the name implies, this a community led speed reduction campaign, where volunteers carry out visible speed monitoring activities. These volunteers have no statutory powers, but advisory letters can be sent by the Police to the registered keeper of any cars observed exceeding the speed limit, and the police are able to
Fig. 3. Automatically collected data from Banbury Road, Aynho, Northamptonshire (Southeasterly/inbound travel) depicted as a boxplot by hour. Separate plots shown for “before” and “after” implementation of a community speedwatch programme. The relative number of vehicles is denoted by the width of the boxes, with wider boxes reporting more vehicle speeds.
508
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
Fig. 4. Automatically collected data from Banbury Road, Aynho, Northamptonshire (Northwesterly/outbound travel) depicted as a boxplot by hour. Separate plots shown for “before” and “after” implementation of a community speedwatch programme. The relative number of vehicles is denoted by the width of the boxes, with wider boxes reporting more vehicle speeds.
throughout the day, and for this reason careful analysis of the speed data is important. For example, the early hours of the morning can be considered as presenting a greater risk of high severity collisions, and it is apparent that vehicle speeds tend to be greater at this time. Without accounting for the temporal patterns, it is possible to induce some further confounding, for example if the relative numbers of nighttime/daytime vehicles were to alter the summary statistics could alter even though the speed related risk profiles at any time of day could be unaltered. Furthermore, it may be the case that following an intervention, one may be more interested in changes in speeds at these times than any other. Conversely, around amenities such as shops and schools, one may be more interested in speed profiles at different times. It could therefore be the case that in understanding interventions, simplistic measures of success and failure at the designated times may under-represent the reality of an intervention; equally, pooling all the data into a single summary may also aggregate out important details. This paper will demonstrate that it is possible to model the full profile in a way which makes maximum use of all the available data, and also consider very carefully the quantiles of the speed distribution. Manual model building suggested that model built upon a eighth order B-spline provided a suitable fit to these data. An example of the results are plotted in Fig. 5 which superimposes the seven quantile regression fits (for the 5th, 15th, 30th, 50th,
Fig. 5. Model predictions for 5th, 15th, 30th, 50th, 70th, 85th and 95th percentiles superimposed on a boxplot plot of the original data; Southeasterly travel.
70th, 85th and 95th percentiles) over a boxplot of the original data for the southeasterly (inbound) travel on Wednesday. It is of little surprise that at every quantile the models indicate higher speeds during the early hours of the morning than during the day. The overall mean speeds for these data are 35.15 mph “before” and 34.31 mph “after”; considering inbound and outbound separately we have 33.39 “before” and 32.28 mph “after” inbound and 36.97 mph “before” and 36.31 mph “after”. Conventional hypothesis tests (using the t-test) are all highly significant. The corresponding figures for the 85th percentiles are 42 mph “before” and “after” when considering all the data, and 41 mph “before” and 40 mph “after” when considering inbound speeds and 43 mph for both “before” and “after” when considering outbound speeds. Results for fitting the seven quantile regression models, as well as the ordinary least squares model are given in Table 2. When comparing these parameter estimates with the overall summary statistics it should be noted that these models are centred at midnight. Taken as a whole, these results suggest that the inbound speeds are significantly slower than the outbound speeds at every quantile suggested, although the differences are greatest for the 30th and 50th percentiles, and smallest for the 85th and 95th percentile speeds. When comparing speeds by day relative to a Sunday, it is clear that the greatest differences are seen amongst the lowest quantiles modelled, for example the 5th percentile speed on Wednesday is 2.33 mph greater and
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
509
Table 2 Parameter estimates for all parameters from fitting seven quantile regression models and an ordinary least squares regression to speeds collected in Banbury Road, Aynho Quantile regression (percentile)
Intercept (β0 ) βInbound βAfter βAfter:Inbound βDay:Monday βDay:Tuesday βDay:Wednesday βDay:Thursday βDay:Friday βDay:Saturday
OLS
5th
15th
30th
50th
70th
85th
95th
31.06 −3.00 0.34 −0.54 −1.35 1.84 2.33 −0.53 −3.05 −2.60
36.64 −3.99 −0.07 −0.47 −0.51 1.12 0.57 −0.33 −0.96 −0.81
40.34 −4.67 −0.34 −0.42 −0.50 1.17 0.21 −0.14 −0.59 −0.52
44.28 −4.62 −0.15 −0.61 −0.55 1.37 −0.00 −0.18 −0.47 −0.23
46.59 −3.69 −0.12 −0.61 −0.63 1.62 −0.22 −0.24 −0.30 −0.10
49.61 −2.77 −0.51 −0.23 −0.74 1.89 −0.13 −0.14 −0.16 −0.08
54.13 −2.02 −0.66 −0.09 −0.91 2.29 0.04 −0.19 −0.38 −0.00
on a Friday 3.05 mph slower when compared to the baseline level. Smaller differences are generally seen amongst the higher percentiles. However, the potential for the quantile regression model is perhaps demonstrated by examining the parameter estimates in association with their standard errors. Generally speaking, one may assume that the parameter estimates associated with the 5th and 95th percentiles have a larger standard error than any other quantile, a feature of the data sparsity as the extreme quantiles. In Table 3, it is the pattern of significance which dominates. At the lower quantiles, generally speaking the parameters for the overall speed difference are not significant, but the interaction terms are - suggesting that speed has been more influenced inbound than outbound. The same finding is seen with the OLS regression. However, when the higher percentiles are considered, although the total speed reduction is smaller, it is the overall speed difference which is significant, and the inbound speeds are not slowed significantly more than the outbound speeds. Having used the smoothing splines it has been possible to ensure that these results are not confounded with cyclical daily patterns in the speed profiles. It would also appear that the greatest differences in speed reduction have been seen among the “middle” quantiles, i.e. the 30th, 50th (median) and 70th. Had we been Table 3 Parameter estimates and standard errors for βAfter and βAfter:Inbound , the estimates for the overall change of speed and interaction between change of speed and direction before and after implementation of community speed watch programme in Banbury Road Aynho Percentile
5th 15th 30th 50th (median) 70th 85th 95th OLS
Parameter (S.E.) βAfter
βAfter:Inbound
0.33894 (0.27100) −0.06796 (0.16868) −0.33774 (0.13806) −0.15256 (0.17045) −0.11577 (0.19355) −0.51441 (0.23174) −0.66115 (0.31012)
−0.53832 (0.15006) −0.46602 (0.10182) −0.41707 (0.08383) −0.60581 (0.10898) −0.61447 (0.13251) −0.23159 (0.15652) −0.08741 (0.21881)
0.13141 (0.14429)
−0.54451 (0.09090)
Non-significant results have been denoted in italics. Results are given for seven quantiles as well as conventional ordinary least square regression (OLS).
43.09 −3.57 0.13 −0.54 −0.76 1.53 0.33 −0.24 −0.79 −0.61
primarily concerned about reducing higher quantile speed, given the belief that these drivers are thought to represent disproportionately high risk, the quantile regression suggests that these drivers have been less affected than drivers who had been travelling at slower speeds initially. Given the lack of study design it is vitally important not to over-interpret these results, but one possible interpretation is that drivers who basically intend to stay within the speed limit have been more encouraged to do just that than other drivers. 4. Discussion and conclusion This paper has demonstrated the use of a very general method which can be used to determine whether the 85th percentile speed, or any other percentile, either differs or can be considered to have changed “significantly”. By doing this in a (quantile) regression framework there is no restriction to two group comparisons, indeed quantile regression can be undertaken using similar design concepts to the ubiquitous ordinary least squares model. We have demonstrated here results from a designed experiment where, essentially, instead of using an ANOVA to determine equality of means we can use the ratio of parameter estimate to standard error to determine whether the 85th percentile (or any other percentile) speed has changed sufficiently to be considered statistically significant. The theory also provides for confidence intervals on these parameters. We have also demonstrated use of quantile regression in a temporal situation where B-splines were used to model out the intra-day variation in vehicle speeds. Whilst that examination of the Northamptonshire Community Speed Watch data is necessarily limited by the study design, we hope this demonstrates a technique which, when evaluating an intervention, offers one method by which an investigator can determine whether or not the 85th percentile speed has been significantly changed. We note that quantile regression is a widely used econometric technique, and as demonstrated here can be fitted entirely with freely available software. Some cautions are in order. Rounding of the speed data is problematic when conducting quantile regression. We have presented results fitted to speed data assuming a left and right censored process. Other results, not presented here, have relied
510
P. Hewson / Accident Analysis and Prevention 40 (2008) 502–510
on “jittering” the speed readings by adding a small amount of random noise. From simulation studies, this would be unnecessary if speeds were recorded in units of tenths or hundredths (of either mph or kph). This does not cause any computer storage problems as the data could still be recorded as an integer in terms of tenths or hundredths, essentially moving the decimal point one or two places during recording. The floating point representation could be added purely at the presentation stage. Secondly, it cannot be over-emphasised that study design and data collection issues are, as always, paramount. Cause and effect can only be determined in the presence of a randomised controlled trial. This may not have been the case with the amis data, so it is not possible to state that the signs caused the changes in speed. Equally, with the Northamptonshire community speed watch data, interesting though the results are, we cannot claim a causal association between the speed reductions and the programme. Thirdly, as discussed in Section 2, we have used the B-splines in a particularly simple way - further development of the model fitting to incorporate the smoothness penalty term is necessary. Fourthly, it would also be desirable to be able to treat the paired sites as random effects and not fixed effects when model fitting, however the necessary theory for mixed effects quantile regression models still needs further development. Nevertheless, given the free availability of software to fit the basic quantile regression models we anticipate that this form of analysis may become much more common when considering speed data and hope that these simple examples will serve to illustrate the potential benefits of this technique. Acknowledgements Many thanks are due to Matt O’Connell and Police Sergeant Nick Gough of Northamptonshire Police and James Butlin of Northamptonshire County Council for providing the Banbury Road, Aynho data as well as assistance in completing this work and for providing information on the Northamptonshire community speed watch programme. Thanks are also due to Richard Kingsley-Smith of Torbay Council for providing radar collected speed data which initiated this study, and to Richard Toomey of Traffic Technology Ltd. for providing further data which was used at an earlier stage of this study. I would like to thank the referees for detailed and helpful feedback on an earlier draft of this paper. References Becker, R.A., Cleveland, W.S., Shyu, M.-J., Kaluzny, S., 1994. Trellis Display: User’s Guide. Technical Report Statistics Research Report No. 10. AT&T Bell Laboratories. Chambers, J., Cleveland, W., Kleiner, B., Tukey, P., 1983. Graphical Methods for Data Analysis. Wadsworth and Brooks. Corbett, C., 1995. Road traffic offending and the introduction of speed cameras in England: the first self-report survey. Acc. Anal. Prevent. 27, 345–354. Corbett, C., Simon, F., 1999. The Effect of Speed Cameras: How Drivers Respond. Technical Report. Centre for Criminal Justice Research, Department of Law, Brunel University.
Davison, A., Hinkley, D., 1997. Bootstrap Methods and Their Application. Cambridge University Press. Dey, P., Chandra, S., Gangopadhaya, S., 2006. Speed distribution curves under mixed traffic conditions. J. Transp. Eng. 132, 475–481. Finch, D., Kompfner, P., Lockwood, C., Maycock, G., 1994. Speed, Speed Limits and Accidents. Project Report 58. Transport Research Laboratory. Harre, N., 2003. Discrepancy between actual and estimated speeds of drivers in the presence of child pedestrians. Injury Prevent. 9, 38–41. Hastie, T., 1992. Generalised additive models. In: Statistical Models in S. Wadsworth and Brooks, pp. 174–197 (Chapter 7). Hutson, A.D., 1999. Calculating nonparametric intervals for quantiles using fractional order statistics. J. Appl. Stat. 26 (3), 343–353. Keenan, D., 2004. Speed Cameras: how do drivers respond. Traffic Eng. Contr., 104–111. Kloeden, C., McLean, A., Moore, V., Ponte, G., 1997. Travelling Speed and the Risk of Crash Involvement. vol. 1. Findings. Technical Report. NHMRC Road Accident Research Unit, University of Adelaide, Adelaide. Koenker, R., 2005. Quantile Regression. Cambridge University Press/Econometric Society Monographs. Koenker, R., 2006. Quantreg: Quantile Regression. R Package Version 3.84, http://www.r-project.org. Koenker, R., Basset, G., 1978. Regression quantiles. Econometrica 46, 33– 50. Koenker, R., Hallock, K., 2001. Quantile regression. J. Econ. Perspect. 15, 143–156. Ledolter, J., Chan, K., 1996. Evaluating the impact of the 65 mph maximum speed limit on iowa rural interstates. Am. Stat. 50 (1), 79–85. Ossiander, E.M., Cummings, P., 2002. Freeway speed limits and traffic fatalities in Washington State. Acc. Anal. Prevent. 34, 13–18. Patterson, T.L., Frith, W.J., Povey, L.J., Keall, M.D., 2002. The effects of increasing rural interstate speed limits in the United States. Traffic Injury Prevent. 3 (4), 316–320. Pilkington, P., Kinra, S., 2005. Effectiveness of speed cameras in preventing road traffic collisions and related casualties: systematic review. BMJ 330 (7487), 331–334. Powell, J., 1986. Censored regression quantiles. J. Econ. 32, 143–155. Quimby, A., Maycock, G., Palmer, C., Buttress, S., 1999. The Factors that Influence a Driver’s Choice of Speed—A Questionnaire Study. TRL Report 325. Transport Research Laboratory. R Development Core Team, 2005. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Stradling, S.G., Campbell M., Allan I.A., Gorell R.S.J., Hill J.P., Winter M.G., Hope S., “The Speeding Driver: Who, How And Why?”, Development Department Research Findings 170/2003, available on http://www.scotland.gov.uk/Publications/2003/08/17981/25173. Stradling, S.G., Meadows, M.L., 2002. Characteristics of speeding, violating and thrill-seeking drivers. In: Rothengatter, J., Hugenin, R. (Eds.), Traffic and Transport Psychology. Pergamon, Oxford. Taylor, M., Baruya, A., Kennedy, J., 2002. The Relationship Between Speed and Accidents on Rural Single-Carriageway Roads. TRL Report TRL511. TRL. Taylor, M., Lynam, D., Baruya, A., 2000. The Effect of Drivers’ Speeds on the Frequency of Road Accidents. TRL Report 421. TRL. Vernon, D., Cooke, L.J., Peterson, K.J., Dean, J.M., 2004. Effect of the repeal of the national maximum speed limit law on occurence of crashes, injury crashes, and fatal crashes on Utah highways. Acc. Anal. Prevent. 36, 223–229. Wilson C, Willis C, Hendrikz J.K., Bellamy N., 2006. Speed enforcement detection devices for preventing road traffic injuries. Cochrane Database of Systematic Reviews, Issue 2. Art. No.: CD004607, doi:10.1002/14651858.CD004607.pub2. Wood, S.N., 2006. Generalized Addtive Models: An Introduction with R. Chapmand and Hall. Yu, K., Lu, Z., Stander, J., 2003. Quantile regression: applications and current research areas. J. R. Stat. Soc. D 52, 331–350.