Evaluation and Program Planning, Vol. 11, pp. 255-266, 1988 Printed in the USA. All rights reserved.
Copyright
0
0149-7189/88 $3.00 + .OO 1988 Pergamon Press plc
METHODOLOGIES FOR THE EVALUATION OF LOCAL TRAFFIC SAFETY PROGRAMS With an Application
to New Jersey DWI Programs
DAVID LEVY Federal
Trade
Commission
ABSTRACT Three methodological techniques are used to explore the impact of DWI (driving while impaired) programming in the State of New Jersey. The programs consisted of a communitywide Task Force whose membership was drawn from local police agencies, alcohol treatment facilities, and public schools, an intensified public education and information campaign, and the use of roadside sobriety checkpoints. The first technique used cross-sectional data from 21 counties to examine fatalities and accidents. The second technique used pooled crosssectional county data in a covariance model to examine program effects over time. The third technique relied upon intervention analysis, a form of time series, to ascertain the effects of the programs in one county.
INTRODUCTION types of studies were conducted. The first study used cross-sectional data (for the 21 New Jersey counties) on a year-by-year basis to explain fatalities or accidents. Multivariate analysis was employed to distinguish demographic effects and other county factors from those of the intervention. The second study adopted a covariance model that pooled cross-sectional (county level) data over years to examine the initial and delayed effects of the programs. Simple controls for variation over counties and years were employed. Finally, a more sophisticated form of time series analysis, intervention analysis (with Box-Jenkins models), was adopted. Monthly data for one of the counties which actively participated in the three major programs were examined. The emphasis of this paper will be the application, i.e., the strengths and limitations, of the methodologies
In recent years, evidence of program effectiveness has become important to formulating traffic safety policy. Many empirical studies have attempted to evaluate program effectiveness (see, e.g., the studies of changes in the minimum legal drinking age laws (U.S. GAO, 1987). Typically, they examine the effect of an intervention (i.e., the policy) on some outcome variable, such as traffic fatalities. This paper will discuss a number of statistical methodologies that may be used to evaluate traffic safety programs, particularly those affecting alcohol-related problems. An evaluation study of local New Jersey Drinking While Intoxicated (DWI) programs will be used as the illustrative case.’ The case study assesses the effects of the three separate programs in reducing fatality and accident rates in participating New Jersey counties. Three different
This paper is based on a study done for the National Highway Traffic Safety Administration and the New Jersey Office of Highway Safety, while the author was at Rutgers University. The original study was done with Peter As& and Dennis Shea, but the author bears complete reaponsibility for the contents of this paper. Requests for reprints should be sent to David Levy, PhD, 4545 Connecticut Avenue, N.W., Apt. 1011, Washington, DC 20008. ‘The original study was done jointly with Peter Asch and Dennis Shea and was funded by the National Highway Traffic Safety .4dministration and the New Jersey Office of Highway Safety.
255
DAVID
256
employed. In focusing on the methodologies themselves, the statistical results of each study will be discussed only briefly, without detailed presentation or
DESCRIPTION
OF PROGRAMS
programs such as SOBER (“Stay Off 1. Informational the Bottle, Enjoy the Road”) and DWI (“Driving While Intoxicated”) Task Force attempt to improve highway safety by means of pamphlets, public service announcements, seminars for students and other consciousness-raising efforts about the dangers of drinking and driving. 2. Preventive and punitive programs such as the Strike Force provide overtime funding to police departments on a county-wide basis to operate roadside sobriety checkpoints to stop vehicles and check for intoxicated drivers. These programs were implemented at various between 1980 and 1986. The SOBER programs been in effect for the longest period of time and been implemented in most counties. The DWI
times have have Task
TABLE 1 PROGRAM START DATES
Atlantic Bergen Burltngton Camden Cape May Cumberland Essex Gloucester Hudson Hunterdon Mercer Middlesex Monmouth Morris Ocean Passaic Salem Somerset Sussex Union Warren
10182 10182 10184 10182 10184 1 O/82 10182 10182 10181 10182 10183 10180 1 Oi80 7 0181 10184 10183 10184 10182 10182 10184
DWliTF
Strike Force
10185
i 183
5183
I 185 9116184 2186 5186 3186 3184 6184 1183
5185
5i82
a/a4 11184 5184 8034
I!86 1i83
3184
5185 8183 1185 1 i85
explanation.2 Before discussing the methodologies, we begin with a description of the programs and of the outcome and intervention measures used in the study,
AND THE OUTCOME
New Jersey, like many other states in recent years, has attempted to reduce the number of alcohol-related fatalities and accidents on its roads through a variety of programs. These programs were implemented at the county level with supervision and funding from the state and federal government. In New Jersey, two types have been implemented. Of programs
S.0.B.E.R
LEVY
AND INTERVENTION
MEASURES
Force and Strike Force Programs are more recent and have not been as widely implemented. The starting dates of the three programs for each New Jersey county are reported in Table 1. While there may be other targets of these programs, such as more general problems associated with alcoholism, the traffic safety programs were evaluated in terms that we considered to be their primary concern, the number of lives saved or accidents avoided. (Other measures are less commonly used for evaluating alcohol and traffic safety programs, such as the number of convictions or the percentage of drivers intoxicated (based on random surveys of drivers on the roadways).) The measurement of accidents and fatalities raises a number of issues regarding the completeness of the measure and the ability to detect alcohol-related behavior. One outcome measure is an “alcohol-related” accident or fatality. In New Jersey, it is defined by the reporting police officer, and represents a subjective judgment by that officer. A test of blood alcohol content has not necessarily been administered. The problems inherent in such measures are well known, particularly the arbitrariness of the definition (see, for example, Levy, Voos, Johnson, & Klein, 1978). (Another measure considered was alcohol-related accidents based on blood alcohol content. However, because not all fatal victims are tested and those tested are likely to be systematically chosen, a problem of sample selection bias is likely to arise.) In fact, the definition may vary across counties and may change in response to community outcry over alcohol-related accidents or as part of the program implementation process. A second type of dependent variable is the “singlevehicle-nighttime” measure, an oft-used surrogate for alcohol-involved fatalities and accidents (see, e.g., Levy et al. 1978; Wagenaar, 1983). (Night is defined as the time between 6 p.m. and 5:59 a.m.) The major virtue of this type of measure is its objectivitythere is virtually no judgment involved in classifying a single-vehiclenighttime event. The drawback of the measure lies in its broadness. Although a preponderance (approximately 70%) of single-vehicle-nighttime accidents and fatalities are believed to be alcohol-related, the category includes some events that have nothing to do with alcohol consumption and excludes other events that are alcoholrelated. _---
257
Local Traffic Safety Programs The total fatality and total accident measures are the most inclusive of the outcome measures. They include all the individuals who are the cause and the victims of the accidents. Because all occurrences of either fatalities or accidents (including non-alcohol-related and pedestrian) are included in the measure, they are most relevant for gauging overall effectiveness from a policy point of view (see, e.g., Cook & Tauchen, 1984). (From a policy perspective, it might be desirable to further distinguish between effects on pedestrians and occupants of motor vehicles or between different types of vehicles and trucks).) However, be(e.g., cars, motorcycles, cause the purpose of this particular evaluation was to examine the effect of programs targetted at alcoholrelated problems, the ability of this measure to detect effects of the programs is not as great as more directly alcohol-related measures. To deal with potential problems in the different outcome measures available, we employed a number of measures as dependent variables in the estimation equations reported below, including: 1. 2. 3. 4. 5. 6.
“Alcohol-related” fatalities “Alcohol-related” accidents Single-vehicle-nighttime fatalities Single-vehicle-nighttime accidents Total fatalities Total accidents
On balance, the objectivity and close relationship with alcohol involvement of the single-vehicle-nighttime measures makes them preferable to the other measures which were considered for gauging alcohol-related fatalities and accidents. Additionally, they have been widely employed in the empirical literature on alcohol-traffic safety relationships (making findings comparable across studies), and their use, although subject to known limitations, is relatively noncontroversial. For this reason
CROSS SECTIONAL
MULTIVARIATE
The first empirical approach, multivariate regression analysis, attempts to explain outcomes by specific causal factors, including the intervention measures. Thereby, this methodology attempts to distinguish program effects from other distinct determinants of alcohol and traffic safety. Regression equations were estimated for different years with observations for each of New Jersey’s counties. Fatalities or accidents within a county were postulated to be a function of population, county characteristics, and DWI programs, that is,
OUTCOME,
= bo + E br,;X;,, ,=I
+ i j=l
b>,jP,,,
and because of limited ability to statistically discriminate when using other measures, many of the conclusions drawn from the study were based on the single-vehicle nighttime fatality and accident measures. Specifically, the equations using other fatality and accident measures had less explanatory power in general and tended to yield lower t-statistics, especially for the intervention variables. The fatality and accident data were obtained from the New Jersey Office of Highway Safety and are also available from the National Highway Traffic Safety Administration, National Center for Statistics and Analysis, U.S. Department of Transportation. Since some of the counties participated in the programs at certain points and others did not, the intervention could be measured by the presence of the program in a particular county at a particular point in time. A simple dummy variable, also known as an indicator variable, was used for that purpose. Different variables were constructed for each of the three programs. The exact formulation of these variables is discussed below. Dummy variable measures of programs do not distinguish differences in the intensity of commitment to the various programs in different counties and at different points in time; if one county actively participates, while another does so casually, both are included as implicitly equal participants. Consequently, we experimented with a variety of intensity measures, including the number of individuals participating, staffhours contributed, dollars spent and arrests or convictions by police (in the case of the strike force program). However, use of some of these measures did not yield appreciably different results. In addition, we had difficulties in getting consistent measures across counties and time. Further, each of these measures raises questions on the weighting given to the enforcement inputs, for example, staffhour versus pure monetary expenditures. For these reasons, we relied primarily on the simple dummy variables.
+ e,
ANALYSIS
where n designates the county, i stands for the ith demographic variable, j for the jlh program, OUTCOME for the fatality or accident rate, X for the demographic variables or county characteristics of interest, and P for the DWI program. The dependent variables have been discussed above. To control for differences in population across counties, accidents and fatalities were measured as rates, standardizing by population in the county. (More desirable would be to deflate by vehicle miles travelled or by licensed drivers, but these data were not available at a county level.) The independent variables we employed are those commonly identified in the traffic-safety literature as important determinants of alcohol-involved
258
DAVID LEVY
accidents (see, e.g., Asch & Levy, 1987). They included popuIation density, computed as county population as reported in the 1980 Census divided by county square miles; age distribution, the median age of county residents in 1980; mean income, the average income of county households in 1980; sex distribution, the ratio of males per 100 females in the county in 1980; per capita sales in drinking and eating establishments, the nominal dollar sales of licensed restaurants, liquor stores, etc. in the county in 1983;’ and, where fatalities were used, hospital density, the number of hospitals in the county in 1980 divided by county square miles. These variables were generally reported in the City and County Data Book and Censrrs ofPopula~i~ii for counties in New Jersey. Many of these variables were only available for 1980, the Census year, and that limited our ability to combine equations for different years. In evaluating a particular model, the sign of the variabies is important. The sign of the first three variables-population density, age distribution, and mean income-was expected to be negative (i.e., higher values will imply lower accident and fatality rates), since rural, young, poor populations tend to more frequently drink and drive. The signs on the variables, sex distribution and sales per capita, should be positive: females are known to drink and drive less than males; higher sales of alcoholic beverages may imply a higher incidence of drunk driving, thus higher fatality and accident rates. Ready availability of medical facilities should increase the probability of survival for a given accident experience; thus we anticipate that the hospital density measure will exert a negative effect on fatalities. (As noted above, the hospital density variable is expected to affect fatality, but not accident, rates.) Other county variables that were employed in some equations but dropped because they were not significant include dummy variables indicating counties that border on New York, Pennsylvania, or Delaware;’ the percentage of county residents who were black or Hispanic; the percentage of residents below the poverty levef; the percentage of residents who drove to work; the median education level of residents over age 25; and the number of alcohol outlets per capita. The DWI programs were introduced into the equations in the form of dummy variables, where: Thk me;~surc has been used in other studies as a wbstiture for actual sale> by liquor stores and drinking establishments (bars, gin joints, speakeask, etc.) which is available only for counties with over 500 cstahlishmcnt\. in New Jersey, these data were unavailable I’OI Salem County. C’onsequently, equations were estimated without Salem County using the actual sales of’ liquor in the other counties. The results differed very little from those discussed below. ‘These variahleb were to account for intcrstnte travel, sii~e these border states do not have the same Il. W. I program\ as New Jer \ey.
c,,,, = 1 if the nrh county participated program in the year examined,’ = 0 if the county
in the j’/r
did not participate
The equations were estimated in both logarithmic and linear form and the error terms were checked (using a GIejser test) for differing variances due to uneven populations across counties (i.e., heteroscedasticity). To deal specifically with this problem, equations were also estimated using a log odds ratio model. (The log odds ratio model also corrects for the truncation of the fatality rate measures at zero and one. For further discussion, see Maddala (1984), who shows that the proper distributional assumptions are made for the dependent variable in this formulation.) The equations were estimated for the two years- 1983 and 1984-during which most of the programs were in effect. The single vehicle fatality and accident measures performed best in the equations. The signs on the demographic variables were generally as expected (the sign on per-capita sales in drinking and eating tstablishments, and the coefficients on the age variable were not as predicted but were not statistically significant). Counties displaying fow income, rural, heavily mafe populations with high per capita sales in drinking and eating establishments, had generally higher rate5 of fatalities and accidents. The program variables were sometimes positive and sometimes negative, but were not significant in any equation estimated. Similar results were obtained using log and linear forms of the model and using different measures of accident and fatality rates. The lack of significant program effects may mean that the alcohol traffic safety countermeasure programs were ineffective or that the statistical model was not well enough specified to statistically discriminate among variable influences. Failure to discriminate would not be entirely surprising. There arc many factors that might explain cross sectional variation in the outcome measures which are not included in the model. For example, some townships and counties were developing independent programs of their own. Omission of these and other variables is likely to bias the coefficients of the included variables. In addition, as noted above, the dummy variables did not reflect the “strength” or “magnitude” of the county DWI programs. Thus, a county could be actively invofved in combatting DWI through some other program, yet not show up as a participant in the dummy variables. In addition, the point-in-time regressions cannot reflect
Local
the trend of fatalities and crashes within counties. A county might succeed in slowing the growth of accidents through the use of DWI programs over a period of years, but this type of “success” might not be apparent in cross sectional analyses. Finally, efforts to mea-
POOLED
CROSS
SECTION
Given the lack of success in attempting to isolate the intervention effects using a cross sectional analysis, alternative methods were considered that would incorporate changes over time across the different counties. One method used by Williams, Zador, Harris, and Karpf (1983) to examine the effects of changes in the minimum legal drinking age was to estimate the change in fatalities when the new program was implemented relative to a control county which did not implement the program during the same period. However, it was felt that there was an absence of adequate control counties and a lack of observations to conduct meaningful statisticat tests for more than a few counties. A preferable alternative was a covariance model that combined county data for different years, and controls for county-specific and time specific effects (see, e.g., Cook & Tauchen, 1984).6 Covariance analysis is a form of regression analysis. The model was estimated for the 21 New Jersey counties for the years 1979-1985. Even though none of the programs began until 1980, it was desirable to allow sufficient time before the programs were implemented to distinguish county effects from program effects. The model was of the form:
j=l
c=l
259
Traffic Safety Programs
k=l
where P, stands for a set of program dummy variables to be described below; C; for a set of county dummy variables (= 1 for the i’h county, = 0 otherwise); and Tk for a set of time dummy variables (= 1 for the /cth year, I= 0 otherwise). OUTCOME,, is a measure of the fatality or accident rate in county i and year i. The outcome variables are the fatality and accident measures divided by the county’s population in a particular year to transform them into per capita terms. In this model, the county 6An error components model was also estimated and yielded similar results. This type of model is more efficient (i.e., loses fewer degrees of freedom and makes better use of information), but relies on the validation of an important statistical assumption (regarding noncorrelation of the error terms and interventions), which is dubious for this model.
sure program effects more precisely were inhibited by limited degrees of freedom in the cross section analysis, due to the limited number of observations (one per county). (When other variables were added, the adjusted R-squared fell.)
TIME
SERIES
ANALYSIS
dummy variables are included to account for factors specific to the individual counties (factors which do not change or change slowly over time), such as the demographic variables discussed above; differences in enforcement efforts (other than for the programs under investigation); and any other county-specific idiosyncrasies. The time dummy variables are included to account for factors specific to each year (i.e., factors that are common to all counties, but change over time), such as the change in the State’s minimum legal drinking age, and statewide enforcement efforts for other than the DWI programs. We considered including the demographic variables from the cross sectional analysis in the previous section (e.g., mean income, population density, and sales per capita), but most of these variables are Census data and were only available for one of the sample years. In any event, these variables were unlikely to change much over the sample period, and, therefore, should be captured in the county dummy variables. The intervention measures were our central concern since they indicate the effect of the programs. The program variables should reflect the impact of each of the different DWI programs after controlling for the various factors noted above. The variables indicate the presence or absence of each program in each county and year. The value of each variable depends on whether that program was in effect in a given county and year. If not, the variable takes a value of zero. If so, the value of the variable is defined by the weighting system described below. In the years in which the programs were implemented, each program variable is multiplied by m/12, where m is the number of months in which the particular program was in effect. A subject of interest is the pattern of influence of the programs on the fatality and accident rates. One possibility is that the programs would have a significant impact in their first year of operation, but then show declining effects over time. Alternatively, it was possible that the main impact of the program occurred only after a “start-up” period and a realization by drivers of the new efforts at DWI enforcement. To examine for different effects over time, we first added dummy variables to the equation for first year effects. (For measurement purposes, the first year of a program is considered to be the first year in which the program
DAVIDLEVY
260
was in effect for more than six months. This yielded a high degree of multicollinearity and generally implausible results. The resufts for the regular program variables tended to remain the same sign, while the results for the first year impact variables displayed various signs, but were in all cases insignificant. The intervention variables were generally insignificant. Consequently, the first year dummy variables were dropped. However, the results from the analysis along with an examination of the results from the time series intervention analysis described below and an analysis of goodness of fit were used to distinguish a series of weights for the number of years in which the program was in existence. (The goodness of fit analysis involved experimenting with different weights and seeing which provided the best fit in terms of the explanatory power of the entire equation and of the intervention effects individually.) The weights that were found to provide the best fit for the intervention variables were the following:
First Year Second Year Third Year
SOBER ~ 1.0 0.5 0.25
DWITF ~ 0.5 1.0 0.5
Strike Force 0.5 I.0 0.75
These weights reflect what appear to be distinctive impact patterns. Specifically, the weights on the SOBER variable reflect an abrupt first-year effect that declines sharply thereafter. In contrast, the Strike Force Program effect was gradual, increasing and remaining relativefy stable after the first year; whereas the DWI Task Force effect occurred gradually but declined more sharply after the second year. The equations were estimated for each of the outcome variables discussed in the Introduction above. In every instance, the dependent accident or fatality variable was in logarithmic form,’ and equations were corrected for heteroscedasticity.8 In addition to the independent variables described above, lagged values of the dependent accident and fatality measures were introduced into some equations {the fags are one year). These measures were intended to correct for any statistical problems associated with autocorrelation patterns within counties. Because the coefficient on the lagged dependent variable generally was insignificant in the ‘Equations estimated in linear form yielded similar results. However, given the truncation of the dependent variable at zero and one, the assumption of log normality is more appropriate. ‘l.c., for the influence of varied population sires among the countie,. To correct for this problem, the method adopted by Cook and Taurhen (1984)was used.
accident equation, the equations were also reestimated without this variable.” For equations explaining total fatality and accident rates, the effects of SOBER were sometimes of the wrong sign and generally not significant. A significant negative effect (as expected) was observed for the DWI Task Force variable on total accidents, and a nearly significant effect for the Strike Force program variable on total fatalities was observed, Equations estimated with the measures of alcohofrelated accidents and fatality rates also yielded limited results. Intervention effects were generally insignificant and sometimes of the sign opposite that predicted. Inspection of the data indicated very little variation for the different counties during the time period examined, which probably explains the poor results. Of potentially greater interest are the safety effects of the DWI programs on the alcohol-related and objective single-vehicle-nighttime events. While the effects of each of the programs had the predicted negative effect in just about all cases, none of the programs had a significant impact on single-vehicle-nighttime fatality rates. Yet in every instance, one or two programs exerted a significant effect on the single-vehicle-nighttime accident rate. The effects of the SOBER variable were generally insignificant (although were very close to significant in some cases); but the Strike Force program had a significant accident-reducing effect in every instance. The analysis of total fatality and accident rates, the alcohol-related fatality and accident rates and the single-vehicle-nighttime fatality rates did not generally provide statistically significant results. This appeared to be due to the small number of fatalities at the county level and the high degree of randomness over the time period examined. However, the results from the singfevehicle-nighttime accident equations indicated significant effects for the Strike Force programs. The coefficient of this variabie indicated a reduction in singfe-vehicfenighttime accidents of between 10 and 15% when the program reached its peak effectiveness. The results indicated somewhat weaker (both statistically and in terms of magnitude) effects for the DWI Task Force program, in the ranges of 5-10%. Generally, the results revealed that the DWI Task Force and Strike Force programs had a significant effect in reducing single-vehicle accident rates. The exact pattern of influence, however, remains problematic. A model of one county which employed all three programs might better reveal the patterns and relationships between the programs and fatality and accident rates. _ “Significance is at the five percent level (using two-tailed tests) in the di
Local Traffic Safety Programs
261
TIME SERIES ANALYSIS OF BERGEN COUNTY The third methodology adopted was a time series analysis, which focused on the pattern of effects of the different programs. The dependent variable in this analysis was a series of monthly observations. To incorporate trends from the DWI programs and other factors, the analysis follows the procedure outlined by Box and Tiao (1975) for analyzing interventions in a time series quasi-experiment. (See Cook & Campbell (1983) or Glass (1968) for a discussion of intervention models.) This methodology separates the effects of the interventions, that is, the DWI programs, from other trends in the series, including seasonal patterns. Corrections have also been made for autocorrelation in the series (which violates the assumption of independence of error terms in the standard model). The mathematical form of the model is spelled out in Appendix A. The estimation procedure consisted of three steps: 1. Identification of a parsimonious Autoregressive Integrated Moving Average (ARIMA) model for the dependent variable and the form of the relationship between the dependent and the intervention variable(s). 2. Estimation of the parameters of the model by means of iterated nonlinear least squares by Marquardt’s method. 3. Diagnostic checking of the model and the residuals with respect to its parsimony, ability to account for the pattern of the original data, and white noise properties of the error term. If the model is subsequently found to be inadequate by the diagnostic check, a new form is specified and estimated until a satisfactory model is found. In specifying an ARIMA process within the intervention model care must be taken since the interruption of the time series by the intervention may distort the autocorrelation function (acf) and partial autocorrelation function (pacf) which are used to identify the ARIMA model. This case proved to be no exception. The act of singlevehicle nighttime accidents for the entire time period indicated all the usual features of a nonstationary series, and also appeared to be autoregressive and seasonal. However, a look at the raw data revealed that the nonstationarity could be an effect of the intervention. After May 1983, only one month of the next 20 has an accident rate above the period mean. Similarly, 23 out of 24 months after January 1983, and 25 out of 27 months after October 1982 have accident rates below the mean. One method to deal with this problem is to first specify a tentative form of the intervention model, then to specify the ARIMA model using the preintervention data and/or the residuals from the estimation
of the model including the intervention. It is not required that the tentative intervention model be correct a priori. Analyzing the residuals after estimating the complete model should reveal any inadequacies. Four standard forms were considered for estimating the shape of the program intervention: 1. 2. 3. 4.
Abrupt, Gradual, Abrupt, Gradual,
permanent permanent temporary temporary
intervention effects intervention effects intervention effects intervention effects
The simple “abrupt, permanent” effect model implies that the effect is instantaneous and stays constant over time. The “gradual start, permanent duration” model implies an effect which increases over time and then stays constant at a particular level. The “abrupt, temporary” effect model implies an instantaneous effect which gradually tapers off. Finally, the “gradual start, temporary” duration model would involve an increasing effect followed by a declining effect. The mathematical formulation of these models is presented in Appendix A. The choice of the intervention and of the final model is influenced by a number of considerations that include: a priori notions (i.e., the importance of delay before drivers become aware of the programs, or knowledge that resources applied to the programs tend to increase over time) and patterns suggested by previous investigation (e.g., examination of the raw accident and fatality data, and the results of the covariance analysis above). Correlations among the independent and dependent variables. Significance of estimated effects, measured roughly by their t-statistics The correlation functions of the error terms (reflected in the Q-statistic, Schwartz Bayesian Criterion, and Akaike’s Information Criterion- each providing a measure of the white noise properties of the error terms). After estimating the model, the coefficients were examined to determine whether they were, in fact, due to some other simultaneously occurring event (e.g., general awareness of the public unrelated to implementation of the program). This determination was approached by employing a control county that had not adopted the program over the relevant time period (see, e.g., Wagenaar, 1983). After estimating an ARIMA model for the control county, the intervention effect
262
DAVID LEVY
for the original county was inserted into the control county equations to check whether the same model explained accident and fatality experience in counties that did not adopt the program. If the error terms of the two equations were independent, the estimated parameters divided by the common residual variance would have a f-distribution, so that standard t-tests of intervention effects could be conducted. The time series analysis focused on a single county, Bergen County. Bergen County implemented a SOBER program in October 1982, a DWK Task Force program in January 1983, and a Strike Force program in May 1983. The Bergen County programs were of special interest because they acted as pilot programs for other counties in New Jersey. In estimating intervention models, a prime difficulty is distinguishing the intervention effect from other trends in the outcome variable. This problem is compounded when there is more than one intervention. It was important to pick a time period of sufficient length to be able to distinguish the intervention effects from time trends. Approximately eighteen months after each intervention began were needed to capture the time profile of the intervention. Using monthly data, we examined the period from 1979 to 1985. A separate model was estimated for each dependent variable and for various combinations of the program interventions. The analysis of the alcohol-involved accident measure did not provide any interesting results, so the total and single-vehicle-nighttime accident and fatality measures were used instead.
Results Bivariate transfer functions were estimated for each of the models. The dependent variable was estimated in log form, in order to control for heteroskedasticity. The log form also provided higher t-statistics, lower Qstatistics and better convergence properties than the Iinear model. Upon examining the autocorrelation and partial autocorrelation functions for each outcome variable, a parsimonious ARIMA model (without interventions) was identified for each dependent variable as follows: Single-vehicle-nighttime accidents and total accidents( 1,O,O)( 1,0,0)i2; that is, an autoregressive model with an AR(l) term and an AR(12) term. Single-vehicle-nighttime fatality and total fatalities: these variables showed no violation of the assumption of independent residuals, and, thus, no autoregressive or moving-average terms were added. The intervention effects were first estimated using gradual start, permanent duration models, because it was felt to be the most plausible model on a priori grounds. Then other models were estimated based on inspection of the cross-correlation matrix. More complex variants of this simple model were
investigated, primarily by testing a “gradual start, temporary duration” intervention model. None of these models proved superior to the simpler model (Qstatistics were higher and t-statistics were lower), and were therefore rejected in the interests of parsimony. (For the gradual, permanent model, the impact in a given month is found by multiplying the numerator term by the denominator term raised to the power of month minus one. Thus, the first month the impact is just the numerator term. The twelfth month is just the numerator term multiplied by the denominator term raised to the thirteenth power. See Cook and Campbell (1983) for further discussion.)
Effects on Single-Vehicle-Nighttime Accidents Three models for the single-vehicle-nighttime accident rate were estimated. In each case, the AR(l) term became insignificant and was dropped as the interventions were included. Model (1) specified the Strike Force program variable as a gradual, permanent intervention, and the SOBER variable as an abrupt temporary intervention. Model (2) retained the Strike Force program variable but substituted the DWITF variable for the SOBER variable, also as an abrupt temporary intervention. Although the effect of the SOBER variable was slightly more significant, model (2) was marginally superior in terms of white-noise properties. Model (3) used all three interventions, specified in the simpler models. Model (3) yields larger values and lower significance levels for the Q statistic for longer lags of the error terms. It, nevertheless, appears to be an effective model for discerning program effects. Other models utilizing alternative specifications of the program interventions yielded substantially inferior results. With all three programs included, the denominator terms of the interventions declined, suggesting that the DWI Task Force and SOBER program variables should not be modelled as producing permanent effects. Table 2 interprets the results of the three models in terms of estimated percentage declines in single-vehiclenighttime accidents attributable to the DWI program interventions. These declines are in a sense lower-limit estimates that ignore the possibility of interactions among the program effects. Conceivably, the impact of two or three simultaneous programs will exceed the simple sum of the separate effect of each. The results indicate that, under Model (11, the SOBER program abruptly reduces accidents by about 24% in the first month; this effect slowly declines, so that after three years, the accident reduction effect is less than one percent. The Strike Force program shows steady growth from its inception in May 1983: it initially reduced accidents by about 4%, but shows an accident reduction effect of 29.5% two years later. interpretation of program effects estimated under
Local Traffic Safety Programs
263
TABLE 2 ESTIMATED PATTERNS OF DECLINE IN SINGLE-VEHICLE NIGHTTIME ACCIDENT RATES Model
Model
Model
(1)
(2)
(3)
Month
SOBER
STFRC
DWITF
STFRC
SOBER
DWITF
STFRC
10182 1 l/82 12182 1183 2183 3183 4183 5183 6183 7183 8183 9183 10183
-24% -21% -19.2% -17.2% -15.4% -13.7% -12.3% -11% -9.9% -8.8% -7.9% -7.1% -6.4%
-
-
-
-26.3% -16.6% -10.5% -6.6% -4.2% -2.7% -1.7% -1.1% . t
-
-
5184 10184
-1.7%
5185 1 O/85
-.4%
Limit
-4.3% -8% -11.1% -134% -16.1% -18.1%
-31.6% -20.5% -13.1% -8.5% -5.5% -3.6% -2.3% -1.5% -1 .O% l
-5.5% -9.2% -12.7% -15.5% -17.7%-19.6%
l
t *
-26.6% -17.1% -11% -7.1% -4.6% -2.9% -1.9% -1.2% . t
-6.4% -11.4% -15.4% -18.5% -20.9% -22.8%
-25.8%
-25.81%
-28.6%
-29.5%
-27.6%
-29.8%
-29.66%
-27.9%
-29.9%
models (2) and (3) follows similar patterns. Strike Force effects build gradually, resulting in very substantial accident reductions within one year. DWI Task Force and SOBER program effects appear substantial initially, but decline very rapidly over time. The results were generally consistent for the time series examination of Bergen County, which show peak accident reductions of about 25 to 35% for all programs. Generally, the results were quite strong and indicate significant program effects on single-vehicle-nighttime accidents. However, it is important to check if the intervention effects are due to some other simultaneously occurring event. The intervention effects from the Bergen County model were inserted into a model for the control county, as described above. Because most counties implemented a DWI Task Force or SOBER program before June 1985, we focused on the Strike Force program variable. The ideal control county then would be one which does not share a border with Bergen County, has similar county demographics to Bergen County, did not implement a strike force program during the time period examined, and has an accident rate which is not significantly correlated with that of Bergen County. Unfortunately, no such county exists in New Jersey. Consequently, counties were selected as controls that met two of the criteria. The models were applied to five control counties (Essex, Hudson, Passaic, Somerset, and Union). Models (1) and (2) yielded similar results to those for model (3) when applied to the control counties. The results
generally failed to indicate significant effects for the Strike Force and SOBER interventions, indicating that the Bergen effects were not from simultaneously occurring events. The DWI Task Force effects, however, were generally significant, but that is likely to be due to the overlap of DWI Task Force programs in many of the control counties (e.g., Passaic implemented DWI Task Force in January 1983 and Somerset implemented DWI Task Force in May 1985). The proximity of Passaic and Essex counties to Bergen county might have also caused spillover effects from the different programs in Bergen County into neighboring counties (e.g., Passaic and Essex counties). (The error terms of the control models were slightly correlated with those of Bergen county, which also may indicate spillover effects.) T-tests continued to indicate significant intervention effects of similar magnitude. Effects on Single-Vehicle-Nighttime Fatalities The three models were applied to single-vehicle-nighttime fatalities. Model (1) specified the Strike Force variable as an abrupt permanent intervention. Model (2) did the same for DWI Task Force, and Model (3) for SOBER. The results indicated that each program individually has a significant fatality-reducing effect; but if more than one intervention is included in the estimation equation, none was significant. These results are less interesting than those for single-vehicle-nighttime accidents, but they shed some light. They show that single-vehicle-nighttime fatalities have declined as
DAVID LEVY
264
DWI programs have been introduced, but they do not distinguish a defined proportion of the declines to specific programs. The results for control counties, however, indicated that effects for each of the interventions were significant in some of the cases, especially the DWI Task Force program, casting some doubt on the robustness of the results.
SUMMARY
Effects on Total Accidents and Total Fatalities The DWI programs yielded few significant effects on total accident and fatality rates. Our suspicion was that these rates are simply too broad-based to reflect the impact of programs designed to affect only alcoholrelated events.
AND CONCLUSIONS
Three different methodologies were applied to examining the effects of DWI programs in New Jersey. They had mixed success. Least successful was the cross sectional analysis by year. The primary drawbacks appear to be a very small number of observations, and, perhaps, our failure to incorporate relevant factors and to control adequately for differences in programs across counties. The covariance analysis was more successful, particularly in explaining single-vehicle-nighttime accidents. This methodology examined traffic safety countermeasures across counties and over time, and provided an overall picture of their effects. One drawback of this analysis is that the beneficial effects of the programs in one county may spill over to other counties, especially those which are contiguous. This would act to underestimate the true effects of the programs. The final methodology was the time series analysis, which was applied to Bergen county. The methodology is limited by its ability to distinguish the effects of different programs within a particular county, and again its inability to distinguish spillover effects in neighboring counties. With these limitations in mind, the application of the three methodologies to the evaluation of DWI programs did provide interesting results. The broadest conclusion to emerge is that the programs adopted to reduce alcohol-related traffic accidents and fatalities have had beneficial effects in New Jersey. For all three methodologies, collinearities among the three major programs, all of which were introduced in many counties during a relatively brief period, complicates assessments of their individual effects on the traffic safety The analyses indicate that New Jersey’s experience. DWI programs have had a major impact on singlevehicle-nighttime accidents, a proxy for alcohol-related accidents. Similar, although somewhat less consistent effects are observed for single-vehicle-nighttime fatalities. The magnitude of the results for particular counties may be understated due to spillover effects, which both underestimated effects for programs within the counties and fail to take into account the benefits to neighboring counties. While we were unable to find significant results for the total fatality and accident measures, this may be due to spillover effects or an inability
of these measures to discriminate among program effects. The time pattern of effects varies sharply among the programs. Both DWI Task Force and SOBER appear to produce accident reductions only temporarily. It may be that these programs should be implemented on a sporadic rather than a continuing basis. In contrast, the Strike Force program appears to gain impact over time; when “mature,” its accident reducing effect is quite dramatic. Further analysis extending the time series would be useful in determining whether this impact continues. A cost-benefit analysis of the programs is the next step toward a complete program evaluation. Based on the above analyses, the Strike Force program, in particular, appears to provide very important traffic safety benefits. In evaluating the programs, account should also be taken of the costs associated with each activity. At the very least, the relevant comparisons of program costs and benefits would provide a basis for considering the magnitude of the “DWI program expenditure” that is necessary to (a) prevent an accident, or (b) save a life. Even without direct reference to “value of life” questions, such estimated magnitudes would indicate whether DWI programs are likely to be more effective than alternative public safety activities. Given the limited resources of public and private funds, manpower and public support, what is the best use of these resources? The usefulness of this kind of information to public policy officials hardly requires elaboration. In conclusion, further analysis along similar lines to this study might be conducted in future evaluations of local programs. The covariance and the time series methodologies are recommended as complementary approaches; the covariance methodology provides information more useful to gauging the average effects across counties and the time series analysis focuses on the time pattern of effects in a particular locale. Extensions of the three methodologies adopted above might also be considered. While the multivariate analysis was not particularly successful, it might provide more illuminating results if there were a longer time series and data on causal factors (i.e., the explanatory variables other than the intervention effects) over time. Multivar-
Local
265
Traffic Safety Programs
iate regression might be better applied over time to investigate changes in the outcome variables. This is likely to be true if explanatory variables omitted from the estimation equations change less over time than they do across local units. (See Graham and Garber (1984) for an example of a multivariate analysis across time.) Causal factors might also be added to the covariance analysis to explicitly incorporate factors other than the intervention variables. Finally, more sophisti-
cated time series techniques might be employed. The time series analysis might be conducted for other counties with equations estimated as a system of equations, or, with sufficient data for relevant explanatory variables, multivariate time series analysis (MARMA) might be employed to incorporate causal factors other than the interventions (see, e.g., Makridakis, Wheelwright, & McGee, 1983).
REFERENCES ASCH, P., & LEVY, D. (1987). Does the minimum drinking age affect traffic fatalities? The Journal of Policy Analysis and Management, 6 (Winter), 180-92.
BOX, G.E.P., & TIAO, G.C. (1975). Intervention applications to economic and environmental problems. American Statistical Association, 70(March), 70-79.
analysis
with
Journal of the
LEVY, P., VOAS, R., JOHNSON, P. & KLEIN, T. (1978). An evaluation of the Department of Transportation’s Alcohol Safety Action Projects. Journal of Safety Research, lO(Winter), 162-176. MADDALA, G.S. (1984). Limited dependent and qualitative variables in econometrics. Cambridge: Cambridge University Press.
MAKRIDAKIS. COOK, P.J., & TAUCHEN, G. (1984). The effects of the minimurn drinking age regulation on youthful auto fatalities, 1970-1977. Journal of Legal Studies, 13(January), 169-190. T. & CAMPBELL, D. (1983). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.
COOK,
GLASS, G. (1968). Analysis of data on the Connecticut speeding crackdown as a time series quasi-experiment. Law and Society Review, I(August), 55-76. GRAHAM, automobile
J., & GARBER, R. (1984). Evaluating the effects of safety legislation. Journal of Policy Analysis and Management, 3(Winter), 206-221.
S., WHEELWRIGHT,
S., & McGEE,
V. (1983).
Forecasting methods and applications. New York: Wiley. ACCOUNTING OFFICE. (1987, March). Drinking age laws, an evaluation synthesis of their impact on highway safety. Washington, DC: Author. U.S. GENERAL
WAGENAAR,
dents. Lexington:
A. (1983). Alcohol, young drivers and traffic acciLexington Books.
WILLJAMS, A., ZADOR, P. HARRIS, S. & KARPF, R. (1983). The rffect of raising the legal minimum drinking age on involvement in fatal crashes. Journal of Legal Studies, 12(January), 169-79.
APPENDIX
A
THE INTERVENTION
MODEL
In its most general form, the seasonal ARIMA model can be written as: Yt =
(1 - a, - a2BS2- . . . - aQBSQ)(l - A,B - AzB2 - . . . - A,Bq)u, + bO (1 - c,BS1 - c2BS2- . . . - cpBsp)(l - CIB’ - C2B2 - . . . - C,BP)(l - B”)o(l - B)d
where the notation is as follows: ;; 9: R: D: s: U:
bO: B:
the order of the AR process the degree of nonseasonal differencing the order of the MA process the order of the seasonal AR process the degree of seasonal differencing the seasonal span a white noise error term a constant term a backshift operator, i.e., BS, = X,_1
c,C: the seasonal and nonseasonal AR parameters respectively a,A: the seasonal and nonseasonal MA parameters respectively A general form of the intervention process can be written: (Wo - WEB- WEB’ - . . . - wjB’) Yt = (1 - v,B - v2B2 - . _ - v,B’) I’-,
266
DAVID
where w,v: B: m: I:
the manner in which the intervention affects the dependent variable is a backshift operator is a delay parameter is a step or pulse function representing the intervention
Although the form of these models appears quite complicated, the actual structure is often quite simple because the ARIMA process is usually limited to only
LEVY
a few periods. For example, the constant effect model contains just the wO term. The gradual start, permanent duration model is: Y, = ~ wB S,-Ill, 1 - vB where S,_, is just temporary duration Y, =
I,_,- I,_,,_, . The model is:
gradual
wB 1 - v,B - vzB2
S,G,,, .
start,