Analysis of the temporal properties in car accident time series

Analysis of the temporal properties in car accident time series

Physica A 387 (2008) 3299–3304 www.elsevier.com/locate/physa Analysis of the temporal properties in car accident time series Luciano Telesca ∗ , Mich...

416KB Sizes 3 Downloads 103 Views

Physica A 387 (2008) 3299–3304 www.elsevier.com/locate/physa

Analysis of the temporal properties in car accident time series Luciano Telesca ∗ , Michele Lovallo Istituto di Metodologie per l’Analisi Ambientale, CNR, C.da S.Loja, 85050 Tito (PZ), Italy Received 9 August 2007; received in revised form 18 October 2007 Available online 16 January 2008

Abstract In this paper we study the time-clustering behavior of sequences of car accidents, using data from a freely available database in the internet. The Allan Factor analysis, which is a well-suited method to investigate time-dynamical behaviors in point processes, has revealed that the car accident sequences are characterized by a general time-scaling behavior, with the presence of cyclic components. These results indicate that the time dynamics of the events are not Poissonian but long range correlated with periodicities ranging from 12 h to 1 year. c 2008 Elsevier B.V. All rights reserved.

Keywords: Car accidents; Allan factor

Traffic jam and car accidents have become an important problem in a modern society, because of many economic, social, cultural and resource management implications. A car accident can be defined as a sudden and unexpected event associated with the operation of a car, in which any person suffers death or serious injury, or in which the car receives substantial damage. Car accidents could lead to fatalities or just injuries or only car damages, and are the most important cause of mortality for adolescents and young adults worldwide [1]. In total, since motorisation was started there have been more than 30 million deaths attributed to accidents which is more than the number of all the deceased soldiers in the 1914–1918 and 1939–1945 world wars combined [2]. Recently, car accidents have raised the attention of physicists, who have investigated the dynamical behavior of the traffic flow, including car-following models, cellular automata models [3–5], gas-kinetic models, and hydrodynamic models [6–8], revealing how complex the physical phenomena underlying traffic flow are. To our knowledge, studies dealing with traffic flow and car accidents have only focused on the probability of the occurrence of accidents in several models of traffic flows and, in particular, on the topological structure of traffic systems, that would be mainly responsible for the car accident occurrences [9–14]. Correlations have been also investigated for car accidents with alcohol consumption patterns [15] or with respect to age [16]. A study that investigates the time structure of a car accident has still not been carried out. Thus, in the present paper car accident data, downloaded from an internet public database, have been analyzed in order to identify and describe their temporal fluctuations. The number of data is 1044 796, which is large enough to obtain significant results. The common belief that car accidents are independent events having no correlation at short as well as at long timescales seems to be quite widespread. This belief implies that car accidents would occur completely randomly in time, and a kind of “memory” in the sequence of accidents cannot be detected. As investigated in previous studies, the ∗ Corresponding author.

E-mail address: [email protected] (L. Telesca). c 2008 Elsevier B.V. All rights reserved. 0378-4371/$ - see front matter doi:10.1016/j.physa.2008.01.055

3300

L. Telesca, M. Lovallo / Physica A 387 (2008) 3299–3304

Fig. 1. Interevent–interval histogram of the car accident sequence from 1975 to 2004.

topological structure of traffic systems could be the main reason for the occurrence of car accidents. Therefore, the road network at a small as well as at large scale, with a more or less complex topology, with a relatively low or high number of intersections and nodes, could be considered as the basis for a sequence of car accidents to be correlated in time. Thus, the question is: how is it possible to reveal the presence of such time correlations in car accidents? A first measure, which allows to detect time-clustering in a point process is the coefficient of variation Cv = σt /µt , where σt and µt are respectively the standard deviation and the mean of the interevent time series. If Cv = 1 the process is Poissonian, while if Cv > 1 the process is clusterized. The examined data have Cv ∼ 7, which indicates that the car accident sequence is clusterized in time. Another quantitative measure of time-clustering is the interevent–interval histogram (IIH), which is an estimate of the probability density function of the interevent times. For Poissonian point processes, the IIH behaves as an exponential decreasing function of the interevent times; while for clusterized processes it behaves generally as a power law T (1+α) , where T is the interevent time and α the scaling exponent. Fig. 1 shows the IIH, plotted in log-log scales. The IIH shows approximately two scaling behaviors separated at the crossover of about 1 h interval time. The region of low interevent intervals is characterized by a slope of about 1.4, while that corresponding to high interevent intervals is about 4.6. The power-law behavior of the IIH indicates that the process is not Poissonian. The two regions correspond to low and high time-clustering respectively. Nevertheless, the Cv and the IIH do not furnish details about the timescales that are involved in the clustering phenomenon. Furthermore, they do not give no quantitative information about the strength of the correlation. Thus, the approach proposed in this paper is based on the analysis of second-order statistical characteristics of the process in order to derive information about the existence of time-correlation structures. Many processes, from medicine to earth and environmental sciences [17], are represented by stochastic point processes, where each event is mainly characterized by its occurrence time and a parameter describing its intensity [18]. Car accidents share with those processes the point-like time structure; in fact, they occur discretely in time and are characterized by a more or less high degree of severity, which could be given by the number of fatalities or injured and/or amount of structural car damage. The main point to focus is the behavior of such point process: if it is Poissonian, the car accidents occur completely randomly in time, and are independent events; if it is clusterized, the car accidents are correlated. A clusterized point process is characterized by a time-scaling behavior, quantified by a scaling exponent, which furnishes information on the strength of such correlation structure. A close to zero value of the scaling exponent is typical of Poissonian point processes [19]. Even though several internet freely available databases of car accidents exist, in this paper, we analyzed the dataset of car accidents available on the Fatality Analysis Reporting System (FARS) (http://www-fars.nhtsa.dot.gov). This database, whose time period ranging from 1975 to 2004 has been analyzed, contains information about car accidents within the United States. This database seems very complete and with a large number of records to perform significant statistical analyses.

L. Telesca, M. Lovallo / Physica A 387 (2008) 3299–3304

3301

Investigating the temporal properties of a time series implies determining and estimating parameters, able to capture its inner dynamics. The standard well-known method of the power spectral density (PSD) S( f ), obtained by means of a Fourier Transform (FT) of the series, can perform this investigation. The PSD gives information on how the power of the series is concentrated at various frequency bands [20], allowing the identification of periodic, multi-periodic or non-periodic frequency patterns. Usually the logarithmic power spectrum plot, that is the power spectrum plotted in log-log scales, is used to analyze broadband behavior. The power-law dependence (linear on a log-log plot) of the PSD, given by S( f ) ∼ f −β , is a hallmark of the presence of time-scaling in the data. The properties of the series can be further classified in terms of the numerical value of the spectral exponent β. β = 0 or β 6= 0 feature white noise or pink noise time series, thus indicating the absence or presence of time correlations respectively. In particular, a positive β indicates that the low-frequency fluctuations are more powerful than the high-frequency fluctuations, and this indicates that the series is characterized by the presence of long-range correlations. On the other hand, a negative β indicates that the high-frequency fluctuations are more powerful than the low-frequency fluctuations, and this indicates that the series is characterized by the presence of short-range correlations. A car accident sequence is a temporal point process, and the application of the FT is not possible. Instead, the Allan Factor method, described in the following paragraphs, allows the detection of time-clustering behavior, and, therefore, correlation structures in a point process. The series can be represented by a finite sum of Dirac’s delta functions centered on the occurrence times ti : y(t) =

N X

δ(t − ti )

(1)

i=1

where N represents the number of events recorded. Then dividing the time axis into equally spaced contiguous counting windows of duration τ , which is called timescale, we produce a sequence of counts {Nk (τ )}, with Nk (τ ) denoting the number of events in the kth window [21]: Z tk X n Nk (τ ) = δ(t − t j )dt. (2) tk−1 j=1

This sequence is a discrete-random process of non-negative integers. An important feature of this representation is that it preserves the correspondence between the discrete time axis of the counting process {Nk } and the “real” time axis of the underlying point process, and the correlation in the process {Nk } refers to correlation in the underlying point process [19]. Such a process may be called fractal when a number of relevant statistics exhibit scaling with related scaling exponents, that indicate the represented phenomenon contain clusters of points over a relatively large set of timescales [22,23]. We can define the following quantity, the Allan Factor (AF), AF(τ ) =

h(Nk+1 (τ ) − Nk (τ ))2 i , 2hNk (τ )i

which is related to the variability of successive counts [24]. If the point process is time-clusterized, then the AF varies with the timescale τ with a power-law form:  α τ AF(τ ) = 1 + τ1

(3)

(4)

over a large range of timescales τ , with the exponent α, which quantifies the strength of time-clusterization; τ1 is the fractal onset time and marks the lower limit for significant scaling behavior in the AF, so that for τ << τ1 the time-scaling property becomes negligible at these timescales [25]. If α = 0, the AF is flat for all the timescales, and the point process is Poissonian, which means that the series is memoryless and constituted by independent events; while if α > 0 the process is characterized by time-scaling behavior, which means that the series is time-correlated. Fig. 2 shows the result of the Allan Factor analysis performed over the whole car accident series, recorded in USA from 1975 to 2004. Several characteristics can be focused: (i) The sequence is not strictly Poissonian, because the AF curve is not flat for all the timescales investigated. (ii) It is very clear the periodicity of 12 h, indicated by the drop

3302

L. Telesca, M. Lovallo / Physica A 387 (2008) 3299–3304

Fig. 2. Allan Factor of the car accident sequence from 1975 to 2004.

Fig. 3. Allan Factor of the car accident sequences from 1975 to 2004: the red curve shows the AF for the whole sequence, while the black curves show the AF for the single states. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

in the AF curve, that could be put in relationship with the increase of car traffic in the morning and in the evening of a working day. (iii) After the 12 h cycle, the increase of the curve is interrupted by a very small plateau (indicating a purely random behavior of the series), up to the timescale of 24 h, after that the curves increases again. The 24 h timescale can be put in relationship with the sun daily cycle, that maybe can have some influence in the occurrence of car accidents. (iv) The 3.5 day periodicity is quite interesting, and can be connected with the week-end traffic increase. (v) the 1 week periodicity is quite reasonable, due to the weekly frequency of the car traffic. (vi) The 10 day periodicity maybe is just a linear superposition of the weekly and mid-weekly cycles. (vii) After a small plateau, the AF curve increases as a power-law with scaling exponent of about 1.77, which indicates a rather high degree of clusterization of the car accidents. The power-law behavior starts from the timescale of approximately 13.8 days; this timescale might be put in relation with the oscillation in the short period range of 13.5 day period, observed in several solar parameters, like solar wind, solar emissions, sunspot area, etc. [26]. Correlations of car accidents with cosmic

L. Telesca, M. Lovallo / Physica A 387 (2008) 3299–3304

3303

Fig. 4. Allan Factor of the annual car accident sequences from 1975 to 2004.

or geophysical activity, or with sun spot frequencies, have been also postulated [27]. The scaling behavior involving the timescales from about 2 weeks up to less than 6 months, indicates that the process is positively correlated. (viii) Regarding the 6 month and 1 year periodicities seasonal and meteo-climatic factors can be taken into account to explain their presence. Fig. 3 shows the comparison between the AF for the whole sequence (red curve) and the AFs of the single states (black curves). Most of the single state AF curves share the same features with the AF curve of the whole sequence, in terms of periodicities and correlations. Fig. 4 shows the annual AF curves. The curves are very similar, with a slight offset due to the different number of events in each year. The good similarity of the behavior of the annual curves indicates that the process is quite stationary. We have performed a detailed analysis of the temporal properties of the car accident sequence that was recorded on a freely available website. The sequence is not independent, uncorrelated and memoryless, but is characterized by correlation structure with the superimposition of periodicities ranging from 12 h to 1 year. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

W. Odero, P. Garner, A. Zwi, Trop. Med. Int. Health 2 (1997) 445. P. Wells, J. Cleaner Production 15 (2007) 1116. K. Nagel, M. Schreckenberg, J. Phys. I 2 (1992) 2221. M. Fukui, Y. Ishibashi, J. Phys. Soc. Jpn. 65 (1996) 1868. X. Yang, W. Zhang, K. Qiu, Y. Zhao, Phys. Rev. E 73 (2006) 016126. D. Chowdhury, L. Santen, A. Schadschneider, Phys. Rep. 329 (2000) 199. D. Helbing, Rev. Modern Phys. 73 (2001) 1067. T. Nagatani, Rep. Prog. Phys. 65 (2002) 1331. N. Boccara, H. Fuks, Q. Zeng, J. Phys. A 30 (1997) 3329. D.-W. Huang, Y.-P. Wu, Phys. Rev. E 63 (2001) 022301. D.W. Huang, J. Phys. A 31 (1998) 6167. X.-Q. Yang, Y.-Q. Ma, J. Phys. A 35 (2002) 10539. X.-Q. Yang, Y.-Q. Ma, Y.-M. Zhao, J. Phys. A 37 (2004) 4743. D.W. Huang, W.C. Tseng, Phys. Rev. E 64 (2001) 057106. S. Wells, S. Macdonald, Accident Anal. Prevention 31 (1999) 663. R. De Raedt, I. Ponjaert-Kristoffersen, Accident Anal. Prevention 33 (2001) 809. S.B. Lowen, M.C. Teich, Phys. Rev. E 47 (1993) 992; R.G. Turcott, S.B. Lowen, E. Li, D.H. Johnson, C. Tsuchitani, M.C. Teich, Biol. Cybern. 70 (1994) 209; L. Telesca, V. Cuomo, V. Lapenna, M. Macchiato, Geophys. Res. Lett. 28 (2001) 4323; L. Telesca, G. Colangelo, V. Lapenna, M. Macchiato, J. Hydrol. 296 (2004) 234; L. Telesca, G. Amatulli, R. Lasaponara, M. Lo vallo, A. Santulli, Ecol. Model. 185 (2005) 531.

3304

L. Telesca, M. Lovallo / Physica A 387 (2008) 3299–3304

[18] [19] [20] [21] [22] [23]

D.R. Cox, V. Isham, Point Processes, Chapman and Hall, London, 1980. S. Thurner, S.B. Lowen, M.C. Feurstein, C. Heneghan, H.G. Feichtinger, M.C. Teich, Fractals 5 (1997) 565. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1990. S.B. Lowen, M.C. Teich, Fractals 3 (1995) 183. S.B. Lowen, M.C. Teich, SPIE Chaos Biol. Medicine 2036 (1993) 64. M.C. Teich, C. Heneghan, S.B. Lowen, R.G. Turcott, in: A. Aldroubi, M. Unser (Eds.), Wavelets in Medicine and Biology, CRC Press, Boca Raton, FL, 1996, p. 383. D.W. Allan, Proc. IEEE 54 (1966) 221; J.A. Barnes, D.W. Allan, Proc. IEEE 54 (1966) 176. S.B. Lowen, M.C. Teich, J. Acoust. Soc. Amer. 99 (1996) 3585. K. Mursula, B. Zieger, J. Geophys. Res. 101 (1996) 27077. M. Ausloos, R. Lambiotte, Physica A 362 (2006) 513.

[24] [25] [26] [27]