Physica A 405 (2014) 303–315
Contents lists available at ScienceDirect
Physica A journal homepage: www.elsevier.com/locate/physa
Dynamic analysis of traffic time series at different temporal scales: A complex networks approach Jinjun Tang a,∗ , Yinhai Wang a,b , Hua Wang a , Shen Zhang a , Fang Liu c a
School of Transportation Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
b
Department of Civil and Environmental Engineering, University of Washington, Seattle, WA 98195-2700, USA
c
School of Energy and Transportation Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China
highlights • • • • •
We measure the complexity of traffic time series at different temporal scales. Complex networks are constructed by using the correlation coefficient among days. The proper critical threshold of networks is estimated by the derivative of density. Some statistical properties of complex networks are analyzed. We exploit the periodicity in traffic time series.
article
info
Article history: Received 9 June 2013 Received in revised form 17 February 2014 Available online 17 March 2014 Keywords: Complex networks Dynamics Traffic time series Complexity Periodicity
abstract The analysis of dynamics in traffic flow is an important step to achieve advanced traffic management and control in Intelligent Transportation System (ITS). Complexity and periodicity are definitely two fundamental properties in traffic dynamics. In this study, we first measure the complexity of traffic flow data by Lempel–Ziv algorithm at different temporal scales, and the data are collected from loop detectors on freeway. Second, to obtain more insight into the complexity and periodicity in traffic time series, we then construct complex networks from traffic time series by considering each day as a cycle and each cycle as a single node. The optimal threshold value of complex networks is estimated by the distribution of density and its derivative. In addition, the complex networks are subsequently analyzed in terms of some statistical properties, such as average path length, clustering coefficient, density, average degree and betweenness. Finally, take 2 min aggregation data as example, we use the correlation coefficient matrix, adjacent matrix and closeness to exploit the periodicity of weekdays and weekends in traffic flow data. The findings in this paper indicate that complex network is a practical tool for exploring dynamics in traffic time series. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Traffic jams, accident and air pollution have attracted the attention from more and more people in the countries all over the world. Since the 1980s, developed countries have researched the technology of Intelligent Transportation Systems (ITS). Nowadays, the ITS has been definitely considered as the effective mean to solve the traffic problems. Advanced Traveler
∗
Corresponding author. Tel.: +86 15204638963. E-mail address:
[email protected] (J. Tang).
http://dx.doi.org/10.1016/j.physa.2014.03.038 0378-4371/© 2014 Elsevier B.V. All rights reserved.
304
J. Tang et al. / Physica A 405 (2014) 303–315
Information System (ATIS) is able to provide travelers with traffic information on road networks to guide their trip routes rightly. Advanced Traffic Signal Control System (ATSCS) applies optimal control strategies to improve traffic condition and relieve congestion. These performances depend on reliable and complete analysis for traffic time series, which includes two main parts: (1) accurate traffic flow prediction; (2) deep data mining and traffic flow modeling. Recently, many scholars focus their interests on applying reliable models to accomplish high forecasting accuracy and produce a variety of achievements, especially in short term forecasting [1–5]. Yin et al. [1] used a fuzzy-neural approach to improve traffic flow predicting accuracy. Qi and Ishak [2] employed a stochastic approach to predict short-term traffic speed during peak periods on freeway in Orlando. Tigran et al. [3] introduced a real-time predicting mode based on spectral characteristics in traffic flow time series. Dunne and Ghosh [4] proposed a new multivariate short-term traffic flow prediction methodology in uncongested and congested regimes. Zheng et al. [5] introduced a neural network model combined with the theory of conditional probability and Bayes’ rule, the combined model is demonstrated outperforms the singular predictors from the experimental test of Singapore’s Ayer Rajah Expressway. Exploring the nonlinear dynamics and periodicity in traffic flow is another significant research field [6–11]. Liu et al. [6] developed a traffic flow model based on cell transmission by incorporating the frictional effect in managed lane (ML) systems. Gao et al. [7] established an improved traffic flow model by considering velocity-adaptation effect between neighboring vehicles, the simulation results showed the improvements of proposed model. Herrmann and Kerner [8] compared different modeling effect of traffic flow between microscopic and macroscopic models, and the results proved that the models had properties of the local cluster effect. Marasco [9] used a nonlinear hydrodynamic model of traffic flow to describe driver’s behavior, the car density and flow evolution in tollgates. He et al. [10] used Recurrence Plot (RP) and the parameter of Recurrence Quantitative Analysis (RQA) to analyze the periodicity of traffic flow in an empirical study. Vlahogianni et al. [11] detected the nonlinearity and non-stationarity statistical properties in traffic flow time series by using the RP and the parameter of RQA. Dynamic analysis focuses on the microcosmic (space headway, time headway, speed, etc.) and macroscopic characteristics (density, occupancy, periodicity, etc.) of traffic flow. The aim of traffic flow forecasting is using the current and historical data to calculate the traffic flow state for next one or multiple steps in the future. Reasonable dynamic analysis of traffic flow is the basement of constructing a stable and effective prediction model. Therefore, full discover of dynamic characteristics can certainly improves traffic flow prediction accuracy. By literature reviewing, we can see these predicting algorithms show desirable performance in certain applications and circumstances. However, with different model natures, this desirable performance is highly dependent on the data characteristics. Additionally, the basses of various traffic flow models are certain assumptions and their advantages are mainly demonstrated by ways of simulation. As traffic flow influenced by many complex factors, the prediction and traffic flow models are not able to describe the irregular, random and unsteady features in traffic systems completely, and the results of simulations can also not perfectly reflect the real condition of traffic systems. Thus, considering both the advantages and disadvantages of the foregoing traffic analyzing methods, one can reach the conclusion that a good method should appropriately describe actual state of traffic flow and analyze the nonlinear dynamics in traffic systems. In the past few years, complex network has gained remarkable development in complex systems of various fields [12–23]. Recently, the theory of complex network has been introduced into the study of time series, which inspires the interests of researchers. By transforming time series into complex network, many scholars proposed various approaches to explore the dynamics in time series [24–31]. In these methods, how to create a proper complex network corresponding to time series is obviously one of the most significant issues. Dong [32] and Karimi [33] summarized three methods to construct complex network from time series: (1) Zhang et al. [34,35] proposed a new method to research the dynamics of ECG (Electrocardiogram) time series by using complex network. In this method, the optimal length of cycles in time series is first calculated. Then, each cycle is considered as node and relation between cycles treated as edges in complex network. (2) Lacasa et al. [36–38] introduced visibility algorithm to create complex network based on graph theory. A series of application using visibility graph to study the dynamics in time series show its potential advantages [39–43]. Wang et al. [43] studied macroeconomic series by the complex network approach and discovered the relationship between characteristics of complex network and statistical features of the original time series. These works indicate that converted complex network inherit some important features of the original time series. (3) Xu et al. [44] employed a method to map time series into complex network based on phase space reconstruction. In this method, two key issues attract concerns from different research fields: (1) algorithms for reconstructing phase space [45–47]; (2) methods for determining the critical threshold in complex network [27,28,48,49]. All these literatures suggest us that statistical features of complex network offer a new viewpoint and a helpful method to explore the dynamic characteristics in time series. With its different features from other time series, traffic flow time series have quasi-periodicity in long term and nonlinearity in short term. Complex networks are hence capable of supplying comprehensive statistical characteristics of the dynamics in traffic time series from a new angle. Traffic flow time series have different features from the other time series for its quasi-periodicity and irregular fluctuation (monthly, daily, hourly variations). Current works about traffic flow prediction and analysis mainly focus on single time scale data. Results of traffic flow time series at multiple time scales are rarely published ([43] P.J. Shang et al., 2006; [4] Dunne and Ghosh, 2012). Through large time scale data, one usually can obtain long-term variation trends of time series. We can similarly obtain its local changes through small time scale data. For traffic flow time series, the travelers not only wish to understand the situation of traffic system during days or hours, but also they are interested in the volume of traffic flow
J. Tang et al. / Physica A 405 (2014) 303–315
305
Fig. 1. Time series of traffic flow at different temporal scales.
in short term, for example, 2 or 5 min. Thus, research of dynamic features in traffic flow at different scales is necessary for improving predicting and modeling accuracy. In this paper, we use complex network to employ the dynamic analysis of traffic flow time series, which are collected in terms of different temporal scales: 2, 5, 10, 15, 30 and 60 min respectively. L–Z algorithm [50] is first adopted to measure the complexity of traffic time series for each time aggregation. In our study, we employ the method in Ref. [34] to map traffic flow time series into a network. By considering individual cycles as nodes and treating correlation among cycles as edges, we then construct the complex network and calculate its global statistic properties. Furthermore, betweenness distribution and betweenness versus clustering coefficient distribution are used to analyze dynamics in traffic time series. Finally, taking 2 min flow data as example, correlation coefficient matrix, adjacent matrix, closeness distribution and degree distribution are used to exploit the periodicity in traffic time series. The remainder of this paper is organized as follows. In Section 2, we introduce measured traffic flow data and calculate the complexity. In Section 3, we construct complex network transforming from traffic flow data in time domain, and the statistical properties of complex network are discussed. Next, the periodicity in the traffic flow data is discovered and analyzed in complex network domain. The final section summarizes the findings of the paper and offers some conclusions. 2. Data collection and complexity measure The research proposed in our study is based on the field data from freeway in Harbin, where each link consists of two lanes in one direction. The raw information is collected using loop detectors from each lane, and the detection infrastructure is able to measure volume of vehicles, vehicle speed and occupancy. In this study, the available data is collected from January 1st to December 31st in 2011 at the interval of 2, 5, 10, 15, 30 and 60 min respectively. Fig. 1 shows the measured data for only 9 days including 5 workdays and 4 days at weekend. It can be seen that traffic time series often exhibit irregular and complex behavior, and it changes abruptly when entering or leaving a congestion hour. This phenomenon can be observed from 2 min data obviously, which manifests stochastic and nonlinear characteristics caused by dynamics and complexity in traffic system. How to define this difference between large and small time scale data by quantitative indicator is rarely published in the field of transportation. With increasingly in-depth study on traffic system, correlation dimension and Lyapunov exponent in chaos theory are used to analyze the nonlinearity in traffic flow [51–53]. However, these methods are dependable to original data and sensitive to noise, besides they need abundant data to obtain reliable results. To quantify feature at different time scales, we use L–Z algorithm to measure the complexity of the traffic flow data. L–Z complexity algorithm was proposed by Lempel and Ziv in 1976, which can characterize the degree of order or disorder and development of spatiotemporal patterns [50]. With its advantage of requiring small amount of data and capacity of anti-interference to noise, the L–Z algorithm is applied to measure complexity in time series [54–57]. To a time series, the L–Z algorithm is implemented as follows: Step 1: Given a time series with n elements (y1 , y2 , . . . , yn ), a string {s1 s2 . . . sn } can be reconstructed by the rules: if yi > y (here, y means the average value of time series), then si = 1, or si = 0. Thus, the string {s1 s2 . . . sn } is a 0–1 series conforming from time series.
306
J. Tang et al. / Physica A 405 (2014) 303–315
(a) 2–10 min.
(b) 15–60 min.
Fig. 2. Complexities of different lengths of time series: the horizontal and vertical coordinate represent the days and complexity CN respectively.
Step 2: Define the variable c (n) as the complexity of string {s1 s2 . . . sn }. S and Q represent two different character sequences, and SQ represents a new concatenation of S and Q . SQ π represents the sequence SQ in which the last character is deleted, and v(SQ π ) is defined as a set comprising all different substring in SQ π . Step 3: Initialize c (n), {s1 }, {s2 } and S , Q to 1 respectively, hence SQ π = {s1 }. Presume S = {s1 s2 . . . sr } and Q = {sr +1 }, ask if the Q belongs to the v(SQ π ). If so, string Q is a simple repetition of an existing substring of {s1 s2 . . . sr }, and hence the complexity remains unchanged. Then Q is renewed to {sr +1 sr +2 }, check if Q belongs to the v(SQ π ) (here, the string SQ π also be renewed). Repeat the above procedure until Q does not belong to v(SQ π ), then increase the complexity by one, read the next string and take Q = {sr +3 }. Step 4: Repeat the above procedure until all the elements of the string {s1 s2 . . . sn } are completely calculated. Then, the resulting c (n) is the complexity of a given time series. Step 5: According to Lempel–Ziv algorithm, the normalized complexity measure is defined by CN (n) = c (n)/b(n) ∈ [0, 1]
(1)
where b(n) = lim c (n) ≈ n→∞
n log2 n
.
(2)
Lempel–Ziv algorithm gives the number of distinct patterns contained in the given finite sequence. It should be noted that if the time series include large amount of repeating sequences, its complexity will be low. To a regular periodic time series, when n tends to ∞, the CN tends to 0. If the time series include random sequences, its complexity will be high. To a complete random time series, when n tends to ∞, the CN tends to 1. Fig. 2 shows the results by using L–Z algorithm to calculate the complexity of traffic flow in terms of different temporal scales. To each scale, the complexity tends to a stable value for a certain length of time series. In Fig. 2(a), the CN of data measured in 2 min becomes stable at 0.266 after 16 days, total sample size of 11 520. The CN of 5 min time series becomes unchanged at 0.205 after 23 days for total 6624 samples. The CN of 10 min time series also becomes unchanged at 0.193 after 30 days for 4320 samples. In Fig. 2(b), the CN of 15 min series tends to the value of 0.195 after 35 days, the sample size is 3360. The CN of 30 min series becomes unchanged at 0.185 after 45 days for 2160 samples, and the CN of 60 min becomes stable at 0.199 after 55 days, total sample size of 1320. The conclusion can be obtained from Fig. 2 is that complexity of traffic flow time series measured in 2 min is higher than other scale data. Complexity can reflect the randomness of a given finite sequence. To traffic flow time series, different time scales data can be measured by different complexity values, which is shown in Fig. 2. The analysis of complexity measure in Fig. 2 indicates that data collected at small time scales behave obvious fluctuation and randomness, and their complexities tend to large values. As time scale increases, the data can be smoothed gradually, which weakens randomness and causes decrease of complexity consequently. 3. Traffic flow complex networks 3.1. Construction of complex network In order to convert traffic time series into complex networks, we consider each day as a node, and the correlation between days as edges. Traffic flow data of 365 days in 2011 are utilized to create the network. In our study, the length of data in one day is constant and every day can be treated as a cycle. Thus, we use correlation coefficient to character the similarity
J. Tang et al. / Physica A 405 (2014) 303–315
(a) Density versus threshold.
307
(b) Derivative of density versus threshold. Fig. 3. Complex network converted from traffic flow data with 365 nodes.
Table 1 Empirical statistical indicators of complex network from traffic time series.
rc Average path length Clustering coefficient Density Average degree
2 min
5 min
10 min
15 min
30 min
60 min
0.920 1.411 0.798 0.288 105.870
0.960 1.659 0.752 0.191 70.622
0.970 1.900 0.749 0.214 78.896
0.975 1.720 0.748 0.208 76.841
0.980 1.895 0.750 0.219 80.593
0.985 1.920 0.742 0.215 79.340
between two cycles, which does not require phase space reconstruction [35]. For each pair of cycles or days, Yi and Yj , the correlation coefficient can be defined as: L
Cij =
[Yi (k) − ⟨Yi ⟩] · Yj (k) − Yj
k=1 L
[Yi (k) − ⟨Yi ⟩] · 2
k=1
(3) L
2
Yj (k) − Yj
k=1
where L is the dimension of the vector for one day (L is selected to 720, 288, 144, 96, 48 and 24 at scales of 2, 5, 10, 15, 30 and 60 min respectively), Yi (k) and Yj (k) mean the volume of traffic flow for kth section in ith and jth day. ⟨Yi ⟩ and Yj represent the mean volume of traffic flow for ith and jth day: ⟨Yi ⟩ = k=1 Yi (k)/L, and Yj = k=1 Yj (k)/L. C is a symmetric matrix and the elements Cij are restricted to the domain [−1, 1]. Regarding each vector as a node, Cij describes the state of connection between the node i and j. Choosing a critical threshold rc , the correlation matrix C can be converted into an adjacent matrix D as follows:
L
Dij =
1, 0,
(|Cij | ≥ rc ) (|Cij | < rc ).
L
(4)
It indicates that node i connects with node j if Dij = 1 and node i disconnects with node j if Dij = 0. It is necessary for us to consider how to select an appropriate critical threshold to create the complex network based on traffic time series. If it is too small, the nodes with weak relations will be connected, and the number of the edges in complex network decreases gradually with increasing the value of rc . If it is too large, some valuable connections will be filtered out and the number of edges decreases rapidly. Accordingly, rc should be determined to a proper value, by which the complex network can capture the characteristics of the time series. Fig. 3 shows the density and its derivative of the networks versus the threshold rc . Density is defined as the number of edges divided by the largest number of edges possible. We can see the decrease of the density reaches the minimum values in Fig. 3(b), and we select the rc to be this minimum. The reason is that the decrease of edges will arrive at the minimum rate and the clustering property of the cycles can be maintained when the threshold approaches the peak value. It can be imagined from the opposite perspective, take 2 min time series for example, the edges will increase at the maximum rate as rc changes from 0.915 to 0.92. Thus, the threshold beyond 0.92 will result in a slower edge increase. Finally, we determine the appropriate threshold for each scale data shown in Table 1. Accordingly, some statistical indicators to describe the basic features of complex network are also calculated and presented in Table 1.
308
J. Tang et al. / Physica A 405 (2014) 303–315
Fig. 4. Cumulative degree distribution of the complex networks converted from traffic flow data at different time scale. The horizontal axis is degree of node k and the vertical axis represents the cumulative probability distribution of degrees p(k).
The small-world behavior is originally characterized by the scaling of network diameter or average path length as well as high clustering coefficient. We compute the average path length and clustering coefficient for the constructed networks. We find that all the constructed networks have higher clustering coefficient yet smaller average path lengths. It indicates that the constructed networks from original traffic time series have small-world property, which means there exists a shortcut between any two nodes in the networks. The degree of a node is the number of edges incident with it, and degree distribution p(k) is defined as the probability of a node with degree k. Fig. 4 shows the cumulative degree distribution p(k) of the complex networks converted from traffic time series at different time scales. The nodes in 2 min network have higher degree than the other networks, which indicates that 2 min traffic flow data include abundant information. All the distribution can be fitted by power law functions. From Fig. 4, we find that there are few nodes with large degree and most nodes have relatively low degree. This means that traffic flow data of most days in a year have different variation features and only a small amount day represents similar property. 3.2. Betweenness distribution Through the analysis of above sections, we measure the complexity of time series at different temporal scales. By converting the complex network from traffic time series, we also explore some statistical characteristics. All these indicators can reflect the global topological properties of network, but sometimes we also need to understand how important a single node is in a network. Take social network as example, some nodes have small degree values, but they may work as the connector between two communities. Thus, removing these nodes will cause the disconnection of communities. In the complex network theory, this characteristic is described by the indicator of betweenness, which was first proposed in social network studies [58]. The betweenness of a node is defined as: Bi =
Njl (i) j̸=l̸=i
Njl
(5)
where Njl (i) denotes the number of shortest paths from node j to node l passing through i and Njl represents the number of shortest paths from node j to node l. A node with high betweenness means there are relative short paths between the node with others and also reflect its effect and influence in network. Fig. 5 compares the betweenness distribution of the complex networks corresponding to time series at different temporal scales. It represents a distribution pattern in which the nodes with low betweenness have high proportions and the nodes with high betweenness have low proportions. This feature can be observed from Fig. 5(a)–(f). The distributions of betweenness similarly follow the exponential functions. However, the average betweenness of the networks are different: 0.337, 0.275, 0.299, 0.291, 0.309 and 0.307 for 2 min, 5 min, 10 min, 15 min, 30 min and 60 min respectively. Furthermore, it is also found in Fig. 6 that the distribution of betweenness and clustering coefficient can be well fitted in negative correlation for the 2 min scale data, while the other data are difficult to find a proper function to fit the relation of betweenness and clustering coefficient. As we know, the clustering coefficient is the ratio of the number of connections in its neighbors to the number of maximum possible connections and it is considered as quantitative measure of the local properties of complex network. In the plane of betweenness versus clustering coefficient, we can see that the nodes with large clustering coefficient tend to have small betweenness for each temporal scale. It can be imagined that the nodes with large betweenness usually work as the bridge to connect two clusters. Thus, compare to the nodes in clusters, the nodes between clusters have usually larger betweenness and smaller clustering coefficient. In addition, for the 2 min scale data, the values of clustering coefficient are restricted approximately in [0.7, 1.0], see Fig. 6(a). For the other temporal scale data, see Fig. 6(b)–(f), this restriction is enlarged to [0.3, 1.0], which may be the reason of the more sparse distribution in the plane. In summary, complex network converted from 2 min traffic time series cannot only reserve massive information, but
J. Tang et al. / Physica A 405 (2014) 303–315
(a) 2 min.
309
(b) 5 min.
(c) 10 min.
(d) 15 min.
(e) 30 min.
(f) 60 min. Fig. 5. Betweenness distribution for the complex networks.
also, in terms of the conclusions of former sections, can reflect the rapid and intense fluctuations in real traffic system. In the next section, we will take 2 min scale data as example and discuss the periodicity in traffic flow data by using complex network theory. 4. Statistical characteristics versus periodicity Three statistical features including the plot of correlation coefficient, the plot of adjacent matrix and closeness are used to analyze the periodicity in traffic time series. (Here, we have to explain that these three indicators of each temporal scale data represent similar features, we just take 2 min scale data as example to study the periodicity.) Fig. 7 provides the plot of correlation coefficient of days or cycles. The color bar represents the values of correlation coefficient from large to small. From Fig. 7(a) we can find considerable cyclic patterns and Fig. 7(b) is the local amplification of the black circles in Fig. 7(a).
310
J. Tang et al. / Physica A 405 (2014) 303–315
(a) 2 min.
(b) 5 min.
(c) 10 min.
(d) 15 min.
(e) 30 min.
(f) 60 min. Fig. 6. Clustering coefficient–betweenness correlation for complex networks.
It is obvious that there are four patterns: the red large square, the red small square, the yellow band and a light diagonal line. The red large square has high correlation coefficient and comprises 5 cycles, which reflects the similarity of traffic volume time series in weekdays shown in Fig. 6(c). The red small square has also high correlation coefficient and comprises only 2 cycles, which reflects the similarity of traffic volume time series in weekends shown in Fig. 7(d). The yellow band represents the weak correlation between weekdays and weekends, and we can observe and compare the fluctuation from Fig. 7(c) and (d). The diagonal line indicates the self correlation between the cycle yi and yi , the values equal to 1. By choosing a proper threshold rc , the correlation coefficient matrix C can be transformed into adjacent matrix D, which includes the information of edges in complex network. Fig. 8 is the plot of adjacent matrix for 2 min scale data for rc equal to 0.92. Comparing to Fig. 7(a), some information of nodes with low correlation coefficient are filtered out, the remains are the periodicities of weekdays and diagonal line.
J. Tang et al. / Physica A 405 (2014) 303–315
311
Fig. 7. The plot of correlation coefficient of complex network based on traffic flow data at 2 min scale in 2011. Axis x indicates sampling time, axis y are the traffic volume time series, and every day treated as a cycle totally 365 days data. (a) Shows the global scale, (b) is the local scale of the black circle in (a), (c) represents the traffic volume marked in (b) by large black circle for the cycle 87 to cycle 91, which means the volume in workday from 28th March (Monday) to 1st April (Friday), and (d) represents the traffic volume marked in (b) by small black circle of the cycle 92 and cycle 93, which indicates the volume in weekend of 2nd April (Saturday) and 3rd April (Sunday).
Fig. 8. The plot of adjacent matrix of complex network based on traffic flow data at 2 min scale in 2011, totally 365 cycles. The black points represent the elements Dij = 1, the horizontal and vertical coordinate are the number of nodes, and nz indicates the number of elements with Dij = 1. (b) The local amplification of the (a), and rc is selected to 0.92. Some squares are incomplete, the reason is that the correlation coefficients of some nodes are smaller than rc , and their edges are filtered out.
By exploring the closeness of network, we find an interesting result in Fig. 9. Fig. 9(a) provides the closeness of nodes and the distribution of closeness. The values of closeness can be divided into three parts: the nodes in part 1 have low closeness and are limited in [0, 0.05], the nodes in part 2 have middle closeness and are limited in [0.1, 0.3] and the nodes in part 3
312
J. Tang et al. / Physica A 405 (2014) 303–315
Fig. 9. Closeness and its distribution of complex network based on traffic flow data at 2 min scale in 2011, totally 365 cycles; (b) traffic volume time series corresponding to three parts in a, here we just display data of 6 days in part 1 and part 3. The cycle 3, 4, 5, 6, 7 and 10 mean 3rd to 7th and 10th in January (weekdays) in part 1. The cycle 180, 237 and 266 represent 29th May, 25th August and 23rd September (Special days) in part 2. The cycle 1, 2, 8, 9, 15 and 16 represent 1st, 2nd, 8th, 9th, 15th and 16th in January (weekends) in part 3.
have high closeness and are restricted in [0.1, 0.3]. This division can be also distinguished from the distribution of closeness. By identifying the cycles of nodes, we find that these three parts correspond to the different parts of traffic volume time series. As shown in Fig. 9(b), the nodes in part 1 represent the workdays and there are obvious morning and evening peaks. The nodes in part 2 represent some days with special events, for example, traffic accident or road construction, and there are only morning peaks. The nodes in part 3 represent the weekends without obvious morning and evening peaks. As we know, closeness is an important statistical characteristic in complex network, which represents the availability of middle nodes. It can be imagined that the nodes with low closeness have a few of connecting edges, which means the correlations among these nodes and the others are weak. On the contrary, the nodes with high closeness have considerable connecting edges and their correlations to the other nodes are strong. Fig. 10 shows the degree distribution of complex network from traffic flow data at 2 min scale. From Fig. 10, we can distinguish weekdays and weekends by value of degree, we also can quickly understand how many days or cycles are relative to some day or cycle. The distribution shows that traffic flow takes on similar features in most weekdays and behaves different form in weekends. In Fig. 11, all the networks include three parts: big cluster, small cluster and isolated nodes. By comparing the node labels with cycle numbers, we find that big cluster represents weekdays, small cluster represents some special days and isolated nodes indicate weekends. These results are similar to Fig. 10. Nodes with large degree turn into big cluster, nodes with small degree construct small cluster and nodes with no degree become isolated with no edge. As the time scales increase, we also find that the number of isolated nodes decreases. This indicates that detecting weekends from all cycles becomes difficult because traffic flow data be smoothed at large time scales and the value of correlation coefficient between weekdays and weekends increases. From Table 1 and Fig. 4(a), we can see network from 2 min time scale cannot only distinguish three types of traffic statues but also have shorter average path length and larger clustering coefficient, density and average degree than the other time scale data, which means network converting from 2 min time scale data can reserve massive information in original time series. Network from 5 min time scale has smaller density and average degree than the other scale data for its massive isolated nodes. Networks from 10 to 15 min time scale data have similar statistic properties such as clustering coefficient, density and average degree, and networks from 30 to 60 min time scale data also have similar statistic properties. In Fig. 12, 365 days in 2011 are classified into three parts obviously. Red circles represent weekdays and comprise 247 nodes, blue squares mean special days and include 47 nodes, and black triangles indicate weekends and include 71 nodes. Nodes in red circles have higher clustering coefficient and larger degree, and these properties can also be shown in big cluster in Fig. 11(a). The clustering coefficient and degree of nodes in black triangles both equal to zero, this indicates these nodes are disparate to the other nodes, and
J. Tang et al. / Physica A 405 (2014) 303–315
313
Fig. 10. Degree distribution of complex network based on traffic flow data at 2 min scale in 2011, totally 365 cycles. (a) degree of 365 days. (b) degree of 35 days in 2011, upper red circle represents weekdays and lower red circle means weekends.
can be observed in isolated nodes in Fig. 11(a). The remnant nodes in blue squares have moderate clustering coefficient and degree, and they can be shown in small cluster in Fig. 11(a). In summary, using complex network as a tool, we can not only explore the topological features of all days by analyzing the characteristic of network such as average path length, clustering coefficient, density, average degree and betweenness, but also can understand the detail information of nodes and their relationships with other nodes by statistic indicators such as closeness and degree distribution. Complex network provides us a new viewpoint to explore the dynamics and periodicity of traffic time series. 5. Conclusions The characterization of traffic dynamics is the foundation of traffic prediction and control. Although there are many different prediction and control approaches obtain good performance in case studies, the basis of these algorithms are different assumptions regarding the underlying statistical properties of traffic flow. Therefore, how to discover real dynamic characteristics in traffic flow become a key question before choosing a model to complete prediction or control. To answer this question, the statistical features of traffic flow such as complexity and periodicity should to be explored. Complexity reflects randomness of traffic flow time series. Periodicity, which is the basic property, reflects the long-term variation regularity of traffic volumes. Furthermore, the study of volume series in different time windows, for example 2, 5, 10, 15, 30 and 60 min, will provide different traffic patterns evolve in time. In this paper, we measure the complexity and explore the periodicity of traffic time series based on different time aggregations from 2 to 60 min. We find that the values of complexity of 2 min data are larger than the other data, which indicates that 2 min scale data have more rapid and intense fluctuations. By analyzing the derivative of density versus threshold, we set the minimum values to be the appropriate threshold for transforming the traffic time series into complex networks. Some statistical properties (average path length, clustering coefficient, average degree) are calculated, and we find the small world property in complex networks corresponding to each sampling data. In order to further compare the characteristics of different sampling data, the betweenness is calculated. We find that the distributions of betweenness are similarly followed exponential functions. To the 2 min scale data, the distribution of betweenness and clustering coefficient can be well fitted in negative correlation. Finally, we use the correlation coefficient, adjacent matrix and closeness to explore the periodicity in traffic time series. The plot of the correlation coefficient contains similarity of weekdays and weekends, which reflects the periodicity in a year. By analyzing the distribution of closeness, the days in 2011 can be divided into three parts: weekdays, weekends and special days. On the whole, the deep analysis of statistical properties in traffic flow can improve accuracy of predicting and controlling models used in traffic system and increase the efficiency of traffic management consequently. Meanwhile, it also provides theoretical basis and improves the rationality in traffic planning and designing. Acknowledgment This research was funded by the National Natural Science Foundation of China (Grant Nos. 51138003 and 51329801).
314
J. Tang et al. / Physica A 405 (2014) 303–315
(a) 2 min.
(b) 5 min.
(c) 10 min.
(d) 15 min.
(e) 30 min.
(f) 60 min. Fig. 11. The plot of complex network corresponds to different time scale traffic flow data.
Fig. 12. Clustering coefficient versus degree for complex network correspond to 2 min time scale data.
J. Tang et al. / Physica A 405 (2014) 303–315
315
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58]
H.B. Yin, S.C. Wong, J.M. Xu, C.K. Wong, Urban traffic flow prediction using a fuzzy-neural approach, Transp. Res. C 10 (2002) 85–98. Y. Qi, S. Ishak, Stochastic approach for short-term freeway traffic prediction during peak periods, IEEE Trans. Intell. Transp. Syst. 99 (2012) 1–13. T.T. Tchrakian, B. Basu, M. O’Mahony, Real-time traffic flow forecasting using spectral analysis, IEEE Trans. Intell. Transp. Syst. 13 (2012) 519–526. S. Dunne, B. Ghosh, Regime-based shout-term multivariate traffic condition forecasting algorithm, J. Transp. Eng. 138 (2012) 455–466. W.Z. Zheng, D.H. Lee, Q.X. Shi, Short-term freeway traffic flow prediction: Bayesian combined neural network approach, J. Transp. Eng. 132 (2006) 114–121. X.Y. Liu, G.H. Zhang, Y.T. Lao, Y.H. Wang, Modeling traffic flow dynamics on managed-lane facility: approach based on cell transmission model, Transp. Res. Rec. 2278 (2012) 163–170. K. Gao, R. Jiang, B.H. Wang, Q.S. Wu, Discontinuous transition from free flow to synchronized flow induced by short-range interaction between vehicles in a three-phase traffic flow model, Physica A 388 (2009) 3233–3243. M. Herrmann, B.S. Kerner, Local cluster effect in different traffic flow models, Physica A 255 (1998) 163–188. A. Marasco, Nonlinear hydrodynamic models of traffic flow in the presence of tollgates, Math. Comput. Modelling (2002) 549–559. Z.C. He, Z.T. Li, J.M. Zhao, Statistical method for assessing the periodicity of traffic flow time series, in: 17th ITS World Congress, Busan, 2010. E.I. Vlahogianni, M.G. Karlaftis, J.C. Golias, Statistical methods for detecting nonlinearity and non-stationarity in univariate short-term time-series of traffic volume, Transp. Res. C 14 (2006) 351–367. L.A. Adamic, B.A. Huberman, Growth dynamics of the World Wide Web, Nature 401 (1999) 131. A.L. Barabási, Scale-free networks: a decade and beyond, Science 325 (2009) 412–413. S.H. Strogatz, Exploring complex networks, Nature 41 (2001) 268–276. R.J. Williams, N.D. Martinez, Simple rules yield complex food webs, Nature 404 (2000) 180–183. R. Albert, A.L. Barabási, Statistical mechanics of complex networks, Rev. Modern Phys. 174 (2002) 47–97. J. Ren, W.X. Wang, B. Li, Y.C. Lai, Noise bridges dynamical correlation and topology in complex oscillator networks, Phys. Rev. Lett. 104 (2010) 058701. H. Chang, B.B. Su, Y.P. Zhou, D.R. He, Assortativity and act degree distribution of some collaboration networks, Physica A 383 (2007) 687–702. X.H. Yang, G. Chen, B. Sun, S.Y. Chen, W.L. Wang, Bus transport network model with ideal n-depth clique network topology, Physica A 390 (2011) 4660–4672. Y.Z. Chen, N. Li, D.R. He, A study on some urban bus transport networks, Physica A 376 (2007) 747–754. E. Estrada, N. Hatano, Communicability graph and community structures in complex networks, Appl. Math. Comput. 214 (2009) 500–511. Y.D. Chen, L. Li, Y. Zhang, J.M. Hu, X.X. Jin, Fluctuations in urban traffic networks, Modern Phys. Lett. B 22 (2008) 101–115. J.J. Wu, Z.Y. Gao, H.J. Sun, Simulation of traffic congestion with SIR model, Modern Phys. Lett. B 18 (2004) 1537–1542. C. Piccardi, L. Calatroni, F. Bertoni, Clustering financial time series by network community analysis, Internat. J. Modern Phys. C 22 (2010) 35–50. T. Nakamura, T. Tanizawa, Networks with time structure from time series, Physica A 391 (2012) 4704–4710. M.A. Kramer, U.T. Eden, S.S. Cash, E.D. Kolaczyk, Network inference with confidence from multivariate time series, Phys. Rev. E 79 (2009) 061916. Z.K. Gao, N.D. Jin, A directed weighted complex network for characterizing chaotic dynamics from time series, Nonlinear Anal. 13 (2012) 947–952. Y. Yang, H.J. Yang, Complex network-based time series analysis, Physica A 387 (2008) 1381–1386. N. Marwan, J.F. Donges, Y. Zou, R.V. Donner, J. Kurths, Complex network approach for recurrence analysis of time series, Phys. Lett. A 373 (2009) 4246–4254. R.V. Donner, Y. Zou, J.F. Donges, N. Marwan, J. Kurths, Ambiguities in recurrence-based complex network representations of time series, Phys. Rev. E 81 (2010) 1–4. P. Caraiani, Characterizing emerging European stock markets through complex networks: from local properties to self-similar characteristics, Physica A 391 (2012) 3629–3637. Y. Dong, W.W. Huang, Z.H. Liu, S.G. Guan, Network analysis of time series under the constraint of fixed nearest neighbors, Physica A 392 (2013) 967–973. S. Karimi, A.H. Darooneh, Measuring persistence in a stationary time series using the complex network theory, Physica A 392 (2013) 287–293. J. Zhang, J.F. Sun, X.D. Luo, K. Zhang, T. Nakamura, M. Small, Characterizing pseudo periodic time series through the complex network approach, Physica D 237 (2008) 2856–2865. J. Zhang, M. Small, Complex network from pseudoperiodic time series: topology versus dynamics, Phys. Rev. Lett. 96 (2006). L. Lacasa, B. Luque, F. Ballesteros, J. Luque, J. Nuno, From time series to complex networks: the visibility graph, Proc. Natl. Acad. Sci. 105 (2008) 4972–4975. L. Lacasa, B. Luque, J. Luque, J.C. Nuno, The visibility graph: a new method for estimating the Hurst exponent of fractional Brownian motion, Europhys. Lett. 86 (2009) 30001. L. Lacasa, R. Toral, Description of stochastic and chaotic series using visibility graphs, Phys. Rev. E 82 (2010) 036120. Y. Yang, J. Wang, H. Yang, J. Mang, Visibility graph approach to exchange rate series, Physica A 388 (2009) 4431–4437. X.H. Ni, Z.Q. Jiang, W.X. Zhou, Degree distributions of the visibility graphs mapped from fractional Brownian motions and multifractal random walks, Phys. Lett. A 373 (2009) 3822–3896. W.J. Xie, W. Zhou, Horizontal visibility graphs transformed from fractional Brownian motions: topological properties versus Hurst index, Physica A 390 (2011) 3592–3601. G. Gutin, T. Mansour, S. Severini, A characterization of horizontal visibility graphs and combinatorics on words, Physica A 390 (2011) 2421–2428. N. Wang, D. Li, Q.W. Wang, Visibility graph analysis on quarterly macroeconomic series of China based on complex network theory, Physica A 391 (2012) 6543–6555. X. Xu, J. Zhang, M. Small, Superfamily phenomena and motifs of networks induced from time series, Proc. Natl. Acad. Sci. 105 (2008) 19601–19605. H.S. Kim, R. Eykholt, J.D. Salas, Nonlinear dynamics, delay times, and embedding windows, Physica D 127 (1999) 48–60. P. Grassberger, I. Procaceia, Characterization of strange attractors, Phys. Rev. Lett. 50 (1983) 346–349. N.H. Packard, J.P. Crutchfield, J.D. Farmer, R.S. Shaw, Geometry from a time series, Phys. Rev. Lett. 45 (1980) 712–716. Z. Gao, N. Jin, Complex network from time series based on phase space reconstruction, Chaos 19 (2009) 033137. J.J. Tang, Y.H. Wang, F. Liu, Characterizing traffic time series based on complex network theory, Physica A 392 (2013) 4192–4201. A. Lempel, J. Ziv, On the complexity of finite sequences, IEEE Trans. Inf. Theory 1 (1976) 75–93. L.A. Wastavino, B.A. Toledo, J. Rogan, R. Zarama, V. Munoz, J.A. Vadivia, Modeling traffic on crossroads, Physica A 15 (2007) 411–419. K.P. Li, Z.Y. Gao, Nonlinear dynamics analysis of traffic time series, Modern Phys. Lett. B 18 (2004) 1395–1402. P.J. Shang, X.W. Li, S. Kamae, Nonlinear analysis of traffic time series at different temporal scales, Phys. Lett. A 357 (2006) 314–318. F.T. Liu, Y. Tang, Improved Lempel–Ziv algorithm based on complexity measurement of short time series, in: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, 2007, pp. 771–776. Y.F. Tang, S.L. Liu, R.H. Jiang, Y.H. Liu, Correlation between detrended fluctuation analysis and the Lempel–Ziv complexity in nonlinear time series analysis, Chin. Phys. B 22 (2013) 030504. M. Aboy, R. Hornero, D. Abásolo, D. Álvarez, Interpretation of the Lempel–Ziv complexity measure in the context of biomedical signal analysis, IEEE Trans. Biomed. Eng. 53 (2006) 2282–2288. J. Hu, J. Gao, C. Principe, Analysis of biomedical signals by the Lempel–Ziv complexity: the effect of finite data size, IEEE Trans. Biomed. Eng. 53 (2006) 12. L.C. Freeman, A set of measures of centrality based on betweenness, Sociometry 40 (1977) 35–41.