Classifying the traffic state of urban expressways: A machine-learning approach

Classifying the traffic state of urban expressways: A machine-learning approach

Transportation Research Part A xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Transportation Research Part A journal homepage: www.els...

2MB Sizes 0 Downloads 34 Views

Transportation Research Part A xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Transportation Research Part A journal homepage: www.elsevier.com/locate/tra

Classifying the traffic state of urban expressways: A machinelearning approach ⁎

Zeyang Chenga,b,c, Wei Wangd, Jian Lua,b,c, , Xue Xingd a

Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 211189, China Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China School of Transportation, Southeast University, Nanjing 211189, China d College of Transportation, Jilin University, Changchun 130022, China b c

A R T IC LE I N F O

ABS TRA CT

Keywords: Urban expressways Traffic state Machine-learning Fuzzy c-means (FCM) clustering Classification performance

The classification of the urban traffic state and its application is an important part of intelligent transportation systems (ITS), which can not only help traffic managers grasp the traffic operation situation and analyze congestion, but also provide travelers with more traffic information and help them avoid congestion. Thus, an accurate traffic state classification method would be very practical for urban traffic management. The primary objective of this study is to classify the urban traffic state using a machine-learning method (i.e., the FCM clustering method, and the classification results can be determined from the corresponding clustering labels). In this approach, two parts are developed. First, a new classification indicator, i.e., the ample degree of road network is proposed, and it will make up a comprehensive classification indicator system with other parameters such as traffic flow, speed and occupancy. Then, the traditional fuzzy cmeans (FCM) clustering approach is improved in two regards, i.e., the fuzzy membership function improvement and weighting processing of the samples, and these improvements can enhance the clustering performance. As a result, an improved machine-learning method (i.e., the improved FCM clustering approach) is developed and used to conduct the clustering analysis with realworld traffic flow data. Next, a case study of Shanghai is used to guide the study process, which consists of data processing, clustering analysis and method comparison. The other methods (e.g., the support vector machines (SVM) method, the decision tree method, the k-Nearest Neighbor (KNN) method and the traditional FCM clustering approach) are introduced to compare with the improved FCM clustering approach. The discussion shows the superiority of the proposed method (e.g., compared with the traditional FCM clustering approach, the objective function value of the improved method decreased by 31.11%, and cluster center error also show a descending trend), and it outperforms the other methods in classification performance (e.g., the overall classification accuracy of the improved FCM method increased by 10.10%, 5.45%, 30.92% and 35.66% in comparison with the traditional FCM method, SVM method, decision tree method and KNN method, respectively). Additionally, the NMI, ROC curve results also illustrated the superiority of the improved FCM method to other methods. These comaprison results suggest that the improved FCM clustering approach is feasible and the results can be well used in the advanced traffic

⁎ Corresponding author at: Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China; School of Transportation, Southeast University, Nanjing 211189, China. E-mail addresses: [email protected] (Z. Cheng), [email protected] (W. Wang), [email protected] (J. Lu), [email protected] (X. Xing).

https://doi.org/10.1016/j.tra.2018.10.035

0965-8564/ © 2018 Published by Elsevier Ltd.

Please cite this article as: Cheng, Z., Transportation Research Part A, https://doi.org/10.1016/j.tra.2018.10.035

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

management system, which may have the potential to serve as a reference for releasing accurate traffic state information and preventing traffic congestion and risk.

1. Introduction Traffic state classification is a critical part of intelligent transportation systems (ITS), and it can both reflect the operating state of the road from a holistic perspective and provide an important basis for traffic control and dynamic traffic-flow guidance. By classifying and describing the state of road traffic, traffic managers can understand and grasp the current traffic situation in a timely manner, and then develop corresponding measures for relieving congestion. In addition, accurate traffic state classification also provides theoretical support for the intelligent development of urban road traffic, for it can recommend more reasonable trips and enhance the level of service of the road network. The foundation for conducting a traffic state classification is traffic data collection and processing, after which several algorithms are used to analyze the processed data. Next, the analyzing results are compared to the standard traffic state (i.e., the predefined traffic state). Subsequently, the state of the current transportation system can be obtained by comparison, and the corresponding traffic management strategy can be implemented. In general, the existing traffic state classification method can be divided into two types: the direct method and the indirect method (Wen et al., 2012). The direct classification approach can classify and evaluate the current traffic state level by recognizing video images, while the indirect classification approach can classify the traffic state by analyzing traffic-flow data obtained by the traffic detector (Jochen and Jo, 2011). The primary approach in this study is the indirect classification method. Accordingly, the issue of exactly how to classify and evaluate the traffic state based on existing traffic-flow data is of vital importance, and the result may be used to improve congestions or risks. (Xu et al., 2012, 2013; Mallah et al., 2017; Hao et al., 2017). Recently, machine learning has been successfully applied in many fields and achieved good results, especially in the field of data mining (Austin et al., 2013), artificial intelligence (Jr et al., 2016) and pattern recognition (Mahesh et al., 2013). Due to its good performance and high efficiency, machine learning has been well used to classify and predict traffic states. In this study, a machinelearning method related to the improved FCM clustering approach is developed to classify the traffic state by using real-world trafficflow data from Shanghai’s north-south elevated expressway. Several methods are compared to the improved FCM clustering approach, and based on the comparison results, the superior performance of the proposed method can be determined. Thus, classification results about the urban traffic state can be developed.

2. Literature review Traffic state classification has drawn increasing attention from scholars in recent years. Due to the various definitions and classification indicators considered, diverse studies can lead to a heterogeneous landscape (Dainotti et al., 2012). Cluster analysis is a type of unsupervised machine-learning algorithm that can realize the pattern classification of traffic flow-data without any prior knowledge. Several works of clustering algorithms have been used to classify the traffic state, and these algorithms are primarily focused on the FCM method, the k-means method and their improved versions. Lozano et al. (2009) and Montazeri-Gh and Fotouhi (2011) presented a k-means algorithm for congestion classification and traffic condition recognition, respectively, and their results have good effects on recognizing the traffic state. Lin et al. (2010) studied the network traffic classification based on an improved k-means algorithm, and the variance of flow attributes was used to initialize cluster centers. Next, scarce labeled flows were selected to construct a mapping from the clusters to the predefined traffic classes set. Experimental results illustrated that the improved method was better than the normal k-means algorithm in both overall accuracy and square error value. In the study of Azimi and Zhang (2010), the k-means method and the FCM method were combined to classify freeway traffic flow conditions. The results provided a means of reasonably categorizing oversaturated flow conditions. In the area of FCM clustering studies, Chen. (2005), Sun et al., 2014, and Silgu and Celikoglu. (2015) examined the traffic state classification and traffic-flow patterns based on fuzzy clustering. Several studies explored the FCM clustering method on the recent massive data analysis related to the microscopic driver behavior and congestion status of road segment, as well as the segment-based emissions (Sun and Elefteriadou, 2014; Zhang et al., 2017; Sun et al., 2018). Because of their varying motivations, these studies had different research objectives and results. In short, these studies illustrated that the FCM clustering method can be well used for analyzing the traffic-flow state. In addition to the above methods, spectral clustering was conducted to analyze the traffic state (Yang et al., 2017). With the development of artificial intelligence (AI) technology in the field of transportation, the machine-learning method based on AI has provided a novel vision for the traffic flow state study. Some scholars have studied the traffic state classification with taxonomic and machine-learning approaches. Numerous studies that related to machine-learning approaches such as support vector machines (SVM), k-Nearest Neighbor (KNN), the decision tree and the Bayesian method analyzed traffic-flow conditions. Yu et al. (2013) studied the urban road traffic condition pattern with the SVM method, establishing a classification indicator system that included traffic volume, average speed and occupation ratio. The results showed the SVM kernel function can separate different patterns from traffic flows with high classification accuracy, but that study compared only the classification result of different SVM kernel functions, making no comparison with other methods. Cao et al. (2017) proposed an accurate traffic classification model based on the SPP-SVM method. Comparison results showed that this approach was superior to the traditional SVM method and other algorithms (i.e., Naïve Bayes, k-NN, and RBF Network) in classification accuracy, dimension, and elapsed time. Xu et al. (2016) 2

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

presented a real-time road traffic state measurement algorithm based on the Kernel-KNN method, and the speed and traffic-flow data were extracted to establish the reference sequences of road traffic running characteristics. The results indicated that this algorithm is feasible and can achieve a high accuracy. Perez-Montesinos et al. (2011) introduced a model in the form of a decision tree that was calibrated to translate traffic states from traffic operation and signal control patterns. Test results indicated that the traffic flow state detection methodology performed well and could be efficiently developed on large data sets. Zhang et al. (2015) proposed a robust statistical traffic classification by combining supervised and unsupervised machine-learning techniques. An empirical study on realworld traffic data confirmed the effectiveness of the proposed scheme. Additionally, a dynamic data-driven local traffic state estimation method was proposed by Antoniou et al (2013). In their study, a series of preexisting methods (e.g., model-based clustering, kNN classification and classification using neural networks) were integrated to evaluate the traffic state. Finally, the method was shown to outperform current state-of-the-art methods. Readers can also refer to the studies of Zhang et al. (2014) and Zhang et al. (2012) for a primary overview of Bayesian analysis in classifying the traffic state. In these studies, however, the analysis of classification performance is insufficient. For instance, classification accuracy was not discussed in Zhang et al. (2014), and only overall accuracy was used to evaluate classification performance in Zhang et al. (2012). Other approaches that yielded diverse classification results have also been presented. The dynamic and non-dynamic classification methods are contained in this part. In the field of dynamic research, several works have analyzed the performance of the dynamic approach to traffic state patterns, and the knowledge of the fundamental diagram (i.e., the speed-flow fundamental diagram) was mentioned in these studies (Celikoglu, 2013; Silgu and Çelikoğlu (2014)). Finally, statistical conclusion was obtained from these studies. Celikoglu, (2014) figured out variations of traffic-flow patterns of freeways using a dynamic approach and has introduced a piecewise linearization to the CTM to study discrete flow pattern variations; the result of this method proved promising. Celikoglu and Silgu (2016) also studied multivariate clustering with the goal of extending the traffic-flow pattern dynamic classification, i.e., the dynamic classification and nonhierarchical clustering. The results were considered to have practical application value in incident detection and control. Furthermore, García-Ródenas et al. (2016) used the k-means algorithm for daily traffic pattern identification, selecting flow, speed, and occupancy as selected the indicators. That method transformed dynamic traffic data at the network level in a pseudo-covariance matrix, which collects dynamic correlations between the road links. The results suggest that these methods can effectively detect the current daily traffic pattern and can be used in intelligent traffic management systems. On the other hand (i.e., in the non-dynamic classification study), Kong et al. (2009) employed an urban traffic state estimation approach by fusing multisource information, and the mean speed from single-loop detectors and GPS probe vehicles were fused to analyze the traffic state. Test results demonstrated that the proposed approach performed well in urban traffic applications. A real-time freeway traffic state estimation based on the extended Kalman filter was also studied, with promising results for subsequent development (Wang and Papageorgiou, 2005). Palmieri and Fiore (2009) proposed a nonlinear method of traffic classification based on circulation; this method was not significantly affected by dynamic changes in traffic ports and was relatively stable. Bhaskar et al. (2014) established a model that used loop data for seamless travel time and density estimation; the model was validated using both real and simulated data. The results indicated that the traffic state classification was accurate. Tsubota et al. (2015) proposed the following two methods for estimating the traffic state variables of signalized arterial sections: a method based on cumulative vehicle counts, and another method based on vehicle trajectory from taxi GPS logs. The results showed that the trajectory-based method successfully captured the features of traffic states and can be a good estimator of network-wide traffic states. As discussed above, different ideas correspond to various results and are represented by different classification indicators and algorithms. However, two aspects are studied relatively infrequently in previous studies; the use of multiple classification indicators, especially beyond three; and the comprehensive analysis of classification accuracy. The research conducted in this paper fills these knowledge gaps, which were rarely addressed by previous studies. This paper introduces a classification indicator system based on four indicators, i.e., traffic flow, speed, occupancy, and ample degree, and this make the analysis more comprehensive. Next, an improved FCM clustering algorithm based on machine learning that can overcome the unbalanced sample distribution issues is presented, thereby improving the clustering performance and accuracy of the traffic state. Subsequently, a more comprehensive analysis of the classification performance and accuracy based on the classification learner is carried out. This analysis includes a comparison of the objective function values and cluster center errors between the FCM clustering algorithm both before and after. It also compares the overall classification accuracy, user classification accuracy, producer classification accuracy, commission classification error, omission classification error and ROC curves between the proposed method and other methods (i.e., the traditional FCM clustering method proposed by Chen, 2005; Sun et al., 2014; the KNN method presented by Xu et al., 2016; the decision tree method provided by Perez-Montesinos et al., 2011; and the SVM method proposed by Yu et al., 2013). These are the main contributions of this paper to the state of the art. The remainder of this paper is structured as follows. In Section 3, the methodology is proposed, including the definition of traffic state classification levels, the determination of traffic state classification indicators, and the corresponding algorithms. In Section 4, a case study of Shanghai is conducted, in which the improved FCM clustering algorithm based on machine learning is used to analyze the samples, then the classification results of traffic states are obtained. Next, comparisons between the proposed method and other methods are made. In Section 5, a discussion is illustrated to analyze the comparison result. In Section 6, a brief review of this paper is presented and future research is discussed. 3. Methodology In this section, the traffic state classification levels and indicators used in the proposed algorithm are first defined. Next, the improved FCM algorithm based on machine learning and its detailed improvements process are introduced. Finally, the iteration 3

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

process of the improved FCM algorithm is presented. 3.1. Traffic state classification level definition Many types of traffic state classification levels have been proposed, such as the “Highway Transportation Manual (HCM)” method in America, the “Improved HCM Method” in Japan and the “Level of Service (LOS)” method in China, e.g., some studies have evaluated the level of service of mid-block bicycle lanes and pedestrian crosswalks (Lu et al., 2017; Kadali and Vedagiri, 2015). In this section, the traffic state classification levels are defined by combining the existing methods and the relevant road characteristics; thus, these classification levels are broadly applicable and reasonable. The traffic state classification levels and corresponding traffic states are presented based on speed (see Table 1). In Table 1, the value of different traffic states and their corresponding road types is speed (km/h). There are five types of traffic state, i.e., the state of fluent, basic fluent, slight congestion, moderate congestion and severe congestion (Li et al., 2009), corresponding to classification levels 1, 2, 3, 4 and 5. Consequently, five clusters should be listed in the subsequent clustering study of this paper. 3.2. Traffic state classification indicators determination Many indicators have been applied to traffic state classification in previous studies, including macroscopic indicators and microscopic indicators. Macroscopic indicators were used to describe the running characteristics of traffic, including traffic flow, speed, density, occupancy and queue length. Microscopic indicators were used to illustrate the operational characteristics of vehicles related to each other in the road network, including time headway and space headway. In general, speed, traffic flow and density are the most commonly used indicators (Jiang et al., 2008). In addition, travel time, delay, occupancy-to-flow rate ratio and occupancy-tospeed ratio, have been adopted in some studies (Jiang, 2004; Levinson and Lomax, 1996). The purpose of this paper is to classify the traffic state of an urban expressway network. Previous researchers have used a single indicator such as speed, traffic volume, etc., or a combination of two (speed and traffic flow, density and traffic flow or speed and density) or three indicators (traffic flow, speed and density) to study this problem, but very few studies have used more than three indicators. In this paper, the indicators are four (traffic flow, speed, occupancy and ample degree), and the ample degree is a new indicator that has never been defined in previous studies, and it can be defined as the ratio of the remaining traffic flow to the maximum traffic flow in the road network at a given time; this indicator is presented below to better reflect the characteristics of the expressway network.

ρit =

qi max − qit qi max

(1)

where ρit denotes the ample degree of section i at time t, i is the detected section of the urban expressway, qi max is the maximum traffic-flow passing section i and is a fixed value (we assumed that qi max is lower than the maximum capacity), qit is the detected traffic flow in section i at time t, and qi max − qit denotes the remaining traffic flow in section i at time t. 3.3. Traffic state classification methods The classification method in this study is parmarily the improved FCM clustering algorithm, but before this, the traditional FCM clustering algorithm is introduced first. As a result, this section contains three part: introduction of the traditional FCM clustering algorithm; introduction of the improved FCM clustering algorithm; Iteration process of the improved FCM algorithm. 3.3.1. Introduction of the traditional FCM clustering algorithm As a machine-learning method, the traditional FCM clustering algorithm is a typical approach primarily used to classify data that are distributed in a multidimensional data space into a certain number of categories. The key concept involves converting relevant clustering issues into mathematical problems that can be solved using the related methods of sample classifications. In the general procedure, a sample set X = {x1, x2, … , xn} is divided into c fuzzy sets in accordance with certain criteria, where c is the number of categories given in advance. Next, the cluster center of each category is determined to minimize the objective function of nonsimilarity indicators. Clustering results are shown in the cluster center matrix K and clustering renderings. The objective function of the traditional FCM clustering algorithm is as follows: Table 1 Traffic state classification levels definition. Traffic state classification description

Traffic state Classification level

Fluent 1

Basic fluent 2

Slight congestion 3

Moderate congestion 4

Severe congestion 5

Road type

Urban expressways Trunk road Secondary trunk road Branch road

> 60 > 45 > 35 > 25

40–60 35–45 25–35 18–25

30–40 25–35 18–25 12–18

20–30 15–25 12–18 8–12

< 20 < 15 < 12 <8

4

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al. c

n

∑ ∑ (μij )mdij2

min J (X , μ, K ) =

(2)

i=1 j=1 c

∑i = 1 μij = 1

⎧ ⎪ ⎪

0 ⩽ μij ⩽ 1 st . 0 < ∑c μ < n i = 1 ij ⎨ ⎪ i = 1, 2. ..,c ⎪ ⎩ j = 1, 2. ..n where X denotes a sample set, n represents the total number of samples, and c is the number of categories and the clustering centers. μij denotes the ith line and jth column in matrix μ, dij = ∥x j − ki∥ is the Euclidean distance between the ith cluster center and the jth sample, m is the weighted index, and the optimal range is from 1.5 to 2.5 (it is selected as 2 in this study). Eq. (2) shows that a lower objective function value corresponds to a better clustering result. Therefore, the requirements for minimizing the objective function by constructing Lagrangian multipliers and finding partial derivatives to all input parameters are expressed in Eqs. (3) and (4): c

2/ m − 1

dij μij = 1/ ∑ ⎜⎛ ⎟⎞ d k = 1 ⎝ kj ⎠

n

n

k=

(3)

∑ (μij )mxj / ∑ (μij )m j=1

i = 1,2,3. ..c (4)

j=1

where μij denotes the fuzzy membership matrix, and k represents the fuzzy cluster center matrix. 3.3.2. Introduction of the improved FCM clustering algorithm A new machine-learning method is proposed by improving the traditional FCM clustering algorithm (i.e., the improved FCM clustering algorithm). Two aspects are changed: the improvement of fuzzy membership degree and the introduction of the weighted processing of samples. These improvements can enhance the clustering renderings (see Fig. 4) and improve both clustering performance and classification accuracy (see Figs. 5, 6, 8, and 9); they will be provided in detail in the section of method comparison. (1) Fuzzy membership degree improvement In the traditional model, the sum of all fuzzy membership degrees is 1, but this degree is effective only when using ideal samples. If the sample data are not perfect, the clustering process can be ineffective. For example, when a sample point is far from the cluster center, its strict membership degree with categories will decrease. However, normalization requires a relatively higher membership for all categories, which will affect the clustering results of the algorithm. Thus, the membership degree assessment is improved, which sets N as the sum of all membership degrees. The equation for N can be expressed as follows: 2/ m − 1

c d í′j ⎞ μí′j = N / ∑ ⎜⎛ ⎟ ′ d k = 1 ⎝ ′ kj ⎠

(5)

where μí′j denotes the improved fuzzy membership degree, N is the number of samples, and d í′j = ∥x j − k ′∥ j is the new Euclidean distance. (2) Weighted processing of sample data In a database with N data points, the knowledge of each point is different, i.e., the sample points have different effects on classification. Some points play a more significant role in the classification, while others may be less important. For instance, in a sample set with noisy points, the noisy points should be less important points. Thus, a weight is assigned to each point to distinguish the differences between sample points. This weighting process is not included in the traditional FCM algorithm. The weights can be determined as follows: N

ωj = ω′j / ∑ ω′j (6)

j=1

n ∑ j=1

dij'

is the new Euclidean where ω′j represents the weight given to a sample, ω′j = dij′/ N , N is the total number of samples, and distance. The introduction of weights can be used to identify similar samples. For relatively dense points, because they are close to the center point, the weights may also be highly similar. In such cases, the weights of noisy points are smaller, and the negative effects of classification can be mitigated. The updated matrix of the cluster center also changes after sample weighting is introduced, and it can be expressed in Eq. (7). 5

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Additionally, the objective function of the improved FCM algorithm is shown in Eq. (8). N

k ′ (b + 1) =

N

∑ ωj (μij′ (b+1) )mxj / ∑ (μij′ (b+1) )m j=1

i = 1, 2, 3. ..c (7)

j=1

Fig. 1. Outlier detection results and samples distribution after outlier removed ((a) outliers of traffic flow, (b) outliers removed of traffic flow, (c) outliers of speed, (d) outliers removed of speed, (e) outliers of occupancy, (f) outliers removed of occupancy). 6

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al. c

N

min J ′ (X ′, μ′, K ′) = ωj ∑ ∑ (μí′j )mdij′ 2

(8)

i=1 j=1 c

⎧ ∑i = 1 μí′j = N ⎪ ⎪ 0 ⩽ μij′ ⩽ 1 st . 0 < ∑c μ ′ < N ⎨ i = 1 ij ⎪ i = 1, 2. ..,c ⎪ ⎩ j = 1, 2. ..N 3.3.3. Iteration process of the improved FCM algorithm The iteration process of the improved FCM algorithm includes the following steps. Step 1: Standardizing the sample In this paper, the range method is applied to standardize the sample, and the element set in the standardized sample is as follows.

x í′j = (x ij − min {x ij})/( max {x ij} − min {x ij}) 1⩽j⩽N

(9)

1⩽j⩽N

1⩽j⩽N

Step 2: Initializing the sample c is the number of cluster centers. The iteration threshold is θ . Thus, the cluster center matrix K(0) can be initialized, and the counter of “b = 0” is set. Step 3: Calculating the improved fuzzy membership u′ b Step 4: Updating the fuzzy cluster center matrix: K (b+1) Step5: Calculating the new objective function to determine whether ∥K (b) − K (b + 1) ∥ < θ . If the answer is yes, the algorithm can be stopped; otherwise, b = b + 1, and return to Step 3 to continue the iterative process. 4. Case study This section is composed of data processing, clustering analysis and method comparison. Real-world traffic datasets from the Shanghai north-south elevated expressway are first processed by eliminating the outliers. Next, the proposed machine-learning 100

80

150

speed (km/h)

traffic flow(pcu/5min)

200

100

50

0 6:00

60

40

20

8:00

10:00

12:00

14:00

16:00

18:00

20:00

22:00

0 6:00

24:00

8:00

time/h

12:00

14:00

16:00

18:00

20:00

22:00

24:00

time/h

(a)

(b) 1.0

60

0.8

ample degree

occupancy (%)

10:00

40

20

0.6 0.4 0.2

0 6:00

8:00

10:00

12:00

14:00

16:00

18:00

20:00

22:00

0.0 6:00

24:00

8:00

10:00 12:00 14:00 16:00

18:00 20:00 22:00 24:00

time/h

time/h (c)

(d)

Fig. 2. Distribution of samples ((a) traffic flow distribution, (b) speed distribution, (c) occupancy distribution, (d) ample degree distribution). 7

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

approach (i.e., the improved FCM clustering algorithm) is conducted to analyze the sample datasets after they are processed. In the method comparison section, the improved FCM clustering algorithm is first compared to the traditional FCM clustering algorithm on the aspects of objective function value and cluster center error. Afterward, the classification accuracy between the proposed machine learning approach and other machine-learning methods (i.e., the traditional FCM method proposed by Chen. 2005; Sun et al., 2014; the KNN method presented by Xu et al., 2016; the decision tree method provided by Perez-Montesinos et al., 2011; and the SVM method proposed by Yu et al., 2013) are further performed so that they can be contrasted. The reason these measures are executed is to estimate clustering performance and classification accuracy. 4.1. Data processing The samples were collected by loop detectors in the north-south elevated expressway of Shanghai, and the detected section spans from Yan-an East Road to Gong-he Xin Road of Shanghai north-south elevated expressway. The total length of the selected sections in this paper is approximately 7 km. Twenty-eight loop detectors were established in seven sections, and the sample data include traffic flow, speed and occupancy. The data detection time spans from 6:00 to 23:00 of one day, and the detection interval is 5 min. It should be noted that the above data have been proved effective in classifying traffic state (Bing, et al., 2015). Because some loop detectors may be damaged or unstable (at least to some extent), the data collected from them may have quality problems, for example, the existence of a large number of outliers, which will have a negative effect on the clustering result. Thus, the data should be processed first. In this paper, data processing mainly involves the elimination of outliers. Rapid Miner Studio is used to detect outliers, a process that involves the following three steps: Step 1: Loading data and preparing for the analysis. Normalization of the data is a typical step when comparing attributes of different natures. In this case, we use Z-transformation as method to ensure that typical deviations are equal so that an outlier has a clear meaning in every dimension of the problem. Step 2: We apply the clustering operator to the data to find coherent groups in the traffic flow parameter list (traffic flow, speed and occupancy). Next, we find “outlier scoring” using the local outlier factor (LOF) mechanism. Step 3: We de-normalize the data by applying the reverse normalization model, thereby obtaining the original data. We then filter the examples to obtain one data set with the outliers and another with the rest, using “outlier = 1.5” as a threshold. The outlier detection results and the sample distribution after outlier removal are shown in Fig. 1. After data processing, 5000 data points of each traffic flow parameter (traffic flow, speed, occupancy) are obtained. The trafficflow parameters changing over time are described in Fig. 2. It should be noted that the ample degree is calculated according to Eq. (1) (see Fig. 2(d)). Fig. 2(a), (b), and (c) show the actual traffic characteristics of urban expressways, and they are consistent with the Shanghai north-south elevated expressway. In Fig. 2(d), the ample degree exhibits periodic changes, showing the negative correlation relationship with the occupancy to a large extent. The ample degree decreases before 8:00 and presents an increasing trend after 20:00, while the occupancy exhibits the opposite trends at these times. In addition, the relationship between ample degree and occupancy at the morning peak and evening peak is also approximately opposite, which reflects the regularity of traffic phenomena, suggesting that the ample degree is a relatively reasonable quantitative indicator. Fig. 3 shows the real traffic state classification level based on speed (see Table 1), in which the samples of different levels are listed, and this is treated as the reference standard in the section of method comparison. 4.2. Significance test of the classification indicators The T-test method is used to indicate the significant level of these indicators (i.e., traffic flow, speed, occupancy and ample degree) that were used to classify the traffic states. The one-sample t-test is adopted to perform this analysis, and a 95% confidence level is set. It should be noted that the normal distribution and homogeneity of variance of these indicators should be tested first (this is the basis to perform the t-test); after this, the one-sample t-test can be performed. We have verified the normal distribution and

traffic state classification level

6 5

160 samples

4

970 samples

3

651 samples

2

2264 samples

1

955 samples

0 6:00

8:00

10:00

12:00 14:00

16:00

18:00 20:00 22:00 24:00

time/h Fig. 3. Traffic state classification level. 8

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

homogeneity of variance. The statistical description of these indicators and t-test results are shown in Tables 2 and 3 respectively. Table 3 illustrates that the significances of traffic flow, speed, occupancy and ample degree are all lower than 0.05, so these indicators that were used to classify the traffic states have statistical significance.

4.3. Clustering analysis The improved FCM clustering algorithm based on machine learning is used to analyze the samples, which are those without outliers (i.e., traffic flow, speed, occupancy and ample degree, see Figs. 1 and 2). The clustering renderings are shown in Fig. 4, which clearly presents the different traffic states for different degrees, and there is almost no noise in Fig. 4(a), (b), (c), (d), (e) and (f), which suggests that the sample points processed using the improved FCM cluster algorithm have high membership degrees for different classification levels. Additionally, the samples are distributed in a relatively uniform manner, and the similarity between points is considerable, demonstrating that the clustering method proposed in this paper is effective on the aspect of clustering renderings. Five traffic state classification levels can be observed in the clustering results; they are denoted by different colors. Notably, red, yellow, purple, green and light blue represent the fluent state (level 1), the basic fluent state (level 2), a slight congestion (level 3), moderate congestion (level 4) and a severe congestion (level 5), respectively. The clustering center matrix is listed in Table 4. User accuracy and producer accuracy (see Eqs. (11), (12)) are used to indicate the effectiveness of the five state classifications, and the results are shown in Table 5. The user accuracy and producer accuracy in Table 5 are all above 90%, which shows the high efficiency of five traffic state clustering. 4.4. Method comparison In this section, two comparison schemes are implemented to evaluate the clustering performance and classification accuracy through machine learning. The first scheme compares the objective function value and cluster center error between the traditional FCM method and the improved FCM method, with the goal of proving that the FCM method after improvement is effective in clustering performance. The second scheme compares classification accuracy between the improved FCM method and other algorithms such as the traditional FCM method, the KNN method, the decision tree method and the SVM method. This comparison is performed with the intention to further validate the improved FCM method’s classification accuracy. The two comparison schemes reflect the good performance of the method proposed in this research.

4.4.1. Comparison of the objective function value and cluster center error For consistency with the above, the clustering center is selected based on speed. Notably, the speeds of urban expressways in Table 1 or Fig. 3 are selected as the reference standard (to simplify the analysis, the average speed of each level in Table 1 is selected, i.e., the speed of level 1 of urban expressways is 60 km/h, the speed of level 2 is 50 km/h, the speed of level 3 is 35 km/h, the speed of level 4 is 25 km/h, and the speed of level 5 is 20 km/h). The cluster center error results are shown in Table 6. The comparison of objective function value and cluster center error are shown in Figs. 5 and 6.

4.4.2. Comparison of the classification accuracy In this part, the comparative approaches include the improved FCM method, the traditional FCM method, the SVM method, the decision tree method and the KNN method. The classification accuracy is composed of three kinds of accuracies, i.e., the overall classification accuracy, user accuracy and producer accuracy, and two kinds of errors (commission error and omission error). The following are definitions of these accuracies and errors. The confusion matrix ( Aij 1⩽ i⩽ 5, 1 ⩽ j⩽ 5) is introduced to better interpret the classification accuracies and errors. In the confusion matrix, the column (j) represents the predicted class, which is the x-coordinate; row (i) denotes the true class, which is the y-coordinate. The correct classification samples are distributed along the diagonal of the confusion matrix ( Aii or Ajj ). (1) Overall classification accuracy Overall classification accuracy is the ratio of the correct classification samples to the total samples. Therefore, in the confusion matrix, it is the ratio of the sum of diagonal elements to the sum of all elements. The equation is as follows: Table 2 One-sample statistics of the classification indicators. Classification indicator

Mean

Std. Deviation

Std. Error Mean

Traffic flow (pcu/5 min) Speed (km/h) Occupancy (%) Ample degree

126.74 46.41 21.03 0.3330

21.736 15.139 10.974 0.11442

0.307 0.214 0.155 0.00162

9

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Table 3 One-sample t-test of the classification indicators. Test value = 0 t

Traffic flow (pcu/5 min) Speed (km/h) Occupancy (%) Ample degree

Sig.(2-tailed)

412.17 216.68 135.45 205.73

Mean difference

0.000 0.000 0.000 0.000

95% Confidence

126.735 46.406 21.027 0.3301

Lower

Upper

126.13 45.99 20.72 0.3298

127.34 46.83 21.33 0.3362

5

ηo =

∑i = 1 Aii 5 ∑i = 1

5

∑ j = 1 Aij

where ηo denotes overall classification samples.

(10) 5 accuracy, ∑i = 1

Aii denotes the correct classification samples,

5 and ∑i = 1

5 ∑ j=1

Aij denotes the total

(2) User accuracy User accuracy is the ratio of the correct classification samples of one class to the total classification samples of one class. In the confusion matrix, it is the ratio of the diagonal element of one class to the total numbers on this row. The equation is as follows:

ηui =

Aii 5

∑ j = 1 Aij

(11)

where ηui denotes user accuracy, Aii is the correct classification samples of the ith class, and class.

5 ∑ j=1

Aij is the total samples of the ith

(3) Producer accuracy Producer accuracy is the ratio of the correct classification samples of one class to the actual total samples of one class. In the confusion matrix, it is the ratio of the diagonal element of one class to the total numbers of this column. The equation is as follows:

ηpj =

Ajj 5

∑i = 1 Aij

(12)

where ηpj represents producer accuracy, Ajj denotes the correct classification samples of the jth class, of the jth class.

5 and ∑i = 1

Aij is the total samples

(4) Commission error Commission error is the ratio of the false classification samples of one class to the total classification samples of one class. In the confusion matrix, it is the ratio of other elements except for the diagonal element of one class to the total numbers in the row. The equation is as follows: 5

δci =

( ∑ j = 1 Aij ) − Aii 5

∑ j = 1 Aij

(13) 5

where δci denotes commission error, Aii represents the correct classification samples of the ith class, and ∑ j = 1 Aij denotes the total samples of the ith class. (5) Omission error Omission error is the incorrect classification samples of one class to the actual total samples of one class. In the confusion matrix, it is the ratio of the other elements except for the diagonal element to the total numbers in the column. The equation shown is as follows: 5

δoj =

( ∑i = 1 Aij ) - Ajj 5

∑i = 1 Aij

(14)

where δoj denotes producer accuracy, Ajj denotes the correct classification samples of the jth class, and 10

5 ∑i = 1

Aij denotes the total

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

200

basic fluent

150

traffic flow/(pcu/5min)

traffic flow/(pcu/5min)

200

slight congestion

moderate congestion

100

severe congestion

50

moderate congestion

slight congestion

150

100

50

basic fluent

fluent

severe congestion

fluent 0 0

10

20

30

40

50

speed/(km/h)

60

70

0 0

80

10

20

30

(a) 80

ample degree

speed/(km/h)

50

moderate congestion

40

severe congestion

30 20

fluent

0.4 0.2

20

30

40

occupancy/%

50

60

0 0

70

moderate congsetion

0.6

10

basic fluent

severe congsetion

slight congsetion 10

20

30

ample degree

traffic flow/(pcu/5min)

fluent

0.8 slight congestion

0.4

basic fluent

severe congestion

10

20

60

70

moderate congestion

150

100

slight congestion basic fluent severe congestion

50

moderate congestion 0 0

50

(d) 200

0.6

40

occupancy/%

(c) 1

0.2

70

0.8

slight congestion

10

60

1

basic fluent

60

0 0

50

(b)

fluent

70

40

occupancy/%

fluent 30

40

50

speed/(km/h)

60

70

0 0

80

0.1

0.2

0.3

0.4

0.5

ample degree

(e)

0.6

0.7

0.8

(f)

Fig. 4. Clustering results of the improved FCM algorithm ((a) clustering result between traffic flow and speed, (b) clustering result between traffic flow and occupancy, (c) clustering result between occupancy and speed, (d) clustering result between occupancy and ample degree, (e) clustering result between ample degree and speed, and (f) clustering result between traffic flow and ample degree). Table 4 Cluster center matrix of the improved FCM algorithm. Classification level

Traffic flow (pcu/5 min)

Speed (km/h)

Occupancy (%)

Ample degree

Level Level Level Level Level

89 119 141 145 122

59 57 52 31 23

9 12 16 30 39

0.53 0.38 0.25 0.23 0.36

1 2 3 4 5

11

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Table 5 User accuracy and producer accuracy of the five classification levels. Classification level Classification Classification Classification Classification Classification

level level level level level

1 2 3 4 5

User accuracy (%)

Producer accuracy (%)

98.11 98.19 93.39 94.07 90.63

96.59 98.49 93.25 95.49 98.69

Table 6 Error of cluster center of traditional and improved FCM method. Method Level

Reference standard (km/h)

Traditional algorithm (km/h)

Error (%)

Improved algorithm (km/h)

Error (%)

1 2 3 4 5

60 50 35 25 20

62 58 53 33 27

3.22 13.79 33.96 24.24 25.93

59 57 52 31 23

1.67 12.28 32.69 19.35 13.04

improved FCM method traditional FCM method

Fig. 5. Comparison of objective function iterations.

45

Traditional FCM Method Improved FCM Method

cluster center error (%)

40 35 30 25 20 15 10 5 0 1

2

3 class

4

5

Fig. 6. Comparison of cluster center error.

samples of the jth class. After these accuracies and errors are defined, the classification learner based on MTLAB R2017 is used to test them according to different approaches. Three steps must be performed. In step 1, we import the samples that have been processed (i.e., traffic flow, speed, occupancy, ample degree and traffic state classification level 1, 2, 3, 4, and 5 based on the speed cluster center obtained by the 12

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

1

937

18

2

33

2223

6

2

16

608

27

38

920

15

3

4

5 1

2

Traditional FCM Method 1

864

91

2

40

2172

30

19

3

3

48

491

104

8

12

4

14

100

844

12

145

5

3

40

98

19

True class

True class

Improved FCM Method

3 4 Predicted class

5

1

2

3 4 Predicted class

(a)

(b)

938

17

2

61

2115

Decision Tree Method

86

2 True class

True class

SVM Method 1

1

381

574

2

195

1994

73

2

98

409

143

1

116

808

46

60

100

3

592

58

1

4

112

811

47

4

33

127

5

5 1

2

5

3 4 Predicted class

3

5

1

(c)

2

3 4 Predicted class

(d)

True class

KNN Method 1

504

451

2

434

1750

76

4

3

114

419

117

1

4

3

145

779

43

49

111

4

5

5 1

2

3 Predicted class

5

(e) Fig. 7. Confusion matrix obtained by different methods ((a) improved FCM method, (b) traditional FCM method, (c) SVM method, (d) decision tree method, (e) KNN method). 13

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

improved FCM method) into the classification learner; In step 2, the response and predictors should be determined, the traffic state classification level is selected as the response, and traffic flow, speed, occupancy and ample degree are then regarded as the predictors. In step 3, the validation method is a cross-validation, which protects against overfitting by partitioning the data set into folds and estimating the accuracy of each fold. Finally, we run the classification learner with different methods. The confusion matrix under each method is obtained and shown as Fig. 7. Based on the confusion matrix (see Fig. 7) that shows different sample numbers of different levels, the normalized mutual information (NMI) between the reference standard (see Fig. 3) and other methods (e.g., the improved FCM method, the traditional FCM method, the SVM method, the decision tree method and the KNN method) are calculated. The results of the NMI are shown in Table 7. The closer the NMI is to 1, the better the classification effect. Table 7 illustrates that NMI between reference standard and improved FCM method is the biggest, which means that the classification result of improved FCM method is more similar to the reference standard, compared with other methods. Next, the overall classification accuracy, user accuracy, producer accuracy, commission error and omission error are calculated according to Eqs. (10)–(14). Fig. 8 shows the comparison of overall classification accuracy between the comparative approaches. The stacking bar charts are used to demonstrate the user accuracy, producer accuracy, commission error and omission error from class 1 to class 5 (corresponding to classification levels 1 to 5) of different methods, which are presented as Fig. 9. Additionally, the ROC curve is developed to evaluate the classifiers performance of different methods. As shown in Fig. 10, the horizontal and vertical coordinates represent the false positive rate and the true positive rate, respectively, and AUC represents the area under ROC. AUC may have a good response to the classifier performance; the larger the AUC, the better the classifier performance. 5. Discussion In this section, the analysis of the comparison results is analyzed to discuss the validity of the proposed method. The objective function of the FCM clustering method (see Eq. (2)) is to optimize the minimum values, so in a general way, the clustering performance seems to be better if the objective function values are smaller. Fig. 5 illustrates the objective function iteration results of both the improved FCM method and the traditional FCM method, and it indicates that the objective function value of improved FCM clustering method decreased by 31.11% compared with the traditional method, so this may indirectly suggest that the clustering result of the improved FCM clustering method is better. Moreover, it can be observed from Fig. 6 that the errors of the improved FCM method from class 1 to class 5 are all lower than that of the traditional FCM method. Hence, the preliminary proof is formed that the improved FCM method is better than the traditional FCM method in clustering performance. To further verify the effectiveness of the improved FCM method, NMI is proposed as the performance comparison indicator. NMI results show that NMI between reference standard and improved FCM method is the largest (0.777), and it is larger than the NMI between reference standard and other methods (i.e., NMI between reference standard and traditional FCM method is 0.651, NMI between reference standard and SVM method is 0.704, NMI between reference standard and decision tree method is 0.505, and NMI between reference standard and KNN method is 0.573.). The NMI analysis examines that the classification of the improved FCM method may be better than other methods (i.e., the traditional FCM method, the SVM method, the decision tree method and the KNN method). Furthermore, the classification accuracy and the ROC curves are further proposed to be contrasted between the improved FCM method and other methods (i.e., the traditional FCM method, the SVM method, the decision tree method and the KNN method). The confusion matrix obtained from the classification learner is the foundation for calculating the classification accuracy. Fig. 8 shows that the overall classification accuracy of the improved FCM method is the highest, up to 96.67%; the classification accuracy of the other methods is 87.8% (traditional FCM method), 91.67% (SVM method), 73.84% (decision tree method) and 71.26% (KNN method). The user accuracy (see Fig. 9(a)) and the producer accuracy (see Fig. 9(b)) of the improved FCM method from class 1 to class 5 all account for the maximum proportion among other methods, and the average user accuracy from class 1 to class 5 of the improved FCM method is 95.04%, which increased by 24.1%, 6.24%, 29.17% and 27.93%, compared with the traditional FCM method (average user accuracy is 72.14%), the SVM method (average user accuracy is 89.11%), the decision tree method (average user accuracy is 67.32%) and the KNN method (average user accuracy is 68.5%). Similarly, the improved FCM method also raises average producer accuracy by 18.59%, 9.76%, 25% and 26.83%, respectively, compared with the traditional FCM method, the SVM method, the decision tree method and the KNN method. Although the commission error at class 1 (see Fig. 9(c)) and the omission error at class 2 (see Fig. 9(d)) of the improved FCM method are higher than the SVM method, its average proportion of the commission error and omission error from class 1 to class 5 are much lower than other methods. Finally, Fig. 10 demonstrates that the AUC of the improved FCM method, the traditional FCM method, the SVM method, the decision tree method and the KNN method are Table 7 NMI comparison between reference standard and other methods. NMI comparison NMI NMI NMI NMI NMI

between between between between between

reference reference reference reference reference

NMI value standard standard standard standard standard

and and and and and

improved FCM method traditional FCM method SVM method decision tree method KNN method

14

0.777 0.651 0.704 0.505 0.573

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Fig. 8. Overall classification accuracy comparison of different methods.

(a)

(b)

(c)

(d)

Fig. 9. Comparison of user accuracy, producer accuracy, commission error and omission error of different methods ((a) comparison of user accuracy, (b) comparison of producer accuracy, (c) comparison of commission error, (d) comparison of omission error).

0.99, 0.98, 0.89, 0.88, and 0.85 respectively, indicating that the improved FCM method in this study provides the best classification performance. The above analysis may also explain that the improved FCM method is superior to the other methods in classification accuracy. 15

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

1.0 0.9 0.8

True positive rate

0.7 0.6 0.5 Base line

0.4

ROC (KNN Method) AUC=0.85

0.3

ROC (Decision Tree Method) AUC=0.88 ROC (Traditional FCM Method) AUC=0.89

0.2

ROC (SVM Method) AUC=0.98

0.1 0.0

0.0

ROC (Improved FCM Method) AUC=0.99

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

False positive rate Fig. 10. ROC curves of different methods.

Hence, the argument that the proposed method outperforms other methods are strengthened. 6. Conclusion An accurate and rapid traffic state classification can provide a foundation for issuing traffic information and preventing traffic congestion, which has an important effect on modern traffic management. This study presents an effective machine-learning method to classify the traffic state, in which a comprehensive classification indicator system based on speed-traffic flow-occupancy-ample degree is established. The traditional FCM clustering algorithm is improved both on fuzzy membership degree and the weighted process of samples. Next, the comparisons between the proposed method and other machine-learning methods (e.g., the SVM method, the decision tree method, the KNN method and the traditional FCM clustering method) are conducted. Comparison results of the objective function value and cluster center error initially verify the feasibility of the improved FCM method. Comparisons of the NMI, the classification accuracies (e.g., overall classification accuracy, user accuracy, producer accuracy, commission error and omission error) and the ROC curves are further provided to prove the superiority of the improved FCM method to other methods; such detailed analyses of classification performance are scarce in previous studies (e.g., Lin et al., 2010; Cao et al., 2017; Perez-Montesinos et al., 2011; Antoniou et al., 2013; Zhang et al., 2014; Zhang et al., 2012). Finally, the Shanghai north-south elevated expressway of China is evaluated as a case study to validate the method, and the case study results exhibit good results in classifying the traffic state, i.e., through the classification results, the state to which the traffic data belong can be determined intuitively. Therefore, the method proposed in this paper is effective in classifying the traffic state of urban expressways. The results of this study will have important implications for urban road traffic management and control (i.e., traffic managers will have a clearer understanding of the traffic status of the road network and will grasp the basic running state of the urban expressways, according to which the traffic management strategy such as reasonable traffic-flow guidance under heavy traffic volume situation and variable speed limits under low speed conditions for the safe transition of vehicles can be implemented). This study may also provide a reference for classifying the traffic state of urban expressways by using real-world traffic flow data. Accordingly, this research makes a double contribution to engineering practice and theoretical research in the field of urban traffic engineering. However, before the results of this study are used in engineering practice and theoretical research, several efforts must be need. For example, the data provided in this paper are primarily the loop detector data for Shanghai from a few year ago, but the real traffic state based on the current traffic-flow data for other cities may show a different result with the increasing complexity of traffic conditions. In addition, traffic crashes may also have an effect on urban road traffic states, i.e., the low speed may sometimes be caused by traffic accidents other than congestions. Therefore, the authors suggest than future studies should focus on these issues, i.e., more comprehensive studies based on real-time traffic flow and crash data in other cities need to be performed. Finally, it should be noted that the traffic classification state level involves speed (see Fig. 3), but it can also be classified by parameters such as occupancy, traffic volume, etc., or a combination of these parameters. Additionally, though there are five predefined traffic state classification levels in this study, the proposed method is also suitable for other traffic state classification levels (three or four kinds of classification levels). Thus, future research should also be conducted to address these issues. Acknowledgments This research was supported by the Jiangsu Province Science and Technology Project (Grant No. BY 2016076-05) and Scientific 16

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Research Foundation of Graduate School of Southeast University (No. YBPY1886). The authors would like to thank the editor and the anonymous reviewers for their constructive comments and valuable suggestions to improve the quality of the article. References Azimi, M., Zhang, Y., 2010. Categorizing freeway flow conditions by using clustering methods. Transp. Res. Rec. 2173, 105–114. https://doi.org/10.3141/2173-13. Austin, P.C., Tu, J.V., Ho, J.E., Levy, D., Lee, D.S., 2013. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J. Clin. Epidemiol. 66 (4), 398–407. https://doi.org/10.1016/j.jclinepi.2012.11.008. Antoniou, C., Koutsopoulos, H.N., Yannis, G., 2013. Dynamic data-driven local traffic state estimation and prediction. Transp. Res. Part C: Emerg. Technol. 34, 89–107. https://doi.org/10.1016/j.trc.2013.05.012. Bhaskar, A., Tsubota, T., Kieu, L.M., Chung, E., 2014. Urban traffic state estimation: fusing point and zone-based data. Transp. Res. Part C: Emerg. Technol. 48, 120–142. https://doi.org/10.1016/j.trc.2014.08.015. Bing, Q.C., Gong, B.W., Yang, Z.S., Lin, C.Y., Xin, Q., 2015. Traffic state identification for urban expressway based on projection pursuit dynamic cluster model. J. Southwest Jiaotong Univ. 50 (6), 1165–1169. Chen, D., 2005. Classification of traffic flow situation of urban freeways based on fuzzy clustering. J. Transp. Syst. Eng. Inform. Technol. 5 (1), 62–67. Celikoglu, H.B., 2013. An approach to dynamic classification of traffic flow patterns. Comput.-Aided Civ. Infrastruct. Eng. 28 (4), 273–288. https://doi.org/10.1111/j. 1467-8667.2012.00792.x. Celikoglu, H.B., 2014. Dynamic classification of traffic flow patterns simulated by a switching multimode discrete cell transmission model. IEEE Trans. Intell. Transp. Syst. 15 (6), 2539–2550. https://doi.org/10.1109/TITS.2014.2317850. Celikoglu, H.B., Silgu, M.A., 2016. Extension of traffic flow pattern dynamic classification by a macroscopic model using multivariate clustering. Transp. Sci. 50 (3), 966–981. https://doi.org/10.1287/trsc.2015.0653. Cao, J., Fang, Z.Y., Qu, G.N., Sun, H.Y., Zhang, D., 2017. An accurate traffic classification model based on support vector machines. Int. J. Network Manage. 27 (1), e1962. https://doi.org/10.1002/nem.1962. Dainotti, A., Pescape, A., Claffy, K., 2012. Issues and future directions in traffic classification. Network IEEE 26 (1), 35–40. https://doi.org/10.1109/MNET.2012. 6135854. García-Ródenas, R., Ĺopez-Garćıa, M.L., Śanchez-Rico, M.T., 2016. An approach to dynamical classification of daily traffic patterns. Comput.-Aided Civ. Infrastruct. Eng. 32 (2017), 191–212. https://doi.org/10.1111/mice.12226. Hao, N., Feng, Y., Zhang, K., Tian, G., Zhang, L., 2017. Evaluation of traffic congestion degree: an integrated approach. Int. J. Distrib. Sens. Netw. 13 (7). https://doi. org/10.1177/1550147717723163. 155014771772316. Jiang, G.Y., 2004. Technologies and Applications of the Identification of Road Traffic Conditions. China Communications Press, Beijing. Jiang, G.Y., Guo, H.F., Wu, C.T., 2008. Identification method of urban road traffic conditions based on inductive coil data. J. Jilin Univ. 38, 37–42. Jochen, E., Jo, D., 2011. Using principal curves to analyze traffic patterns on freeways. Transportmetric. 7 (3), 229–246. https://doi.org/10.1080/ 18128600903500110. Jr, B.G., Mitra, A.P., Mitra, S.A., Almal, A.A., Steven, K.E., Skinner, D.G., Fry, D.W., Lenehan, P.F., Worzel, W.P., Cote, R.J., 2016. Use of artificial intelligence and machine learning algorithms with gene expression profiling to predict recurrent nonmuscle invasive urothelial carcinoma of the bladder. J. Urol. 195 (2), 493–498. https://doi.org/10.1016/j.juro.2015.09.090. Kong, Q.J., Li, Z., Chen, Y., Liu, Y., 2009. An approach to urban traffic state estimation by fusing multisource information. IEEE Trans. Intell. Transp. Syst. 10 (3), 499–511. https://doi.org/10.1109/TITS.2009.2026308. Kadali, B.R., Vedagiri, P., 2015. Evaluation of pedestrian crosswalk level of service (LOS) in perspective of type of land-use. Transp. Res. Part A: Policy Pract. 73, 113–124. https://doi.org/10.1016/j.tra.2015.01.009. Levinson, H., Lomax, L., 1996. Developing a travel time congestion index. Transp. Res. Rec.. 1564, 1–10. https://doi.org/10.3141/1564-01. Li, Q.Q., Gao, D.Q., Yang, B.S., 2009. Urban road traffic status classification based on fuzzy support vector machine. J. Jilin Univ. 39 (2), 131–134. Lozano, A., Malay, G., Nieddu, L., 2009. An algorithm for the recognition of levels of congestion in road traffic problems. Math. Comput. Simul 79, 1926–1934. https:// doi.org/10.1016/j.matcom.2007.06.008. Lin, G., Xin, Y., Niu, X., Jiang, H., 2010. Network traffic classification based on semi-supervised clustering. J. China Univ. Posts Telecommun. 17, 84–88. https://doi. org/10.1016/S1005-8885(09)60577-X. Lu, B., Liu, P., Chan, C., Li, Z., 2017. Estimating level of service of mid-block bicycle lanes considering mixed traffic flow. Transp. Res. Part A: Policy Pract. 101, 203–217. https://doi.org/10.1016/j.tra.2017.04.031. Montazeri-Gh, M., Fotouhi, A., 2011. Traffic condition recognition using the k-means clustering method. Sci. Iran. 18, 930–937. https://doi.org/10.1016/j.scient. 2011.07.004. Mahesh, Pal, Maxwell, Aaron.E., Warner, Timothy.A., 2013. Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens. Lett. 4 (9), 853–862. https://doi.org/10.1080/2150704X.2013.805279. Mallah, R.A., Quintero, A., Farooq, B., 2017. Distributed classification of urban congestion using VANET. IEEE Trans. Intell. Transp. Syst. 3 (9), 2435–2442. https:// doi.org/10.1109/TITS.2016.2641903. Palmieri, F., Fiore, U., 2009. A Nonlinear, Recurrence-based Approach to Traffic Classification. Elsevier North-Holland, Inc. Perez-Montesinos, J., Dixon, M.P., Kyte, M., 2011. Detection of stop bar traffic flow state. Transp. Res. Rec.. 2259, 132–140. https://doi.org/10.3141/2259-12. Sun, D., Liu, X., Ni, A., Peng, C., 2014. Traffic congestion evaluation method for urban arterials: case study of Changzhou, China. Transp. Res. Rec. 2461, 9–15. https:// doi.org/10.3141/2461-02. Sun, D.J., Elefteriadou, L., 2014. A driver behavior-based lane-changing model for urban arterial streets. Transp. Sci. 48 (2), 184–205. https://doi.org/10.1287/trsc. 1120.0435. Silgu, M.A., Çelikoğlu, H.B., 2014. K-means clustering method to classify freeway traffic flow patterns. Pamukkale Univ. J. Eng. Sci. 20 (6), 232–239. Silgu, M.A., Celikoglu, H.B., 2015. Clustering traffic flow patterns by fuzzy c-means method: some preliminary findings. In: International Conference on Computer Aided Systems Theory. Springer International Publishing, pp. 756–764. Sun, D.J., Zhang, K., Shen, S., 2018. Analyzing spatiotemporal traffic line source emissions based on massive didi online car-hailing service data. Transp. Res. Part D: Transp. Environ. 62, 699–714. https://doi.org/10.1016/j.trd.2018.04.024. Tsubota, T., Bhaskar, A., Nantes, A., 2015. Comparative analysis of traffic state estimation: cumulative counts-based and trajectory-based methods. Transp. Res. Rec. 2491. https://doi.org/10.3141/2491-05. Wang, Y., Papageorgiou, M., 2005. Real-time freeway traffic state estimation based on extended Kalman filter: a general approach. Transp. Res. Part B: Methodol. 39 (2), 141–167. https://doi.org/10.1016/j.trb.2004.03.003. Wen, C.J., Zhan, Y.Z., Jia, K.E., 2012. General equalization fuzzy C-means clustering algorithm. Syst. Eng. Theory Pract. 32 (12), 2751–2755. Xu, C.C., Liu, P., Wang, W., Li, Z.B., 2012. Evaluation of the impacts of traffic states on crash risks on freeways. Accid. Anal. Prev. 47, 162–171. https://doi.org/10. 1016/j.aap.2012.01.020. Xu, F., He, Z., Sha, Z., Sun, W., Zhuang, L., 2013. Traffic state evaluation based on macroscopic fundamental diagram of urban road network. Procedia-Soc. Behav. Sci. 96, 480–489. https://doi.org/10.1016/j.sbspro.2013.08.056. Xu, D.W., Wang, Y.D., Jia, L.M., Li, H.J., Zhang, G.J., 2016. Real-time road traffic states measurement based on Kernel-KNN matching of regional traffic attractors. Measurement 94, 862–872. https://doi.org/10.1016/j.measurement.2016.08.038. Yu, R., Wang, X., Zheng, J., Wang, H., 2013. Urban road traffic condition pattern recognition based on support vector machine. J. Transp. Syst. Eng. Inform. Technol. 13, 130–136. https://doi.org/10.1016/S1570-6672(13)60097-5.

17

Transportation Research Part A xxx (xxxx) xxx–xxx

Z. Cheng et al.

Yang, S.Y., Wu, J.P., Qi, G.Q., Tian, K., 2017. Analysis of traffic state variation patterns for urban road network based on spectral clustering. Adv. Mech. Eng. 9 (9). https://doi.org/10.1177/1687814017723790. Zhang, J., Chen, C., Xiang, Y., Zhou, W., 2012. Internet traffic classification by aggregating correlated naive Bayes predictions. IEEE Trans. Inf. Forensics Secur. 8 (1), 5–15. https://doi.org/10.1109/TIFS.2012.2223675. Zhang, J., Wang, X., Ma, L., Tan, D., 2014. Research on traffic flow states identification method based on dynamic Bayesian networks. Trans. Beijing Inst. Technol. 34 (1), 45–49. Zhang, J., Chen, X., Xiang, Y., Zhou, Wan, Wu, J., 2015. Robust network traffic classification. IEEE/ACM Trans. Networking 23 (4), 1257–1270. Zhang, K., Sun, D.J., Shen, S., Zhu, Y., 2017. Analyzing spatiotemporal congestion pattern on urban roads based on taxi GPS data. J. Transp. Land Use 10 (1), 675–694. https://doi.org/10.5198/jtlu.2017.954.

18