Mining regular behaviors based on multidimensional trajectories

Mining regular behaviors based on multidimensional trajectories

Expert Systems With Applications 66 (2016) 106–113 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www...

1MB Sizes 2 Downloads 83 Views

Expert Systems With Applications 66 (2016) 106–113

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

Mining regular behaviors based on multidimensional trajectories Xinlong Pan∗, You He∗, Haipeng Wang∗, Wei Xiong, Xuan Peng Institute of Information Fusion, Naval Aeronautical and Astronautical University, Yantai 264001, China

a r t i c l e

i n f o

Article history: Received 7 May 2016 Revised 7 September 2016 Accepted 8 September 2016 Available online 9 September 2016 Keywords: Regular behavior Clustering Multidimensional trajectory Hausdorff distance

a b s t r a c t In the information fusion domain, mining regular behaviors is very important to task classification, anomaly behavior detection, situation assessment and threat estimation. Through clustering multidimensional trajectories which are accumulated in all kinds of electronic information systems, regular behaviors could be mined. Most of the trajectories clustering methods are clustering spatial position closed trajectories to a cluster. They cannot distinguish behaviors whose space position is similar but the moving speed and direction are quite different. Some sub-trajectory clustering methods which presented similarity measures considering segment direction, speed, and angle could solve this problem, but they are not suitable to some application scenarios which should clustering the whole trajectories. In this paper, we proposed a multidimensional trajectory clustering algorithm to mine regular behaviors by considering the attribute, type, position, velocity and course characteristics, and implement it on two experiments. This research is very helpful for mining all the regular behaviors in different application scenarios and would have a wide prospect in expert and intelligent systems. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction With the development of target detection methods and the improvement of multi sensor information fusion technology, a variety of targets have been detected, tracked, identified, and even formed continuous and stable trajectories. Massive trajectories are stored and accumulated in all kinds of information processing systems, early warning surveillance systems, air and maritime traffic control systems and video monitoring systems. A large amount of information and knowledge are contained in the trajectories. As an important step of knowledge discovery in database, data mining techniques relying on the characteristic of interdisciplinary has been widely used in various fields. Trajectory data mining technology is increasingly become a hot spot in the field of data mining. In the information fusion domain, regular behaviors could be mined based on multidimensional trajectories which are accumulated in the electronic information systems. Regular behavior means the behavior which target routinely behaves in some application scenario. Mining regular behaviors belongs to the high level information fusion and is very important to task classification, anomaly behavior detection, situation assessment and threat estimation. Target trajectory is usually composed of multidimensional data points. Through clustering the multidimensional trajectories,



Corresponding author. E-mail addresses: [email protected] (X. Pan), [email protected] (Y. He), [email protected] (H. Wang), [email protected] (W. Xiong), [email protected] (X. Peng). http://dx.doi.org/10.1016/j.eswa.2016.09.015 0957-4174/© 2016 Elsevier Ltd. All rights reserved.

regular behaviors could be mined. Many researchers have carried out a lot of work on time series clustering and clustering trajectories. The concept, distance measure, clustering algorithm, evaluation method and application field of time series clustering were summarized by Liao (2005). Berkhin (2010) summarized various clustering techniques in data mining. Jeung, Man, and Jensen (2007) used trajectories to analyze the behavior pattern of moving targets. Zheng (2015) made a systematic exposition of the related research and technology of trajectory data mining, analyzed the relation and difference between the existing related technologies, and provided some public datasets. A model based clustering method was proposed to cluster target trajectories (Gaffney & Smyth, 1999). Zhang, Huang, and Tan (2006) compared different similarity measures used for trajectory clustering in outdoor surveillance scenes. A spline-based trajectory clustering method was presented and used in a coastal surveillance scenario (Dahlbom & Niklasson, 2007). Lee, Han, and Whang (2007) presented a sub-trajectory clustering algorithm based on a partitionand-group framework and a geometric distance measure. Morris and Trivedi (2009) evaluated different similarity measures and clustering algorithms for trajectory clustering through an experiment. Atev, Miller, and Papanikolopoulos (2010) introduced a vehicle trajectory clustering algorithm that used a similarity measure based on the Hausdorff distance. Frank (2010) proposed a trajectory clustering method based on the point alignment, and regular flight routes could be mined through clustering flight trajectories. Gariel, Srivastava, and Feron (2011) proposed a trajectory clustering method based on the longest common subsequence

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113

similarity measure and a framework for monitoring the aircraft. A sub-trajectory clustering method was presented by using a similarity measure based on segment direction, speed, and angle (Yuan, Xia, Zhang, Zhou, & Ji, 2012). Debnath, Tripathi, and Elmasri (2013) introduced a sub-trajectory clustering algorithm based on a combination of spatial and non-spatial features. A length scale directive Hausdorff similarity measure was presented for trajectory clustering (Hao et al., 2013). Wu, Yeh, and Chen (2013) proposed a k-means trajectory clustering algorithm based on both spatial shift distance and temporal speed distance. Bermingham and Lee (2015) proposed a trajectory clustering methodology through computing the n-dimensional minimum bounding boxes. Most of the trajectories clustering methods are clustering spatial position closed trajectories to a cluster. They cannot distinguish behaviors whose space position is similar but the moving speed and direction are quite different. Some sub-trajectory clustering methods which presented similarity measures considering segment direction, speed, and angle could solve this problem, but they are not suitable to some application scenarios which should clustering the whole trajectories. In some application scenarios, such as air and maritime traffic control systems, target could have the similar position route but the moving speed and direction are quite different. Thus, the approach which could distinguish behaviors whose space position is similar but the moving speed and direction are quite different is very helpful for mining all the regular behaviors in an application scenario. In this paper, we proposed a multidimensional trajectory clustering algorithm to mine regular behaviors by considering the attribute, type, position, velocity and course characteristics, which could cluster the whole trajectories. First, considering the characteristics of position, velocity and course, the multi-factor Hausdorff distance was constructed as the similarity measure. Second, the multidimensional trajectory clustering algorithm was proposed based on the density clustering idea. Third, we introduced three evaluation indexes which could evaluate regular behavior mining result through evaluating the quality of trajectory clusters. Finally, two experiments are implemented to investigate the performance of the proposed algorithms.

2. Preliminaries 2.1. Multidimensional trajectory Trajectory is a sequence composed of target information data points. According to the application scenarios, trajectory could be divided into early warning surveillance trajectory, air and maritime traffic control trajectory and video monitor trajectory, etc. According to the type of target, trajectory could be divided into the aircraft trajectory, ship trajectory, vehicle trajectory, pedestrian trajectory, animal trajectory and tornado trajectory, etc. Trajectory is usually a multidimensional sequence composed of multidimensional data points. For example, in the automatic dependent surveillance broadcast (ADS-B) systems, each trajectory data point usually contains flight number, time, longitude, latitude, altitude, velocity, course and other characteristics. In the early warning surveillance information processing systems, each trajectory data point usually contains target number, attribute, category, quantity, type, aircraft/ship number, time, longitude, latitude, altitude, velocity, course and other characteristics. Trajectory could be represented as follows:

T D = {T R 1 , T R 2 , . . . , T R n }

(1)

TD represents a trajectory dataset, i ∈ [1, n] represents the trajectory number, n is the trajectory total number. TRi is a

107

multidimensional trajectory which is composed of multidimensional data points in time sequence.

T Ri = {Pi1 , Pi 2 , . . . , Pi m }

(2)

Pij represents the jth data point in TRi , j ∈ [1, m] represents the data point number, mis the data point total number. For different trajectories, m could be different. Trajectory data point Pij is a multidimensional characteristic vector.

Pi j = (number, at t ibut e, cat egory, . . . , time, longitude, lat it ude, alt it ude, velocit y, course )

(3)

For different trajectories, the multidimensional characteristics of Pij are not necessarily the same. 2.2. Clustering algorithms Clustering is an important part of data mining, which divides the samples into multiple classes or clusters. The samples in the same cluster have high similarity, but the samples in different clusters are different (Chen, Han, & Yu, 1997; Silva et al., 2014). By clustering, we can identify the dense and sparse regions, and find the interesting relationship between the global distribution patterns and the data attributes. In data mining, clustering can be used as an independent tool to get the data distribution. Observing the characteristics of each cluster, we could focus on the specific clusters to do further analysis. In addition, clustering can be used as a preprocessing step for other algorithms (such as characteristic analysis and classification), which can be processed on the generated clusters. Clustering algorithms can be roughly divided into 4 types. First, the clustering algorithms based on partitioning, such as k-means (Kanungo et al., 2002), k-medoids (Park & Jun, 2009). This algorithm needs to set the cluster number, then classify each simple into the nearest cluster according to the similarity measure. Second, the hierarchical clustering algorithms (Bouguettaya, Yu, Liu, Zhou, & Song, 2015; Li, Deng, Wang, Feng, & Fan, 2014; Peebles, 1999), which concludes agglomerative and divisive hierarchical clustering, depending on whether the hierarchical decomposition is formed in a bottom-up or top-down fashion. According to the different similarity measures, hierarchical clustering algorithm could be classified as single-link, complete-link, group average, Ward, BIRCH (Zhang, 1996), CURE (Guha, Rastogi, & Shim, 1998), and etc. Third, the clustering algorithms based on statistical model, such as expectation maximization (EM) algorithm (Ordonez & Omiecinski, 2005; Yang, Lai, & Lin, 2012). This algorithm is based on statistical theory, and the dataset is assumed to be generated by a statistical process. Fourth, the density based algorithms (Chen & Tu, 2007; Sander, Ester, Kriegel, & Xu, 1998), whose central idea is to find the high density regions to be separated by lower density regions, and each of the independent high density regions is classified as a cluster. According to the different definition of density, the typical algorithms contain DBSCAN (Ester, Kriegel, Sander, & Xu, 1996), OPTICS (Ankerst, Breunig, Kriegel, & Sander, 1999), DENCLULDE (Hinneburg & Keim, 1998), and etc. Density based clustering algorithm cluster samples based on the dense degree of spatial distribution without setting the number of clusters. Therefore, it is particularly suitable for clustering datasets with unknown contents. DBSCAN (Density-based Spatial Clustering of Application with Noise) was proposed by Ester et al. (1996) . This algorithm classifies high density regions into clusters and could find clusters of arbitrary shapes in a spatial database with noise. The central idea of the algorithm is that the neighborhood density of each sample in a cluster must exceed a certain threshold.

108

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113

factor Hausdorff distance (DMFHD) was constructed as follows:



− →

δM (T RA , T RB ) = max

Pa ∈T RA

3. Mining regular behaviors 3.1. Similarity measure Characteristic selection is to select some of the most effective characteristics from a set of characteristics to achieve the purpose of reducing the dimension of the characteristic space. Assuming that Y = {y1 , y2 , . . . , yD } is a characteristic space with D dimensional characteristic vectors and X = {x1 , x2 , . . . , xn } is the reduced characteristic space. Then, X is a subset of Y and there must be a characteristic vector yi corresponding to each xi . There are many characteristics in trajectories and it is necessary to selected them before clustering. Trajectory data point Pij is a vector with multidimensional characteristics. We select the attribute, category, position, velocity, and course characteristics to mine the regular behaviors. Attribute and category characteristics are used to construct the labels of regular behaviors, while position, velocity, and course characteristics are used to construct the similarity measure between two multidimensional trajectories. Hausdorff distance describes a measure of similarity degree between two sets, which is very famous in geometric calculation and image processing domain (Gao & Leung, 2002; Huttenlocher, Klanderman, & Rucklidge, 1993). It is widely used in shape matching and pattern recognition. Given two trajectories TRA ,TRB the Di− → rected Hausdorff Distance (DHD) from TRA to TRB is δH :



δH (T RA , T RB ) = max

Pa ∈T RA

 min {dist (Pa , Pb )}

(4)

Pb ∈T RB

dist(Pa , Pb ) is the Euclidean distance between point Pa in TRA and − → Pb in TRB . δH (T RA , T RB ) corresponds to the maximum distance from a point in TRA to the closest point in TRB . If TRA and TRB represent different shapes, DHD corresponds to the similarity degree of shape TRA and shape TRB . DHD does not require each part of TRB matches a part of TRA , it could measure the similarity degree well even though TRA and TRB are not completed. Fig. 1 shows the DHD between two polygonal curves. The Euclidean distance dist(Pa , Pb ) only takes the fixed position characteristic into account, without considering the dynamic characteristic such as the velocity and course. In this paper, we define the multi-factor distance mfdist(Pa , Pb ), which considers the position, velocity and course characteristics of point a and b.





m f dist (Pa , Pb ) = wd · dist (Pa , Pb ) + wv · vPa − vPb 





+ wθ · θPa − θPb 

Pb ∈T RB

(6)

DMFHD could describes a measure of similarity degree between high dimensional trajectory TRA and TRB , it does not require each part of TRB matches a part of TRA , it could measure the similarity degree well even though TRA and TRB are not completed. In this paper, we choose the multi-factor Hausdorff distance (MFHD) as the similarity measure between two multidimensional trajectories:

Fig. 1. Illustration of the DHD between two polygonal curves.

− →



min {m f dist (Pa , Pb )}

(5)

vPa and vPb are velocities of point Pa and Pb , θPa and θPb are courses of point Pa and Pb , |vPa − vPb | is the velocity distance of point Pa and Pb , |θPa − θPb | is the course distance of point Pa and Pb , wd is the weight factor of position characteristic, wv is the weight factor of velocity characteristic, wθ is the weight factor of course characteristic. The value of weight factor depends on the specific application scenarios. The weights wd ≥ 0, wv ≥ 0 and wθ ≥ 0 satisfy wd + wv + wθ = 1. Based on the multi-factor distance mfdist(Pa , Pb ), DHD could be expanded into a higher dimensional space. The directed multi-

−  → − → δM (T RA , T RB ) = max δM (T RA , T RB ), δM (T RB , T RA )

(7)

3.2. Multidimensional trajectory clustering algorithm Density based clustering algorithm not only can discover clusters of arbitrary shape, but also can adjust the density of relevant parameters to control the cluster coverage. Here, we propose a multidimensional trajectory clustering algorithm (MTCA) using the multi-factor Hausdorff distance (MFHD) as the similarity measure between two multidimensional trajectories based on the density clustering idea. We use MTCA to complete unsupervised clustering of multidimensional trajectories and then mine the regular behaviors of all kinds of moving targets. There are some definitions in MTCA: Definition 1. (ε -neighborhood of a trajectory, Nε (T Ri )) TRi is an arbitrary trajectory in the trajectory dataset TD. The ε -neighborhood of TRi is defined as:









Nε (T Ri ) = T R j ∈ T DδM T Ri , T R j ≤ ε



(8)

Definition 2. (Core trajectory) TRi is a core trajectory wrt. ε and MinTRs if the quantity of Nε (T Ri ) is not less than MinTRs: |Nε (TRi )| ≥ MinTRs. Definition 3. (Border trajectory) TRi is a border trajectory wrt. ε and MinTRs if it is not a core trajectory and it is a ε -neighborhood of a core trajectory. Definition 4. (Directly density-reachable) A trajectory TRj is directly density-reachable from trajectory TRi wrt. ε and MinTRs if TRj ∈ T R j ∈ Nε (T Ri ) and Nε (T Ri ) ≥ MinT Rs. For two core trajectories, directly density reachable is symmetric. Definition 5. (Density-reachable) A trajectory TRj is densityreachable from trajectory TRi wrt. ε and MinTRs if there is a bag of trajectories T R1 , T R2 , . . . , T Rn , T R1 = T Ri , T Rn = T R j , such that T Ri+1 is directly density-reachable from TRi Definition 6. (Density-connected) A trajectory TRj is densityconnected to trajectory TRi wrt. ε and MinTRs if there is a trajectory TRo such that both TRj and TRi are density-reachable from TRo . Definition 7. (Trajectory cluster) A trajectory cluster TC is a closure of TD satisfying the following conditions: (1) ∀TRi , TRj , if TRi ∈ TC and TRj is density-reachable from TRi , then TRj ∈ TC. (2) ∀TRi , TRj ∈ TC, TRj is density-connected to TRi . Note that a trajectory cluster contains at least MinTRs trajectories. Two lemmas are proposed based on the definitions. Lemma 1. Let TRi be a trajectory in TD and Nε (TRi ) ≥ MinTRs. Then the set O = {T RO |T RO ∈ T DandT RO isdensity−reachable f romT Ri } is a cluster.

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113

Lemma 2. Let Ck be a cluster and let TRi be any trajectory in Ck with Nε (TRi ) ≥ MinTRs. Then the set O = {T RO |T RO ∈ T DandT RO isdensity−reachable f romT Ri } is equal to cluster Ck . MTCA consists of two steps: Step 1: Calculate similarity measure between target trajectories based on MFHD proposed in Section 3.1 and determine the Nε (TRi ) of each TRi using the input parameter ε . Step 2: Clustering trajectories according to the density based idea. First, set clusterID = 1 and mark all the trajectories in TD as unclassified, then select a trajectory randomly. If this trajectory is a core trajectory, allocate it into the cluster CclusterID . If the ε -neighborhoods of this core trajectory does not belong to other clusters, allocate them into the cluster CclusterID . Then traverse all the ε -neighborhoods of this core trajectory, if a ε -neighborhood is a core trajectory, allocate its ε -neighborhoods that does not belong to other clusters into the cluster CclusterID . Repeat this process, until the cluster CclusterID cannot be extended. Increased clusterID by 1, and select another unclassified trajectory to perform the same process. Repeat this process, until the cluster dataset C cannot be extended. (See Algorithm 1.) The value of MinTRs could be assigned as 4, since the value of MinTRs is usually 4 in Density based clustering algorithms and the clustering result is relatively better. The value of ε depends on the application scenario. There are some efforts to determine the value of ε (Jahirabadkar & Kulkarni, 2014). Regular behaviors of moving targets could be reflected by the multidimensional trajectory clustering result. Each cluster after trajectory clustering is regarded as a regular behavior and the regular behavior label is constructed by arranging the attribute number, category number, and cluster ID. In early warning surveillance domain, target attribute contains friend and foe which could be represented by numbers 1 and 2, while

109

target category contains military aircraft, civilian aircraft, warship and civilian ship which could be represented by numbers 1, 2, 3 and 4. For instance, we obtain a clustering dataset C = {C1 , C2 , C3 } by clustering the multidimensional trajectories of enemy military aircraft. Then we could find three regular behaviors of the targets, and the three regular behaviors could be represented by the labels 211, 212 and 213. 3.3. Evaluation methods Regular behaviors were mined by clustering multidimensional trajectories. We could also evaluate the mining effect on regular behaviors by evaluating the trajectory clustering quality. The evaluation methods can be divided into two classes, one is the extrinsic method, and the other is the intrinsic method (Han & Kamber, 2001). When there is a benchmark, we use the extrinsic method to compare the clustering result and the benchmark. Without benchmark to use, we should use the intrinsic method to evaluate the compactness and separation of clusters. In this paper, we use precision and recall to evaluate the clustering quality when there is a benchmark. The precision of a trajectory indicates how many other trajectories in the same cluster have the same class as the trajectory. The recall of a trajectory reflects how many trajectories in the same class are assigned to the same cluster. C is a cluster of the multidimensional trajectory dataset T D = {T R1 , T R2 , . . . , T Rn }. L(TRi ) is the class of TRi , which is determined by the benchmark. C(TRi ) is the cluster ID of TRi in C. For two trajectories TRi and TRj (1 ≤ i, j ≤ n, i = j), correctness of the relation between TRi and TRj in clustering C is defined as:







Cor rectness T Ri , T R j =







1, L ( T Ri ) = L T R j ⇔ C ( T Ri ) = C T R j 0, other wise



(9) The precision is defined as:

Algorithm 1 MTCA. Input: 1. A trajectory dataset T D = {T R1 , T R2 , . . . , T Rn } 2. Two parameters ε and MinTRs

Output: A trajectory cluster dataset C = {C1 , C2 , . . . , Cm } /∗ Step 1 ∗/ 1: for each TRi , TRj ∈ TD ∧ i = j do



− →



3:

δM (T Ri , T R j ) = max min {m f dist (Pa , Pb )} Pa ∈T RA Pb ∈T RB −  → − → δM (T Ri , T R j ) = max δM (T Ri , T R j ), δM (T R j , T Ri )

4:

if δ M (TRi , TRj ) ≤ ε then

2:

5: 6: 7: 8:

Nε (TRi ) ← TRj end if end for /∗ Step 2 ∗/ Set clusterID = 1

9:

Mark all the trajectories in TD as unclassified

10: 11:

for each TRi ∈ TD and TRi is unclassified do if |Nε (TRi )| ≥ MinTRs then

12:

Allocate TRi to cluster CclusterID

13: 14:

for each TRj ∈ Nε (TRi ) do if TRj is unclassified then

15:

Allocate TRj to cluster CclusterID

16: 17:

if |Nε (TRj )| ≥ MinTRs then Insert Nε (TRj ) into Nε (TRi )

18: 19: 20:

end if end if end for

21:

end if

22:

Increase clusterID by 1

23: end for

n 1 P recision = n



 Correctness T Ri , T R j )     (10)  T R j i = j, C (T Ri ) = C T R j  (

T R j :i = j,C (T Ri )=C T R j

i=1

The recall is defined as: n 1 Recall = n



i=1

 Correctness T Ri , T R j )      T R j i = j, L(T Ri ) = L T R j  (

T R j :i = j,L(T Ri )=L T R j

(11)

When there is not a benchmark for the trajectory dataset, we must use the intrinsic method to evaluate the quality of clusters. Here, we use the silhouette coefficient to evaluate the compactness and separation of clusters. T D = {T R1 , T R2 , . . . , T Rn } is classified into a cluster dataset C = {C1 , C2 , . . . , Cm }. The average multi-factor Hausdorff distance from TRi to the other trajectories in the same cluster of TRi is defined as:



a ( T Ri ) =

T R j ∈C p ,T Ri =T R j

 δM T Ri , T R j

(12)

|Cp | − 1

TRi ∈ Cp , 1 ≤ p ≤ m. The minimum average multi-factor Hausdorff distance from TRi to all the clusters not containing TRi is defined as:

b( T Ri ) =

min

Cq :1≤q≤m,q = p

⎧  ⎫ ⎪ ⎨ T R ∈C δM T Ri , T R j ⎪ ⎬ j

q

⎪ ⎩

|Cq |

⎪ ⎭

(13)

The silhouette coefficient of TRi is defined as:

s ( T Ri ) =

b( T Ri ) − a ( T Ri ) max {a(T Ri ), b(T Ri )}

(14)

110

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113

2.5 2 1.5 1 0.5 Y

Fig. 2. Illustration for the velocity and course of a point in a trajectory.

The value of a(TRi ) reflects the compactness of the clusters containing TRi . The smaller the value of a(TRi ), the more compact the cluster, the more similar the behavior of the target in the cluster. The value of b(TRi ) reflects the separation from TRi to other clusters. The greater the value of b(TRi ), the more separated from TRi to the other clusters, the more different the behavior from the target to others in other clusters. The values of the silhouette coefficients are between −1 and 1. When the silhouette coefficient is close to 1, the cluster is compact and far away from the other clusters, which is a desirable situation. When the silhouette coefficient is negative (b(TRi ) < a(TRi )), the behavior of current target is more similar to the other clusters, which is really bad and should be avoided. In order to measure the quality of clustering, we can use the average value of the silhouette coefficients of all the trajectories in the dataset. 4. Experiments In this section, we first implement the multidimensional trajectory clustering algorithm (MTCA) and investigate its performance on a simulated trajectory dataset of labeled behaviors in a military scenario (Section 4.1 ). Next, we implement MTCA on an automatic dependent surveillance broadcast (ADS-B) trajectory dataset which was received in China (Section 4.2). Experiments are implemented in MATLAB software. 4.1. Experiment in military scenario In this experiment, we generate a trajectory dataset to simulate the flights of enemy military aircrafts in a certain airspace. Then, we implement MTCA on this public dataset to mine regular behaviors in a military scenario. And the performance of MTCA is evaluated in both extrinsic and intrinsic methods. 4.1.1. Dataset We create a new public dataset based on the trajectory generation program1 written by Piciarelli, Micheloni, and Foresti (2008). First, we generate a dataset containing trajectories with twodimensional position characteristic, and then the attribute, category, velocity, and course characteristics are added. Here, we simulate the flights of enemy military aircrafts in a certain airspace, so the attribute is foe and category is military aircraft. The motion direction of a target is supplemented from point (xi , yi ) to the next point (xi+1 , yi+1 ) and time intervals between the two adjacent points are set to be equal. As shown in Fig. 2, the velocity and course of point (xi , yi ) can be calculated by (15) and (16).



vi =

θi =

1

(yi+1 − yi )2 + (xi+1 − xi )2 t

⎧ (yi+1 − yi ) ⎪ ⎨arc tan (x − x ) , x > 0 i+1

i

⎪ ⎩arc tan (yi+1 − yi ) + π , x < 0 (xi+1 − xi )

http://avires.dimi.uniud.it/papers/trclust/create_ts2.m

(15)

(16)

0 -0.5 -1 -1.5 -2 -2.5 -2.5

-2

-1.5

-1

-0.5

0 X

0.5

1

1.5

2

2.5

Fig. 3. Plot of all the trajectories in the dataset.

The dataset2 contains 1200 multidimensional trajectories correspond to 8 regular behaviors and 300 multidimensional trajectories correspond to irregular behaviors. First, we generate 1200 trajectories from 4 different clusters and 300 trajectories from 300 different clusters with the randomness parameter set to 0.5. Set the attribute label to 2 and category label to1, according to the scenario. Velocity and course characteristics are calculated by the formulas (15) and (16). Through the above operation, we get 1500 multidimensional trajectories. Next, we use code 211 to represent the first regular behavior of the enemy military aircraft which corresponds to the first cluster containing 300 trajectories. We randomly select 150 trajectories from the 300 trajectories in the second cluster and modify their velocity two times than themselves. As a result, the second cluster was divided into two clusters corresponding to two different regular behaviors labeled 212 and 213. We randomly select 150 trajectories from the 300 trajectories in the third cluster and modify their course to the opposite direction to themselves. As a result, the third cluster was divided into two clusters corresponding to two different regular behaviors labeled 214 and 215. We randomly select 200 trajectories from the 300 trajectories in the fourth cluster, and then modify the velocity of 100 trajectories two times than themselves and the course of 100 trajectories to the opposite direction to themselves. As a result, the fourth cluster was divided into three clusters corresponding to three different regular behaviors labeled 216, 217, and 218. Finally, the dataset is created by randomly arranging the order of the remaining 1500 multidimensional trajectories. Fig. 3 shows the plot of all the trajectories in the dataset.

4.1.2. Results and discussion We implement MTCA proposed in Section 3.2 to the dataset and obtain a trajectory cluster datasetC = {C1 , C2 , . . . , C8 }. In order to express the regular behaviors of enemy military aircrafts, we select the trajectory that has the most number of ε -neighborhoods as the representative trajectory. We only need to plot the representative trajectory and mark out the moving direction, and the length of a line between the adjacent points indicates the speed of the target. Fig. 4 shows the plot of regular behaviors. It is very hard for us to discover the regular behaviors in Fig. 3. However, 8 regular behaviors labeled 211 to 218 are intuitive expressed in Fig. 4.

2

http://qiannao.com/file/airadar/95e2e371/

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113 1.5

213

111

1 212

0.9 215 1

0.8 0.7

211

Recall

0.5 Y

214

0

0.6 0.5

216

0.4 217

0.3

-0.5

0.2

218 -1 -1.5

-1

-0.5

0 X

0.5

1

0.1

1.5

0.9

0.9

0.8

0.8

0.7

0.7 Silhouette coefficient

1

Precision

0.6 0.5 0.4

1500

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 0 500

1000

Fig. 6. Plot of the average recall dependent on the size of trajectory dataset.

1

0

500 Size of dataset

Fig. 4. Plot of regular behaviors.

0

0

1000

1500

Size of dataset

0

500

1000

1500

Size of dataset

Fig. 5. Plot of the average precision dependent on the size of trajectory dataset.

Fig. 7. Plot of the average silhouette coefficient dependent on the size of trajectory dataset.

In order to evaluate the clustering results, we calculate the average precision, average recall and average silhouette coefficient of all the trajectories in accordance with the method mentioned in Section 3.3. The results are as follows: P recision = 95.71%, Recall = 95.94%, Silhouettecoe f f icient = 76.32%. The value of average precision reveals that there is a high probability for trajectories in the same cluster corresponding to the same regular behavior. The value of average recall reveals that there is a high probability for trajectories corresponding to the same regular behavior assigned to the same cluster. The value of average silhouette coefficient reveals that the clusters are compact, that is to say, trajectories in the same cluster correspond to very similar behavior and trajectories in different cluster correspond to very different behavior. For this dataset, MTCA has a good performance to mine the regular behaviors. We implement MTCA on the increasing dataset and evaluate the clustering results on different size of the dataset. Fig. 5 shows the average precision dependent on the size of trajectory dataset. Fig. 6 shows the average recall dependent on the size of trajectory dataset. Fig. 7 shows the average silhouette coefficient dependent on the size of trajectory dataset. In Figs. 5 and 6, we can see that when the dataset is too small to distinguish 8 regular behaviors, the average precision and

recall are both gradually increasing. When the dataset is big enough, the average precision and recall are always maintained at a high level. That is to say, MTCA can provide a high probability that trajectories in the same cluster corresponding to the same regular behavior and trajectories corresponding to the same regular behavior assigned to the same cluster, as long as the dataset is big enough to distinguish different regular behaviors. In Fig. 7, we can see that the average silhouette coefficient is also always maintained at a high level when the dataset is big enough. That is to say, MTCA makes trajectories in the same cluster corresponding to very similar behavior and trajectories in different clusters corresponding to very different behavior, as long as the dataset is big enough to distinguish different regular behaviors. The experiment results showed that MTCA has a good performance to mine regular behaviors in this military scenario and would have insightful implications in these military aircraft surveillance expert and intelligent systems. 4.2. Experiment in civilian scenario In this experiment, we process a trajectory dataset from an automatic dependent surveillance broadcast (ADS-B) system in China. Then, we implement MTCA on this multidimensional trajectory

112

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113 4

4

x 10

x 10 1

1

0.5

0.5

122 126 124

0 121

Z

Z

0

125

123

-0.5

-0.5

-1 -1

-1.5 -2 -2

-1.5 -2

0 0

0 0

-2 2

5

5

4

x 10

6

5

x 10

4

-6

X

-6

Y

X

Y

x 10

-4

2

5

x 10

-4

-2

Fig. 10. Plot of regular behaviors in three-dimensional space.

Fig. 8. Plot of trajectories in three-dimensional space.

5

5

0

0

x 10

x 10

-0.5 122

-1

-1

125

123 -1.5 -2

Y

Y

-3

124

126

-2 -2.5

121 -3

-4

-3.5 -4

-5

-4.5 -6 -2

-1

0

1

2

3

X

4

5

-5 -2

-1

5

x 10

0

1 X

2

3

4 5

x 10

Fig. 9. Plot of trajectories in two-dimensional plane.

Fig. 11. Plot of regular behaviors in two-dimensional plane.

dataset to mine regular behaviors in a civilian scenario. And the clustering results of MTCA are compared to the realistic flight routes in the airspace.

the representative trajectory. We plot the representative trajectories and mark out the moving direction of the civilian aircrafts. The labels of regular behaviors are constructed by the attribute number 1, category number 2, and cluster numbers 1–6. Fig. 10 shows the plot of regular behaviors in three-dimensional space. Fig. 11 shows the plot of regular behaviors in two-dimensional plane. In order to evaluate the performance of MTCA in this civilian scenario, we compare the mined regular behaviors labeled 121 to 126 to the realistic flight routes in this airspace. Regular behavior labeled 121, 122, 123, 124, 125, and 126 are just corresponding to the flight routes from Beijing city to Jinan city, from Beijing city to Yantai city, from Zhengzhou city to Beijing city, from Yantai city to Beijing city, from Beijing city to Yantai city, and from Jinan city to Beijing city. These corresponding relationships show that MTCA has a good performance to mine the regular behaviors in this civilian scenario. The experiment results showed that MTCA would have insightful implications in these civilian aviation surveillance systems to promote the intelligent level.

4.2.1. Dataset We select 237 trajectories accepted in May 2015 from an ADS-B system in China to construct a multidimensional trajectory dataset. Each trajectory in the data includes a number of multidimensional data points. We can directly extract the longitude, latitude, altitude, velocity and course characteristics of each data point. The attribute is friend and the category is civilian aircraft. In order to calculate the similarity measure between trajectories, we transform the longitude, latitude and altitude in geographic coordinate system into a local rectangular coordinate system. We select the geographic coordinates of Beijing Capital International Airport as the origin of the local rectangular coordinate system, and transform each trajectory data point into the local rectangular coordinate. Fig. 8 shows the plot of trajectories in three-dimensional space. Fig. 9 shows the plot of trajectories in two-dimensional plane.

5. Conclusion 4.2.2. Results and discussion We implement MTCA proposed in Section 3.2 to the dataset and obtain a trajectory cluster dataset C = {C1 , C2 , . . . , C6 }. In order to express the regular behaviors of civilian aircrafts, we also select the trajectory that has the most number of ε -neighborhoods as

In this paper, we proposed a method to mine regular behaviors based on clustering multidimensional trajectories. Considering the characteristics of position, velocity and course, multi-factor distance (mfdist(a, b)) for two data points the multi-factor Hausdorff

X. Pan et al. / Expert Systems With Applications 66 (2016) 106–113

distance (MFHD) for two trajectories was constructed. Based on the density clustering idea, we proposed the multidimensional trajectory clustering algorithm (MTCA) by using MFHD as the similarity measure, and introduced three evaluation indexes which could evaluate the results of regular behaviors mining. We implemented an experiment in military and civilian scenarios respectively. The experiment results showed that MTCA has a good performance to mine regular behaviors and would have a wide prospect in the information fusion domain. Experiments on large scale multidimensional trajectory datasets of AIS, ADS-B and radar systems will be implemented and investigated in the future work. Based on this research, there are several future research directions: distributed multidimensional trajectory clustering algorithms, anomaly behaviors mining algorithms, online detection of anomaly behaviors, and online classification of regular behaviors. Acknowledgments This work has been supported by the Key Laboratory of Information Perception and Fusion Technology, the National Natural Science Foundation (nos. 61531020, 91538201 and 61471383), and the Key Project of Science and Technology in Shandong Province (no. 2015ZDZX01001) in China. The authors would like to acknowledge the researchers who carried out previous work on clustering trajectories. References Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 49–60). ACM. Atev, S., Miller, G., & Papanikolopoulos, N. P. (2010). Clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 11(3), 647–657. Bermingham, L., & Lee, I. (2015). A general methodology for n-dimensional trajectory clustering. Expert Systems with Applications, 42(21), 7573–7581. Berkhin, P. (2010). A survey of clustering data mining techniques. Grouping Multidimensional Data, 43(1), 25–71. Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785–2797. Chen, M. S., Han, J., & Yu, P. S. (1997). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883. Chen, Y., & Tu, L. (2007). Density-based clustering for real-time stream data. In Proceedings of the thirteenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). ACM. Dahlbom, A., & Niklasson, L. (2007). Trajectory clustering for coastal surveillance. In Proceeding of the tenth IEEE international conference on information fusion (pp. 1–8). IEEE. Debnath, M., Tripathi, P. K., & Elmasri, R. (2013). A novel approach to trajectory analysis using string matching and clustering. In Proceedings of the thirteenth IEEE international conference on data mining workshops (pp. 986–993). IEEE Computer Society. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining (pp. 226–231). ACM. Frank, R. (2010). Clustering of flight tracks. In Proceedings of the 2010 conference of American institute of aeronautics and astronautics (pp. 1–9). AIAA. Gaffney, S., & Smyth, P. (1999). Trajectory clustering with mixtures of regression models. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 63–72). ACM.

113

Gariel, M., Srivastava, A. N., & Feron, E. (2011). Trajectory clustering and an application to airspace monitoring. IEEE Transactions on Intelligent Transportation Systems, 12(4), 1511–1524. Gao, Y., & Leung, M. K. H. (2002). Line segment Hausdorff distance on face matching. Pattern Recognition, 35(2), 361–371. Guha, S., Rastogi, R., & Shim, K. (1998). Cure: An efficient clustering algorithm for large databases. Information Systems, 26(1), 35–58. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco, CA: Morgan Kaufmann Publishers. Hao, J., Gao, L., Zhao, X., Li, P., Xing, P. J., & Zhang, X. Y. (2013). Trajectory clustering based on length scale directive Hausdorff. In Proceedings of the sixteenth IEEE international conference on intelligent transportation systems (pp. 480–484). IEEE. Hinneburg, A., & Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the fourth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 58–65). ACM. Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. A. (1993). Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863. Jahirabadkar, S., & Kulkarni, P. (2014). Algorithm to determine ε -distance parameter in density based clustering. Expert Systems with Applications, 41(6), 2939–2946. Jeung, H., Man, L. Y., & Jensen, C. S. (2007). Trajectory pattern mining. In Proceedings of the thirteenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 330–339). ACM. Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892. Lee, J. G., Han, J., & Whang, K. Y. (2007). Trajectory clustering: A partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD international conference on management of data (pp. 593–604). ACM. Liao, T. W. (2005). Clustering of time series data – A survey. Pattern Recognition, 38(11), 1857–1874. Li, M., Deng, S., Wang, L., Feng, S., & Fan, J. (2014). Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowledge-Based Systems, 65(4), 60–71. Morris, B., & Trivedi, M. (2009). Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In Proceedings of the 2009 conference on computer vision and pattern recognition (pp. 312–319). IEEE. Ordonez, C., & Omiecinski, E. (2005). Accelerating EM clustering to find high-quality solutions. Knowledge and Information Systems, 7(2), 135–157. Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications, 36(2), 3336–3341. Peebles, P. J. E. (1999). Hierarchical clustering. Astrophysical Journal, 111(1), 405– 414. Piciarelli, C., Micheloni, C., & Foresti, G. L. (2008). Trajectory-based anomalous event detection. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1544–1554. Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (1998). Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 2(2), 169–194. Silva, J. A., Faria, E. R., Barros, R. C., Hruschka, E. R., De, C. P. L. F., & Gama, J. (2014). Data stream clustering: A survey. ACM Computing Surveys, 46(1), 125–134. Wu, H. R., Yeh, M. Y., & Chen, M. S. (2013). Profiling moving objects by dividing and clustering trajectories spatiotemporally. IEEE Transactions on Knowledge and Data Engineering, 25(11), 2615–2628. Yang, M. S., Lai, C. Y., & Lin, C. Y. (2012). A robust EM clustering algorithm for Gaussian mixture models. Pattern Recognition, 45(11), 3950–3961. Yuan, G., Xia, S., Zhang, L., Zhou, Y., & Ji, C. (2012). An efficient trajectory-clustering algorithm based on an index tree. Transactions of the Institute of Measurement and Control, 34(7), 850–861. Zhang, T. (1996). Birch: An efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2), 103–114. Zhang, Z., Huang, K., & Tan, T. (2006). Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In Proceeding of the eighteenth international conference on pattern recognition (pp. 1135–1138). IEEE. Zheng, Y. (2015). Trajectory data mining: An overview. ACM Transactions on Intelligent Systems and Technology, 6(3), 1–41.