A* algorithm with dynamic weights for multiple object tracking in video sequence

A* algorithm with dynamic weights for multiple object tracking in video sequence

Accepted Manuscript Title: Multiple Object Tracking Using A∗ Association Algorithm Author: Zhenghao Xi Heping Liu Shengchun Tang Yang Zheng PII: DOI: ...

1MB Sizes 0 Downloads 64 Views

Accepted Manuscript Title: Multiple Object Tracking Using A∗ Association Algorithm Author: Zhenghao Xi Heping Liu Shengchun Tang Yang Zheng PII: DOI: Reference:

S0030-4026(15)00492-1 http://dx.doi.org/doi:10.1016/j.ijleo.2015.06.020 IJLEO 55655

To appear in: Received date: Accepted date:

28-4-2014 5-6-2015

Please cite this article as: Z. Xi, H. Liu, S. Tang, Y. Zheng, Multiple Object Tracking Using A∗ Association Algorithm, Optik - International Journal for Light and Electron Optics (2015), http://dx.doi.org/10.1016/j.ijleo.2015.06.020 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Multiple Object Tracking Using A* Association Algorithm Zhenghao Xi1, Heping Liu1, Shengchun Tang2,1, Yang Zheng1 1

School of Automation and Electrical Engineering, University of Science and Technology Beijing,

ip t

Beijing 100083, China 2 Electronic Information and Control National Experimental Teaching Center, Beijing Information

cr

Science and Technology University, Beijing 100101, China

us

E-mail: [email protected]

Abstract Multiple object tracking persistently in the illumination and clutter environments is very

an

challenging. In this paper, an A* based tracking association algorithm was developed to solve this problem. The multiple object tracking is formulated as an integer programming problem of flow

M

network. Under this framework, the integer assumption was relaxed to a standard linear programming problem and therefore the global optimal solution can be quickly obtained by the A* algorithm with

te

d

dynamic weights. The proposed method can avoid the difficulties of integer programming and more importantly, compared with other recent similar methods, has lower worst-case complexity but better

Ac ce p

the tracking accuracy and robustness in complex environments. The simulation results demonstrate that this method has less time costs and can satisfy the real-time. Key words Multi-object tracking, A* algorithm, flow network model, vary illumination

Multiple object tracking is a hot issue in the field of computer vision, robust tracking of objects is important for many computer vision applications, such as human-computer interaction, video surveillance, intelligent navigation and other aspects [1-2]. Apart from the detection algorithm of high performance as an auxiliary, multi-object tracking of high quality should also track the algorithm for support, which can address certain types of complex cases, e.g., illumination, occlusion, clutter, and so on [3]. The data association (DA) method is a favorite method of multi-object tracking. The often Page 1 of 24

utilized techniques include the nearest neighbor method [4], joint probability data association (JPDA) [5] and the methods based on neural networks [6] etc.

ip t

The effect of the above DA methods is closely related to the detection accuracy of the detector in the adjacent frames. These typical approaches are resilient to false positives and false negatives: if an

cr

object is not detected in a frame but is detected in previous and following frames, it is a false negative.

us

A false positive is mistaking the tracking object ‘A’ as object ‘B’. Although this problem can be solved using targeted design a statistical trajectory model with filtering [7-8], the estimating method

an

exhibiting maximum posterior probability is NP-Complete.

Many recent papers proposed some approaches for this problem: Giebel etc. [9] used sampling

M

and particle filtering to remove clutter from the same object and reduced the probability of NP-Complete. This method can obtain relatively accurate tracking trajectory but requires a sufficient

te

d

sampling point. Perera etc. [10] divided a long sequence into several short ones, yielding lots of short tracking tracks, and linked them using Kalman filtering. This can avoid the NP-Complete. The

Ac ce p

accuracy of this method is inversely proportional to the length of the short one, the short track and the better tracking, but the excessive division will increase the computation time and cannot track objects for a long time. Fleuret etc. [11] processed trajectories individually over long sequences using a reasonable greedy dynamic programming (DP) to choose the order. These approaches, while effective, cannot achieve the global optimum.

Zhang’s approach [12] relies on a min-cost network flow framework based optimization method to find the global optimum for multiple object tracking, but the proposed two algorithms have many defects in practice and the complexity of the algorithms is polynomial. Under this framework, Berclaz etc. [13] formulated multi-object tracking as an Integer Programming (IP) problem and reduced it to linear programming (LP). By relying on the k-shortest paths (KSP) algorithm for the optimization of the LP problem, their approach reduced the complexity to perform robust multi-object tracking in time. Page 2 of 24

However, because of KSP’s lack of a motion model over DP, DP’s tendency to ignore fragmentary trajectories makes it more robust. Pirsiavash [14] continues the work of Zhang, his method was used to obtain the global optimal solution with the greedy algorithm for K  1 in O  N  but only

ip t

obtained the approximate solutions for K  1 in O  KN  , where K is the unknown optimal

cr

number of unique tracks.

By contrast, we effectively combine the model of Zhang and Berclaz, a more efficient A*

us

association algorithm with dynamic weights (A*AADW) was developed to solve the multi-object

an

tracking problem on this basis. The A*AADW algorithm can directly obtain the global solution without greedy optimization, it is far better with respect to both the worst case complexity and solving

M

time than the above state-of-the-art algorithm. The main contributions are listed as follows: 1) A general mathematical integer programming formulation of a min-cost network flow framework

d

is introduced for multi-object tracking, which more conveniently and naturally filters out false

te

positives and false negatives using A*AADW.

Ac ce p

2) To solve the integer programming formulation of the proposed framework and to obtain the global optimum, we propose a novel more rapid and more efficient A*AADW algorithm, which is very robust as well.

3) Extensive experimental validations. The rest of this paper is organized as follows. In Section 1, we formulate an IP problem using the min-cost network flow framework and relax it to a continuous LP. Section 2 presents the proposed A * association algorithm with dynamic weights for the relaxation of the original integer assumption. Section 3 is about the approaches of object localization and long sequence segmentation processing. Section 4 shows the experimental results and a complete evaluation metrics are also given in this section. Finally, conclusions are drawn in Section 5. Page 3 of 24

1. Network flow framework The target motion of multi-objet tracking can be described better by the relationship of the

ip t

neighborhood location between adjacent frames, which uses the DP method in a min-cost network flow framework. We define an objective function for multi-object tracking equivalent to that of [13].

cr

The objective presence of likelihood will be estimated by the marginal posterior probability in every

us

frame, thereby obtaining the potential object moving trajectory. 1.1. Min-cost flow model

an

We formulate the multi-object tracking as a whole process, in which the objective location of each time continuously and discretely changes with time. A directed 3D spatiotemporal group with random

M

variable k is used to describe the video sequence. k   x, y , t  , x  V

(1)

d

Where k denotes any location of an object in this spatiotemporal group, V is the set of all

te

space-time locations in a sequence, x and y are the pixel positions of the target in the transverse

Ac ce p

and longitudinal axes respectively, and t is every instant of time. For any location k at time t , the neighborhood N  k   1,2, ,K  denotes the locations an object can reach at time t  1 . A single track, as an ordered set of state vectors T   k1 , , k N  and

X  T1 , , TL  is a set of the collection of tracks. We assume that the tracking tracks independently of each other, and describe the network flow framework of multi-object tracking using the dynamic model as follows:

P  X    P T 

(2)

T X

where

 N 1  P(T )  Psource  k1    P  kn 1 kn   Psink  k N   n 1 

(3)

Psource  k1  is the probability of a tracking track starting at location k1 and Psink  k N  is the

Page 4 of 24

probability of a tracking track ending at location k N . In the spatial coordinate set V , a binary indicator variable i ,k for the directed flow from location i to location k , which stands for the number of objects moving from i to k . i ,k is 1

ip t

when the space-time location i and k are included in some track, if location i at time t and k

k ,



i , k N  i 

i , k   k 



i , k ,

i ,k

k , j

1

(4)

(5)

an

k N  i 



jN  k 

us

t  1 . Some constraint conditions are executed for the variable i ,k .

cr

at time t  1 , which means that an object remains at the same spatial location between times t and

Let a random variable M k stands for the true presence of an object at location k in space-time.

M

For every time t , the detector is used to check every location of the tracking zone. The marginal posterior probability of an existing object is estimated as follows

d

 k  Pˆ  M k  1 I t 

(6)

te

Where It is the single image at frame t . We write m  mk  for a feasible set of the existing

Ac ce p

likelihood probability distributions of objects in V by the method of §3.1, and M is the spatial set of M k . The existence likelihood probability of an object in the given set of tracks X is

P  M  m X    P  M k  mk X 

(7)

k X

M k is conditional independence in the given X , we can infer the maximum a posteriori estimate of tracks by the existing likelihood probability distributions of objects.

X *  arg max P  X  P  M  m X 

(8)

 arg max  P T  P  M k  mk X 

(9)

X

X

T X

k X

 arg max  log P T    log P  M k  mk X  (10) X

T X

k X

 arg max  log P T    1  mk  log P  M k  0 X   mk log P  M k  1 X   (11) X

T X

k

Page 5 of 24

 arg max  log P T    mk log T X

X

k

PMk  1 X 

PMk  0 X 

    arg max  log P T    mk log  k  X T X k  1  k 

(12)

(13)

ip t

where Eq.11 is true because mk is 0 or 1 according to Eq.5, for Eq.10, and we obtain Eq.12 by

cr

ignoring a term that does not need mk . The cost value of a directed flow between the neighborhood

   c  ek ,n    log  k   1  k 

(14)

 c e 

1.2. Integer programming formulation

k ,n

(15)

M

ek ,n ei , j nN  k 

an

and the total cost value between any two locations in V is C  ei , j  

us

locations of any adjacent frames is defined as

te

d

In our framework, because the objects can enter and leave the tracking area, we introduce the additional nodes of the source and sink that have been defined in the model of Berclaz [13]. The

Ac ce p

formulation Eq.8-13 can be translated naturally into an integer program: Minimize C    C  ei , j 

Subject to k ,



i , k N  i 

i, k ,



jN  i 

i , k   k 



k N  i 

i ,k

i, j

 C  esource,i   source,i  C  ei ,sink   i ,sink i



jN  k 

(16)

i

k , j

1

where the constraint conditions are the same as Eq.4 and Eq.5, and  *  arg min C   is the optimal solution of the IP. C  esource,i  is the total cost from the source node to the locations of the tracking track, and C  ei ,sink  is the total cost from the locations of the track to the sink node. Figure 1 shows a simple flow network constructed from multi-object tracking, where the costs are ci , j for and ci ,sink for

, csource,i

.

Page 6 of 24

source

frame 2 frame1

frame 3 sink

c  esource,i    log Psource  ki  (17) (18)

us

c  ei ,sink    log Psink  ki 

cr

The costs in it are defined as follow:

ip t

Fig 1. The simple flow network model.

The relaxation of the integer program using standard methods is NP complete. In general, the

an

variants of the simple algorithm [15-16] or the interior point based methods [17-18] can be used to

M

solve this problem. However, these algorithms have very high worst case time complexities. In [13] and [14], whereas the methods of KSP and SSP can relax the integer assumption successfully to a

d

continuous linear program, both of them have their own deficiency. We used the proposed A*AADW

te

algorithm to compensate the deficiencies of these methods. 2. A* association algorithm with dynamic weights

Ac ce p

In this paper, we propose the A* algorithm with dynamic weights to relax the integer program by the network flow framework; the worst-case complexity of this algorithm is O  KN  . The global optimal solution of the proposed algorithm improves the reliability and efficiency of the tracking. Two particular properties are required by the network flow model to realize our algorithm: 1) All edges and nodes are independent of each other, and all edges are unit capacity. 2)

The network is a directed acyclic graph (DAG).

2.1. A* association algorithm Let C be the total cost value of any location in space V , and let E be the set of the edges between adjacent frames of any neighborhood location. We achieve the state transition among any nodes of the dynamic model by E , and the DAG G V , E , C  can completely describe the flow Page 7 of 24

activity of an object of the min-cost flow model. Define the residual graph Gr   to be the same as the original G V , E , C  , except that the flow is from the current location to the ending location. We

ip t

used the A* algorithm to perform a heuristic search in the directed acyclic residual graph Gr   , which can find the path of the min-total cost between two nodes.

cr

Because the tracking objects may appear within and leave the tracking area, we introduce two

us

additional virtual nodes, source and sink, that denote the potential position where an object appeared and disappeared, respectively. Then, we formed a new DAG with the virtual nodes link to all the

an

locations that can be reached and used the neighbors of birth and end to replace the original position,

The algorithm steps are as follows.

M

as shown in Figure 2.

d

We create the Open list and the Closed list. The Open list records all the nodes that are considered to find the shortest path, and the Closed list records the nodes that are no longer considered.

te

1) We put the initial position lbirth into the Open list, calculating the total cost of any path fi ,birth

Ac ce p

from lbirth to current position li :

g  li  =cost  fbirth,i  

 c e 

ek ,n  fbirth ,i

k ,n

(19)

The cost from li to the potential position l j , l j  N  li  of the next frame is calculated as follows:

g '  l j   cost  f i , j   c  ei , j 

(20)

The estimate of total cost of any path f j ,end from l j to the potentially terminal position is calculated by Eq. 21, lbirth is put into the Closed list. h  lend   cost  f j ,end  



ek ',n '  f j ,end

c  ek ',n ' 

(21)

2) Putting the neighborhood of li , l j  N  li  into the Open list, we obtain the position l *j , which can Page 8 of 24

be satisfied by Eq. 22, putting l *j into the Closed list. arg min F  l j   g  li   g '  l j   h  lend 

(22)

ip t

3) We empty the Open list and update the current location to l *j , l *j  N  li  . 4) Steps 1-3 are iterated until lend is added into the Open list (because lbirth has been add into the

cr

Closed list, the Open list no longer adds it). We output all the searched locations in the Closed list according to first in first out (FIFO), which is the shortest path.

source

source

frame3

us

* * * f birth ,end   lbirth , li , l j ,  , lend  , j  N  i 

(23)

frame3

frame4

an

frame4 birth

M

birth

end

end frame1

frame1

frame2

frame2 sink

(a) source

d

sink

te

frame3

Ac ce p

birth

frame1

(b) source frame3

frame4 birth

end

frame2

frame4

end frame1

frame2

sink

(c)

sink

(d)

*

Fig. 2. Illustration of the A association algorithm. (a) Computing the total cost of any path between the birth and the current position. (b) In the residual graph, searching for the next potential location in the neighborhood of the current position and computing its cost. (c) Estimating the total cost of any path between the next potential location and the end. (d) Computing the next potential location and then updating the current position to that location and updating the residual graph. Legend:

All edges among the positions which can be reached,

virtual positions and the potential locations that can be reached, and

all edges between the

the optimal path which obtained by

proposed algorithm.

Figure 2 shows the simple processing steps using the A* association algorithm, where the birth point is the node where an object was first discovered and the end point is the node where an object Page 9 of 24

was last discovered. Each relaxation operation using the A* association algorithm is a heuristic search process for the objective presence of likelihood in the next frame. The nth relaxation operation ensures that the path is shortest in all the depths of n. Because the length of the edge for the shortest path in

ip t

the residual graph does not exceed V  1 , the path that we obtain using the A* algorithm is the

cr

shortest one. The optimization process of the A * association algorithm is comparable with [14], which

us

used the SSP algorithm with the additional greedy method, meaning that A* can find the global minimum. The convergence of A* association algorithm will be proved below.

an

2.2. The proof of the optimal solution

Lemma 1[19] (Theorem Hoffman-Kruskal): Let A be an m  n matrix. Then, for each integer vector

 x : Ax  b, x  0

are integral if and only if A is a totally

M

b  R m , the vertices of the polyhedron unimodular matrix.

d

Lemma 2[20]: For the arbitrary node n in the A* algorithm, h  n  is an estimated distance from n

te

to the terminal node. If the estimated distance is no more than the actual distance between the two

Ac ce p

points, the globally optimal solution can be found. The integer programming problem U1 and the corresponding relaxed liner programming problem U 2 are considered as follows:

U1 : min cx; s.t. Ax  b, x  0 ,and as an integer vector. U 2 : min cx; s.t. Ax  b, x  0 ,

where A , c and b are the known appropriate dimension vectors and constraint matrix, respectively. Theorem: In the DAG, if A is a totally unimodular matrix, then the relaxed liner programming problem U 2 can be solved by the A* algorithm and the globally optimal solution of the integer programming problem U1 can also be obtained.

Page 10 of 24

Proof: The set

 x : Ax  b, x  0

denotes a bounded polyhedron of the feasible solutions, in which

there is only one vertex representing the optimal solution. Lemma 1 shows that the vertices must be nonnegative integers. Considering the specific element of A and that the vertices must be between 0

ip t

and 1, we can conclude that the vertex coordinates of a polyhedron should be either 0 or 1. In fact, the A* algorithm accelerates the iterative process of obtaining the optimal solution by solving the linear

cr

programming step by step. Since, in the DAG, an estimated distance h  n  must no more than the

us

distance of n to the terminal node, we can obtain the globally optimal solution using the A* algorithm from Lemma 2. The total unimodularity of A ensures that each basic feasible solution is

an

an integer, which means the relaxed linear program can always converge to the optimal solution of the original integer programming.

M

The total unimodularity of A has been proved in [13]. To improve the screening speed of the A*

d

algorithm for the large number of nodes in the initial stage and then ignore some objective movement

te

in the later stage, we use the dynamic weight on the A* algorithm.

Ac ce p

2.3. Dynamic weights

To help find the optimal solution quickly and accurately, we can prioritize speed in the initial stage of the search and increase the precision priority in the later stage, which can be achieved by adding a dynamic weight  in Eq. 22.

F  lt  n 1,u   g  lt  n , j   g '  lt  n 1,u    * h  lt  n 1,u  . (24)

where,

  1

x   n  1 . x

(25)

x is the number of nodes in one search, and n is the number of frames between the tracking initial

node and the destination node. In the initial stage of the search, the current position of the object is far from the objective position and  is relatively large. Thus, the A* association algorithm is performed quickly at first, and the linear programming quickly converges to near the objective position. In the later stage, the objective position is closer and  closer to 1. Thus, the A* association algorithm prioritizes Page 11 of 24

precision to reduce the searching blindness. 2.4. The worst case complexity Pirsiavash et al. [14] and Berclaz et al. [13] described the algorithms of KSP and SSP. The

ip t

worst-case complexity of these two algorithms is O  KN log N  , where K is the unknown optimal number of unique tracks and N is the frame numbers of the video sequence. Because of the

cr

different values of K , Pirsiavash used different methods to obtain the solution, and the specific

us

complexity of this algorithm is related to the value of K .

The Dijkstra algorithm is recognized as an effective method to compute the shortest path in

an

O  N log N  . O  NE  of the Bellman-ford algorithm is simpler. Unfortunately, in our proposed min-cost flow network, there are negative edge costs and the directed weighted graph G does not

M

exist as a negative weight cycle. Fortunately, we can use the simpler A* association algorithm in this

d

network. For the DAG G V , E , C  , the worst case will appear when the number of extended

te

sub-nodes from the current node is up to 3 sub-nodes that are not in the Open list. While the Open list will be empty in each iteration, the number of nodes of the Open list will not increase. Therefore,

Ac ce p

for the N frame video sequence, the worst-case complexity of the proposed algorithm is O  KN  , where K is the number of optimal paths using the A* association algorithm in the DAG. Generally,

K  3 . In fact, achieving the tracking curves using the A* association algorithm involves obtaining the optimal solution using the heuristic Dijkstra algorithm. The heuristic feature of the A* association algorithm makes the search direction more objective and reduces unnecessary calculations. 3. Target localization and long sequence processing Multi-object tracking of high quality requires a reliable tracker, and the detector needs to accurately segment and locate multiple objects and a pre-processing method that can improve the performance of the algorithm. 3.1. Target detection and localization Page 12 of 24

To obtain the accurate target for the tracker, we establish a background model with the improved codebook algorithm and extract the observational characteristic information of the tracking object by

ip t

the foreground/background subtraction method in [21]. Using the method of [22], we segment individual objects that were initially merged. Then, we obtain the probability distributions of the plane

an

us

selected frames of the target localization are illustrated in Fig. 3.

cr

of the objects from the detector, which can serve as the input of the A*AADW algorithm. Some

Fig. 3 Separating merged objects and locating them with probability distribution

M

The full range tracking in the camera field of view increases the processing time of the algorithm and consumes a significant portion of the limited memory resources. For this reason, because most of

te

d

the probabilities of the objective presence estimated are equal to 0, we can reduce the number of nodes and computational costs by this characteristic. On the other hand we limit the potential birth area of

Ac ce p

targets to reduce the amount of computation. The proposed method also checks the maximum detection probability of each location k within a given spatiotemporal neighborhood of each frame t .

max  j

j  k 1 t  2  t  2

(26)

If the value at a location is below the set threshold, an object represented by the value is considered not be able to reach the location, and all flows from and to it are removed from the model. This method can reduce by an order of magnitude the number of required variables and constraints. In our experiment, we pruned the graph by a radius of 1   2  3 . 3.2. Long sequence processing In theory, processing a whole long video sequence using the A*AADW algorithm can obtain the global optimum for tracking time, but it requires considerable a lot of operation time. To address this Page 13 of 24

issue, we split the long sequence in segmentations of 100 frames, which yields good results with a delay of less than 0.5 second between input and output and can be performed in real time.

The first segmentation

Segment long sequence

The second segmentation

The third segmentation

ip t

Long sequence

cr

Fig 4 Illustration of segment processing of a long video sequence

us

For each segment maintaining temporal consistency, we use the method of multi-frame overlay, as shown in Fig. 4, and add the last 10 frames of the previously optimized segmentation to the first 10

an

frames of the current one. We then force the sum of the flows out of every location of the first 10 frames of the current frame to be consistent with the total flows out of the last locations of object in

effectively.



1 i, j





d

k  1,, K  ,

M

the last 10 frames of the previous one. This solves the target missing on the piecewise point

jN  i 

iN  k 

1 k ,i

 k

(27)

te

Where  k is the total flow of the last position k of object appearing in the last 10 frames of the

Ac ce p

previous segment. For the corresponding first position j of an object appearing in the first 10 frames of the current segment, the total flows into it is equal to the flow out of position k and also equal to the total flow out of any potential position i of any object between k and j . This is implemented as an additional constraint in our framework. If we cannot find the tracking target in the first 10 frames of the current segment, the proposed method searches for the object in t  frames after the current one. In our experiment, we let t   10 . If we find the tracking target in a frame of t  , this frame is the first frame of the current segment, and otherwise the tracking is failed. 4. Experimental results In our simulation, video sequences with different characteristics were selected from the datasets of

Page 14 of 24

PETS09, CAVIAR, BEHAVEDATA and ETHMS, and the challenges of them are summarized in Table 1. The selected sequences cover almost all the problems that commonly occur in multi-object

Table 1 The challenges of the experimental sequences Scaling

Multipleflow OneStopMoveEnter Fightmargaret

√ √ √



CrowdS2view8





Seq04left





Pose

Clutter

Ill Dynamic background Blur



















cr

Occ



us

Sequence name

ip t

tracking.



an

4.1. Parameter setting

In the training period, a detector is established by the background subtraction method of the

M

improved codebook algorithm model. We combine the detection result with the activity scope of the object by foreground/background segment updating in real-time and estimate the location of the

te

d

object with a high probability. Because in every sequence the size of the activity scope of the object and the number of the pixels of the object are not same, our method can generate 900-1000 detections

Ac ce p

per frame in each video sequence. We set the log-likelihood ratio of each detection to be the negative score as the results of the linear detector. We used a bounded values dynamic model: we define the cost ci , j between two locations of consecutive frames in spatial overlap (i.e., an object remains at a location) as 0. The cost from the virtual position to the neighborhood of birth and end are csource,birth  10 , cend,sink  10 respectively. 4.2. Evaluation metrics Let GTi ,t be the i -th ground truth bounding box for the t -th frame and TR i ,t be the tracked bounding box. The Ci ,t of the t -th frame and i -th object is defined as the ratio between the area of intersection GTi ,t  TR i ,t and the area of union GTi ,t  TR i ,t [23].

Page 15 of 24

Ci ,t 

AREA GTi ,t  TR i ,t 

(28)

AREA GTi ,t  TR i ,t 

In our experiment, we set the threshold of Ci ,t to 0.5, which means that the tracking is successful

ip t

when the overlapping area of the ground truth bounding box and tracked bounding box exceed 0.5. Our results are evaluated using the Multiple Object Tracking Accuracy (MOTA) and Multiple

  c  m   c  fp   c  g  C (30) MOTP =  Nm m

t

t

f

t

t

t

i ,t

t

s

(29)

an

i ,t

t

us

MOTA  1 

cr

Object Tracking Precision (MOTP) of the standard CLEAR2006 metrics[24].

where, gt is the number of ground truth objects in the t -th frame, Nmt refers to the number of

M

mapped objects in the t -th frame, mt represents the missed detection count and fpt is the false positive count for each frame. cs  log ID  SWITCHESt , and ID  SWITCHESt is the number of ID

d

mismatches in t considering the mapping in frame t  1 . We started the count from 1 because of the

te

log function. cm and c f represent, respectively, the cost functions for missed detections and false

Ac ce p

positives. The values used for the weighting functions in Eq.29 were cm  c f  1 . Fig. 5 shows the histograms of MOTA and MOTP in the experiment using the A*AADW algorithm. 1

Zhang's method 2

KSP

SSP+DP

Proposed

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 OneStopMoveEnter Multipleflow

Seq04left

Fightmargaret

CrowdS2view8

0 OneStopMoveEnter Multipleflow

(a)

Zhang's method 2

Seq04left

KSP

SSP+DP

Fightmargaret

Proposed

CrowdS2view8

(b)

Fig 5 MOTA (a) and MOTP (b) measures applied to the results of the recent state-of-the-art methods and proposed tracker on various experiment sequences

4.3. Results analysis To ensure the unique identification for each tracking target, we use different colors to indicate the Page 16 of 24

order. The video sequences for our experiment are from Table 1. The detection results were obtained by §3.1 as the input of our algorithm, then we take the performance test of the multi-object tracking

ip t

in the conditions of false positives, false negatives and dynamic background, respectively. Performance test for false negatives: The sequences use the Multipleflow and CrowdS2view8 of the

cr

PETS09 dataset. We show typical results in Fig. 6 and Fig. 7. In particular, the former uses the bright

us

yellow coats of pedestrians as the tracking object. Although the probability of false negatives increases greatly because of the occlusion with non-tracking objects, the proposed algorithm can

an

ensure persistent tracking (the color of the tracking box has not changed) for each object in the whole tracking process. The experiment for CrowdS2view8 verifies the robustness of the proposed algorithm

d

M

when the targets leave the area of non-restricted leaving and appears again soon.

Ac ce p

te

Fig 6 The typical results of Multipleflowview1(Frame:11,16,22,42,49,75)

Fig 7 The typical results of CrowdS2view8 (Frame:13,24,38,56,66,94)

Performance test for false positives: The sequences use the Fightmargaret of the BEHAVEDATA dataset and the OneStopMoveEnter of the CAVIAR dataset. The typical results are been shown in Fig. 8 and Fig. 9. We used the method of §3.1 for detection and localization. The detection accuracy of this method is not improved greatly compared with the POM of [11], but because of the superior ability of the solution and anti-interference of the A*AADW algorithm, we can still track multiple objects stably and timely in the case of false positives.

Page 17 of 24

cr

ip t

Fig 8 The results of Fightmargaret (Frame:700,749,770,810,960,1000)

us

Fig 9 The results of OneStopMoveEnter (Frame:182,206,224,244,261,284)

Performance test for dynamic background:

an

There are two conditions that must be satisfied by the sequence of the experiment: 1) The available probability distribution of the dynamic background of the sequence needs to be

M

relatively consistent. Only in this way can the algorithm quickly obtain the location of an object for tracking.

te

d

2) The targets should be fixed access areas in the tracking ground. Because the tracking ground is moving, the potential area in which the objects can enter and exit is changed. We require the

Ac ce p

borders of the camera field of view to be the area for all objects that can enter and exit. The sequence uses Seq04left from the ETHMS dataset. We obtain the object characteristics by the method of combining the skin color and [25], and we show typical results in Fig. 10. The method of detection and localization in §3.1 only considers the available probability distribution of the target characteristic in the tracking ground and does not relate to the background conditions. Therefore, the video sequence for our experiment requires a consistent available probability distribution. This constraint, in a way, limits the experiment conditions of performance for a dynamic background, but it doesnot affect the conclusion that multi-object tracking using the A*AADW algorithm in a dynamic background is robust.

Page 18 of 24

ip t

Fig 10 The results of Seq04left (Frame:20,90,100,168,220,250)

4.4. Simulation analysis

cr

All of above experiments were performed on a Windows XP PC equipped with a 2.7 GHz

us

Pentium(R) Dual-Core CPU and 8 GB of memory. The software platform uses Visual Studio 2010 and Open CV2.2. 30

25 )l e x p(i sr or r e kc ar t e g ar ev A

20 15

0

200

400

600

800 Frames

1000

1200

1400

20 15 10

M

10 5

Zhang's method 2 KSP SSP+DP Proposed

an

25 )l e x p(i sr or r e ck ar t e g ar e v A

30

Zhang's method 2 KSP SSP+DP Proposed

5 0

6.77

6.78

6.79

6.8 Frames

6.81

6.82

6.83 4

x 10

d

(a) (b) Fig 11 The comparison of the average tracking errors on CrowdS2view8 of the PETS09 dataset (a) and

te

Fightmargaret of the BEHAVEDATA dataset (b)

Ac ce p

We contrast the proposed algorithm with Zhang’s method 2 [12], Berclaz’s KSP [13] and Pirsiavash’s SSP [14] in S2L1view8 of the PETS09 dataset and Fightmargaret of the BEHAVEDATA dataset about the average tracking errors, and the results are shown in Fig. 11. We also compare algorithms using the tracking accuracy. Fig. 12 shows the detection rate versus false positives per image (FPPI) for the above algorithms. 0.5

1

0.4 et a R n oi ct et e D

0.8 et a R n oi ct et e D

0.3

0.2

0.1 -3 10

SSP+DP -2

KSP -1

10 10 False Positive Per Image

(a)

Proposed 0

10

0.6

0.4

0.2 -3 10

SSP+DP -2

KSP -1

10 10 False Positive Per Image

Proposed 0

10

(b)

Fig 12 Detection rate versus false positive image on CrowdS2view8 of the PETS09 dataset (a) and Fightmargaret of

Page 19 of 24

the BEHAVEDATA dataset (b)

The Fig. 11 shows that the tracking errors of these algorithms are not significantly different in the cases without clutter and vary illumination. However, when tracking an object in the case of false

ip t

positives and false negatives for a long time, our proposed algorithm shows an obvious advantage.

cr

Although the occupancy problem in the case of simple assumptions can be satisfied by Zhang’s method 2, when several false negatives and false positives occur frequently, the assumptions required

us

result in omission and eventually lead to tracking failure. As illustrated in Fig. 12, when the target

an

detection rates are similar, the A*AADW algorithm, which highlights the scouting speed and performance performs better than KSP and SSP in controlling FPPI. The superiority of the proposed

M

algorithm depends on heuristic search using the A* association algorithm and finding the global optimal solution faster.

d

We compare the detection rate and false positives of our A*AADW algorithm with the above

te

state-of-the-art methods on the ETHMS dataset and the CAVIAR dataset in Table 2. Compared with

Ac ce p

the other algorithms, the A*AADW algorithm can achieve better tracking. In addition, as shown in Fig. 13, the run time of the A*AADW algorithm significantly outperforms the other three state-of-the-art algorithms.

Table 2 Our algorithm performance compared with the previous state-of-the-art for the ETHMS and CAVIAR datasets

Dataset algorithm Detection rate(%) False positives per image(%) ETHMS Zhang’s method 2 70.4 0.97 KSP 77.2 0.86 SSP+DP 72.5 0.89 77.2 0.77 A*AADW CAVIAR Zhang’s method 2 76.4 0.105 KSP 89.8 0.057 SSP+DP 84.4 0.636 A*AADW 89.4 0.051

4.5. Run time We evaluate the speed of our proposed tracking algorithms on the video sequences of 25 fps of the Page 20 of 24

BEHAVEDATA dataset. The curves of the run time for A*AADW and the above state-of-the-art algorithms have been shown in Fig. 13. The vertical axis representing the run time is plotted on a log

ip t

scale. The solver of Zhang’s method 2 does not converge for a long running time. When dealing with a video of 1000 frames, the KSP solver takes approximately 20 seconds and SSP takes 0.9 seconds,

cr

but our A*AADW solver only takes 0.15 seconds. 2

10

1

-2

10

0

200

400

an

us

10 ) s d n o c e 100 s( e mti n -1 u R 10

Proposed Zheng's method 2 KSP 600 800 1000 1200 1400 1600 Frames

SSP+DP 1800 2000

M

Fig 13 The comparison of the run time

5. Conclusions

d

To solve the false positives and false negatives of the multi-object tracking in the illumination and

te

clutter environments, we proposed a more reliable tracker with the flow network framework. In the

Ac ce p

min-cost flow framework established by the theory of integer program, combining the A* algorithm with dynamic weights to develop the A*AADW algorithm. We used this novel algorithm to relax the integer assumption and to identify the global optimal solution successfully. The resulting algorithm can better solve the problems of short-time false positives and false negatives in multi-object tracking, and it has strong robustness. The highlights of this paper is that the global optimal solution of the relaxed LP can be found more quickly using A*AADW. The experiment results indicate that the proposed algorithm is helpful for improving trajectory consistency, solving the serious occlusion between multiple objects and overcoming the vary illumination interference, and it also can satisfy the real-time measurement. Compared with other similar state-of-the-art algorithms, there are obvious advantages for A*AADW. Tracking multiple

Page 21 of 24

different types of targets of the scene with dynamic background in real time will be the focus of future investigations.

ip t

Acknowledgements This work is jointly supported by the National Natural Science Foundation of China (Grants No:

References 1.

us

cr

61372090).

Kim I S, Choi H S, Yi K M, Choi J Y, Kong S G. Intelligent visual surveillance-a survey. International Journal

an

of Control, Automation, and Systems, 2010, 8(5): 926-939

Hou Zhi-Qiang, Han Chong-Zhao. A survey of visual tracking. Acta Automatica Sinica, 2006, 32(4): 603-617

3.

Jiang Ming-Xin, Wang Hong-Yu, Liu Xiao-Kai. A Multi-target Tracking Algorithm Based on Multiple Cameras.

Zhou Hongren. A survey of multiple targets tracking technique. ACTA Aeronautica et Astronautica Sinica,

5.

Ac ce p

1986,7 (1): 1-10

te

4.

d

Acta Automatica Sinica, 2012,38(4):531-539

M

2.

Yu Q, Medioni G. Multiple-target tracking by spatiotemporal monte carlo markov chain data association. IEEE transactions on pattern analysis and machine intelligence, 2009, 31(12):2196

6.

Serratosa F, Alquezar R, Amezquita N. A probabilistic integrated object recognition and tracking framework. Expert systems with applications, 2012, 39(8): 7302

7.

E Maggio, M Taj, A Cavallaro. Efficient multi-target visual tracking using random finite sets. IEEE Transactions on circuits and systems for video technology, 2008, 18(8): 1016

8.

Sharp I, Yu K, Sathyan T. Positional accuracy measurement and error modeling for mobile tracking. IEEE transactions on mobile computing, 2012, 11(6), 1021

9.

J Giebel, D Gavrila, C Schnorr. A bayesian framework for multi-cue 3D object tracking. In: Proceedings of

Page 22 of 24

European Conference on Computer Vision. Prague, Czech Republic: Springer Verlag, 2004. 241-252 10. Perera A G A, Srinivas C, Hoogs A, Brooksby G, Wensheng Hu. Multi-object tracking through simultaneous

ip t

long occlusions and split-merge conditions. In: Proceedings of 24th IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 666–673

cr

11. Fleuret F, Berclaz J, Lengagne R, Fua P. Multi-camera people tracking with a probabilistic occupancy map.

us

IEEE Trans. Pattern Analysis and Machine Intelligence, 2008, 30(2): 267–282

12. L Zhang, Y Li, R Nevatia. Global Data Association for Multi-Object Tracking Using Network Flows. In:

an

Proceeding of 26th IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA 2008.342-349

M

13. Berclaz J, Fleuret F, Tueretken E, Fua P. Multiple Object Tracking Using K-Shortest Paths Optimization. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 2011,33(9): 1806-1819

te

d

14. Hamed Pirsiavash, Deva Ramanan and Charless C.Fowlkes, “Globally-Optimal Greedy Algorithms for Tracking

a Variable Number of Objects,” In proceedings of 2011 IEEE Conference on Computer Vision and Pattern

Ac ce p

Recognition. IEEE, Colorado Springs 1201-1208 (2011). 15. B. Aghezzaf and T. Ouaderhman, “An interactive interior point algorithm for multi objective linear

programming problems,” Operations Research Letters. 29(4), 163-170 (2001). 16. Gonzalez-Lima Maria D., Oliveira Aurelio R. L., Oliveira Danilo E. A robust and efficient proposal for solving

linear systems arising in interior-point methods for linear programming. Computational Optimization and Applications, 2013,56(3):573-597 17. Wei Jingxuan and Zhang Mengjie, “Simplex model based evolutionary algorithm for dynamic multiobjective

optimization,” In Proceedings of AI 2011: Advances in Artificial Intelligence. Springer Verlag, Western Australia 372-381 (2011). 18. Khan Izaz Ullah, Ahmad Tahir, Maan Normah. A simplified novel technique for solving fully fuzzy linear

Page 23 of 24

programming. Journal of Optimization theory and Applications, 2013,159(2):536-546 19. Dimitri P B. Convex Optimization Theory [M]. Beijing: Tsinghua University Press, 2011.

20. WANG Guiping, WANG Yan, REN Jiachen. Graph Theory, implementation and application [M]. Beijing:

ip t

Peking university press, 2011.

cr

21. Mohamad Hoseyn Sigari, Mahmood Fathy. Real-time background modeling/subtraction using two-layer

codebook model. In Proceedings of the International Multi-conference of Engineers and Computer Scientists

us

2008. Hong Kong, P.R.China, 2008:19-21.

an

22. Bugeau A, Perez P. Track and Cut: Simultaneous Tracking and Segmentation of Multiple Objects with Graph

Cuts, EURASIP Journal on Image and Video Processing, 2008, Article 317278: 1

M

23. Huaping Liu, Mingyi Yuan, Fuchun Sun, Jianwei Zhang. Spatial Neighborhood-Constrained Linear Coding for

Visual Object Tracking. IEEE Transactions on Industrial Informatics, 2014, 10(1): 469-480

d

24. Rangachar Kasturi, Dmitry Goldgof, Padmanabhan Soundararajan, Vasant Manohar, John Garofolo, Rachel

te

Bowers, Matthew Boonstra, Valentina Korzhova, Jing Zhang. Framework for Performance Evaluation of Face,

Ac ce p

Text, and Vehicle Detection and Tracking in Videl: Data, Metrcs, and Protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.31(2):319-335 25. Guan Chen-Ning, Juang Chia-Feng, Chen Guo-Cyuan. Face localization using fuzzy classifier with

wavelet-localized focus color features and shape features. Digital Signal Processing, 2012,22(6):961-970

Page 24 of 24