Applied Soft Computing Journal xxx (xxxx) xxx
Contents lists available at ScienceDirect
Applied Soft Computing Journal journal homepage: www.elsevier.com/locate/asoc
Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update Junseok Kwon School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea
highlights • A PS-MCMC sampler is proposed, which can simultaneously infer the target state and template. • By using the PS-MCMC, a novel tracking system is developed, which is superior to a similar tracker, PMCMC, in Kwon et al. (2016). • An adaptive updating strategy for target templates is presented, which better describes target appearances.
article
info
Article history: Received 1 May 2017 Received in revised form 29 November 2018 Accepted 7 April 2019 Available online xxxx Keywords: Particle swarm optimization Markov chain Monte Carlo Visual tracking Adaptive template update
a b s t r a c t A novel tracking method is proposed, which infers a target state and appearance template simultaneously. With this simultaneous inference, the method accurately estimates the target state and robustly updates the target template. The joint inference is performed by using the proposed particle swarm optimization–Markov chain Monte Carlo (PSO–MCMC) sampling method. PSO–MCMC is a combination of the particle swarm optimization (PSO) and Markov chain Monte Carlo sampling (MCMC), in which the PSO evolutionary algorithm and MCMC aim to find the target state and appearance template, respectively. The PSO can handle multi-modality in the target state and is therefore superior to a standard particle filter. Thus, PSO–MCMC achieves better performance in terms of accuracy when compared to the recently proposed particle MCMC. Experimental results demonstrate that the proposed tracker adaptively updates the target template and outperforms state-of-the-art tracking methods on a benchmark dataset. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Visual tracking aims to find the exact position and scale of a target in each frame. Recently, many algorithms have achieved great success in visual tracking in real-world environments [1–8]. These algorithms are able to handle several types of appearance changes in the target, including illumination changes, occlusion, and pose variations. However, if these changes are very extreme, the algorithm is unable to follow the changes and fails to track the target. In order to track a target with severe appearance changes, several methods [9–11] update the appearance model of the target over time and use the enhanced appearance model to improve visual tracking accuracy. The aforementioned tracking methods [9–11] assume that the updated appearance model is better at describing the current target appearance than any previous models. However, this assumption encounters the following fundamental problem. In these methods, the appearance model is typically updated based on tracking results up to the current frame. However, because E-mail address:
[email protected].
visual trackers cannot perfectly track the target, the tracking results inevitably include errors. These errors force the appearance model to be updated incorrectly. As a result, the updated appearance model cannot accurately describe the current target and cause the tracker to fail. The preliminary study [12] estimates the target state and the ground plane position simultaneously. Conventional methods such as expectation–maximization (EM) [13] and Gibbs sampling [14] first estimate the target state, and then find the ground plane position based on the estimated target state. The ground plane is then used to track the target. This process repeats until the best target state and ground plane position are found. Recently, a Bayesian algorithm [15] was proposed to estimate target state and ground plane position iteratively. However, this approach encounters an error propagation problem. An erroneous target state produces a bad ground plane, which then re-contaminates the tracking state. The method in [12] solves this error propagation problem by jointly estimating the target state and ground plane position. It uses the particle MCMC, proposed in [16] and was recently used for visual tracking in [17]. The particle MCMC consists of two stochastic algorithms: particle filter and MCMC. The particle filter infers the target state and the
https://doi.org/10.1016/j.asoc.2019.04.014 1568-4946/© 2019 Elsevier B.V. All rights reserved.
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
2
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
MCMC estimates the ground plane in [12]. However, the particle filter used in [12,17] has shown poor inference ability when the target state is represented by a multi-modal distribution. PSO methods have been widely used for visual tracking problems. Ren et al. [18] proposed a multi-task particle swarm optimization algorithm for visual tracking. The method consists of several modules, namely, the PSO-based tracking module, discovery module, and the contour module, which aims to track the existing targets, aims to determine the contour of targets, and aims to detect new targets, respectively. Hu et al. [19] presented a probabilistic variant of PSO, namely, a resampling cellular quantum-behaved PSO for visual tracking. This method can better balance the global and local search than traditional PSO algorithms. While the aforementioned methods show accurate tracking results, the proposed method further improves the PSO optimization algorithm by adding a MCMC sampling method. The method can jointly infer multiple variables (i.e. configuration and appearance templates), whereas traditional methods can handle a single variable (i.e. configuration). In order to solve the aforementioned problems of the particle MCMC and the PSO, a new joint inference method called particle swarm optimization–MCMC (PSO–MCMC) is proposed. PSO [20–24] is a population based stochastic optimization technique that has achieved great success in handling nonlinear, non-differentiable, and multi-modal distributions. For these reasons, PSO typically yields more accurate tracking results than a traditional particle filter. PSO is combined with the MCMC sampling method, using PSO to accurately infer the target state and the MCMC method to adaptively determine the target template for appearance updating. The proposed method simultaneously finds the best target state and target template using PSO–MCMC. Fig. 1 illustrates the core concept and shows the best target state and appearance template found by the proposed method. The objective function for the target state and template inference is presented in Section 2. The PSO–MCMC algorithm that finds a solution for the objective function is discussed in Section 3. In Section 4, the PSO–MCMC based tracking system is described. Section 5 discusses experimental settings and results, and Section 6 contains final conclusion. 2. Joint inference of the target state and template The proposed visual tracking algorithm aims to simultaneˆ t and target appearance template Tˆ t ously find the best state X given the observations up to time t, Y1:t . In order to achieve this ˆ t and Tˆ t goal, a joint posterior p(Xt , Tt |Y1:t ) is modeled and the X is searched, which maximize p(Xt , Tt |Y1:t ):
ˆ t , Tˆ t = arg max p(Xt , Tt |Y1:t ), X
(1)
Xt ,Tt
where Xt = {xt , yt , st } denotes the (x, y)-position and scale of the target. at time t. As shown in (1), Xt and Tt are represented as a joint probability. Thus, they should be inferred simultaneously for accurate estimation. ˆ t and Tˆ t can typically be found by using the MCMC sampling X method [25]. The MCMC method obtains multiple samples and uses them to describe p(Xt , Tt |Y1:t ). The sample that maximizes ˆ t and Tˆ t , is selected. For this purpose, the MCMC p(Xt , θ t |Y1:t ), X operates in two steps: the proposal and acceptance steps. In the w w proposal step, the MCMC proposes a new sample, Xne , Tne , t t based on the current sample, Xt , Tt : Q
(
w Xne t
,
w Tne t
; Xt , Tt = N
)
w (Xne t
(
,
w Tne ) t
; (Xt , Tt ), Σ
2
)
,
(2)
where Q (·) denotes a proposal function. The proposal function is typically modeled by using a normal distribution N with mean
(Xt , Tt ) and variance Σ 2 . The MCMC then determines if the prow w posed sample (Xne , Tne ) is acceptable or not by using the folt t lowing acceptance ratio:
αMCMC = [ min 1,
w w w w p(Xne , Tne |Y1:t )Q (Xt , Tt ); (Xne , Tne ) t t t t
)]
(
p(Xt , Tt |Y1:t )Q
(
w (Xne t
,
w ) Tne t
; (Xt , Tt )
)
(3)
,
where the MCMC accepts a new sample with the acceptance probability αMCMC . With these two steps, the MCMC obtains N (n) (m) samples of Xt for n = 1, . . . , N, and M samples of Tt for m = 1, . . . , M. However, the joint posterior p(Xt , Tt |Y1:t ) in (3) resides in a higher-dimensional space than a conventional single posterior such as p(Xt |Y1:t ). In order to describe a higher-dimensional space, the MCMC typically requires additional samples. However, because of the additional computational cost this incurs the number of samples is limited in practice. Alternatively, a more sophisticated sampling strategy can be developed, which efficiently describes the joint posterior with a limited number of samples. It is also difficult to implement the joint proposal ( more ) w w distribution N (Xne , Tne ); (Xt , Tt ), Σ 2 in (2) t t ( compared )to a w conventional proposal distribution such as N Xne ; Xt , Σ 2 . In t the joint proposal distribution, proposing Xt and Tt simultaneously is non-trivial. The following section explains how to solve these issues by using the proposed PSO–MCMC method. 3. PSO-MCMC sampling method The PSO–MCMC method combines PSO [20] with MCMC [25]. The optimality of this combination can be verified mathematically. PSO–MCMC can describe a high-dimensional posterior p(Xt , Tt |Y1:t ) and efficiently obtain samples from it. For example, the PSO can obtain samples of Xt , while the MCMC can obtain samples of Tt simultaneously. PSO–MCMC improves the particle MCMC proposed in [16] by removing the particle filter in favor of PSO. Note that this substitution is non-trivial. Its details are described in the following section. PSO–MCMC consists of two steps: the proposal and acceptance steps, similar to the standard MCMC mentioned in Section 2 In the w w proposal step, a new state Xne and Tne t t ) are proposed ( appearance w new by the joint proposal function Q Xne , T ; X , T t t . Because it is t t difficult to implement a joint proposal distribution in practice, the joint distribution is divided into two individual distributions, each w w of which is represented by a single variable: Xne or Tne : t t w w w w new Q Xne , Tne ; Xt , Tt = Q (Tne ; Tt ) × p(Xne |Tt , Y1:t ), t t t t
(
)
w where Q (Tne t new proposes Xt
w Tne t
w p(Xne t
(4)
w Tne t
based on Tt and | , Y1:t ) w based on Tne . t w w In the acceptance step, the proposed Xne and Tne are either t t accepted or rejected based on the acceptance ratio:
; Tt ) proposes
w w w w p(Xne , Tne |Y1:t )Q (Xt , Tt ); (Xne , Tne ) t t t t
(
w w p(Xt , Tt |Y1:t )Q (Xne , Tne ); (Xt , Tt ) t t
(
= =
)
)
w new w w p(Xne |Tt , Y1:t )p(Tne |Y1:t )Q (Tt ; Tne )p(Xt |Tt , Y1:t ) t t t
w w new p(Xt |Tt , Y1:t )p(Tt |Y1:t )Q (Tne ; Tt )p(Xne |Tt , Y1:t ) t t new new p(Tt |Y1:t )Q (Tt ; Tt ) w p(Tt |Y1:t )Q (Tne ; Tt ) t
(5)
,
where the second equality holds because( of (4). In (5), p(Xt ), w w Tt |Y1:t ) is transformed to p(Tt |Y1:t ), while Q (Xne , Tne ); (Xt , Tt ) t t new is transformed into Q (Tt ; Tt ). Thus, both the high-dimensionality(problem in p(Xt , Tt |Y)1:t ) and the implemenw w tation problem in Q (Xne , Tne ); (Xt , Tt ) are resolved. Although t t
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
3
Fig. 1. Core concept of the proposed method. The method simultaneously finds the target state and appearance template. Conventional methods consider finding the target state and estimating the appearance template as separate processes.
the joint distribution p(Xt , Tt |Y1:t ) is reduced to the marginalized distribution p(Tt |Y1:t ) in (5), the samples obtained by PSO– MCMC still describe the original posterior p(Xt , Tt |Y1:t ). This is mathematically proven in [16]. The MCMC method for appearance template estimation is explained in Section 3.1 and then the PSO algorithm for target state inference is described in Section 3.2. 3.1. Markov chain Monte Carlo in PSO-MCMC The MCMC in PSO–MCMC proposes a new appearance temw w plate Tne via the proposal function Q (Tne ; Tt ) derived in (4). t t new The sampled Tt is accepted with the acceptance probability αPSO−MCMC :
[ ] w w p(Tne |Y1:t )Q (Tt ; Tne ) t t αPSO−MCMC = 1, . w ; Tt ) p(Tt |Y1:t )Q (Tne t
(6)
In (6), the proposal function is implemented by w w Q (Tne ; Tt ) = N (Tne ; Tt , ΣT2 ), t t
(7)
where N is a normal distribution with mean Tt and variance ΣT2 . In (7), Tt is a two dimensional image patch with three channels. Thus, N should be a multivariate distribution. However, for more efficient computation, each pixel in Tt is proposed independently with a univariate distribution. It is assumed that the target appearance changes smoothly over time. With this assumption, the proposed appearance template becomes more accurate: w w Tne = c1 Tne + c2 Tˆ t −1 + c3 Tˆ 1 , t t ′
(8)
where c1 , c2 , and c3 are weighting parameters. In (8), Tˆ t −1 and Tˆ 1 are the best appearance templates at frames t − 1 and 1, respectively. The remaining task is to design p(Tt |Y1:t ) from (6): N
w p(Tne |Y1:t ) = t
1 ∑ N
n=1
w p(Tne , Xt |Y1:t ), t (n)
(9)
(n)
where Xt is the nth sample obtained by PSO, which will be w explained in the following section. Because p(Tne , X(n) t |Y1:t ) ∝ t (n) (n) new new w p(Y1:t |Tt , Xt )p(Tt , Xt ), The likelihood p(Y1:t |Tne , X(n) t ) is t (n) new designed instead of p(Tt , Xt |Y1:t ): (n)
w p(Y1:t |Tne , Xt ) = e−λ×f (Yt (Xt t (n)
w) ),Tne t
,
(10)
(n)
where λ is a parameter and Yt (Xt ) indicates the image patch (n) (n) described by the configuration Xt . In (10), Yt (Xt ) is compared new to the reference appearance template Tt by using a similarity measure f (·). In this paper, the diffusion distance in [26] is used as a similarity measure. 3.2. Particle swarm optimization in PSO-MCMC PSO is inspired by the social behavior of bird flocking [20]. It (n) is initialized with a group of N random particles Xt for n = 1, . . . , N. All particles have their own velocities. These particles w are moved toward well describing the posterior p(Tne |Y1:t ) in (9) t based on two simple principles. One is that the movement of each particle is guided by the best position it has achieved so far. The other is that the overall movement is guided by the best position obtained so far by any particle in the group. The velocity of the (n) nth particle Xt at the (i + 1)th iteration is computed as follows: (n) (n) ˆ vin+1 = vin + ψ1 u1 (Xˆ (n) 1:t − Xt ) + ψ2 u2 (X1:t − Xt ),
where v
n i+1
(n)
denotes the velocity of
(n) Xt
(11)
at the (i + 1)th iteration,
ˆ X 1:t indicates the best position of the nth particle from time ˆ 1:t is the best position among all particles up to 1 to t, and X the current time. In (11), u1 ∈ (0, 1) and u2 ∈ (0, 1) denote uniformly distributed random numbers, and ψ1 and ψ2 are (n) weighting parameters. The position of Xt is updated as follows: (n)
Xt
n = X(n) t + vi+1 .
(12)
Then the weight of the particle, w
w
(n) (Xt )
≡
w p(Y1:t Tne t
|
,
(n) Xt )
,
(n) (Xt ),
is calculated by (13)
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
4
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
Fig. 2. PSO–MCMC sampler.
Algorithm 1
PSO–MCMC Tracker Algorithm
ˆ t −1 and Tˆ t −1 Input: X ˆ t and Tˆ t Output: X 1: for m=1 to M do (m) 2: Propose a new target template Tt using (8). (n)
for n = 1, · · · , N using (12).
3:
Sample N particles Xt
4:
Calculate weight of each particle, w (Xt ) using (10) and (13).
5:
(n)
Compute
w p(Tne t
|Y1:t ) from w
(n) (Xt )
using (9). (m)
6: Accept or reject the proposed target template Tt based on (8). 7: end for ˆ t and Tˆ t , using (1). 8: Find the best target state and template, X
w where p(Y1:t |Tne , Xt ) is introduced in (10). This process is t repeated until N particle positions converge. Fig. 2 illustrates the PSO–MCMC sampling method. (n)
box. The precision plot describes the percentage of frames in which the center location error is less than a given threshold [27]. The success rate indicates the ratio of successfully tracked frames over total frames. A tracking is considered as being successful if its score is⋂greater than a given threshold, where the score is ⋂ B B Bg is the overlapping area of the estimated defined as Be ⋃ Bg . Be e
g
⋃
bounding box Be and ground truth bounding box Bg , while Be Bg is the area of either Be or Bg . The success plot depicts the success rate based on a varying threshold between 0 and 1. 5.2. Analysis of the proposed tracker The excellent performance of the proposed tracker stems from two main factors: the PSO–MCMC sampler and the adaptive template updating. In order to demonstrate the advantages of the PSO–MCMC sampler and the adaptive template updating, the tracker was analyzed in a component-wise manner.
4. PSO-MCMC based tracker
ˆ t −1 and Tˆ t −1 , at Given the best target state and template, X frame t − 1, the proposed tracker seeks the next best target state ˆ t and Tˆ t , at frame t. The tracker first proposes a and template, X (m) new target template Tt using (8). It then samples N particles (n) Xt for n = 1, . . . , N, which are candidates for target states, by (n) using (12). Each sample has a weight w (Xt ) calculated using (n) w (13). p(Y1:t |Tne , Xt ) in (13) is estimated using (10). With the t w average weight of N particles, the probability p(Tne |Y1:t ) can t new be computed by using (9). p(Tt |Y1:t ) is then used in (8) to (m) determine if the proposed target template Tt is accepted or not. This process is repeated M times. Finally, M × N target states and M target templates are obtained. The best target state and template are selected based on (1). This process is summarized below in Algorithm 1. [belowfloat=8pt] 5. Experiment 5.1. Experimental settings N = 100 particles were used for each PSO process and M = 20 samples for the MCMC process. ΣT2 in (7) was set to 5. c1 , c2 , and c3 in (8) were set to 0.5, 0.25, and 0.25, respectively. λ in (10) was set to 0.1. ψ1 and ψ2 in (11) were set to 2. The values of parameters were empirically found by experiments (e.g. Fig. 4) and were fixed throughout all experiments. For compared methods, the best parameters were chosen to show the best performance. The proposed algorithm was implemented using Visual C++ 2015 and OpenCV 2.4.12. The proposed method was compared to 30 state-of-the-art tracking methods based on [27], including SCM [28], STRUCK [29], ASLA [30], TLD [31], CXT [32], VTD [33], VTS [34], CSK [35], and PMCMC [12]. For comparison, the 50 test sequences from [27] were used. For IVT, ASLA, and the proposed method, 2000 samples were used to track the target. Two evaluation metrics, namely, precision and success rate, were used. Precision measures the Euclidean distance between the ground truth and the estimated center of the target bounding
5.2.1. Experiment for the PSO-MCMC sampler In order to demonstrate the advantages of the proposed sampler (PSO–MCMC), PSO–MCMC was compared with two other samplers: particle MCMC (PMCMC) and MCMC. In order to facilitate this experiment, the PMCMC tracker [12] was modified to estimate the best target state and template simultaneously. The MCMC tracker in [25] was used, which estimates the best target state and template iteratively. Fig. 3 demonstrates that the PSO–MCMC based tracker outperforms trackers based on PMCMC and MCMC. The MCMC sampler produced inaccurate tracking results because it estimated the best target state and template iteratively. Thus, if the estimated target template contains an error, the target state obtained by using the template is also contaminated. The PSO–MCMC sampler jointly estimates the best target state and template. Thus, the aforementioned error propagation problem does not occur. The particle filter employed in the PMCMC sampler cannot handle the multi-modality in the target distribution, which results in worse tracking performance than the PSO used in PSO–MCMC, as shown in Fig. 3. In order for a sampler to perfectly describe a target distribution, it requires an infinite number of samples. However, many samples cannot be utilized in practice due to computational cost limitations. Instead, it is needed to develop an efficient sampler that can describe a target distribution accurately with a limited number of samples. PSO–MCMC is superior to the PMCMC and MCMC because the PSO–MCMC sampling strategy is more efficient than the others. For fair comparison, we used the same number of samples and particles. For example, PSO–MCMC and PMCMC used M = 20 samples for MCMC and N = 100 particles for particle filter, while MCMC used MN samples. Thus, PSO–MCMC, PMCMC, and MCMC show the same computational complexity of O(MN). In order to demonstrate the efficiency of the PSO–MCMC sampling strategy, the following experiment was conducted. PSO– MCMC tracking was performed with three different numbers of samples: 500, 1000, and 2000. In Fig. 4, PSO–MCMC(100 × 20), PSO–MCMC(50 × 20), and PSO–MCMC(25 × 20) denote the PSO– MCMC that used 100 × 20, 50 × 20, and 25 × 20 particles in total. As shown in Fig. 4, the tracking accuracy of PSO–MCMC does not decrease significantly as the number of samples decreases.
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
5
Fig. 3. Performance of PSO–MCMC. PSO–MCMC was compared with two other sampling methods: PMCMC and MCMC.
Fig. 4. Performance of PSO–MCMC. The PSO–MCMC was evaluated by using different numbers of samples: 25 × 20, 50 × 20, 100 × 20, 100 × 10, and 100 × 5.
Fig. 5. Performance of Adaptive Template Update The proposed target templates for the Bird1, Soccer, Shaking, and Deer sequences are shown in order from left to right.
Fig. 6. Quantitative comparison with other tracking methods. Among the 30 tested tracking methods, the results of the top 9 tracking methods were displayed.
The PSO–MCMC tracker produced more accurate results with 500
In Fig. 4, PSO–MCMC(100 × 10) and PSO–MCMC(100 × 5) denote
samples than PMCMC and MCMC produced with 2000 samples .
the PSO–MCMC when M = 10 and M = 5 with the fixed
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
6
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
Table 1 Attribute-based performance in terms of average precision. Attribute
SCM
STRUCK
ASLA
TLD
CXT
VTD
VTS
CSK
PMCMC
PSO–MCMC
Illumination variation (25) Out-of-plane rotation (39) Scale variation (28) Occlusion (29) Deformation (19) Motion blur (12) Fast motion (17) In-plane rotation (31) Out of view (6) Background clutter (21) Low resolution (4)
0.594 0.618 0.672 0.640 0.586 0.339 0.333 0.597 0.429 0.578 0.305
0.558 0.597 0.639 0.564 0.521 0.551 0.604 0.617 0.539 0.585 0.545
0.517 0.518 0.552 0.460 0.445 0.278 0.253 0.511 0.333 0.496 0.156
0.537 0.596 0.606 0.563 0.512 0.518 0.551 0.584 0.576 0.428 0.349
0.501 0.574 0.550 0.491 0.422 0.509 0.515 0.610 0.510 0.443 0.371
0.557 0.620 0.597 0.545 0.501 0.375 0.352 0.599 0.462 0.571 0.168
0.573 0.604 0.582 0.534 0.487 0.375 0.353 0.579 0.456 0.578 0.187
0.481 0.540 0.503 0.500 0.476 0.342 0.381 0.547 0.379 0.585 0.411
0.751 0.823 0.779 0.785 0.829 0.714 0.730 0.781 0.705 0.780 0.477
0.815 0.894 0.846 0.853 0.900 0.775 0.793 0.848 0.765 0.847 0.518
Whole
0.649
0.656
0.532
0.608
0.575
0.576
0.575
0.545
0.810
0.880
Bold denotes the best tracking results. Among the 30 tested tracking methods, the results of the top 9 tracking methods were reported. Table 2 Attribute-based performance in terms of average success rate. Attribute
SCM
STRUCK
ASLA
TLD
CXT
VTD
VTS
CSK
PMCMC
PSO–MCMC
Illumination variation (25) Out-of-plane rotation (39) Scale variation (28) Occlusion (29) Deformation (19) Motion blur (12) Fast motion (17) In-plane rotation (31) Out of view (6) Background clutter (21) Low resolution (4)
0.473 0.470 0.518 0.487 0.448 0.298 0.296 0.458 0.361 0.450 0.279
0.428 0.432 0.425 0.413 0.393 0.433 0.462 0.444 0.459 0.458 0.372
0.429 0.422 0.452 0.376 0.372 0.258 0.247 0.425 0.312 0.408 0.157
0.399 0.420 0.421 0.402 0.378 0.404 0.417 0.416 0.457 0.345 0.309
0.368 0.418 0.389 0.372 0.324 0.369 0.388 0.452 0.427 0.338 0.312
0.420 0.434 0.405 0.403 0.377 0.309 0.302 0.430 0.446 0.425 0.177
0.429 0.425 0.400 0.398 0.368 0.304 0.300 0.416 0.443 0.428 0.168
0.369 0.386 0.350 0.365 0.343 0.305 0.316 0.399 0.349 0.421 0.350
0.509 0.528 0.473 0.523 0.540 0.524 0.528 0.497 0.554 0.537 0.341
0.555 0.576 0.516 0.571 0.589 0.572 0.576 0.542 0.605 0.586 0.372
Whole
0.499
0.474
0.434
0.437
0.426
0.416
0.416
0.398
0.532
0.580
Bold denotes the best tracking results. Among the 30 tested tracking methods, the results of the top 9 tracking methods were reported. Table 3 Runtime for several visual trackers. FPS
SCM
STRUCK
ASLA
TLD
CXT
VTD
VTS
CSK
PMCMC
PSO–MCMC
0.09
9.84
1.5
23.3
11.3
15.2
4.2
151
13.2
10.7
The tracking algorithms were performed on a computer machine with a 3.5GH i7 CPU.
N = 100, respectively. As demonstrated in Fig. 4, PSO–MCMC produced more accurate tracking results with more samples for target templates, while the computational cost got higher. 5.2.2. Experiment for adaptive template updating Fig. 5 shows examples of the target templates generated by (8). In (8), a new template is formed via the weighted linear combination of a current proposed template, the best template at the previous frame, and the best template at the initial frame. Because the new template contains the target appearance at the current, previous, and initial frames, it is able to cover appearance changes in the target. The template is adaptively updated by the MCMC sampling process over all frames. Thus, the proposed tracker robustly tracks targets in real-world sequences. 5.3. Comparison with other trackers 5.3.1. Quantitative comparison As shown in Fig. 6, the proposed tracker shows the best performance in terms of center location error and success rate, vastly outperforming PMCMC, STRUCK, and TLD. The test videos for the comparison include severe occlusions, pose variation, and illumination changes. However, the proposed tracker could still track the target accurately because it adaptively updates the target template to reflect the current tracking environment. Additionally, the tracker jointly estimates the target state with the proposed PSO–MCMC sampler. Other trackers, such as STRUCK and TLD do not estimate the target state and template simultaneously. Thus, the error propagation problem cause the trackers to
Table 4 One-sided Wilcoxon rank sum test with significance level 0.01. PSO–MCMC PMCMC Standard PSO MCMC
PSO–MCMC
PMCMC
Standard PSO
MCMC
0 7 3 0
35 0 9 2
42 37 0 10
50 43 21 0
fail at tracking the targets. PMCMC did not encounter this issue. However, it could not accurately estimate the target state using a unimodal particle filter, particularly when there were several objects with appearances similar to the target. Tables 1 and 2 report the center location error and success rate for specific types of appearance changes in the videos. One can see that the proposed tracker consistently achieves good tracking performance over all types of appearance changes. Table 3 shows the frame per second (FPS) for several visual trackers. PSO–MCMC runs almost in real-time. Although PSO–MCMC jointly estimates multiple variables, it is fast because the sampling process can be conducted in a smaller space than the original space. Nevertheless, samples of PSO–MCMC exactly follow the original distribution, which can be verified mathematically. One-sided Wilcoxon rank sum test justifies the statistical significance of the results. For each pair of algorithms (algorithm A,
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
7
Fig. 7. Qualitative comparison with other tracking methods. The red and yellow bounding boxes represent the tracking results of the proposed tracker and PMCMC, respectively.
algorithm B), Table 4 shows how often the algorithm A significantly surpasses the algorithm B. For example, PSO–MCMC outperformed Standard PSO 42 times among 50 videos, while Standard PSO outperformed PSO–MCMC 3 times among 50 videos. As demonstrated in Table 4, PSO–MCMC provides a more efficient and accurate way of proposing new configurations of targets and new target templates. Note that the good properties of PSO– MCMC come from the joint inference of target configurations and target templates, while the inference can be performed in a smaller space that traditional joint inference methods. 5.3.2. Qualitative comparison Fig. 7 presents qualitative tracking results for the top two trackers: the proposed method and PMCMC. The Biker sequence contains abrupt motions. The Bird1 and Soccer sequences include severe occlusions. Illumination changes occur in the Iron-man and Skating sequences. The MotorRolling and Skiing sequences include severe pose variations. In these challenging sequences, the proposed tracker robustly tracked the target with its adaptive target template update. However, the PMCMC tracker easily drifted into background when severe target appearance changes occurred. 6. Conclusion and discussion A novel PSO–MCMC sampling algorithm is proposed, which can jointly estimate a target state and template. PSO–MCMC combines the PSO with an MCMC sampler. Using the PSO–MCMC sampler, a accurate visual tracking system is developed, which adaptively updates the target template. Thus, the proposed tracker accurately tracks the target in real-world tracking environments. Experimental results demonstrate that the proposed method outperforms several state-of-the-art tracking methods on a benchmark dataset. When multiple variables need to be inferred, two well-known approaches are widely used, which are expectation–maximization (EM) and Gibbs sampling (GS). Because the multiple variables typically reside in a high-dimensional space, which makes the inference difficult, EM and GS approximate the original highdimensional space into a low-dimensional space. For example, to infer the multiple variables in the joint distribution, EM alternates between expecting the log-likelihood of a current parameter and maximizing the expected log-likelihood, in which the loglikelihood can be defined in a low-dimensional space. On the other hand, GS constructs the conditional distribution of each variable, which is conditioned on the other variables. Then, GS alternately infers each variable from its conditional distribution, in which the distribution is typically defined in a low-dimensional
space. However, main drawback of these methods is to approximate the original joint distribution into multiple marginal distributions. Thus, results are not optimal for the original distribution but sub-optimal for the approximated distributions. Contrary to the aforementioned methods, the proposed PSO– MCMC jointly infers multiple variables from the original distribution instead of approximating the distribution. Thus, PSO– MCMC can produce the perfect samples that follows the original distribution, while it can save the computational cost by mathematically enabling the method to conduct the inference in a smaller space. As a result, the proposed method can estimate more accurate states and templates of targets during visual tracking. Acknowledgments This work was supported by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIP) (No. 2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding). Conflict of interest No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.04.014. References [1] M. Godec, P.M. Roth, H. Bischof, Hough-based tracking of non-rigid objects, in: ICCV, 2011. [2] B. Han, L. Davis, On-line density-based appearance modeling for object tracking, in: ICCV, 2005. [3] S. Hong, B. Han, Visual tracking by sampling tree-structured graphical models, in: ECCV, 2014. [4] L. Sevilla-Lara, E. Learned-Miller., Distribution fields for tracking, in: CVPR, 2012. [5] M. Danelljan, F.S. Khan, M. Felsberg, J. van de Weijer, Adaptive color attributes for real-time visual tracking, in: CVPR, 2014. [6] R. Timofte, J. Kwon, L. Van Gool, PICASO: Pixel correspondences and soft match selection for real-time tracking, CVIU 153 (2016) 162–153. [7] J. Kwon, K.M. Lee, Adaptive visual tracking with minimum uncertainty gap estimation, TPAMI 39 (1) (2017) 18–32. [8] J. Kwon, R. Timofte, L. Van Gool, Leveraging observation uncertainty for robust visual tracking, CVIU 158 (2017) 62–71. [9] R.T. Collins, Y. Liu, M. Leordeanu, Online selection of discriminative tracking features, PAMI 27 (10) (2005) 1631–1643. [10] A.D. Jepson, D.J. Fleet, T.F.E. Maraghi, Robust online appearance models for visual tracking, PAMI 25 (10) (2003) 1296–1311.
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.
8
J. Kwon / Applied Soft Computing Journal xxx (xxxx) xxx
[11] D.A. Ross, J. Lim, R. Lin, M. Yang, Incremental learning for robust visual tracking, IJCV 77 (1) (2008) 125–141. [12] J. Kwon, R. Dragon, L. Van Gool, Joint tracking and ground plane estimation, SPL 23 (11) (2016) 1514–1517. [13] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B Stat. Methodol. 39 (1) (1977) 1–38. [14] S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, PAMI 6 (6) (1984) 721–741. [15] J. Kwon, R. Dragon, L. van Gool, Tracking by switching state space models, CVIU 153 (2016) 29–36. [16] C. Andrieu, A. Doucet, R. Holenstein, Particle Markov chain Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (3) (2010) 269–342. [17] J. Roh, D.W. Park, J. Kwon, K.M. Lee, Visual tracking using joint inference of target state and segment-based appearance models, in: APSIPA, 2013. [18] Y. Ren, B. Xu, P. Zhu, M. Lu, D. Jiang, A multicell visual tracking algorithm using multi-task particle swarm optimization for low-contrast image sequences, Appl. Intell. 45 (4) (2016) 1129–1147. [19] J. Hu, W. Fang, W. Ding, Visual tracking by sequential cellular quantum-behaved particle swarm optimization algorithm, in: BIC-TA, 2016. [20] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: ICNN, 1995. [21] M. Clerc, J. Kennedy, The particle swarm-explosion, stability, and convergence in a multidimensional complex space, IEEE Trans. Evol. Comput. 6 (1) (2002) 58–73. [22] M.P. Wachowiak, R. Smolikova, Y. Zheng, An approach to multimodal biomedical image registration utilizing particle swarm optimization, IEEE Trans. Evol. Comput. 8 (3) (2004) 289–301.
[23] D. Parrott, X. Li, Locating and tracking multiple dynamic optima by a particle swarm model using speciation, IEEE Trans. Evol. Comput. 10 (4) (2006) 440–458. [24] X. Zhang, W. Hu, S. Maybank, X. Li, M. Zhu, Sequential particle swarm optimization for visual tracking, in: CVPR, 2008. [25] Z. Khan, T. Balch, F. Dellaert, MCMC-based particle filtering for tracking a variable number of interacting targets, PAMI 27 (11) (2005) 1805–1918. [26] H. Ling, K. Okada, Diffusion distance for histogram comparison, in: CVPR, 2006. [27] Y. Wu, J. Lim, M.-H. Yang, Online object tracking: A benchmark, in: CVPR, 2006. [28] W. Zhong, H. Lu, M.-H. Yang, Robust object tracking via sparsity-based collaborative model, in: CVPR, 2012. [29] S. Hare, A. Saffari, P.H.S. Torr, Struck: Structured output tracking with kernels, in: ICCV, 2011. [30] X. Jia, H. Lu, M.-H. Yang, Visual tracking via adaptive structural local sparse appearance model, in: CVPR, 2012. [31] Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection, PAMI 34 (7) (2012) 1409–1422. [32] T.B. Dinh, N. Vo, G. Medioni, Context tracker: Exploring supporters and distracters in unconstrained environments, in: CVPR, 2011. [33] J. Kwon, K.M. Lee, Visual tracking decomposition, in: CVPR, 2010. [34] J. Kwon, K.M. Lee, Tracking by sampling trackers, in: ICCV, 2011. [35] J.F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in: ECCV, 2012.
Please cite this article as: J. Kwon, Particle swarm optimization–Markov Chain Monte Carlo for accurate visual tracking with adaptive template update, Applied Soft Computing Journal (2019), https://doi.org/10.1016/j.asoc.2019.04.014.