An incremental Bhattacharyya dissimilarity measure for particle filtering

An incremental Bhattacharyya dissimilarity measure for particle filtering

ARTICLE IN PRESS Pattern Recognition 43 (2010) 1244–1256 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevie...

3MB Sizes 0 Downloads 23 Views

ARTICLE IN PRESS Pattern Recognition 43 (2010) 1244–1256

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

An incremental Bhattacharyya dissimilarity measure for particle filtering Anbang Yao a,, Guijin Wang a, Xinggang Lin a, Xiujuan Chai b a b

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China Nokia Research Center, Beijing 100013, China

a r t i c l e in f o

a b s t r a c t

Article history: Received 5 January 2009 Received in revised form 12 September 2009 Accepted 22 September 2009

The dissimilarity between a target descriptor and a particle descriptor is a crucial parameter in the particle filtering (PF), while the widely used Bhattacharyya dissimilarity (BD) is not discriminative enough. This paper presents an incremental Bhattacharyya dissimilarity (IBD) for measuring histogram based descriptors (HBDs) used for particle weight estimation. IBD is defined by incorporating an incremental similarity matrix (ISM) into the BD. Such an ISM imposes the incremental similarity beliefs on the matched bin patches of two input histograms and enables a cross-bin interaction during the comparison, which yields the enhanced capability of discriminating the particles located in the object from those positioned in the background. We propose a robust approach to compute the ISM by jointly utilizing the spatial and temporal attributes. Also, to handle target appearance changes and deformations, a classification-inspired target model update strategy is presented. These components lead to an effective and robust tracking algorithm. Experimental results demonstrate that IBD shows promising discriminative capability in comparison with other state of the art dissimilarity measures. Moreover, the IBD based PF-tracker also exhibits competitive tracking performance, especially under scenarios of partial occlusion and background clutter. & 2009 Elsevier Ltd. All rights reserved.

Keywords: Visual tracking Particle filtering Mixture of Gaussian Bhattacharyya dissimilarity

1. Introduction Visual tracking is a fundamental and crucial requirement of today’s emerging applications such as surveillance [1], intelligent traffic navigation [2], human computer interaction [3] and video indexing [4]. Among the state of the art tracking methods [5], particle filtering (PF, i.e. Condensation [6]), which is originally proposed by Isard and Blake to model the dynamics of general non-linear and non-Gaussian systems, has recently obtained considerable success in many tracking related applications. Compared with other stochastic methods such as Kalman filter [7], the most appealing merit of PF is that it presents a general framework for estimating and propagating the posterior probability density function (pdf) of state variable regardless of the underlying distribution. To handle practical tracking tasks, the standard PF usually has four key steps that are (1) re-sampling of the initial particle set, (2) propagation of the re-sampled particle set, (3) measuring of the propagated particle set, and (4) estimation of the object position. These steps are highly interrelated with each other. Aiming to achieve sound results in each step, numerous approaches have been presented to address the following problems. How should the re-sampling strategy be reliably constructed [8]? How should the posterior pdf be

 Corresponding author. Tel.: + 86 10 62781291; fax: + 86 10 62770317.

E-mail address: [email protected] (A. Yao). 0031-3203/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2009.09.024

precisely estimated [9]? How should the different cues be effectively associated [10]? How should the state transition model be accurately adjusted [11]? The aforementioned problems in essence greatly correlate with particle weight (i.e. the normalized dissimilarity/similarity between a target descriptor and a particle descriptor). This paper focuses on presenting an effective dissimilarity measure for particle weight estimation task in the context of using histogram based descriptors (HBDs) to represent the target of interest and particles. As the quantized and compact distributions of particular contents (e.g. intensity, color, edge orientation, gradients, high order of derivatives, filter responses, etc.) in the image regions of interest, HBDs have many advantages for tracking rigid or nonrigid objects in the presence of real applications. For example, some popular tracking systems [12–15] have been set up by using color histograms as the descriptors, and qualitative evaluation exhibits that color histograms are simple and robust to object scale change, object rotation and partial occlusion. Tracking systems that are based on other HBDs, e.g. intensity histograms [16], gradient orientation histograms [17] and edge orientation histograms [18], have also been reported in recent years. In the existing HBDs based PF trackers, many dissimilarity measures have been employed to estimate particle weight. Among them, Bhattacharyya dissimilarity (BD) [12,13] is one of the most widely used measures. As a divergence-type bin-to-bin dissimilarity measure [15], BD only compares the values in the same indexed bins of two histograms. That is BD neglects the similarities

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

between the neighboring bins, hence it is not discriminative, especially for matching high dimensional HBDs [19]. Instead of using BD, Kullback–Leibler divergence (KLD) is employed in [20] to compute the similarity between a reference histogram and a candidate histogram. KLD measures how inefficient on average it would be to code one histogram using the other as the code-book. However, KLD is non-symmetric and is sensitive to histogram binning [21]. The authors of [18] adopt the Minkowski-form L2 distance (L2 -D) of Euclidean space to achieve robust comparisons. Similar to BD, L2 -D also applies pair-wise comparisons. Therefore, it is better adapted to the tasks of accumulative content matches. In [22], Wang et al. present a joint spatial-color mixture of Gaussians based similarity measure. This measure is well constructed to take advantage of the clustered statistics of color contents and coordinates, thus it is not suitable for measuring the dissimilarities between HBDs. In addition, it is not trivial to set the appropriate number of Gaussian kernels and to further update the parameters of each Gaussian kernel. Besides these, other cross-bin dissimilarity measures such as quadratic-form distance (QFD) [23], earth mover’s distance (EMD) [24] and diffusion distance (DD) [25] can also be employed. To achieve reliable comparisons, QFD uses cross-bin information via a similarity matrix. EMD takes the matching process of two distributions as a transportation problem and further obtains an optimal solution from linear optimization. In contrast with QFD and EMD, DD models the difference between two HBDs as a temperature field. However, these cross-bin dissimilarity measures are primarily proposed and well suited for the content based applications (e.g. image retrieval, object recognition, shape matching and texture analysis), which limits their properties and applications to measure particles. Several systematic studies [25–28] have been performed to interpret which state of the art dissimilarity measure works best for each of these content based tasks. Targeting the reliable particle weight estimation, we present an incremental Bhattacharyya dissimilarity (IBD). The main contributions of this paper can be summarized as follows: (1) an incremental similarity matrix (ISM) is embedded in the BD for better comparisons. Such an ISM works as a bin-mixing matrix and enables a crossbin interaction, (2) partially motivated by the work of [29], a robust approach is proposed to compute the ISM in the joint spatial temporal space. Moreover, multiple cues can be naturally integrated into the computation process, (3) based on the spatial ISM, an effective classification-inspired target model update strategy is presented to adapt object appearance changes and deformations, and (4) a complete IBD based PF tracking algorithm is given. The remainder of this paper is organized as follows. In Section 2, we first clarify the importance of dissimilarity measure from the theoretical point of view, and then point out the desirable properties and the limitations of BD in the context of using HBDs as the inputs. In Section 3, we present a detailed description of the proposed tracking algorithm including the definition of IBD, the approach for calculating the ISM, and the target model update strategy. In Section 4, experiments are presented to show the discriminative capability of IBD and the tracking accuracy of our algorithm, respectively. The conclusions are summarized in Section 5.

2. Problem formulation 2.1. The importance of dissimilarity measure PF [6] is theoretically based on Bayesian inference. Denote xt as the object state at time t and Zt ¼ fz1 ; z2 ; . . . ; zt g as all the observations up to time t, the tracking problem is to recursively estimate the belief of the posterior pdf pðxt jZt Þ in the state space. Assuming that the observation zt is sequentially and conditionally

1245

independent of the state sequence Xt ¼ fx1 ; x2 ; . . . ; xt g, the posterior pdf pðxt jZt Þ can be expressed as R pðzt jxt Þ pðxt jxt1 Þpðxt1 jZt1 Þ dxt1 pðxt jZt Þ ¼ ð1Þ pðzt jZt1 Þ where pdf pðzt jZt1 Þ is supposed to be a positive constant. Since the prior pdf pðzt jxt Þ (i.e. observation likelihood function) and pðxt jxt1 Þ (i.e. state transition model) are usually in non-linear and non-Gaussian forms, it is not accessible to get an analytic expression of (1) in most cases. PF offers an approximate solution by estimating the posterior pdf pðxt1 jZt1 Þ as a set of particles K fskt1 ; pkt1 gk ¼ 1 associated with state skt1 and weight pkt1 , here, K is the number of particles. Now (1) can be represented as K X

pðxt jZt Þppðzt jxt Þ

pkt1 pðxkt jskt1 Þ

ð2Þ

k¼1

In each frame, the recursive implementation of (2) is usually split into four stages. First, given the initial particle set K

fskt1 ; pkt1 gk ¼ 1 at time t  1, a re-sampling procedure is performed to suppress possible particle degeneracy according to the weight K

set fpkt1 gk ¼ 1 . Second, based on the state transition model k

K

k

b t1 gk ¼ 1 is propagated pðxt jxt1 Þ, the re-sampled particle set fb s t1 ; p K

to obtain the state set fskt gk ¼ 1 at time t. Third, with respect to the K

propagated state set fskt gk ¼ 1 , the corresponding weight set fp

k K t gk ¼ 1

is re-estimated and normalized by (3) which is also used K

k

K

b t1 gk ¼ 1 : to normalize the weight set fpkt1 gk ¼ 1 and fp

pkt ¼ PK

pðzkt jskt Þ

ð3Þ

pðzkt 1 jskt 1 Þ

k1 ¼ 1

In this paper, we choose the widely used observation likelihood function [10,11,20–22,30] shown as 2

pðzkt jskt Þped Here

dðskt Þ

ðskt Þ=2s2

ð4Þ

denotes the dissimilarity between a target descriptor

and a particle descriptor at state skt , i.e. 0 r dðskt Þ r 1, s2 is the observation variance and we set it to 0.25 according to the works of [22,30]. Finally, the object center at time t is estimated from the K

propagated particle set fskt ; pkt gk ¼ 1 . In general, it is estimated as the weighted mean state of the propagated particle set: Eðxt Þ ¼

K X

pkt skt

ð5Þ

k¼1

From the above description, it is quite clear that the steps of resampling particles, measuring particles and locating object center are all highly correlated with the estimation of particle weight. Considering that particle weight is just the normalized dissimilarity between a target descriptor and a particle descriptor, how to develop a robust and effective dissimilarity measure is a core issue for PF. From the theoretical and practical points of view, an ideal dissimilarity measure should exhibit favorable discriminative capability, well defined basin of attraction against the true object center, high efficiency and easy implementation. 2.2. Definition of BD BD is widely used in the existing PF trackers [12,13]. The original type of BD is introduced by Kailath [31] for signal selection problems. Denote p1 ðxÞ and p2 ðxÞ are two pdfs, BD is defined as  Z pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  p1 ðxÞp2 ðxÞ dx d ¼  ln ð6Þ

ARTICLE IN PRESS 1246

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

Clearly, it is bounded by zero and infinity, i.e. 0 r d r þ 1. As clarified by Kailath, (6) is a pseudo metric because it does not obey the rule of triangle inequality. To impose a metric structure, this paper addressed BD is constructed by Comaniciu [15] for accommodating comparisons between two discrete histogram based distributions. Specifically, given an N  bin target histogram PN q and an N  bin particle histogram p, i ¼ 1 qi ¼ 1 and PN p ¼ 1, Comaniciu’s BD is shown as j j¼1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi ð7Þ d ¼ 1  pT q Clearly, it is bounded by zero and one, i.e. 0 r d r 1. Several desirable properties of (7)p defined ffiffiffiffiffiffiffiffi BD are listed as follows: (1) it is a true metric, (2) the term pT q can be interpreted as the cosine of pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi the angle between two unit histogram vectors ð q1 ; q2 ; . . . ; qN ÞT pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi T and ð p1 ; p2 ; . . . ; pN Þ , hence (7) has a clear geometric interpretation, (3) considering that the inputs are two histograms, it is less sensitive to target scale changes in visual tracking, (4) it only requires OðNÞ operations, thus it is an efficiently computable dissimilarity measure, and (5) other interesting and useful relations between BD and KLD, Kolmogorov–Smirnov distance (KSD) [32] have been theoretically discussed in [31,33,34]. On the other hand, in the context of using traditional HBDs as the descriptors to represent the target of interest and particles, BD inherits with the following limitations: (1) the first limitation is based on the bin-to-bin comparison nature which assumes that the input two discrete distributions are strictly aligned. As a result, BD only assesses the accumulative dissimilarities over the same indexed bins and neglects the latent similarities across the different bins, (2) due to the effect of histogram binning, BD is not discriminative for matching high dimensional HBDs [19], and (3) usually, traditional HBDs discard the spatial layout information in the image region of interest, which implies in some degree that the BD based PF-trackers are susceptible to partial occlusion and severe cases of background clutter. To simply illustrate the aforementioned disadvantages, Fig. 1 gives four different synthesized image patterns and the corresponding dissimilarities computed with the intensity histogram based BD. For these 256  256 images, their intensity contents are uniformly distributed in the range of 0–255, while their spatial layouts are different. We take the first image as the reference image and the others as candidate images. Whatever the number of bins is chosen, the BD based dissimilarity between two intensity histograms mapped from the reference image and each candidate image is always computed as 0. That is BD outputs the four image patterns as entirely same. These results clearly verify that BD cannot robustly handle multi-modal tracking problems which are usually happened in cluttered scenarios.

3. The proposed method To address the above limitations, two key factors must be considered. One is the spatial configuration of object, and the

d=0

other is the effect of histogram binning. By densely encoding the spatial attributes and elaborately introducing a cross-bin interaction during the comparison, our proposed IBD reasonably addresses these two factors. 3.1. Defining the IBD The basic idea of our IBD is to embed an incremental similarity matrix (ISM) in the traditional BD for consistent comparison between a target HBD and a particle HBD. The ISM works as a binmixing matrix and enables a cross-bin interaction. Consequently, compared with BD, IBD can better discriminate the particles in the object from those in the background. Given q and p that are similar to (7), IBD is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffi pffiffiffi ð8Þ d ¼ 1  ð qT W p Þ Here, the ISM W ¼ ½wij  A RNN , 8i; j; 0 rwij r 1; wij ¼ wji , and P j wij ¼ 1 if W is not a diagonal matrix. The element wij denotes the incremental similarity belief between the matched bins qi and pj . Note that a large wij set is easier to obtain a small dissimilarity value in comparison with ap small wij set. Since q and p are unit ffiffiffiffiffi p ffiffiffi histograms, we can get 0 r qT W p and N X N pffiffiffiffiffi pffiffiffi X pffiffiffiffi pffiffiffiffi qi wij pj qT W p ¼ i¼1j¼1 N X pffiffiffiffi pffiffiffiffiffi pffiffiffiffiffiffi qi ðwi1 p1 þ    þwiN pN Þ

¼

i¼1 N h X

r P

wi1

i¼1

q þ p  q þp i N 1 i þ    þwiN i 2 2

wi1 p1 w pN  þ    þ iN 2 2 2 i¼1 P P P N N X X q ¼ 1&& p ¼1 w ¼ 1 i p q ij i j j j i i þ ¼ 1 ¼ 2 2 i¼1 j¼1 j

wij ¼ 1

¼

N  X qi

þ

ð9Þ

Hence, IBD is bounded by zero and one, i.e. 0 rd r 1. For now, such an ISM has the advantage of being not needed to renormalize the dissimilarity in the range of zero to one. Considering that the ISM is a symmetric square matrix, it is clear that the above defined IBD is indeed a positive semi-definite dissimilarity measure. We also want to emphasize that if the ISM is an identity matrix, IBD will be the BD. That means BD is a special case of IBD, IBD can be regarded as the generalized BD. With respect to the expression form of (8), IBD is a little similar to QFD which is defined as (10). However, QFD is different from IBD as follows: (1) QFD does not have a clear geometric interpretation, (2) in (10), A ¼ ½aij  A RNN and 8i aj; 0 raij r1; aii ¼ 1. Note that aii ¼ 1, it means that QFD always makes similarity confidence between qi and pi equal to 1, (3) if q ¼ p, QFD exhibits that the dissimilarity between q and p is 0, so matrix A has no effect in this case. In other words, QFD also

d=0

d=0

Fig. 1. Different synthesized image patterns and their dissimilarities computed with the intensity histogram based BD.

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

takes the images shown in Fig. 1 as completely same, and (4) QFD is originally proposed by Hafner [23] for content based image retrieval, and matrix A should be learned from a large number of training samples. Even though the pre-learned matrix A actually can alleviate the time complexity, QFD cannot be directly used for weighing particles because a well collected training set of target sample is usually not available in real-time tracking tasks. The experiments presented in Section 4 will further verify this claim. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð10Þ d ¼ ðp  qÞT Aðp  qÞ

We want to incorporate as much of the information that is necessary for measuring the latent relations between the target region of interest and particle regions into the ISM. To that end, we first construct the ISM Ws in the spatial space and Wt in the temporal space respectively. Then, the final ISM W of joint spatial temporal space is obtained through flexibly filtering the ISM Ws and Wt . 3.2.1. Calculating the ISM in the spatial space According to the biological knowledge of human vision system, spatial information is advantageous to distinguish the object of interest from the rest of the outliers [35]. To incorporate spatial information into the design of tracking systems, various approaches have been proposed in the literature [13–15,19,20,22,36]. In [13], Yang et al. provide an approach to divide the region of interest into several rectangle blocks and measure the similarities between the multiple sub-histogram based models and candidates block by block. Comaniciu et al. [15] employ a position weighted approach to partially utilize the spatial information. However, in the above two approaches, their usages of spatial information are loose. In [14,19,22], the authors focus on precisely estimating the joint spatial color distributions with the well defined kernel functions. Unlike these available works, the objective of our spatial ISM Ws is to densely encode the spatial layout information of pixel content in the image region of interest and to further utilize the spatial attributes in the cross-bin comparisons. Our approach can be viewed as an important extension of the work presented in [29]. Let f ¼ fuðxm ; ym Þgm ¼ 1;...;M be the spatial feature set of a local image region R centered at ðx0 ; y0 Þ, where uðxm ; ym Þ is the d-dimensional spatial feature vector at the 2-D pixel coordinate ðxm ; ym Þ, and M is the number of pixels in R. In general, uðxm ; ym Þ can be intensity, color, edge orientation, gradients, high order of derivatives, filter responses, etc. Then, the normalized N  bin histogram h of R is constructed as M 1 X dðgðuðxm ; ym ÞÞ; iÞ; Mm¼1

ym  y0 r 0 \ xm  x0 Z0 ym  y0 o 0 \ xm  x0 o0 ym  y0 40 \ xm  x0 o 0

ð12Þ

ym  y0 4 0 \ xm  x0 40

where bw and bh are the bandwidths, we set them to half the width and height of object, respectively; and   y  y0   om ¼ arctan  m xm  x0  Given q and p that are similar to (7), the element wsij of the matrix Ws is computed and normalized by

3.2. Calculating the ISM

hi ¼

8 om ; > > > > 1 < p  om ; ym ¼ > p þ om ; 2p > > > : 2p  om ;

1247

i ¼ 1; . . . ; N

ð11Þ

Here, g : uðxm ; ym Þ-f1; . . . ; Ng. That is, the function g maps uðxm ; ym Þ to a specific bin of histogram h via a quantization process. We represent the spatial attributes of the pixels mapped to the ith bin patch of histogram h as sðiÞ ¼ fNi ; mi ; Si g, where Ni is the number of pixels in the ith bin patch, mi and Si are the corresponding mean vector and covariance matrix of the spatial features, respectively. The spatial feature vector of the mth pixel is denoted as vm ¼ ½dm ; ym T , where dm is the Euclidean distance of the coordinate ðxm ; ym Þ to the region center ðx0 ; y0 Þ, ym is the corresponding orientation. In our formulation, Si is supposed to be a diagonal matrix, dm and ym are obtained from sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi     xm  x0 2 ym  y0 2 þ dm ¼ bw bh

T

wsij ¼ expaðmq½i mp½j Þ

ðSq½i þ Sp½j Þ1 ðmq½i mp½j Þ

;

wsij ¼ PN

wsij þ wsji

s s j ¼ 1 ðwij þ wji Þ

ð13Þ Here, a is a positive normalizing factor. Since (13) is an exponential function, it should be noted that the parameter a corresponds to the object size. To achieve well defined basin of attraction against the true object center, for the large object sizes large values can be used for normalization, while for the small object sizes small values are more reliable. In this paper, we use its default value 2 for comparisons. In reality, the function (13) can be interpreted as the spatial Mahalanobis similarities between the paired patches of a target region and a particle region. Therefore, the spatial cross-bin similarities are densely encoded by the ISM Ws . For now, the problem illustrated in Fig. 1 can be easily handled with IBD. Compared with the previous works of [13–15,19,20,22,29,36], the key differences with respect to our approach are as follows: (1) both coordinate and orientation information are densely encoded, (2) the cross-bin interaction is introduced, (3) our approach utilizes the spatial attributes in a statistical way, and it is not dependent on the properties of any kernel functions [19,20] or other preprocessing procedures (e.g. target learning [22]), (4) our approach is suitable for the histograms extracted from the different features, while [14,19,22] mainly address the association of colors and coordinates, and (5) if cross-bin similarities and orientation information are not considered, our approach will be the work of [29]. Therefore, with respect to (1), (2) and the definition of ISM, the proposed approach is an important extension of [29]. Furthermore, we want to point out that without considering the orientation information and cross-bin interaction, [29] also regards the images shown in Fig. 1 as the completely same mainly due to their symmetric structures. 3.2.2. Calculating the ISM in the temporal space In practice, since the object and background scene usually change over time, the temporal attributes should be reliably incorporated into ISM to augment the discriminative capability of IBD. To that end, we also construct the ISM Wt in the temporal space. Denote Rt0 ; . . . ; Rt1 as the estimated object regions in the previous image frames from time t0 to t  1, we keep the histogram set ht0 ; . . . ; ht1 as the temporal attributes of object, where ht0 ; . . . ; ht1 are the histograms of the object region Rt0 ; . . . ; Rt1 , respectively. With regard to this histogram set, we first calculate a mean histogram hm . Then, given an N  bin particle histogram p, the choice of the element wtij of matrix Wt is 2

wtij ¼ expbð1ðminðpj ;hmi Þ=maxðpj ;hmi ÞÞ ;

wtij ¼ PN

wtij þwtji

t t j ¼ 1 ðwij þwji Þ

ð14Þ

Here, similar to the parameter a of (13), b is also a positive normalizing factor. With respect to the exponential function (14),

ARTICLE IN PRESS 1248

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

parameter b is related to the possible maximal ratio (PMR) of the small value to the large value in the matched two bins of input histograms. To achieve well defined basin of attraction against the true object center, for the large PMRs small values can be used for normalization, while for the small PMRs large values are more reliable. In this paper, we use its default value 2 for comparisons. It should be noted that the mean histogram hm is an intrinsic average that inherits the latent properties of the previous histogram set, and it is used as the ground truth histogram of temporal space. Consequently, (14) enforces the effective crossbin comparisons between the different bins of the particle histogram and mean histogram over the time domain. 3.2.3. Generating ISM in the joint spatial temporal space In terms of the ISM Ws and Wt , we obtain the final ISM W in the joint spatial temporal space through filtering   wij þ wji ij : ðð1  lÞ; wsij þ lwtij Þ; wij ¼ PN wij ¼ k : bl j ¼ 1 ðwij þ wji Þ ð15Þ Here, l is a weighted coefficient, i.e. 0 r l r 1, which indicates the proportions of the spatial attributes and temporal attributes in the weighting process. In general, when the object appearance quickly changes over time, the importance of the temporal attributes should be enhanced. As a result, l 40:5 should be met. In other cases, we set l r0:5. In this paper, we use its default value 0:2 for experiments. As for kernel function kði; jÞ, we recommend to use 8 < 1  : i  j : if : i  j : r1 bl bl ð16Þ kði; jÞ ¼ : 0 others where bl is the normalizing factor, we set it to the dimension of the matched cross-bins. The insight of (16) is that for the matched bins of two input histograms, the same indexed/neighboring bins are considered to be more similar than those bins that are far from each other. Furthermore, the key property of (15) is that it evokes a weighting process in the sense that the contribution of the Wt in a specific image frame is reasonably considered. Now, it is worthwhile to highlight the advantages of calculating the ISM in the proposed method: (1) the ISM embodies the information embedded within the spatial structure of the object as well as the temporal appearance variations, (2) it proposes a natural way to fuse multiple cues, (3) HBDs of any regions have the same dimension, thus it allows for measuring the cross-bin similarities of any regions without being restricted to a constant window size, and (4) as a result, the proposed IBD limits the effect of particles in the background and improves the tracking accuracy. 3.2.4. Speeding up the computation process When we evaluate the dissimilarity between a target region and a particle region with the proposed approach, the most expensive parts are the computation of spatial layout attributes sðiÞ ¼ fNi ; mi ; Si g and the implementation of cross-bin comparisons. Since IBD has quadratic complexity, its naive implementation is an obstacle for real-time tracking tasks. We employ two techniques to significantly speed up the implementation process. First, we follow the basic idea of the recently proposed integral histogram to drastically reduce the complexity of computing sðiÞ ¼ fNi ; mi ; Si g, the readers are referred to [36,37] for details. Based on the discussion [23,24] that matching the affinities between all possible cross-bins sometimes overestimates the mutual similarities of two distributions, we employ particular cross bin comparisons rather than naive comparisons to achieve necessary discrimination. As for particular cross-bin comparisons, we use the concept of P diagonal ISM, here P is an odd integer. In

mathematics, a square matrix with all elements not on the main diagonal equal to zero is called as diagonal matrix. Similar to the definition of diagonal matrix, the ISM W ¼ ½wij  A RNN with all elements not positioned at the indexes of the maximum value ji  jj ¼ ðP  1Þ=2 equal to zero is called as P diagonal ISM. Therefore, for P diagonal ISM, the true cross-bin similarities of the indexes whose elements are set to zero are neglected. To demonstrate the effect of different numbers of cross-bin comparisons, Fig. 2 shows some example results on different targets. In the experiments, 3, 5, 17, 65, 1023 diagonal ISMs are used, respectively; and we take 8  8  8-bin histograms of RGB color space as the descriptors. For temporal ISM, the mean histogram hm of each target is calculated from the manual labeled object regions of the first five frames of test set (the test set of each target has six consecutive frames, and the last frame is used as the target image) collected from the diverse video sequences. Some example target images are indicated with the first images of Fig. 2(a)–(c), and the other figures are just the results obtained from 3, 5, 17, 65, 1023 diagonal ISMs respectively. These results are obtained by calculating the IBD based observation likelihood between the target histogram (mapped from green rectangle region) and each particle histogram (mapped from red rectangle region), we refer readers to Eqs. (4) and (8)–(16) for computational details. As for each particle region, it has the same size to the target size, and its center is scanned all over the target region. Compared with the naive cross-bin comparisons, it can be seen that the IBDs under particular numbers of cross-bin comparisons exhibit moderately different observation likelihood surfaces. However, their scores are sufficient to discriminate the particles that are nearer to the target center from the particles that are far from the target center. In other words, near optimal basins of attraction against target center are achieved under particular numbers of cross-bin comparisons. We recommend to choose P from {3,5,7,9,11,13,15,17}, in this paper we use 5 diagonal ISM. That is only the main diagonal and the nearest four neighboring diagonals of main diagonal in the ISM are computed. As a result, the computation cost of cross-bin comparisons is highly decreased. Apart from these two techniques, another important fact is that high dimensional HBDs of any regions are usually allocated on particular bins in many tracking tasks [15] (in Fig. 2, the number of non-zero bins is in the range of 34–71). In the end, we achieve a relatively efficient calculation of IBD in the experiments. However, for any high dimensional non-sparse HBDs, real-time operation is still a problem. 3.3. Updating the target model Under scenarios of object appearance changes and deformations, when persistent tracking performance is desired, the target model should be properly updated. In our algorithm, we appropriately update the target histogram from a classificationinspired perspective. With respect to the single diagonal ISM Ws between the target histogram and the histogram of the current estimated object region, we classify the bins of the current histogram into two categories that are bins with large similarity beliefs and bins with small similarity beliefs. We call the bins with large similarity beliefs as primary components and the bins with small similarity beliefs as non-primary components. In the update stage, our aim is to enhance the proportion of the primary components while decrease the proportion of the non-primary components in a non-linear manner. In the support of this goal, our update strategy includes measurement rule and update rule. Specifically, our measurement rule is 8s A ft  2; t  1; tg; dðxs Þ o T

ð17Þ

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

1249

Fig. 2. Example results of IBD under different numbers of cross bin comparisons.

Here, T is a positive threshold value, its default value is 0:8, dðxs Þ is the IBD based dissimilarity between the target histogram and the histogram of the estimated object region at time s. Note that our measurement rule attains an accumulative time to suppress possible false positive judgment. In accordance with measurement rule, our update rule is q^ ¼ a1Ws ðq;qt Þ

qþ q  t

2

;

q^ q^ i ¼ PN i

i¼1

q^ i

ð18Þ

Here, a is a positive base number, and 0 o a r1. Parameter Ws ðq; qt Þ denotes the single diagonal spatial ISM between the target histogram and the histogram of the estimated object region at time t. It should be noted that (18) is a non-linear function, and the value of weight a1Ws ðq;qt Þ falls into the range of a–1. Therefore, parameter a reflects the ratio of the minimal weight to the maximal weight. In this paper, we use its default value 0:2 for experiments. That is the maximal weight is five times of the minimal weight and hence an unequal update is achieved.

ARTICLE IN PRESS 1250

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

Consequently, the key property of (18) is that it evokes an importance weighted update process. In our update rule, the parameters of spatial attributes in the target histogram are also renewed by

m^ q½i ¼ ð1  wsii Þmq½i þ wsii mqt ½i

0 1 Nqt ½i X T s s@ 1 m m ^ S q½i ¼ ð1  wii ÞSq½i þ wii ðv  m^ q½i Þðvqt ½i  m^ q½i Þ A Nqt ½i m ¼ 1 qt ½i ð19Þ Here, Nqt ½i is the number of pixels in the ith bin patch of histogram qt , and vm qt ½i denotes the spatial feature vector of the mth pixel in the ith bin patch. In general, our target model update approach differs from the existing approaches [12,13,15,37] in the following aspects: (1) we introduce the concept of classification-inspired ingredient in the update procedure. That is the contribution of spatial-like bins is enhanced while the contribution of other bins is decreased, which is quite different from the equal update approaches given by [12,13,15], and (2) it is clear that the spatial ISM plays an important role in the update stage, thus, our update approach takes advantages of spatial attributes and improves the robustness of the tracking algorithm.

sufficient for measuring particles [24]. Note that the similarity matrix of QD is not learned from a well collected sample set. In view of this, the performance of QD is unfortunately bad, which further verifies our claim in Section 3.1. Among the comparative dissimilarity measures, DD shows relatively better performance. This is mainly benefited from its multi-scale comparisons. Since our proposed IBD encodes the spatial and temporal attributes in the object of interest, it consistently exhibits the nearly concave observation likelihood surface whose peak value is precisely located at the object center, and the well defined basin of attraction is persistently achieved. Fig. 4 shows more experimental results of IBD on the target images of other four tracking object categories. Note that competitive results are also obtained. These results can also be compared with the results given in [22]. In the second set of experiments, we run IBD on three other HBDs including intensity histograms [16], edge orientation histograms [18] and gradient orientation histograms [17]. Fig. 5 illustrates the representative results of IBD in Fig. 2(a) that shows a woman’s image. According to the above results, it is clear that the cross-bin IBD exhibits favorable performance in the task of measuring HBDs based particles. Considering that IBD can effectively discriminate high quality particles from low quality particles, it further lays the base to accurately locate the object center in the tracking tasks.

4. Performance evaluation 4.2. Tracking accuracy Performance evaluation includes two parts. The first part aims to compare the discriminative capabilities of IBD and other state of the art dissimilarity measures. The second part is to further compare the tracking accuracy of the proposed algorithm and the other state of the art trackers. All experiments are conducted on a Pentium 2.5 GHZ PC with 512 M memory. 4.1. Discriminative capability As for discriminative capability, we evaluate our IBD and eight other dissimilarity measures including w2 statistics dissimilarity ðw2 -SDÞ [24], L2 -D, KSD, KLD, EMD-L1 [38], DD, QD and BD on a well collected dataset. The dataset contains 252 images of 7 conventional tracking object categories including pedestrian, face, head, hand, car, cup and ping pong ball. For each object category, we collect six sub-sets from diverse video sequences. Each sub-set has six successive frames in which a variety of object appearance changes and background clutters are included. In the experiments, the last frame of each sub-set is used as the target image, and the mean histogram hm of our approach is calculated from the manual labeled object regions of the other five frames. Such a mean histogram hm is also used as the ground truth to calculate the similarity matrix of QD. The codes of EMD-L1 and DD are available at [39], and the other comparative dissimilarity measures are strictly implemented according to [15,23–26]. It should be emphasized that the other parameter settings are same as that of Fig. 2. Here, we present two sets of representative experiments. In the first set of experiments, we take 8  8  8-bin histograms of RGB color space as the descriptors. Fig. 3 gives the typical results on the different target images, and the other results are moderately similar to that of Fig. 3. It can be seen that BD gives very similar scores for many particle regions. That means BD cannot effectively discriminate the particles in the object from those in the background, this conclusion is also obtained in [22]. Since w2 -SD, L2 -D and KLD are also based on bin-to-bin comparison nature, their results are very similar to that of BD. For KSD and EMD-L1, they cannot output stable observation likelihood surfaces because their matching properties are not

To clearly show the performance of the proposed tracking algorithm, we compare it with a BD based PF-tracker, a kernel based tracker [15] and a hybrid tracker [40]. As for our tracker and BD based PF-tracker, the particle number is 200 and the state transition model is a widely used second order autoregressive model shown as (20). The kernel based tracker runs in the RGB color space with 8  8  8 bins, and the hybrid tracker is strictly implemented with respect to the pseudo-code and the default parameters that are available in [40]. To adapt target appearance changes and deformations, a BD based update rule [12] is employed in the BD based PF-tracker and the proposed classification-inspired update strategy is used in our tracker. We run these four trackers on various challenging test sequences that are available at [41–43]. The results are represented in terms of the position errors between the centers of the tracking results and that of the manual labeled ground truth. Ideally, a favorable tracker should be demonstrated with the position errors around zero. Here, we present three set of representative results to show the ability of our tracker in handling a variety of difficulties. xt ¼ at1 xt1 þ at2 xt2 þ nt where vt  Nð0; 1Þ; at1 ¼ 2; at2 ¼  1

ð20Þ The test sequence 1 has 250 frames whose resolution is 320  240 pixels. In this sequence, the car mainly undergoes background clutter because the road color is very similar to the color of the car. In our tracker, histograms of RGB color space with 8  8  8 bins are employed as the descriptors. Fig. 6 illustrates some result frames, and Fig. 9(a) shows the relative error curves. Note that kernel based tracker completely loses the car at frame 195 for the remainder frames when the background clutter becomes heavy. Similar to kernel based tracker, the hybrid tracker drifts into the background at frame 224, and it is also unable to recover. Although BD based PF-tracker tracks the car nearly throughout the sequence, it usually cannot reach the accurate centers. This is mainly due to two reasons. First, BD cannot robustly discriminate the particles in the car from the particles in the background.

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

Second, the target model is easy to be trapped into a false position under the BD based update rule. In contrast, the proposed IBD encodes the spatial and temporal attributes of the car, which makes the re-sampled particles densely allocate on the particles

1251

of car centered region. In addition, the classification-inspired update strategy enhances the weights of the primary components while degrades the weights of the non-primary components, hence the target model is reliably updated. With these two

Fig. 3. Comparison of nine dissimilarity measures: (a) the observation likelihoods of w2 -SD (#1), L2-D (#2), KSD (#3), KLD (#4), EMD-L1 (#5), DD (#6), QD (#7), BD (#8) and IBD (#9) on the target image of woman which is shown as the first image of Fig. 2 (a); (b) and (c) correspond to the comparative results on the target images of car (the first image of Fig. 2 (b)) and face (the first image of Fig. 2 (c)), respectively.

ARTICLE IN PRESS 1252

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

Fig. 3. (Continued)

Fig. 4. Results of IBD on different object categories: (a) target images and (b) results.

Fig. 5. Results of IBD under the inputs of different HBDs (the target image of woman is used): (a) intensity histogram (32 bins), (b) edge orientation histogram (32 bins), and (c) gradient orientation histogram (32 bins).

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

1253

Fig. 6. Result frames of the car sequence. The results of our tracker, kernel based tracker, BD based PF-tracker and the hybrid tracker are indicated in magenta, red, green and blue rectangles, respectively. From left to right, the frame indexes are 18, 51, 53, 112, 113, 142, 202 and 212. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. Result frames of the woman sequence. From left to right, the frame indexes are 10, 28, 69, 128, 185, 195, 202 and 208.

Fig. 8. Result frames of the face sequence. From left to right, the frame indexes are 4, 12, 23, 24, 37, 39, 44 and 49.

ARTICLE IN PRESS 1254

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

Fig. 9. Comparison of the tracking accuracy between our tracker and the other three trackers on three different video sequences: (a) car, (b) woman, and (c) face.

effective ingredients, our tracker achieves more accurate object centers throughout the sequence. The test sequence 2 has 230 frames with resolution of 384  288 pixels. In this video, the background color is similar to the color of the woman’s trousers, and the man’s coat and trousers are also in similar color to the woman’s coat. In addition,

the woman also undergoes partial occlusion. In our algorithm, we use 32-bin gradient orientation histograms as the descriptors. With respect to the results shown in Figs. 7 and 9(b), it can be noticed that the other three trackers all quickly drift to the man when the woman is partially occluded from frame 194 to the last frame. For our tracker, with the auxiliary support of the ISM, the

ARTICLE IN PRESS A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

inherent relation between the spatial and temporal attributes of the woman is discovered. On the other hand, our tracker uses the spatial cross-bin similarity to appropriately measure woman’s appearance changes. Consequently our algorithm is robust to partial occlusion, which makes the target model not easily deteriorated by any outliers. On the whole, our tracker also shows more effective and robust performance on this sequence. The test sequence 3 is even more challenging, in which the man’s face mainly undergoes partial occlusion resulted from his hands. We specify 32-bin intensity histograms as the descriptors in our tracker. According to the results demonstrated in Figs. 8 and 9(c), when the man’s face to be tracked is partially occluded by his moving hands, BD based PF-tracker usually drifts away from the target. In contrast, kernel based tracker and the hybrid tracker are relatively better. However, they still cannot robustly handle the effect resulted from mimic colors of hands. By adopting IBD and the proposed model update strategy, our tracking algorithm can deal with the difficulties mentioned above in a robust and precise way. In these experiments, we also calculate the statistical mean errors of four trackers on three test sequences. Specifically, on the car sequence, the mean errors of kernel based tracker, the hybrid tracker, BD based PF-tracker and our tracker are 21.9062, 15.331, 11.6737 and 4.3594 pixels, respectively. For woman and face sequences, the mean errors are 16.8970, 17.0386, 21.3840, 3.1087 and 4.6925, 5.3397, 8.3277, 2.62 pixels, respectively. It is clear that the mean errors of our tracker are much smaller than that of the other three trackers. The other important thing we want to emphasize here is that our tracker may also obtain competitive tracking results on these representative video sequences in the context of using the other suitable HBDs as the descriptors.

5. Conclusion Particle weight plays an important role in the PF. In this paper, we have presented a new dissimilarity measure IBD for estimating particle weight in the context of using HBDs as the descriptors. The discriminative capability of IBD is largely associated with an ISM. The ISM enforces the similarity beliefs on the paired bin patches of two input histograms and enables a cross-bin interaction to reliably discriminate the particles in the object from those in the background. To implement IBD, we present an effective approach to compute the ISM in the joint spatial temporal space. The benefit of the joint spatial temporal space is that it makes a weighted balance between the temporal attributes and the spatial configurations of the target. Furthermore, we propose a classification-inspired target model update method to handle object appearance changes and deformations. The key idea of our update strategy is to enhance the proportion of the spatial-like bins while decrease the proportion of the other bins in a non-linear manner. Experiments on real data show that the proposed IBD exhibits promising discriminative performance that can be compared with other state of the art dissimilarity measures. In contrast with three other state of the art trackers, the IBD based PF algorithm also demonstrates the better performance in overcoming the difficulties resulted from the partial occlusion and background clutter. In the current form, our attention is primarily focused on the estimation of particle weight. Clearly, the proposed IBD may also be useful for any other applications involving the dissimilarity judgments. In the future, we plan to verify the strength and weakness of IBD in other applications such as image retrieval and object recognition. According to the results of IBD, we feel that particular combination of intensity, gradient, texture and coordi-

1255

nates can be crucial for achieving promising results in these applications.

Acknowledgments The authors would like to thank the associate editor and the anonymous reviewers for their constructive comments and suggestions to improve the quality of the paper. They would also like to thank Haibing Lin for providing the source codes of DD and EMD-L1 for comparisons. This work was partially supported by Nokia Research Center, Beijing, China.

Appendix A. Supplementary material Supplementary data associated with this article can be found in the online version at 10.1016/j.patcog.2009.09.024.

References [1] T. Zhao, R. Nevatia, Tracking multiple humans in complex situations, IEEE Trans. Pattern Anal. Mach. Intell. 26 (9) (2004) 1208–1221. [2] W. Hu, X. Xiao, D. Xie, T. Tan, S. Maybank, Traffic accident prediction using 3-D model-based vehicle tracking, IEEE Trans. Veh. Tecnol. 53 (3) (2004) 677–694. [3] B. Stenger, A. Thayananthan, P.H.S. Torr, R. Cipolla, Model-based hand tracking using a hierarchical Bayesian filter, IEEE Trans. Pattern Anal. Mach. Intell. 28 (9) (2006) 1372–1384. [4] Z. Yin, R. Collins, On-the-fly object modeling and tracking, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8. [5] A. Yilmaz, M. Shah, Object tracking: a survey, ACM Comput. Surv. 38 (4) (2006) 1–45. [6] M. Isard, A. Blake, Condensation—conditional density propagation for visual tracking, Int. J. Comput. Vision 29 (14) (1998) 5–28. [7] R.E. Kalman, A new approach to linear filtering and prediction theory, Trans. ASME Ser. D J. Basic Eng. 82 (1960) 35–45. [8] M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Trans. Signal Process. 50 (2) (2002) 174–188. [9] B. Han, D. Comaniciu, Y. Zhu, L.S. Davis, Incremental density approximation kernel-based Bayesian filtering for object tracking, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2004, pp. 638–644. [10] M.N. Francesc, A. Sanfeliu, D. Samaras, Dependent multiple cue integration for robust tracking, IEEE Trans. Pattern Anal. Mach. Intell. 30 (4) (2008) 670–685. [11] Y. Li, H. Ai, T. Yamashita, S. Lao, M. Kawade, Tracking in low frame rate video: a cascade particle filter with discriminative observers of different life spans, IEEE Trans. Pattern Anal. Mach. Intell. 30 (10) (2008) 1720–1740. [12] K. Nummiaro, E.B. Koller-Meier, L.V. Gool, Object tracking with an adaptive color-based particle filter, in: Proceedings of Symposium for Pattern Recognition of the DAGM, 2002, pp. 355–360. [13] K. Nummiaroa, E. Koller-Meierb, L.V. Gool, An adaptive color-based particle filter, Image Vision Comput. 21 (1) (2003) 99–110. [14] P. Perez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, in: Proceedings of ECCV, 2002, pp. 661–675. [15] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2000, pp. 142–149. [16] R.T. Collins, Y. Liu, M. Leordeanu, Online selection of discriminative tracking features, IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) (2008) 1631–1643. [17] X. Liu, T. Yu, Gradient feature selection for online boosting, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2007, pp. 1–8. [18] C. Yang, R. Duraiswami, L.S. Davis, Fast multiple object tracking via a hierarchical particle filter, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005, pp. 212–219. [19] C. Yang, R. Duraiswami, L.S. Davis, Efficient mean-shift tracking via a new similarity measure, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2005, pp. 176–183. [20] A. Elgammal, R. Duraiswami, L.S. Davis, Probabilistic tracking in joint featurespatial spaces, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2003, pp. 781–788. [21] T.M. Cover, J.A. Thomas, Elements of information theory, in: Wiley Series in Telecommunications, Wiley, New York, USA, 1991. [22] H. Wang, D. Suder, K. Schindler, C. Shen, Adaptive object tracking based on an effective appearance filter, IEEE Trans. Pattern Anal. Mach. Intell. 29 (9) (2007) 1661–1667. [23] J. Hafner, H.S. Sawhney, W. Equitz, M. Flickner, W. Niblack, Efficient color histogram indexing for quadratic form distance functions, IEEE Trans. Pattern Anal. Mach. Intell. 17 (7) (1995) 729–736.

ARTICLE IN PRESS 1256

A. Yao et al. / Pattern Recognition 43 (2010) 1244–1256

[24] Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for image retrieval, Int. J. Comput. Vision 40 (2) (2000) 99–121. [25] H. Ling, K. Okada, Diffusion distance for histogram comparison, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2006, pp. 246–253. [26] J. Puzicha, J.M. Buhmann, C. Tomasi, Y. Rubner, Empirical evaluation of dissimilarity measures for color and texture, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1999, pp. 1165–1173. [27] J. Puzicha, T. Hofmann, J.M. Buhmann, Non-parametric similarity measures for unsupervised texture segmentation and image retrieval, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 1997, pp. 267–272. [28] S. Nicu, T. Qi, S.L. Michael, S.H. Thomas, Similarity matching in computer vision and multimedia, Comput. Vision Image Understanding 100 (3) (2008) 309–311. [29] S. T. Birchfield, S. Rangarajan, Spatiograms versus histograms for region-based tracking, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2005, pp. 1158–1163. [30] J. Lichtenauer, M. Reinders, E. Hendriks, Influence of the observation likelihood function on particle filtering performance in tracking applications, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FGR), 2004, pp. 767–772. [31] T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol. 15 (1) (1967) 52–60.

[32] D. Geman, S. German, C. Graffigne, P. Dong, Boundary detection by constrained optimization, IEEE Trans. Pattern Anal. Mach. Intell. 12 (7) (1990) 609–628. [33] A.K. Jain, On an estimate of the Bhattacharyya distance, IEEE Trans. Syst. Man Cybern. 6 (11) (1976) 763–766. [34] A. Djouadi, O. Snorrason, F. Garber, The quality of training sample estimates of the Bhattacharyya coefficient, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1) (1990) 92–97. [35] M.S. Lew, Principles of Visual Information Retrieval, Springer, New York, 2001. [36] F. Porikli, Integral histogram: a fast way to extract histograms in Cartesian spaces, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2005, pp. 829–836. [37] F. Porikli, O. Tuzel, P. Meer, Covariance tracking using model update based on Lie algebra, in: Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), 2006, pp. 728–735. [38] H. Ling, K. Okada, EMD-L1: an efficient and robust algorithm for comparing histogram-based descriptors, in: Proceedings of ECCV, 2006, pp. 330–343. [39] /http://vision.ucla.edu/  hbling/main.htmS. [40] E. Maggio, A. Cavallaro, Hybrid particle filter and mean shift tracker with adaptive transition model, in: Proceedings of the IEEE Acoustic, Speech and Signal Processing (ICASSP), 2005, pp. 221–224. [41] /http://groups.inf.ed.ac.uk/vision/caviar/caviardata1S. [42] /http://vision.stanford.edu/  birch/headtracker/seqS. [43] /http://iris.usc.edu/  icohen/projects/uav/index.xmlS.

About the Author—ANBANG YAO received his B.S. degree from Nanjing University of Science and Technology, China, in 2002 and received his M.S. degree from Tsinghua University, China, in 2005. He is currently a Ph.D. candidate of the Department of Electronic Engineering at Tsinghua University. His research interests include object recognition, object tracking and feature selection, etc.

About the Author—GUIJIN WANG received the B.S. and Ph.D. degrees (with honor) from the department of Electronic Engineering, Tsinghua University, China, in 1998 and 2003, respectively. He is currently an associate professor of the department of Electronic Engineering at Tsinghua University.

About the Author—XINGGANG LIN received the Ph.D. degree and M.S. degree both in information science from Kyoto University, Japan, in 1986 and 1982, respectively, and the B.S. degree in Electronic Engineering from Tsinghua University, China, in 1970. He joined the Department of Electronic Engineering at Tsinghua University, Beijing, China in 1986 where he has been a full professor since 1990.

About the Author—XIUJUAN CHAI received her M.S. and Ph.D. degrees in computer science from Harbin Institute of Technology, China, in 2002 and 2007, respectively. She is currently a Post-doctor in Nokia Research Center, Beijing, China. Her research interests include pattern recognition, image processing, face recognition, gesture analysis and human computer interaction.