Accepted Manuscript An object tracking method based on guided filter for night fusion image Xiaoyan Qian, Yuedong Wang, Lei Han PII: DOI: Reference:
S1350-4495(15)00268-6 http://dx.doi.org/10.1016/j.infrared.2015.11.005 INFPHY 1899
To appear in:
Infrared Physics & Technology
Received Date:
25 June 2015
Please cite this article as: X. Qian, Y. Wang, L. Han, An object tracking method based on guided filter for night fusion image, Infrared Physics & Technology (2015), doi: http://dx.doi.org/10.1016/j.infrared.2015.11.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
An object tracking method based on guided filter for night fusion image XiaoyanQian, Yuedong Wang, Lei Han
College of Civil Aviation
Nanjing University of Aeronautics and Astronautics,210016, Nanjing, China Abstract Online object tracking is a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel online object tracking with guided image filter for accurate and robust night fusion image tracking. Firstly, frame difference is applied to produce the coarse target, which helps to generate observation models. Under the restriction of these models and local source image, guided filter generates sufficient and accurate foreground target. Then accurate boundaries of the target can be extracted from detection results. Finally timely updating for observation models help to avoid tracking shift. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-art methods. Keyword: target tracking; guided filter; color fusion image; observation model
1. Introduction
detection and tracking and this is a
Object tracking is a fundamental problem
promising field [4]. Image fusion aims to
with wide application and a rich literature. It
integrate multiple images derived from
is significant for applications such as airport
different sensors into a composite image
security, construction site safety, patient
that is more suitable for the purposes of
safety and hospital asset management etc.
human
[1-3]. However, variable lighting, quick
computer-processing tasks [5].
moving and random occlusions present
visual
perception
or
Recently, more and more researchers are
difficulties for real-time surveillance, which
interested
in
using
tend to cause erroneous object detection and
different sensors to improve the tracking
trajectories. Wang et al proposed that
performance
combining the advantages of different visual
pixel-, feature- and decision-level fusion
sensors can improve the robustness of object
techniques for tracking. Pixel-level method
[1,5-10].
information
They
from
developed
works in the spatial domain or in the
be fuse for object tracking. Decision-level
transform domain. Image fusion at pixel
methods get the tracking results separately
level amounts to integration of low-level
in each source image and then use the
information and the fused image can
outputs of initial object tracking as inputs to
maintain the rich color of visual image and
fusion algorithm to produce the object state.
meanwhile pop out the target in the infrared
For example, Conaire et al [10] proposed a
image [1]. Some researchers improved the
framework that can efficiently combine
tracking performance using the appearance
features for robust tracking based on fusing
features from the fused results [6]. Qian et al
the outputs of multiple spatiogram trackers.
[5]proposed an object observation model
The framework allows the features to be
based on 2D color histogram and gave the
split arbitrarily between the trackers, as well
local matting tracking scheme for the night
as providing the flexibility to add, remove
fused images. The feature-level algorithms
or dynamically weight features.
typically calculate features from each image
In this paper, we focus on object
and fuse their properties to realize object
tracking for pixel-level fused images. A
tracking [4,7]. Zhao et al.[7] proposed an
tracking method typically consists of three
object tracking method based on infrared
components: an observation model (e.g.
and visible dual-channel video. It extracted
contours[11] and histograms of oriented
the Hue, Saturation and value color features
gradients[12,13]),
in visible image and used meanshift to
likelihood of an observed image patch
estimate the object location in the visible
belonging to the object class; a dynamic
image. The contour feature in infrared
model which aims to describe the states of
image realized accurate tracking [8,9]. Liu
an object over time (e.g., Kalman filter[14]
et al fused tracking in color and infrared
and particle filter[15,16]); and a search
images using joint sparse representation [9].
strategy for finding the likely states in the
In this paper, a similarity induced by joint
current frame ( e.g. sliding window [17] and
sparse
mean shift[18]). In this paper, we propose a
representation
is
designed
to
algorithm
which
using
evaluates
a
the
construct the likelihood function of particle
robust
generative
filter tracker so that the color visual
appearance model that considers the effects
spectrum and thermal spectrum images can
of occlusion to alleviate tracking drift.
In
order
to
effective
contour matching alone may not achieve
observation models for object tracking,
good results since the useful information
early works tended to construct the model
within the target boundary is not taken into
by describing the target itself [19], while
account. In [5], a matting tracking method
recently the adoption of context information
was
has become very popular [20]. Although
segmentation, matting can exploit the linear
these methods more or less suffer from
compositing equations in the alpha channel
inaccuracy in the estimation of foreground
instead of directly handling the complexity
and background which will cause tracking
in color image. Therefore, it may achieve
shift, they motivate us to design an accurate
better
observation model which can capture the
performance. In addition, the adaptive
information
its
appearance model automatically generates
To focus on the
scribbles in each frame, which makes the
from
neighboring context.
develop
the
target
and
proposed.
Compared
with
foreground/background
of
image
separation
alleviation of the aftereffects caused by
performance
foreground/background
foreground/background labeling errors, an
separation only rely on the scribble. Their
accurate boundary of the target could mostly
model adaptation largely excludes the
reduce such errors. There are several
ambiguity of foreground and background,
tracking methods that try to obtain boundary
thus these methods significantly alleviate
of the foreground [5, 21, 22].Tracking using
the drift problem in tracking.
active contours is one way to extract and
Benefiting from the matting tracking,
track the object boundaries [23]. However,
we propose a robust generative tracking
active contour tracking heavily relies on
algorithm based on guided image filter in
curve matching and is not designed for
this paper. Different from matting tracking,
complex shape deformation; therefore, it
the detection of foreground target can be
cannot handle large deformation. Image
produced by filtering a raw appearance
segmentation is another direct and popular
scribble under the guidance of the source
solution to separate the foreground from the
image. There are three advantages for
background [24]. In [21], a shape-based
guided image filter to be applied into
method was proposed to match the contour
tracking. One is that the time complexity is
of a prior shape to the current image, but
independent of the window radius, which
allows us to select random kernel size; the second is that it avoids solving the matting
2. Guided image filter Guided
image
filter
is
an
Laplacian matrix, which will bring faster
edge-preserving smoothing filter. It avoids
tracking. The last one is that this filter can
the gradient reversal artifacts that may
well
edge
appear in detail enhancement. The key
information, which allows us to get real
assumption of the guided filter is a local
contour. So object scaling and rotation can
linear model between the guidance I and
maintain
the
guidance’s
be handled by obtaining the accurate and robust
boundary.
In
addition,
our
discriminative model adaptation largely excludes the ambiguity of foreground and background, thus significantly alleviating the drift problem in tracking. The adaptive appearance
model
can
handle
partial
occlusion and other challenging factors. Experiments and evaluations on fusion image sequences bear out that the proposed algorithm is efficient and effective for robust object tracking. Fig.1 gives the flow
the filter output O :
Oi ak Ii bk , i wk
(1)
Where ( ak , bk ) are some linear coefficients assumed to be constant in a window wk centered at the pixel k . This local linear model ensure that O has an edge only if I has an edge, because O aI . By minimizing the difference between O and the filter input P , the linear coefficients are determined by:
chart of our algorithm. ak
1 w
iwk
I i Pi k pk
k2 bk pk ak k
(2)
k and k2
are the mean and
Here,
w variance of I in k , w is the number of pixels in
wk
, and pk
1 w
iwk
Pi is the
w mean of P in k . Fig.1 The flow chart of our algorithm
The relationship among I , P and O can be rewritten in the form of image filter like
the following:
3. Object tracking via guided filter
Oi j Wij ( I ) Pj
In this section, we propose an algorithm (3)
for object tracking in the fused sequences
Here, Wij ( I ) is the kernel weight which
includes construction of the observation
can be explicitly expressed by:
Wij ( I )
1 w
2
(1
( I i k )( I j k )
k :( i , j )wk
with guided filter. The main content
k2
model, target detection from guided filter
)
and the update of the observed model. (4)
Compared with the closed-form solution
3.1 Observation model Since
color
a
useful
foreground
clue from
to
to matting [22], we find that the elements of
discriminate
the matting Laplacian matrix L can be
background in the color fused images, we
directly given by the guided filter kernel
select the most discriminative colors for the
weight:
foreground and its surrounding background
Lij w ( ij wij )
the
is
the
respectively as the observation model. (5)
Where ij is the Kronecker delta. So if
Given a frame with known foreground and its background (the target field is initialized
there is a reasonably good guess of the matte, we can run guided filtering process to produce a fine alpha matte just like Laplacian matting. By simple boundary extraction, the opacity map can give accurate and whole detection of tracking object.
fit guided filter naturally into the tracking framework by filling the gaps including constructing the appearance model, giving search
automatically
two frames), we first get the 2D color histogram of the foreground and the histogram of the background in Lab color space. Lab model is a perceptually uniform color space and there is little correlation between the axes; so different operations to
Just like matting tracking, we also can
the
by frame difference algorithm at the first
strategy, from
generating the
performing model updating.
model
matte and
different color channels can be applied with confidence that undesirable cross-channel artifacts will not occur. In addition, Lab is the most complete color model used conventionally to describe all the color visible to the human eye. There is no losing when an image is processed in Lab space
which has been applied into color image
each other.
fusion successfully. Thus we can get
3.2 Target locating from guided filter
accurate
colors
To make guided filter work well in the
respectively for the foreground and its
object tracking, we need scribble estimation
background. Then the log-likelihood ratio of
as an input which includes false background
these
and foreground in the current frame.
and
two
discriminate
histograms
can
be
easily
calculated with respect to each bin:
Supposing that the observation model has been updated at the former frame, the scribble estimation at the current frame will be automatically generated as following:
(5) Where ratio of the
denotes the log-likelihood bin and
are the
number of intensity levels on channels.
color
are the 2D color
For each pixel , if its color
, this
pixel will be marked as foregroundand the value of this pixel is labeled as 1; If its color
, this pixel will be
histograms of the foreground and the
looked as background and be set as 0.
background. is very small constant to avoid
Then the guided filter produces fine
infinity when
approach 0.
foreground detection from the scribble
It is obvious that when one color mainly
estimation. To accelerate the detecting speed,
appears in the foreground and rarely in its
we use the gray image as a guided image
surrounding background, its L value would
which has the same local field as the current
be high, and vice versa. So we consider (i, j)
estimation. Under the guidance of this gray
color bin as a distinctive color for the
image, the fine foreground can be produced
foreground if L (i, j) is larger than a
from the estimation:
threshold. Otherwise the color is regarded as
Oi ak Ii bk , i wk
a distinctive color for the background. In this way, two discriminative color lists of the foreground and background are defined respectively (denoted by
and
) which
contain the most distinctive colors against
ak
1 w
iwk
(6)
I i Pi k pk
k2 bk pk ak k
(7)
Where O is the fine output, I denotes the
local gray source image and P is the
histograms for the new background and
scribble estimation.
foreground and then extract discriminative
3.3 Boundary extraction
colors for them using the log-likelihood
An accurate boundary that clearly divides
ratio of these two histograms, as introduced
the target from the background can alleviate
in
drift conducted in the succeeding tracking
foreground color Cit in the current frame,
modules. The guided result is a continuous map of foreground. The value near the
Section
3.1.
For
remove such ambiguity we set a threshold T
to cut this map. Supposing that
i
is
the value of pi ( pi is a pixel in the current local field): if
i
T , then pi is
marked as the foreground and
i
255 ;
else it is regarded as the background and i
0 . With the refined field, we can use
simple edge detection operation such as Sobel to extract accurate boundary. 3.4 Update of observation model During tracking, the background color may largely change, while the foreground may also vary due to deformation and occlusion. Therefore, the observation model
extracted
we compare it with each color in the former frame’s
color
lists
C tf
boundary of the target are hardly 0 or 1 but some values between them. Therefore, to
each
1
and
Cbt
1
according to three cases: If Cit
Cbt
1
,then Cit
is no longer the
discriminative color for the background and thus will be removed from the current Cbt (this is initialized by Cbt 1 ).; If Cit
Cbt 1 and Cit
C tf 1 , no update will
be performed. If Cit
Cbt 1 and Cit
C tf 1 , this color will
be considered as a new discriminative color for the foreground and will be added to the current C tf (this is initialized by C tf 1 ). Similarly, we compare the new extracted background discriminative color with C tf
1
and Cbt 1 . After adaptation, the two color
should be updated to remove invalid colors
lists Cbt and C tf contain the newest colors
and add new discriminative colors.
as well as maintaining previous color
According to the location of the target at the current frame, we recalculate the 2D
information that is still valid. 4. Experiments
The proposed algorithm is implemented
and partial occlusion. Our method always
in MATLAB which runs on a Pentium
keeps an accurate boundary of the target and
2.4GHz Dual Core PC with 4G memory.
can cut the body from the background (the
During tracking, if the size of the target is
last line). In the second sequence, the
, the surrounding region with size of
camera keeps moving from frame 7. Mean
1/16
is considered as its local
shift
and
particle
filter
suffer
from
neighboring background. We use 3 image
unreliable matching and lose track after
fusion sequences produced by our former
frame7. Comparing with mean-shift tracking,
work in the experiments.
contour tracking is more robust. But there
We compare our tracking method with several
bottoms-up
methods
Simple contour detection causes failure
including mean-shift tracking [14], contour
tracking. While matting method and guided
tracking [23] and matting tracking [5]. Some
tracking can still successfully handle this
results are shown in Figs.2 and 3. In
complex situation and accurately cuts the
experiment1,
large
body from its background. That’s because
deformation and occlusion in some frames.
these two methods involve the information
Mean-shift tracking will lose hot target after
of the former frame into the current frame.
Frame17 when the target moves in a large
The corresponding source frame also can
range (seen the first line). The other three
help to detect the target.
the
tracking
are other contours appearing from frame 13.
pedestrian
has
tracking methods all can handle deformation
Frame12
Frame 13
Frame 14
Frame 15
Frame 16
Frame 17
Frame 18
Frame 19
Fig.2 From top to bottom: tracking results by mean-shift method, active contour tracking, matting tracking and our method
Frame7
Frame 8
Frame 9
Frame10
Frame 11
Frame 12
Frame 13
Frame 14
Fig.3 From top to bottom: tracking results by mean-shift method, active contour tracking, matting tracking and our method
Performance evaluation is an important issue that requires sound criteria in order to fairly assess the strength of tracking algorithms. Quantitative evaluation of object tracking typically involves computing the difference between the predicated and the ground truth center locations, as well their average values. We define the Euclidean distance as the center difference which is described by :
(9) Here,
is the number of the frames;
is the center difference of ith frame. Figs.4 and 5 give the center difference for Figs.2 and 3. It can be seen that the four methods all can get correct tracking if the target move in a small range and the camera keeps still. But once the target move in a large range which can be found in Fig.2, mean-shift tracking will cause terrible position error. The other three methods still
(8)
Where
denotes
the
can get small error because they can get effective appearance information including contour and color in their searching field.
center difference; (
) and The mean error values are 22.9988, 0.9290,
(
) are the centers of current
3.1720 and 2.3400 separately for the four
tracking and ground truth. Their average
methods. In Fig.3, when the camera keeps
value
moving from Frame7, the position error
difference like:
is defined by the mean
brought by mean-shift method rises rapidly because of losing the target. And from frame
13, contour tracking begins to detect wrong
when the object moves in a large range and
contour and loses the target gradually, so the
the camera keeps shifting. Comparing with
position error become bigger and bigger. On
the other three algorithms, contour tracking
the contrary, our method and matting
sometimes has the best performance at the
tracking get stable and small position error.
cost of running speed. Fortunately, our
The mean error values separately are
algorithm not only has robust tracking result
52.5169, 12.1254, 3.1117 and2.7156.
just like matting tracking but also has a
On the other hand, the tracking overlap
good speed.
rate indicates stability of each algorithm as it takes the size and pose of the target object into account. Given the tracking result of each frame
and the corresponding
ground truth
, the overlap rate is defined
by the PASCAL VOC [25] criterion: (10)
An
object
is
regarded
as
being
successfully tracking when the rate is above
(a) UN Camp
0.5. Fig.5 shows the overlap rates of each tracking algorithm for all the sequences. The average overlap rates for Figs.2 and 3 are separately 0.3641, 0.8807, 0.7047, 0.9048 and 0.1778, 0.6888, 0.8006, 0.8483. Overall, our tracker performs favorably against the other algorithms. In addition, Table 1 gives running time for the four algorithms. Although mean-shift (b) Trees
tracking is fastest, its robustness is worst
Fig.4 Quantitative comparisonof center difference
Table 1 Running time of different algorithms Mean-shift
Contour tracking
Matting tracking
Guided tracking
Fig.1
1.6968
20.8397
5.9646
3.1814
Fig.2
1.3227
22.9231
4.1082
2.8151
state-of-art algorithms. In future work, we plan to do further research on tracking models and model updating schemes in order to make our algorithm suitable for multiple object tracking. Acknowledgements This study is supported by China Postdoctoral Science Foundation (Grant No. 20110491415) and NSFC Peiyu Funds of (a) UN Camp
NUAA(NN2012049). References [1] Ms. Mukta V. Parvatikar, Gargi S. Phadke. Comparative study of different image fusion techniques. International Journal of Scientific Engineering and Technology, 2014,3(4):375-379 [2] K.Q. Huang, X.T. Chen, Y.F. Kang, T.N. Tan. Intelligent visual surveillance: a review. Chinese (b) Trees
Journal of Computers, 2014,37(49):1:-28
Fig.5 Quantitative comparison of overlapping core
5. Conclusions
[3] Liu Huaping, Sun Fuchun. Fusion tracking in
This paper presents a robust tracking
color and infrared images using joint sparse
algorithm via guided filter. In this work, we
representation. Science china Information Science,
explicitly
take
2012,55(3):590-599
deformation
and
partial camera
occlusion, motion
into
[4] J. Wang, D. Chen, S. Li, Y. Yang. Infrared and
account for appearance update. Experiments
visible fusion for robust object tracking via local
on
discrimination analysis. Journal of
challenging
image
sequences
demonstrate that our tracking algorithm
Computer-Aided Design& Computer Graphics,
performs
2014,26(6):870-878
favorably
against
several
[5] X.Y. Qian, L. Han, Y.Y. Cheng. An object
[13] F. Tang, S. Brennan, Q. Zhao, and H.
tracking method based on local matting for night
Tao.Co-Tracking Using SemiSupervised Support
fusion
Vector Machines.CVPR, 2007.
image.
Infrared
Physics&Technology,
2014,67:455-461
[14] D. Comaniciu, V. R. Member, and P. Meer.
[6] A L Chan, S R Schnells. Fusing concurrent
Kernel-based object tracking. IEEE Transactions
visible and infrared videos for improved tracking
on Pattern Analysis and Machine Intelligence,
performance. Optical Engineering, 2013, 52(1).
2003, 25(5):564–575, 2003
[7] G Zhao, Y Bo, M Yin. A n object tracking
[15] Y. Li, H. Ai, T. Yamashita, S. Lao, and M.
method based on infrared and visible dual-channel
Kawade. Tracking inLow Frame Rate Video: A
video . Journal of Electronics& Information
cascade particle filter with discriminativeobservers
Technology,2012,34(3):529-534
of different life spans. IEEE Transactions on
[8] M. Kass, A. Witkin, and D. Terzopoulos.
Pattern
Int’l J.
Snakes: Active Contour Models.
Analysisand
Machine
Intelligence,
30(10):1728–1740, 2008.
Computer Vision, 1998,1: 321-331, 1988.
[16]P.Perez, C.Hue, J.Vermaak, M. Gangnet.
[9] H P Liu ,F C Sun.Fusion tracking in color
Color-based probabilistic´tracking. In Proceedings
and
of European Conference on Computer Vision,2002:
infrared
images
using
representation . Science
China
joint
sparse
Information
661–675, 2002.
Sciences,2012,55(3):590-599
[17] H. Grabner and H. Bischof. On-line boosting
[10] Conaire C O, O ’Connor N E, Smeaton
and vision. In Proceedingsof IEEE Conference on
A . Thermal-visual feature fusion for object
Computer
tracking
pages260–267, 2006.
using
multiple
spatiogram
Vision
and
Pattern
Recognition,
trackers. Machine Vision and Application s,2008,
[18] D. Comaniciu, V. R. Member, and P. Meer.
19 (5/6 ):483- 494
Kernel-based object tracking. IEEE Transactions
[11] Y Wu, J Lim, MH Yang. Online object
on
tracking: a benchmark. 2013 IEEE Conference on
Intelligence,25(5):564–575, 2003.
Computer
[19]M. Kim, S. Kumar, V. Pavlovic, and H.A.
Vision
and
Pattern
Recognition,
Pattern
Analysis
and
Machine
Portland ,OR,USA, 2013:2411-2418
Rowley, “Face Tracking and Recognition with
[12] N. Dalal and B. Triggs. Histograms of
Visual Constraints in Real-World Videos,” Proc.
Oriented Gradients for HumanDetection.In CVPR,
IEEE
2005.
Recognition, 2008.
Conf.
Computer
Vision and Pattern
[20] Y. Wu and J. Fan, “Contextual Flow,” Proc.
Image Processing, 2001,10(2): 266-278
IEEE
[24] H Zhou, X Li, G Schaefer, M. EmreCelebi, P
Conf.
Computer
Vision and Pattern
Recognition, 2009
Miller. Mean shift based gradient vector flow for
[21] T. Schoenemann and D. Cremers, “Globally
image segmentation. Computer Vision and Image
Optimal Shape-Based Tracking in Real-Time,”
Understanding, 2013,117: 1004-1016
Proc. IEEE Conf. Computer Vision and Pattern
[25]M. Everingham, L. Van Gool, C. K. I.
Recognition, 2008.
Williams, J. Winn, and A. Zisserman. The
[22] J L Fan, X H Shen, Y Wu. Scribble tracker: a
PASCAL Visual Object Classes Challenge 2010
matting-based approach for robust tracking. IEEE
(VOC2010) Results.
Transaction on Pattern Analysis and Machine Intelligence, 2012,34(8):1633-1645 [23] Tony F. Chan, Luminita A. Vese. Active contours without edges. IEEE Transaction on
Highlights
1> Automatic guided image filter is applied into fusion image tracking. 2> Simple boundary extracted scheme and model updating make the track more accurate and robust. 3> This method is validated through subjective and objective metrics.