An object tracking method based on guided filter for night fusion image

An object tracking method based on guided filter for night fusion image

Accepted Manuscript An object tracking method based on guided filter for night fusion image Xiaoyan Qian, Yuedong Wang, Lei Han PII: DOI: Reference: ...

1MB Sizes 47 Downloads 194 Views

Accepted Manuscript An object tracking method based on guided filter for night fusion image Xiaoyan Qian, Yuedong Wang, Lei Han PII: DOI: Reference:

S1350-4495(15)00268-6 http://dx.doi.org/10.1016/j.infrared.2015.11.005 INFPHY 1899

To appear in:

Infrared Physics & Technology

Received Date:

25 June 2015

Please cite this article as: X. Qian, Y. Wang, L. Han, An object tracking method based on guided filter for night fusion image, Infrared Physics & Technology (2015), doi: http://dx.doi.org/10.1016/j.infrared.2015.11.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

An object tracking method based on guided filter for night fusion image XiaoyanQian, Yuedong Wang, Lei Han

College of Civil Aviation

Nanjing University of Aeronautics and Astronautics,210016, Nanjing, China Abstract Online object tracking is a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel online object tracking with guided image filter for accurate and robust night fusion image tracking. Firstly, frame difference is applied to produce the coarse target, which helps to generate observation models. Under the restriction of these models and local source image, guided filter generates sufficient and accurate foreground target. Then accurate boundaries of the target can be extracted from detection results. Finally timely updating for observation models help to avoid tracking shift. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-art methods. Keyword: target tracking; guided filter; color fusion image; observation model

1. Introduction

detection and tracking and this is a

Object tracking is a fundamental problem

promising field [4]. Image fusion aims to

with wide application and a rich literature. It

integrate multiple images derived from

is significant for applications such as airport

different sensors into a composite image

security, construction site safety, patient

that is more suitable for the purposes of

safety and hospital asset management etc.

human

[1-3]. However, variable lighting, quick

computer-processing tasks [5].

moving and random occlusions present

visual

perception

or

Recently, more and more researchers are

difficulties for real-time surveillance, which

interested

in

using

tend to cause erroneous object detection and

different sensors to improve the tracking

trajectories. Wang et al proposed that

performance

combining the advantages of different visual

pixel-, feature- and decision-level fusion

sensors can improve the robustness of object

techniques for tracking. Pixel-level method

[1,5-10].

information

They

from

developed

works in the spatial domain or in the

be fuse for object tracking. Decision-level

transform domain. Image fusion at pixel

methods get the tracking results separately

level amounts to integration of low-level

in each source image and then use the

information and the fused image can

outputs of initial object tracking as inputs to

maintain the rich color of visual image and

fusion algorithm to produce the object state.

meanwhile pop out the target in the infrared

For example, Conaire et al [10] proposed a

image [1]. Some researchers improved the

framework that can efficiently combine

tracking performance using the appearance

features for robust tracking based on fusing

features from the fused results [6]. Qian et al

the outputs of multiple spatiogram trackers.

[5]proposed an object observation model

The framework allows the features to be

based on 2D color histogram and gave the

split arbitrarily between the trackers, as well

local matting tracking scheme for the night

as providing the flexibility to add, remove

fused images. The feature-level algorithms

or dynamically weight features.

typically calculate features from each image

In this paper, we focus on object

and fuse their properties to realize object

tracking for pixel-level fused images. A

tracking [4,7]. Zhao et al.[7] proposed an

tracking method typically consists of three

object tracking method based on infrared

components: an observation model (e.g.

and visible dual-channel video. It extracted

contours[11] and histograms of oriented

the Hue, Saturation and value color features

gradients[12,13]),

in visible image and used meanshift to

likelihood of an observed image patch

estimate the object location in the visible

belonging to the object class; a dynamic

image. The contour feature in infrared

model which aims to describe the states of

image realized accurate tracking [8,9]. Liu

an object over time (e.g., Kalman filter[14]

et al fused tracking in color and infrared

and particle filter[15,16]); and a search

images using joint sparse representation [9].

strategy for finding the likely states in the

In this paper, a similarity induced by joint

current frame ( e.g. sliding window [17] and

sparse

mean shift[18]). In this paper, we propose a

representation

is

designed

to

algorithm

which

using

evaluates

a

the

construct the likelihood function of particle

robust

generative

filter tracker so that the color visual

appearance model that considers the effects

spectrum and thermal spectrum images can

of occlusion to alleviate tracking drift.

In

order

to

effective

contour matching alone may not achieve

observation models for object tracking,

good results since the useful information

early works tended to construct the model

within the target boundary is not taken into

by describing the target itself [19], while

account. In [5], a matting tracking method

recently the adoption of context information

was

has become very popular [20]. Although

segmentation, matting can exploit the linear

these methods more or less suffer from

compositing equations in the alpha channel

inaccuracy in the estimation of foreground

instead of directly handling the complexity

and background which will cause tracking

in color image. Therefore, it may achieve

shift, they motivate us to design an accurate

better

observation model which can capture the

performance. In addition, the adaptive

information

its

appearance model automatically generates

To focus on the

scribbles in each frame, which makes the

from

neighboring context.

develop

the

target

and

proposed.

Compared

with

foreground/background

of

image

separation

alleviation of the aftereffects caused by

performance

foreground/background

foreground/background labeling errors, an

separation only rely on the scribble. Their

accurate boundary of the target could mostly

model adaptation largely excludes the

reduce such errors. There are several

ambiguity of foreground and background,

tracking methods that try to obtain boundary

thus these methods significantly alleviate

of the foreground [5, 21, 22].Tracking using

the drift problem in tracking.

active contours is one way to extract and

Benefiting from the matting tracking,

track the object boundaries [23]. However,

we propose a robust generative tracking

active contour tracking heavily relies on

algorithm based on guided image filter in

curve matching and is not designed for

this paper. Different from matting tracking,

complex shape deformation; therefore, it

the detection of foreground target can be

cannot handle large deformation. Image

produced by filtering a raw appearance

segmentation is another direct and popular

scribble under the guidance of the source

solution to separate the foreground from the

image. There are three advantages for

background [24]. In [21], a shape-based

guided image filter to be applied into

method was proposed to match the contour

tracking. One is that the time complexity is

of a prior shape to the current image, but

independent of the window radius, which

allows us to select random kernel size; the second is that it avoids solving the matting

2. Guided image filter Guided

image

filter

is

an

Laplacian matrix, which will bring faster

edge-preserving smoothing filter. It avoids

tracking. The last one is that this filter can

the gradient reversal artifacts that may

well

edge

appear in detail enhancement. The key

information, which allows us to get real

assumption of the guided filter is a local

contour. So object scaling and rotation can

linear model between the guidance I and

maintain

the

guidance’s

be handled by obtaining the accurate and robust

boundary.

In

addition,

our

discriminative model adaptation largely excludes the ambiguity of foreground and background, thus significantly alleviating the drift problem in tracking. The adaptive appearance

model

can

handle

partial

occlusion and other challenging factors. Experiments and evaluations on fusion image sequences bear out that the proposed algorithm is efficient and effective for robust object tracking. Fig.1 gives the flow

the filter output O :

Oi  ak Ii  bk , i  wk

(1)

Where ( ak , bk ) are some linear coefficients assumed to be constant in a window wk centered at the pixel k . This local linear model ensure that O has an edge only if I has an edge, because O  aI . By minimizing the difference between O and the filter input P , the linear coefficients are determined by:

chart of our algorithm. ak 

1 w



iwk

I i Pi  k pk

 k2   bk  pk  ak k

(2)

 k and  k2

are the mean and

Here,

w variance of I in k , w is the number of pixels in

wk

, and pk 

1 w



iwk

Pi is the

w mean of P in k . Fig.1 The flow chart of our algorithm

The relationship among I , P and O can be rewritten in the form of image filter like

the following:

3. Object tracking via guided filter

Oi   j Wij ( I ) Pj

In this section, we propose an algorithm (3)

for object tracking in the fused sequences

Here, Wij ( I ) is the kernel weight which

includes construction of the observation

can be explicitly expressed by:

Wij ( I ) 

1 w

2



(1 

( I i  k )( I j  k )

k :( i , j )wk

with guided filter. The main content

 k2  

model, target detection from guided filter

)

and the update of the observed model. (4)

Compared with the closed-form solution

3.1 Observation model Since

color

a

useful

foreground

clue from

to

to matting [22], we find that the elements of

discriminate

the matting Laplacian matrix L can be

background in the color fused images, we

directly given by the guided filter kernel

select the most discriminative colors for the

weight:

foreground and its surrounding background

Lij  w ( ij  wij )

the

is

the

respectively as the observation model. (5)

Where  ij is the Kronecker delta. So if

Given a frame with known foreground and its background (the target field is initialized

there is a reasonably good guess of the matte, we can run guided filtering process to produce a fine alpha matte just like Laplacian matting. By simple boundary extraction, the opacity map can give accurate and whole detection of tracking object.

fit guided filter naturally into the tracking framework by filling the gaps including constructing the appearance model, giving search

automatically

two frames), we first get the 2D color histogram of the foreground and the histogram of the background in Lab color space. Lab model is a perceptually uniform color space and there is little correlation between the axes; so different operations to

Just like matting tracking, we also can

the

by frame difference algorithm at the first

strategy, from

generating the

performing model updating.

model

matte and

different color channels can be applied with confidence that undesirable cross-channel artifacts will not occur. In addition, Lab is the most complete color model used conventionally to describe all the color visible to the human eye. There is no losing when an image is processed in Lab space

which has been applied into color image

each other.

fusion successfully. Thus we can get

3.2 Target locating from guided filter

accurate

colors

To make guided filter work well in the

respectively for the foreground and its

object tracking, we need scribble estimation

background. Then the log-likelihood ratio of

as an input which includes false background

these

and foreground in the current frame.

and

two

discriminate

histograms

can

be

easily

calculated with respect to each bin:

Supposing that the observation model has been updated at the former frame, the scribble estimation at the current frame will be automatically generated as following:

(5) Where ratio of the

denotes the log-likelihood bin and

are the

number of intensity levels on channels.

color

are the 2D color

For each pixel , if its color

, this

pixel will be marked as foregroundand the value of this pixel is labeled as 1; If its color

, this pixel will be

histograms of the foreground and the

looked as background and be set as 0.

background. is very small constant to avoid

Then the guided filter produces fine

infinity when

approach 0.

foreground detection from the scribble

It is obvious that when one color mainly

estimation. To accelerate the detecting speed,

appears in the foreground and rarely in its

we use the gray image as a guided image

surrounding background, its L value would

which has the same local field as the current

be high, and vice versa. So we consider (i, j)

estimation. Under the guidance of this gray

color bin as a distinctive color for the

image, the fine foreground can be produced

foreground if L (i, j) is larger than a

from the estimation:

threshold. Otherwise the color is regarded as

Oi  ak Ii  bk , i  wk

a distinctive color for the background. In this way, two discriminative color lists of the foreground and background are defined respectively (denoted by

and

) which

contain the most distinctive colors against

ak 

1 w



iwk

(6)

I i Pi  k pk

 k2   bk  pk  ak k

(7)

Where O is the fine output, I denotes the

local gray source image and P is the

histograms for the new background and

scribble estimation.

foreground and then extract discriminative

3.3 Boundary extraction

colors for them using the log-likelihood

An accurate boundary that clearly divides

ratio of these two histograms, as introduced

the target from the background can alleviate

in

drift conducted in the succeeding tracking

foreground color Cit in the current frame,

modules. The guided result is a continuous map of foreground. The value near the

Section

3.1.

For

remove such ambiguity we set a threshold T

to cut this map. Supposing that

i

is

the value of pi ( pi is a pixel in the current local field): if

i

T , then pi is

marked as the foreground and

i

255 ;

else it is regarded as the background and i

0 . With the refined field, we can use

simple edge detection operation such as Sobel to extract accurate boundary. 3.4 Update of observation model During tracking, the background color may largely change, while the foreground may also vary due to deformation and occlusion. Therefore, the observation model

extracted

we compare it with each color in the former frame’s

color

lists

C tf

boundary of the target are hardly 0 or 1 but some values between them. Therefore, to

each

1

and

Cbt

1

according to three cases: If Cit

Cbt

1

,then Cit

is no longer the

discriminative color for the background and thus will be removed from the current Cbt (this is initialized by Cbt 1 ).; If Cit

Cbt 1 and Cit

C tf 1 , no update will

be performed. If Cit

Cbt 1 and Cit

C tf 1 , this color will

be considered as a new discriminative color for the foreground and will be added to the current C tf (this is initialized by C tf 1 ). Similarly, we compare the new extracted background discriminative color with C tf

1

and Cbt 1 . After adaptation, the two color

should be updated to remove invalid colors

lists Cbt and C tf contain the newest colors

and add new discriminative colors.

as well as maintaining previous color

According to the location of the target at the current frame, we recalculate the 2D

information that is still valid. 4. Experiments

The proposed algorithm is implemented

and partial occlusion. Our method always

in MATLAB which runs on a Pentium

keeps an accurate boundary of the target and

2.4GHz Dual Core PC with 4G memory.

can cut the body from the background (the

During tracking, if the size of the target is

last line). In the second sequence, the

, the surrounding region with size of

camera keeps moving from frame 7. Mean

1/16

is considered as its local

shift

and

particle

filter

suffer

from

neighboring background. We use 3 image

unreliable matching and lose track after

fusion sequences produced by our former

frame7. Comparing with mean-shift tracking,

work in the experiments.

contour tracking is more robust. But there

We compare our tracking method with several

bottoms-up

methods

Simple contour detection causes failure

including mean-shift tracking [14], contour

tracking. While matting method and guided

tracking [23] and matting tracking [5]. Some

tracking can still successfully handle this

results are shown in Figs.2 and 3. In

complex situation and accurately cuts the

experiment1,

large

body from its background. That’s because

deformation and occlusion in some frames.

these two methods involve the information

Mean-shift tracking will lose hot target after

of the former frame into the current frame.

Frame17 when the target moves in a large

The corresponding source frame also can

range (seen the first line). The other three

help to detect the target.

the

tracking

are other contours appearing from frame 13.

pedestrian

has

tracking methods all can handle deformation

Frame12

Frame 13

Frame 14

Frame 15

Frame 16

Frame 17

Frame 18

Frame 19

Fig.2 From top to bottom: tracking results by mean-shift method, active contour tracking, matting tracking and our method

Frame7

Frame 8

Frame 9

Frame10

Frame 11

Frame 12

Frame 13

Frame 14

Fig.3 From top to bottom: tracking results by mean-shift method, active contour tracking, matting tracking and our method

Performance evaluation is an important issue that requires sound criteria in order to fairly assess the strength of tracking algorithms. Quantitative evaluation of object tracking typically involves computing the difference between the predicated and the ground truth center locations, as well their average values. We define the Euclidean distance as the center difference which is described by :

(9) Here,

is the number of the frames;

is the center difference of ith frame. Figs.4 and 5 give the center difference for Figs.2 and 3. It can be seen that the four methods all can get correct tracking if the target move in a small range and the camera keeps still. But once the target move in a large range which can be found in Fig.2, mean-shift tracking will cause terrible position error. The other three methods still

(8)

Where

denotes

the

can get small error because they can get effective appearance information including contour and color in their searching field.

center difference; (

) and The mean error values are 22.9988, 0.9290,

(

) are the centers of current

3.1720 and 2.3400 separately for the four

tracking and ground truth. Their average

methods. In Fig.3, when the camera keeps

value

moving from Frame7, the position error

difference like:

is defined by the mean

brought by mean-shift method rises rapidly because of losing the target. And from frame

13, contour tracking begins to detect wrong

when the object moves in a large range and

contour and loses the target gradually, so the

the camera keeps shifting. Comparing with

position error become bigger and bigger. On

the other three algorithms, contour tracking

the contrary, our method and matting

sometimes has the best performance at the

tracking get stable and small position error.

cost of running speed. Fortunately, our

The mean error values separately are

algorithm not only has robust tracking result

52.5169, 12.1254, 3.1117 and2.7156.

just like matting tracking but also has a

On the other hand, the tracking overlap

good speed.

rate indicates stability of each algorithm as it takes the size and pose of the target object into account. Given the tracking result of each frame

and the corresponding

ground truth

, the overlap rate is defined

by the PASCAL VOC [25] criterion: (10)

An

object

is

regarded

as

being

successfully tracking when the rate is above

(a) UN Camp

0.5. Fig.5 shows the overlap rates of each tracking algorithm for all the sequences. The average overlap rates for Figs.2 and 3 are separately 0.3641, 0.8807, 0.7047, 0.9048 and 0.1778, 0.6888, 0.8006, 0.8483. Overall, our tracker performs favorably against the other algorithms. In addition, Table 1 gives running time for the four algorithms. Although mean-shift (b) Trees

tracking is fastest, its robustness is worst

Fig.4 Quantitative comparisonof center difference

Table 1 Running time of different algorithms Mean-shift

Contour tracking

Matting tracking

Guided tracking

Fig.1

1.6968

20.8397

5.9646

3.1814

Fig.2

1.3227

22.9231

4.1082

2.8151

state-of-art algorithms. In future work, we plan to do further research on tracking models and model updating schemes in order to make our algorithm suitable for multiple object tracking. Acknowledgements This study is supported by China Postdoctoral Science Foundation (Grant No. 20110491415) and NSFC Peiyu Funds of (a) UN Camp

NUAA(NN2012049). References [1] Ms. Mukta V. Parvatikar, Gargi S. Phadke. Comparative study of different image fusion techniques. International Journal of Scientific Engineering and Technology, 2014,3(4):375-379 [2] K.Q. Huang, X.T. Chen, Y.F. Kang, T.N. Tan. Intelligent visual surveillance: a review. Chinese (b) Trees

Journal of Computers, 2014,37(49):1:-28

Fig.5 Quantitative comparison of overlapping core

5. Conclusions

[3] Liu Huaping, Sun Fuchun. Fusion tracking in

This paper presents a robust tracking

color and infrared images using joint sparse

algorithm via guided filter. In this work, we

representation. Science china Information Science,

explicitly

take

2012,55(3):590-599

deformation

and

partial camera

occlusion, motion

into

[4] J. Wang, D. Chen, S. Li, Y. Yang. Infrared and

account for appearance update. Experiments

visible fusion for robust object tracking via local

on

discrimination analysis. Journal of

challenging

image

sequences

demonstrate that our tracking algorithm

Computer-Aided Design& Computer Graphics,

performs

2014,26(6):870-878

favorably

against

several

[5] X.Y. Qian, L. Han, Y.Y. Cheng. An object

[13] F. Tang, S. Brennan, Q. Zhao, and H.

tracking method based on local matting for night

Tao.Co-Tracking Using SemiSupervised Support

fusion

Vector Machines.CVPR, 2007.

image.

Infrared

Physics&Technology,

2014,67:455-461

[14] D. Comaniciu, V. R. Member, and P. Meer.

[6] A L Chan, S R Schnells. Fusing concurrent

Kernel-based object tracking. IEEE Transactions

visible and infrared videos for improved tracking

on Pattern Analysis and Machine Intelligence,

performance. Optical Engineering, 2013, 52(1).

2003, 25(5):564–575, 2003

[7] G Zhao, Y Bo, M Yin. A n object tracking

[15] Y. Li, H. Ai, T. Yamashita, S. Lao, and M.

method based on infrared and visible dual-channel

Kawade. Tracking inLow Frame Rate Video: A

video . Journal of Electronics& Information

cascade particle filter with discriminativeobservers

Technology,2012,34(3):529-534

of different life spans. IEEE Transactions on

[8] M. Kass, A. Witkin, and D. Terzopoulos.

Pattern

Int’l J.

Snakes: Active Contour Models.

Analysisand

Machine

Intelligence,

30(10):1728–1740, 2008.

Computer Vision, 1998,1: 321-331, 1988.

[16]P.Perez, C.Hue, J.Vermaak, M. Gangnet.

[9] H P Liu ,F C Sun.Fusion tracking in color

Color-based probabilistic´tracking. In Proceedings

and

of European Conference on Computer Vision,2002:

infrared

images

using

representation . Science

China

joint

sparse

Information

661–675, 2002.

Sciences,2012,55(3):590-599

[17] H. Grabner and H. Bischof. On-line boosting

[10] Conaire C O, O ’Connor N E, Smeaton

and vision. In Proceedingsof IEEE Conference on

A . Thermal-visual feature fusion for object

Computer

tracking

pages260–267, 2006.

using

multiple

spatiogram

Vision

and

Pattern

Recognition,

trackers. Machine Vision and Application s,2008,

[18] D. Comaniciu, V. R. Member, and P. Meer.

19 (5/6 ):483- 494

Kernel-based object tracking. IEEE Transactions

[11] Y Wu, J Lim, MH Yang. Online object

on

tracking: a benchmark. 2013 IEEE Conference on

Intelligence,25(5):564–575, 2003.

Computer

[19]M. Kim, S. Kumar, V. Pavlovic, and H.A.

Vision

and

Pattern

Recognition,

Pattern

Analysis

and

Machine

Portland ,OR,USA, 2013:2411-2418

Rowley, “Face Tracking and Recognition with

[12] N. Dalal and B. Triggs. Histograms of

Visual Constraints in Real-World Videos,” Proc.

Oriented Gradients for HumanDetection.In CVPR,

IEEE

2005.

Recognition, 2008.

Conf.

Computer

Vision and Pattern

[20] Y. Wu and J. Fan, “Contextual Flow,” Proc.

Image Processing, 2001,10(2): 266-278

IEEE

[24] H Zhou, X Li, G Schaefer, M. EmreCelebi, P

Conf.

Computer

Vision and Pattern

Recognition, 2009

Miller. Mean shift based gradient vector flow for

[21] T. Schoenemann and D. Cremers, “Globally

image segmentation. Computer Vision and Image

Optimal Shape-Based Tracking in Real-Time,”

Understanding, 2013,117: 1004-1016

Proc. IEEE Conf. Computer Vision and Pattern

[25]M. Everingham, L. Van Gool, C. K. I.

Recognition, 2008.

Williams, J. Winn, and A. Zisserman. The

[22] J L Fan, X H Shen, Y Wu. Scribble tracker: a

PASCAL Visual Object Classes Challenge 2010

matting-based approach for robust tracking. IEEE

(VOC2010) Results.

Transaction on Pattern Analysis and Machine Intelligence, 2012,34(8):1633-1645 [23] Tony F. Chan, Luminita A. Vese. Active contours without edges. IEEE Transaction on

Highlights

1> Automatic guided image filter is applied into fusion image tracking. 2> Simple boundary extracted scheme and model updating make the track more accurate and robust. 3> This method is validated through subjective and objective metrics.