Object detection using multi-resolution mosaic in image sequences

Object detection using multi-resolution mosaic in image sequences

ARTICLE IN PRESS Signal Processing: Image Communication 20 (2005) 233–253 www.elsevier.com/locate/image Object detection using multi-resolution mosa...

1MB Sizes 0 Downloads 76 Views

ARTICLE IN PRESS

Signal Processing: Image Communication 20 (2005) 233–253 www.elsevier.com/locate/image

Object detection using multi-resolution mosaic in image sequences Ju-Hyun Cho, Seong-Dae Kim Department of Electrical Engineering and Computer Science, Division of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 373-1, Kusong-dong, Yusong-gu, Taejon, 305-701, Republic of Korea Received 24 May 2004; accepted 10 December 2004

Abstract Object detection in image sequences has a very important role in many applications such as surveillance systems, tracking and recognition systems, coding systems and so on. This paper proposes a unified framework for background subtraction, which is very popular algorithm for object detection in image sequences. And we propose an algorithm using spatio-temporal thresholding and truncated variable adaptation rate (TVAR) for object detection and background adaptation, respectively. Especially when the camera moves and zooms in on something to track the target, we generate multi-resolution mosaic which is made up of many background mosaics with different resolution, and use it for object detection. Some experimental results in various environments show that the averaged performance of the proposed algorithm is good. r 2005 Elsevier B.V. All rights reserved. Keywords: Object detection; Background subtraction; Spatio-temporal thresholding; Variable adaptation rate; Multi-resolution mosaic; Global illumination change

1. Introduction Object detection in image sequences has a very important role in many research areas which are related to computer vision. So there have been many researches to solve some problems in detecting objects in image sequences Corresponding author. Tel.: +82 42 869 5430;

fax: +82 42 869 8570. E-mail address: [email protected] (J.-H. Cho).

[3,7,9,11,16,18,19,20,21,24,25,26,27]. In general object detector should be robust to some noise and has to be adaptive to illumination changes in the real environments. It is also very important to detect object as fast as possible in real time applications. Background subtraction [3,7,9,11,18,19,24,25,27] which uses difference picture (DP) [3,7,9,11,16,18,19,20,21,24,25,27] between the current and the background image, is the most popular one because it satisfies almost all requirements for object detector. We describe

0923-5965/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2004.12.001

ARTICLE IN PRESS 234

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

a unified framework for background subtraction and propose an algorithm using spatio-temporal thresholding for object detection with spatiotemporal distance metric. When the camera moves to track the detected object, global motion estimation (GME) and global motion compensation (GMC) must be done first in general [1,2,4,10,12,15,22]. But in respect of object detection in image sequences, it is very useful to generate a background mosaic [15,22] and use it instead of wasting the past parts of the images. In case of visual surveillance system, camera motion is very regular and repeated, so the background mosaic will be more helpful for object detection. The word ‘background’ means that the mosaic doesn’t include any object. That situation is possible if we use the previous results of object detection. Another reason we use a background mosaic is because the following statement is true. GME using a background mosaic is more accurate than using the previous image because there is no induced error from moving object, and the accumulated GME error during the whole sequence is relatively small. But if the camera zooms in on the target to get higher resolution image, the loss of information cannot be avoided because the current image is warped and stitched to the reference background mosaic which is previously generated and has lower resolution than the current image. Actually the loss of information is very serious in respect of object detection. So we propose a method to generate multi-resolution mosaic, and use it for object detection. In generating multi-resolution mosaic, we’ll focus on another topic about GME under global illumination change (GIC). GME between two images has a lot of applications such as image registration [4,10,12], object detection and tracking, panorama mosaic generation and so on. Among the related algorithms we focus on optical flow based iterative parameter estimation method [1,2,15,22] which is known as the relatively good performance in some environments. This algorithm assumes that the intensity of a pixel doesn’t change along the motion trajectory, which is so called brightness consistency of optical flow equation. But this assumption is not true when

there are illumination changes between two images, which may be induced by light source, moving object, camera response and so on. Especially we are interested in GIC such as global motion (GM). To handle GIC, the previous popular algorithms use a simple linear GIC model as in (1) [1], but this is not accurate model in some real environments. I 2 ðx0 Þ ¼ aI 1 ðxÞ þ b;

(1) 0

where I 1 ðxÞ and I 2 ðx Þ are the intensities at the first and second image coordinate x ¼ ½x; yT and x0 ¼ ½x0 ; y0 T ; respectively, and GIC parameter ½a bT is constant between the two images. In this paper we propose a saturated linear GIC model using some assumptions in the image acquisition process, and estimate GM and GIC parameters at the same time to minimize a global cost function [1], which outperforms an independent optimization method. And also in order to get more accurate parameters in case of some outliers, for example moving objects, we use outlier rejection with two stop criteria during the iteration. 1.1. Overall system In this subsection we are going to describe the workflow of the proposed object detection system in brief. Overall system and the brief description of used symbols are shown as in Fig. 1 and Table 1, respectively. Multi-resolution mosaic is composed of many background mosaics with different resolution. The system starts from one background mosaic and generates another one having higher resolution if the resolution of the current image exceeds the maximum resolution of the multi-resolution mosaic. So the number of background mosaics is not to be determined in advance, but increases due to the resolution change of the current image. Each background mosaic in the multi-resolution mosaic has the mean and covariance images having the same dimension as the used feature. After initialization, that mean and covariance images are used to detect object and are updated with some criteria. The proposed system is a recursive one, so the previous result is the starting point of a new

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

235

Fig. 1. The proposed overall system.

iteration. We first estimate GM and GIC parameters between the previous corresponding mosaic and the current image. Here we need to explain the meaning of corresponding mosaic. ‘Corresponding’ means that it has the most similar resolution to that of the current image among all the mosaics in the multi-resolution mosaic. So the corresponding mosaic is suitable for object detection, and must be selected from the multi-resolution mosaic using the calculated representative resolution (RR). Here we decide RR using resolution descriptor (RD). RD is defined and estimated at each pixel in the reference background mosaic which has the lowest resolution. With the recalculated corresponding motion parameter mc and extracted feature Xðx; tÞ; we generate spatio-

temporal distance metric Distt ðx; tÞ and Dists ðx; tÞ: Before extracting the feature, we compensate for GIC in the current image with the estimated GIC parameter c. Of course, the mean and covariance images of the corresponding background mosaic, mt ðxc ; t  1Þ and Rt ðxc ; t  1Þ are also used to generate distance metric. After thresholding with spatio-temporal distance metric, object mask image is generated. Finally, the multiresolution mosaic is updated with some criteria using detection result. In Section 2 GME under GIC will be discussed and we will introduce multi-resolution mosaic in Section 3. In Section 4 we will propose spatio-temporal thresholding and TVAR for object detection and background adaptation,

ARTICLE IN PRESS 236

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

Table 1 Symbol descriptions in Fig. 1 which shows the proposed overall system Symbol

Description

x xpc xc xi Iðx; tÞ ~ tÞ Iðx; Xðx; tÞ Distt ðx; tÞ Dists ðx; tÞ mIt ðxpc ; t  1Þ mt ðxc ; t  1Þ Rt ðxc ; t  1Þ mt ðxi ; tÞ Rt ðxi ; tÞ c mpc mc mi1;i RR

Current image coordinate Previous corresponding background mosaic coordinate Corresponding background mosaic coordinate Each background mosaic coordinate with different resolution in the multi-resolution mosaic Intensity at t GIC compensated intensity at t Feature using intensity and gradient at the same time at t Temporal distance metric at t Spatial distance metric at t Intensity component in the temporal mean of previous corresponding background mosaic at t  1 Temporal mean of corresponding background mosaic at t  1 Temporal covariance of corresponding background mosaic at t  1 Temporal mean of each background mosaic with different resolution in the multi-resolution mosaic at t Temporal covariance of each background mosaic with different resolution in the multi-resolution mosaic at t GIC parameter Motion parameter between previous corresponding background mosaic and the current image Motion parameter between corresponding background mosaic and the current image Motion parameter between each background mosaics with different resolution Representative resolution of the current image

respectively. Experimental results will be shown in Section 5. Finally conclusions will be mentioned in Section 6.

R(x,)

Sensor

L(x,,t)

Exposure/ Gain Control

q(x,t)

S(x,)

Q(x,t)

h(q(x,t),t)

2. Global motion estimation under GIC

Camera Response

I(x,t)

f (Q(x,t))

Fig. 2. Image acquisition process.

2.1. GIC model A general image acquisition process is shown as in Fig. 2, where Lðx; l; tÞ and Rðx; lÞ are the spectral distribution of light source and the spectral response of the surface material, respectively. Here we ignore the temporal variation of the surface material. Sðx; lÞ is the ideal sensor response and hðqðx; tÞ; tÞ is the exposure and gain control function. Camera response, for example gamma correction is f ðQðx; tÞÞ: It is not easy to know time varying functions Lðx; l; tÞ and hðqðx; tÞ; tÞ exactly, so we adopt some assumptions as in (2), i.e. assume that all the function can be decomposed into the time varying component and the time invariant one. Lðx; l; tÞ ¼ LðtÞLðx; lÞ;

Z qðx; tÞ ¼ LðtÞ

Lðx; lÞRðx; lÞSðx; lÞ dl l

¼ LðtÞq0 ðxÞ; Qðx; tÞ ¼ hðqðx; tÞ; tÞ ¼ aE ðtÞaG ðtÞqðx; tÞ; f ðQðx; tÞÞ ¼ minfB þ bQðx; tÞg ; Cg; Iðx; tÞ ¼ minfB þ b½aE ðtÞaG ðtÞLðtÞq0 ðxÞg ; Cg ¼ minfB þ baðtÞg q0 ðxÞg ; Cg;

ð2Þ

where l is wave length. qðx; tÞ is ideal sensor output and Qðx; tÞ is sensor outputs after auto exposure and gain control. aE ðtÞ and aG ðtÞ are the time varying exposure and gain control function of

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

237

equation must be modified using GIC compensation. Among many gradient descent based iterative parameter estimation methods, we choose a compositional Gauss–Newton method which uses composite function for warping and GIC function as in (5) [1,2]. x0 ¼ Wðx; mÞ; Fig. 3. The proposed GIC model: (a) a saturated linear GIC model, (b) the cross histogram between the two images under illumination change like sunrise. The exact thin line shows the histogram using the estimated GIC parameter.

the camera, respectively. B and C are the black level and the clipping level of the camera. Using (2) we are able to show the relationship between two images obtained at t1 and t2 ; respectively, as in (3) [13]. Iðx; t2 Þ ¼ minfB þ AðIðx; t1 Þ  BÞ; Cg; A ¼ ðaðt2 Þ=aðt1 ÞÞg :

ð3Þ

From (3) we propose a saturated linear GIC model as in (4) and Fig. 3(a). G 1 ðI 1 ðxÞÞ ¼ ð1 þ c0 ÞI 1 ðxÞ þ c1 ; G 2 ðI 1 ðxÞÞ ¼ c2 þ 255; GðI 1 ðxÞ; cÞ ¼ G 1 ðI 1 ðxÞÞ þ ½G2 ðI 1 ðxÞÞ  G 1 ðI 1 ðxÞÞ uðI 1 ðxÞ  LÞ; L ¼ ðc2  c1 þ 255Þ=ð1 þ c0 Þ; c ¼ ½c0 c1 c2 T ;

(4)

where GðI 1 ðxÞ; cÞ is a GIC function which is made up of two linear functions G1 ðI 1 ðxÞÞ and G 2 ðI 1 ðxÞÞ given GIC parameter c. uðI 1 ðxÞ  LÞ is the shifted unit step function. It can be proved that the proposed GIC model is correct in some cases from Fig. 3(b) which is the cross histogram between the two images obtained under illumination change like sunrise. 2.2. Iterative parameter estimation We want to estimate GM and GIC parameters at the same time using an optical flow based iterative method, which is the most popular one because of its relatively good performance. Of course brightness consistency in optical flow

I 01 ðxÞ

x00 ¼ Wðx0 ; dmÞ;

¼ GðI 1 ðxÞ; cÞ;

I 001 ðxÞ

¼

GðI 1 ðxÞ; 0Þ ¼ I 1 ðxÞ;

Wðx; 0Þ ¼ x;

GðI 01 ðxÞ; dcÞ; ð5Þ

where Wðx; mÞ is a image coordinate warping function given motion parameter m, and GðI 1 ðxÞ; cÞ is the GIC function which is mentioned in (4). x and x0 are the first and the second image coordinate, respectively. Especially we use a planar model for the warping function as in (6). And we can find that the function is identity one when the given parameter is zero, which makes sense. x ¼ ½x yT ; x0 ¼ ½x0 y0 T m ¼ ½m0 m1 m7 T ð1 þ m0 Þx þ m1 y þ m2 ; x0 ¼ m6 x þ m7 y þ 1 m3 x þ ð1 þ m4 Þy þ m5 y0 ¼ : m6 x þ m7 y þ 1

ð6Þ

GM parameter m and GIC parameter c are estimated at the same time using an iterative compositional parameter estimation method to minimize a global cost function E as in (7). X E¼ ½I 2 ðWðWðx; mÞ; dmÞÞ  GðGðI 1 ðxÞ; cÞ; dcÞ2 : x

(7) Using the first order Taylor series expansion, we obtain the incremental parameter dp in a closed form, which includes GM and GIC parameter as in (8). Final step is parameter update which is shown as in (9) [2,22]. X qI 2 ðx00 Þ qx00 dm  I 01 ðxÞ I 2 ðx0 Þ þ E 00 qx qdm x 2 qI 001 ðxÞ  dc ¼ E a ; qdc dp ¼ ½ dm dc T ;

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

238

Ea ¼

X I 2 ðx0 Þ  I 01 ðxÞ x

  2 qI 2 ðx00 Þ qx00 qI 001 ðxÞ þ  dp ; qx00 qdm qdc X qE a ¼ 0; ) dp ¼ H1 ST e; qdp x   qI 2 ðx00 Þ qx00 qI 001 ðxÞ S¼  ; qx00 qdm qdc X ST S; e ¼ I 01 ðxÞ  I 2 ðx0 Þ: H¼

ð8Þ

x

scale ¼ dm6 m2 þ dm7 m5 þ 1; m0 ðdm0 m0 þ dm1 m3 þ dm2 m6 þ dm0 þ m0 Þ=scale; m1 ðdm0 m1 þ dm1 m4 þ dm2 m7 þ dm1 þ m1 Þ=scale; m2 m3

ðdm0 m2 þ dm1 m5 þ dm2 þ m2 Þ=scale; ðdm3 m0 þ dm4 m3 þ dm5 m6 þ dm3

þ m3 Þ=scale; m4 ðdm3 m1 þ dm4 m4 þ dm5 m7 þ dm4 þ m4 Þ=scale; m5 m6

ðdm3 m2 þ dm4 m5 þ dm5 þ m5 Þ=scale; ðdm6 m0 þ dm7 m3 þ dm6 þ m6 Þ=scale;

m7 c0

ðdm6 m1 þ dm7 m4 þ dm7 þ m7 Þ=scale; dc0 c0 þ dc0 þ c0 ;

c1

dc0 c1 þ dc1 þ c1 :

ð9Þ

Fig. 4 shows the performance of the algorithms using the different GIC models. The two images are obtained with a handheld camera, and we generate GIC manually in the second image, i.e. c ¼ ½0:5 20:0 65:0T : Here we use a hierarchical parameter estimation method considering the speed of algorithm. So, the number of iterations is not proportional to exact elapsed time because the different hierarchy consumes the different time during each iteration. In Fig. 4(a) we can find that the proposed GIC model outperforms the other cases in that it converges fast with the smallest error which is normalized one per pixel. Fig. 4(b) and (c) show the result of image alignment and corresponding cross histogram. The lower row is

Fig. 4. Error analysis and image alignment: (a) normalized total error per pixel during iterations, (b) the result of image alignment, and (c) the cross histogram of the two images. In (b) and (c) the first and second rows are the results using a simple linear GIC model and the proposed GIC model, respectively.

the result when using the proposed GIC model, which shows good performance. The case with no GIC model cannot even converge properly. 2.3. Outlier rejection From experimental results we can find that GIC parameter is more sensitive to outliers than GM parameter in general because it is related to the intensity itself. Assume that one image contains objects and the other one doesn’t, which is the case when using the background mosaic. Even though geometric alignment is done perfectly, it is not

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

possible to find correct GIC parameter if the objects have a very different intensity with the background. So we use two stop criteria during the total iterations in order to reject outliers. We stop the first iterations when the absolute temporal gradient of the total error is less than th1 ; the first stopping threshold. It means that GM parameter almost converges. If the first stop criterion is satisfied, then only the pixels, at which the total error is smaller than a threshold eth ; are considered in the second iterations using the second stopping threshold th2 : Fig. 5 shows that outlier rejection is necessary for the GIC parameter estimation. Fig. 5(a) are two images under GIC which comes from dark and relative big object. Lower row in Fig. 5(b) and (c) is the result using outlier rejection with two stop criteria, i.e. eth ¼ 40:0; th1 ¼ 0:01; th2 ¼ 0:001: Seeing the cross histogram in Fig. 5(b), we know that the estimated parameter is more accurate in case of using outlier rejection. Fig. 5(c) shows the absolute difference between the two images after GIC compensation. It also reveals that in respect of object detection outlier rejection is important because the compensation error with outlier rejection is relatively small.

239

3. Multi-resolution mosaic 3.1. Resolution descriptor RDðxref Þ is a descriptor to express the resolution change between the reference background mosaic and the current image, which is defined and calculated at the reference background mosaic coordinate as in (10) [15]. x ¼ W ðxref ; mÞ; " # jqx=qxref j jqx=qyref j ref RDðx Þ ¼ ; jqy=qxref j jqy=qyref j

ð10Þ

where W ðxref ; mÞ is a warping function between the reference background mosaic coordinate xref and the current image coordinate x. If we use a planar model like (6), RD becomes the following as in (11). scale ¼ m6 xref þ m7 yref þ 1; 2  1 þ m0  m6 x   6  scale 6 ref 6 RDðx Þ ¼ 6   4 m3  m6 y scale

m  m x  1 7    scale

3

7 7  7 :  1 þ m 4  m 7 y 7  5   scale ð11Þ

Fig. 5. The effect of outlier rejection: (a) two images under GIC, (b) the cross histogram between the two images, (c) absolute DP after GIC compensation. The exact thin line in (b) is the histogram using estimated GIC parameter. In (b) and (c) the second row is the result using outlier rejection.

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

240

Reference background mosaic x = W(x

ref

Current image

3.2. Generation of multi-resolution mosaic

; m)

We first define a quantized RL as in (13) considering the results of object detection, and assign it to each background mosaic. For example if RR is between RL2 and RL3 ; we assign RL3 to the background mosaic.

x = ( x, y )

x ref = ( x ref , y ref )

nx ref

1

ny ref

1

RR = mean(nx ref ) × mean(ny ref )

(a) 8

Resolution Resolution Level

7 Resolution

6 5 4 3 2 1 0

50

(b)

100

150 frame

200

250

Fig. 6. Representative resolution (RR) calculation using RD at each reference background mosaic pixel: (a) RR calculation using RD, and (b) the resolution change in an image sequence. In (b) the dotted line represents RR and crossed line shows the corresponding RL.

RLi ¼ ai ði ¼ 1; 2; 3; . . .Þ;

(13)

where a is a constant to be determined. If a is too large, there is a serious problem for object detection because too much loss of information may occur in that RL. On the other hand if a is too small, we need a lot of memories to store and maintain the multi-resolution mosaic. From the simulation results, we choose a is 1.5 typically. To generate multi-resolution mosaic we should first estimate GM between the previous corresponding background mosaic and the current image as mentioned before. And finally RR and the corresponding background mosaic are determined as in Fig. 7 using (12) and (14) in case pc ¼ 1 and c ¼ 2: x1 ¼ Wðx0 ; m0;1 Þ ¼ M0;1 x0 ; x2 ¼ Wðx1 ; m1;2 Þ ¼ M1;2 x1 ; x ¼ Wðxpc ; mpc Þ ¼ Mpc xpc ;

Now we can get the needed number of pixels in the current image to express resolution, and we propose RR as in (12) and Fig. 6.

x ¼ M1 x ¼ M1 M0;1 xref ) mRR ;

nðxref Þ ¼ ½nxref nyref T ¼ RDðxref Þdxref ;

2 x ¼ M1 M1 1;2 x ) mc ;

ref

dx

T

¼ ½11 ;

meanðnðxref ÞÞ ¼ ½meanðnxref Þmeanðnyref ÞT ; RR ¼ meanðnxref Þ meanðnyref Þ;

ð12Þ

where nxref nyref is the needed number of pixels to express resolution at x in the current image when the corresponding number of pixel is 1 1 at xref : But we want to find RR, a representative one in the current image, so we get the mean of nðxref Þ and use it to get RR. Fig. 6(b) shows the change of RR and quantized resolution levels (RLs) through an image sequence, where RLs will be mentioned in detail later. We can find that the resolution of the last image is about six times as high as the first one, i.e. the camera is zooming in a lot.

if pc ¼ 1;

c ¼ 2;

1

ð14Þ

where mRR is the motion parameter which is used to determine RR. Mi and Mi1;i are the adopted warping matrices in order to explain the geometric relation between each coordinate more easily. And the other symbols are described in Table 1. Finally, we update background mosaics in the multi-resolution mosaic, which have lower resolution than RR of the current image after object detection. If RR is higher than the maximal RL in the multi-resolution mosaic, then we generate a new background mosaic and assign the newly generated maximal RL to it. Fig. 8(a) shows two generated multi-resolution mosaics with a ¼ 1:5: Image sequences are obtained using a moved handheld camera which is zooming in and out through the sequences. Each

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

241

Fig. 7. The generation of multi-resolution mosaic.

multi-resolution mosaic has five background mosaics with 5 different RL. Fig. 8(b) shows the change of resolution in each sequence. In each case we failed to generate a proper single background mosaic because the resolution gap is too large between the background mosaic and the current image with the highest resolution, i.e. we couldn’t estimate correct GM.

4. Object detection in image sequences 4.1. A unified framework for background subtraction Background subtraction is an algorithm [3,7,9,11,18,19,24,25,27] for object detection, which uses DP between the current and the background image. Especially there have been many methods to solve some specific problems. For example, a mixture of Gaussian background model is proposed to solve regularly moving

background problem like waving trees [24]. And Kalman filter is also used to estimate and maintain the background model [3]. Using DP between adjacent frames as well as background subtraction can be helpful [9,25]. Here we propose a unified framework for background subtraction, which is made up of the following three criteria. I. What kinds of features are used? II. What kind of distance metric is used to determine whether each pixel is object or background by thresholding? III. What is the adaptation rule? The first criterion is about the features which are used in the algorithms. The most favorite and the simplest one is the intensity Iðx; tÞ of each pixel in an image, where x and t are the spatial and temporal coordinates, respectively. We decide that a pixel is object if the intensity of the pixel changes not because there are some noise and illumination changes but because an object covers the pixel. In that case the main problem is that we should find the cause of intensity change. Occasionally gradual

ARTICLE IN PRESS 242

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

Fig. 8. Two multi-resolution mosaics with 5 RLs: (a) generated multi-resolution mosaic, and (b) the change of RR and RL in each image sequence.

or abrupt illumination changes can bother your decision. So edge Eðx; tÞ or gradient Gðx; tÞ ¼ ½Gx ðx; tÞG y ðx; tÞT is likely to be used due to its robustness to illumination changes [11,23,27]. But in general some kinds of spatial refinement have been made after the temporal change detection, i.e. no feature level fusion of different information occurs. A simple decision level fusion of information may cause misdetection or false detection in case there is a very noisy dominant feature [5,8,14]. That is why we propose an algorithm using the feature Xðx; tÞ ¼ ½Iðx; tÞGðx; tÞT T which is made by the feature level fusion of intensity Iðx; tÞ and gradient Gðx; tÞ: The second criterion is about the distance metric with which the decision is made by thresholding. The simplest distance metric is a difference Distðx; tÞ ¼ Iðx; tÞ  Bðx; t0 Þ; where Iðx; tÞ is the

measured intensity at ðx; tÞ and Bðx; t0 Þ is the estimated background at ðx; t0 Þ: Here t0 ¼ t0 means that there is no adaptation at all. Using this way is very simple but can be applied only to the situation in which there are no serious illumination changes. If t ¼ t  1; the background is updated immediately to Bðx; tÞ after object detection. In each case a pixel at ðx; tÞ is thought to be object if Distðx; tÞ4g and otherwise to be background, where g is the threshold to de decided. So the most important thing is to find a proper threshold. If g is too small, there will be a lot of false detection in an image. On the other hand, if g is too large, misdetection rate will increase. So far, the previous related work has been considering a temporal statistics usually. Each pixel is considered to have a statistically independent normal process in general [3,7,9,11,18,19,24,25,27], and a temporal distance

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

metric Distt ðx; tÞ is proposed as the following: Iðx; tÞ 

Nðmt ðx; tÞ; s2t ðx; tÞÞ;

Distt ðx; tÞ ¼ ½Iðx; tÞ  mt ðx; t  1Þ2 =s2t ðx; t  1Þ; ð15Þ where the subscript t indicates the word ‘temporal’ and mt ðx; tÞ and s2t ðx; tÞ are the temporal mean and variance of the background model, respectively, which are estimated at ðx; tÞ: The pixel will be thought to be object if Distt ðx; tÞ4gt : In that case threshold gt is determined by the temporal statistics of each pixel process. We can find that Distt ðx; tÞ has a w2 distribution [6,17]. So, if we choose gt ¼ 3:841 and decide a pixel is background, the pixel is estimated to be background with probability Prðx 2 backgrondÞ ¼ 0:95: We have considered only a temporal statistics of each pixel so far, but in fact there are some correlations between each pixel. So we can also assume that an image obtained at t ¼ t1 has a spatial statistics due to a spatial noise nt1 ðxÞ from the sensor and environment as in (16) [21]. I t1 ðxÞ ¼ st1 ðxÞ þ nt1 ðxÞ; (16) where I t1 ðxÞ; st1 ðxÞ and nt1 ðxÞ are intensity, signal and noise, respectively, which are measured or estimated at ðx; t1 Þ: Thus if we assume the spatial noise has a normal distribution and use a different kind of distance metric, a spatial distance metric Dists ðx; tÞ as in (17), the pixel will be thought to be object if Dists ðx; tÞ4gs : ½Iðx; tÞ  mt ðx; tÞ  Nð0; s2s ðtÞÞ; Dists ðx; tÞ ¼ ½Iðx; tÞ  mt ðx; t  1Þ2 =s2s ðtÞ;

ð17Þ

where the subscript t and s indicate the word ‘temporal’ and ‘spatial’, respectively and the spatial variance s2s ðtÞ is estimated as in (18). s2s ðtÞ ¼

1 M1 X X 1 N ½Iðx; tÞ  mt ðx; t  1Þ2 ; MN y¼0 x¼0

(18)

where M and N are the width and height of the image, respectively. In that case threshold gs is determined by the spatial statistics in a similar way to the case of selecting gt ; i.e. Dists ðx; tÞ has a w2 distribution. We will use both of the distance metric and propose spatio-temporal thresholding for object detection in the latter part of this paper.

243

The last criterion is about background adaptation. Generally, recursive algorithms update their parameters using both of the previous estimated parameters and the current measured value in order to be adaptive to something. Background subtraction is also likely to be adaptive to illumination changes. One of the simplest adaptations is to use a simple mean as in (19) when updating the temporal mean of the background model [9]. mt ðx; tÞ ¼

t1 1 mt ðx; t  1Þ þ Iðx; tÞ: t t

(19)

We can find that as t increases, the estimated mean goes to the previous estimated value, i.e. the current measured value is not likely to be considered. On the other hand, a very popular method is to use a constant weighted sum as in (20) [3,7,9,11,19,24,25,27]. mt ðx; tÞ ¼ amt ðx; t  1Þ þ ð1  aÞIðx; tÞ; st ðx; tÞ ¼ ast ðx; t  1Þ þ ð1  aÞ½Iðx; tÞ  mt ðx; tÞ2 ; ð20Þ where a is a constant weight having the value between 0 and 1, which is determined manually in general. So in that case we use the fixed adaptation rate. But we need to control adaptation rate to maintain proper statistics and to increase the performance of the algorithm. In other words, if a pixel is thought to be object with very high probability, no adaptation should be done or the pixel should be adapted with a very low adaptation rate in order to maintain the statistics properly. On the other hand, if a pixel is thought to be background with very high probability, the pixel should be adapted with higher adaptation rate than other background pixels with low probability. A very similar idea has been tested using two constant weights [18]. Very large one is applied to the object pixels, and small one is assigned to the background pixels. But we need more sophisticated adaptation method, so propose an adaptation function of Adaptation Rate (AR) as in (21), which is determined by the temporal statistics. It will be described in detail later. mt ðx; tÞ ¼ f ðARÞmt ðx; t  1Þ þ ð1  f ðARÞÞIðx; tÞ;

ARTICLE IN PRESS 244

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

st ðx; tÞ ¼ f ðARÞst ðx; t  1Þ þ ð1  f ðARÞÞ 2

½Iðx; tÞ  mt ðx; tÞ ;

ð21Þ

where f is a function of AR. 4.2. Statistical background model We use multi-resolution mosaic as a statistical background model which is made up of mean and covariance images. So the proposed algorithm is able to work when the camera moves and even zooms in to track the detected object. We first should initialize the multi-resolution mosaic assuming that the extracted feature has spatiotemporal normal distribution as in (22) [6,17]. Xðx; tÞ  Nðmt ðx; tÞ; Rt ðx; tÞÞ; ½Xðx; tÞ  mt ðx; tÞ  Nð0; Rs ðtÞÞ;

1 M1 X X 1 N ½Iðx; tÞ  mIt ðx; t  1Þ2 ; MN y¼0 x¼0

RG s ðtÞ ¼

1 M1 X X 1 N ½Gðx; tÞ  mG t ðx; t  1Þ MN y¼0 x¼0

2

ð25Þ

ð22Þ

ð23Þ

~ tÞ is the GIC compensated intensity where Iðx; at ðx; tÞ and GðIðx; tÞ; cÞ is GIC function as mentioned in Section 2. G x ðx; tÞ and Gy ðx; tÞ are the elements of the gradient Gðx; tÞ in direction of x and y; respectively, which are estimated at ðx; tÞ as in (26). T T mt ðx; tÞ ¼ ½mIt ðx; tÞmG t ðx; tÞ  ; 2 2 3 0T sIt ðx; tÞ 5; Rt ðx; tÞ ¼ 4 0 RG t ðx; tÞ

sIs ðtÞ ¼

T ½Gðx; tÞ  mG t ðx; t  1Þ ;

where Xðx; tÞ is the extracted feature from the GIC compensated current image as in (23), which uses the intensity and gradient at the same time in the feature level. mt ðx; tÞ and Rt ðx; tÞ are the temporal mean and covariance of each pixel at ðx; tÞ as in (24). But in fact we have the mean and covariance images in the background mosaic coordinate xi ; i.e. mt ðxi ; tÞ and Rt ðxi ; tÞ: So we need to compensate GM in order to obtain mt ðx; tÞ and Rt ðx; tÞ in the current image coordinate. In doing so, some interpolations cannot be avoided and will be mentioned in detail later. Rs ðtÞ is the spatial covariance which is estimated in an image at t as in (25). The meaning of each subscript t and s has been mentioned before. ~ tÞ ¼ G 1 ðIðx; tÞ; cÞ; Iðx; ~ tÞG x ðx; tÞG y ðx; tÞT ; Xðx; tÞ ¼ ½Iðx;

where we can find that the intensity and gradient are uncorrelated because we use Sobel operator to get the gradient as in (26). The assumption about spatio-temporal statistics as in (22), (24) and (25) is thought to be true in many cases. 2 2 3 0T sIs ðtÞ 5; Rs ðtÞ ¼ 4 0 RG s ðtÞ

ð24Þ

where M and N are the width and height of the image, respectively.

Gx ðx; tÞ ¼

1 X 1 X

~ þ i; y þ j; tÞ; Gy ðx; tÞ aij Iðx

i¼1 j¼1

¼

1 X 1 X

~ þ i; y þ j; tÞ; bij Iðx

i¼1 j¼1

2

1

  16 2 aij ¼ 6 44 1 2 1   16 0 bij ¼ 6 44 1 a00 ¼ b00 ¼ 0:

0 a00 0 2 b00 2

1

3

7 27 5; 1 1

3

7 0 7 5; 1 ð26Þ

Considering (23)–(26), we initialize the multiresolution mosaic as the following. We start from the reference background mosaic with the lowest RL. The first image Iðx; t0 Þ is assigned to the mean image as in (27) and we assign the maximum values to the covariance image because we don’t know anything about the statistics of the background model at the starting point.

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253 T mt ðx; t0 Þ ¼ ½Iðx; t0 Þ Gx ðx; t0 Þ G y ðx; t0 Þ ; 2 3 127:52 0 0 6 7 Rt ðx; t0 Þ ¼ 6 255:02 0 7 4 0 5; 2 0 0 255:0

Current image

Mean and covariance images of the corresponding background mosaic 1

m1 (xc,t−1) ∑1

ð27Þ

2 m 2 (xc,t−1)

(xc,t−1)

m 3 (xc,t−1)

X(x,t)

4.3. Object detection using spatio-temporal thresholding After selecting corresponding background mosaic as mentioned in Section 3, we make a decision whether each pixel is object or background by using the proposed spatio-temporal statistics. Spatiotemporal distance metric is estimated first after GMC as in (28), which is a kind of Mahalanobis distance. And the decision is made by thresholding, i.e. if Distt ðx; tÞ4gt and Dists ðx; tÞ4gs ; the pixel is thought to be object, otherwise to be background. Thresholds gt and gs are determined by the temporal and spatial statistics, respectively, as mentioned before. We choose that gt ¼ 11:345 and gs ¼ 7:815 in our algorithm. It means that if we decide a pixel is background, the pixel is estimated to be background with temporal probability Prt ðx 2 backgrondÞ ¼ 0:99 and spatial probability Prs ðx 2 backgrondÞ ¼ 0:95: Distt ðx; tÞ ¼ ½Xðx; tÞ  mt ðx; t  1ÞT R1 t ðx; t  1Þ ½Xðx; tÞ  mt ðx; t  1Þ; ð28Þ

When GM is compensated, the coordinate transform between the current and background mosaic brings some unavoidable interpolations. Interpolation during object detection process occurs in the corresponding background mosaic coordinate xc ; but during model update process occurs in the current image coordinate x. There are many interpolation methods of course, but if we

∑ 2 (xc,t−1)

y2 x2

where t0 means the starting point of the recursive algorithm. If we need to generate a new background mosaic with the next RL as mentioned in Section 3, we assign the previous mean and covariance images to the new background mosaic as the initial images.

Dists ðx; tÞ ¼ ½Xðx; tÞ  mt ðx; t  1ÞT R1 s ðtÞ ½Xðx; tÞ  mt ðx; t  1Þ:

245

∑ 3 (xc,t−1)

+

x1 m 4 (xc,t−1)

y1

∑ 4 (xc,t−1) 4

3 xc = (xc, yc)

x = (x, y)

Fig. 9. Bilinear interpolation in object detection process.

choose one of them as in Fig. 9, two possible candidates are considered as in (29). One is the direct interpolation using the estimated metric Disti ðx; tÞ: The other is the indirect interpolation using the interpolated mean and covariance itself, ~ t  1Þ as in Fig. 9. ~ i.e. mðx; t  1Þ and Rðx; 4 X

Distd ðx; tÞ ¼

wi Disti ðx; tÞ;

i¼1

Disti ðx; tÞ ¼ ½Xðx; tÞ  mi ðxc ; t  1ÞT Ri ðxc ; t  1Þ1 ½Xðx; tÞ  mi ðxc ; t  1Þ; ~ t  1Þ1 ~ t  1ÞT Rðx; Distind ðx; tÞ ¼ ½Xðx; tÞ  mðx; ~ ½Xðx; tÞ  mðx; t  1Þ; ~ mðx; t  1Þ ¼

4 X

~ t  1Þ wi mi ðxc ; t  1Þ; Rðx;

i¼1

¼

4 X

wi Ri ðxc ; t  1Þ:

ð29Þ

i¼1

where wi is the constant interpolation weight factor and the subscript d and ind indicate the word ‘direct’ and ‘indirect’, respectively. If we adopt bilinear interpolation as in Fig. 9, it can be proved that (30) is always true by using (31). Distd ðx; tÞXDistind ðx; tÞ; w1 ¼ x1 y1 ;

w2 ¼ x2 y1 ;

(30) w3 ¼ x1 y2 ;

w4 ¼ x2 y2 ;

x1 þ x2 ¼ y1 þ y2 ¼ 1: (31) In addition, some simulation results show the following interesting fact. If four neighborhood

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

246

pixels are homogeneous, two different interpolated distance metric Distd ðx; tÞ and Distind ðx; tÞ are almost same. But otherwise, Distd ðx; tÞ is likely to be much larger than Distind ðx; tÞ in general, because the largest one among the estimated distances Disti ðx; tÞ may have a dominant effect on the final value of the interpolation. If we use Distd ðx; tÞ instead of Distind ðx; tÞ for object detection, false detection rate will increase. Thus the proposed algorithm uses Distind ðx; tÞ for object detection. 4.4. Adaptation with TVAR Here we will describe model update process which is final one in a cycle of the overall system. The necessity of adaptation to illumination changes has been mentioned before. In addition, we want to control AR if necessary. Thus the proposed adaptation rule which uses truncated variable adaptation rate (TVAR) will be explained in detail. We first consider a general adaptation rule as in (32).

that we choose AR which is a function of Distt ðxi ; tÞ as in (33), i.e. we want to choose a high AR if Distt ðxi ; tÞ is very small, which means that the pixel is very probable to be background. ARðxi ; tÞ ¼ 1=Distt ðxi ; tÞ;

(33)

where Distt ðxi ; tÞ is the GM compensated distance metric in each background mosaic coordinate, which is calculated using warping function and interpolation. The idea seems good. But because Distt ðxi ; tÞ 0 in almost all background pixels, ARðxi ; tÞ is too high to adapt the algorithm to illumination changes properly. Thus we propose the concept of TVAR as in (34) and Fig. 10. TVAR1 ðxi ; tÞ ¼ minfa; maxf1=b; 1=Distt ðxi ; tÞgg; pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi TVAR2 ðxi ; tÞ ¼ a½maxf0; 1  Distt ðxi ; tÞ=bg2 ; TVAR3 ðxi ; tÞ ¼ a½maxf0; 1  Distt ðxi ; tÞ=bg; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi TVAR4 ðxi ; tÞ ¼ a maxf0; 1  Distt ðxi ; tÞ2 =b2 g; ð34Þ

mt ðxi ; tÞ ¼ f ðARÞmt ðxi ; t  1Þ

where a and b are truncation constants. Simulation results show that TVAR outperforms CAR or VAR.

þ ð1  f ðARÞÞXðxi ; tÞ; i

i

Rt ðx ; tÞ ¼ f ðARÞRt ðx ; t  1Þ þ ð1  f ðARÞÞ½Xðxi ; tÞ  mt ðxi ; tÞ i

i

T

½Xðx ; tÞ  mt ðx ; tÞ ;

AR(xi,t)

AR(xi,t)

ð32Þ

i

where x is each background mosaic coordinate with RLi in the multi-resolution mosaic, and Xðxi ; tÞ is the GM compensated feature using the transformation as in (14) and Fig. 7. Here we are going to update the background mosaics of which resolutions are not higher than the corresponding mosaic because we want to use correct information and not to waste lots of information. f is a function of AR; for example f ðARÞ ¼ eAR [9]. If we use an exponential function as the above example, AR can span from 0 to 1; which makes sense. If AR ¼ 0; then f ðARÞ ¼ 1; which means that no adaptation occurs. If AR ¼ 1; then f ðARÞ ¼ 0; which implies the extremely fast adaptation. In that case we replace the estimated parameter with the current value. Thus it is very important to find a suitable AR, but is not always easy. If we are able to determine AR with respect to each pixel’s statistics, more proper decision will be made in the next cycle of the system. A simple idea is

(a)

Distt(xi, t)

C AR

(b)

Distt(xi, t)

VAR

AR(xi,t)

AR(xi,t)

x

y=

Distt(xi, t)

y

AR( xi , t ) x

+

y

x

TVAR3

2

+

y

2

Distt(xi, t)

Distt(xi, t)

(c)

Distt(xi, t)

TVAR2

TVAR1

AR (xi , t )

+

TVAR4

Fig. 10. Adaptation rules: (a) constant adaptation rate (CAR), (b) variable adaptation rate (VAR), and (c) TVAR.

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

247

Fig. 11. The effect of using different feature for object detection: (a) current image, (b) detection result using Iðx; tÞ only, (c) detection result using Gðx; tÞ only, (d) detection result using Iðx; tÞ and Gðx; tÞ at the same time in the decision level, and (e) detection result using Xðx; tÞ:

Table 2 The effect of using different feature for object detection in respect of miss detection rate (MDR) and false detection rate (FDR): (b)–(e) are the results using each corresponding method as in Fig. 11 1st sequence

(b) (c) (d) (e)

2nd sequence

3rd sequence

MDR (%)

FDR (%)

MDR (%)

FDR (%)

MDR (%)

FDR (%)

13.9 28.8 0.02 1.02

9.34 0.62 13.8 4.30

9.82 15.24 0.04 0.51

3.74 0.41 19.33 2.78

7.53 32.88 0.04 0.02

0.28 0.47 12.23 3.29

ARTICLE IN PRESS 248

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

Fig. 12. The effect of spatio-temporal thresholding for object detection: (a) background mosaic, (b) current image, (c) detection result using temporal statistics only, and (d) detection result using spatio-temporal statistics.

5. Experimental results Some experimental results show the performance of the proposed object detection algorithm

in this subsection. In order to compare the results fairly, experiments are conducted in the same condition except for only one. In that condition we want to know the difference of the results. Image

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

sequences are obtained using a handheld camera indoors or outdoors. IR camera as well as CCD camera is also used to get a few sequences outdoors. 5.1. Feature level fusion Fig. 11 shows the performance of the algorithms which use different feature in three different sequences. Two indoor sequences are from static CCD camera and the other one is outdoor sequence from static IR camera. We can find that using feature Xðx; tÞ outperforms the other three cases. Even if the feature Iðx; tÞ and Gðx; tÞ are Table 3 The effect of spatio-temporal thresholding for object detection in respect of MDR and FDR: (c) and (d) are the results using each corresponding method as in Fig. 12 1st sequence

(c) (d)

2nd sequence

MDR (%)

FDR (%)

MDR (%)

FDR (%)

2.62 0.00

263.2 1.54

0.00 1.23

190.5 5.83

249

used at the same time in the decision level like Fig. 11(d), the performance is not as good as that of the proposed algorithm which uses Xðx; tÞ: Here we define Miss Detection Rate (MDR) and False Detection Rate (FDR) in an image as in (35), and show the results as in Table 2 in order to give a reasonable comparison. MDRð%Þ ¼ No: of miss detected object pixels 100; No: of ture object pixels FDRð%Þ ¼ No: of false detected object pixels 100; No: of ture object pixels

ð35Þ

From Fig. 11 and Table 2, it is shown that the averaged performance of the proposed method is the best among them considering MDR and FDR at the same time. The result using the decision level fusion reveals that MDRs are about zeros but FDRs are relatively high. Also we know that FDRs in case of using Xðx; tÞ are not negligible because Sobel gradient operator makes a thick edge. Shadows are not likely to be detected in case of using Gðx; tÞ only, because shadows are very

Fig. 13. The effect of adaptation using TVAR for object detection: (a) detection result using CAR with a ¼ 0:2; (b) detection result using VAR, and (c) detection result using TVAR4 with a ¼ 0:2; b ¼ 40:

ARTICLE IN PRESS 250

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

homogenous themselves and have no texture information in general. But shadows are not our concern here. 5.2. Spatio-temporal Thresholding Fig. 12 and Table 3 show the effect of spatiotemporal thresholding for object detection. Two image sequences are used. One is from moving IR camera and the other one is from moving CCD camera. In each case the camera pans and tilts only not zooming in and out. Because the background mosaic is updated after each detection cycle, false or misdetection error may be accumulated and

propagate through the whole sequence. The algorithm using the temporal statistics only cannot overcome this situation. FDRs are extremely high from Table 3. Especially near the edges, serious false detection occurs because we use Xðx; tÞ which is considering gradient also. 5.3. TVAR In Fig. 13 the results of object detection using different adaptation rule are shown in a whole image. A method using TVAR shows a relatively good performance. Misdetection rate increases in case of using CAR because the temporal statistics

Fig. 14. Object detection in various environments: (a) background mosaic, (b) current image, and (c) detection result.

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

is corrupted. If we use VAR instead, false detection rate increases because of its extremely fast adaptation, i.e. even a small fluctuation in Xðx; tÞ can make Distt ðx; tÞ and Dists ðx; tÞ very large. 5.4. Results in various environments In Fig. 14 we can find that the averaged performance of the proposed algorithm is good in various environments. Each sequence is obtained using a handheld CCD camera indoors and outdoors. From the last sequence, we can find that there is misdetection because of camouflage problem, i.e. the intensity of his shirt is very similar to that of the background. Fig. 15 shows the effect of multi-resolution mosaic for object detection. Two sequences are obtained from a handheld CCD camera which not only moves but also zooms in and out. For each

251

sequence, object detection is accomplished using each multi-resolution mosaic which is shown as Fig. 8(a). In each case, we have 5 RLs and the estimated resolution change is shown as Fig. 8(b). The result using a single background mosaic as in Fig. 15(b) reveals that it cannot overcome the resolution change of images. There are a lot of misdetection and false detection. On the other hand, the proposed method using each multiresolution mosaic shows a relatively good performance. Fig. 16 shows the effect of GIC compensation for object detection in three sequences. Seeing the estimated cross histogram as in Fig. 16(d), we can find that GIC occurred. A very dark object or the light source in the background is responsible for GIC in each case. Without GIC compensation, we cannot even get the correct GM as mentioned in Section 2. And also in respect of object detection, it is very serious as in Fig. 16(b). The result in

Fig. 15. Object detection using a multi-resolution mosaic: (a) current image, (b) detection result using a single background mosaic, and (c) detection result using a multi-resolution mosaic.

ARTICLE IN PRESS 252

J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

Fig. 16. The effect of GIC compensation for object detection: (a) current image, (b) detection result without GIC compensation, (c) detection result with GIC compensation, and (d) the cross histogram between the background mosaic and the current image. The exact thin line shows the histogram using the estimated GIC parameter.

Fig. 16(c) shows that the proposed method has a pretty good performance.

6. Conclusions In this paper we proposed an algorithm for object detection in image sequences using multiresolution mosaic. Especially when the camera

moves and zooms in on something to track the detected object, it was very useful to generate a multi-resolution mosaic and use it for object detection instead of wasting the past parts of images. And we could find that GIC was very serious situation in respect of not only GME but also object detection, so it should be compensated before object detection. Finally we described a unified framework for background subtraction

ARTICLE IN PRESS J.-H. Cho, S.-D. Kim / Signal Processing: Image Communication 20 (2005) 233–253

and proposed an algorithm using spatio-temporal thresholding and TVAR for object detection and background adaptation, respectively.

References [1] Y. Altunbasak, R.M. Mersereau, A.J. Patti, A fast parametric motion estimation algorithm with illumination and lens distortion correction, IEEE Trans. Image Processing 12 (4) (2003) 395–408. [2] S. Baker, I. Matthews, Lucas-Kanade 20 years on: a unifying framework, Internat. J. Comput. Vision 56 (3) (2004) 221–255. [3] M. Boninsegna, A. Bozzoli, A tunable algorithm to update a reference image, Signal Processing: Image Communication 163 (2000) 353–365. [4] H.-M. Chen, P.K. Varshney, Automatic two-stage IR and MMW image registration algorithm for concealed weapons detection, Proc. Vision, Image Signal Processing 148 (4) (2001) 209–216. [5] I.J. Cox, A Review of statistical data association techniques for motion correspondence, Int. J. Comput. Vision 10 (1) (1993) 53–66. [6] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, Second ed, Wiley-Interscience Publication, New York, 2001. [7] A. Elgammal, R. Duraiswami, D. Harwood, L.S. Davis, Background and foreground modeling using nonparametric kernel density estimation for visual surveillance, Proc. IEEE 90 (2002) 1151–1163. [8] A.H. Gunatilaka, B.A. Baertlein, Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection, IEEE Trans. Pattern Anal. Machine Intelligence 23 (6) (2001) 577–589. [9] S. Huwer, H. Niemann, Adaptive change detection for real-time surveillance applications, in: Proceedings of Visual Surveillance, 2000, pp. 37–46. [10] M. Irani, P. Anandan, Robust multi-sensor image alignment, 6th International Conference on Computer Vision (1988) 959–966. [11] O. Javed, K. Shafique, M. Shah, A hierarchical approach to robust background subtraction using color and gradient information, in: Proceedings of Motion and Video Computing, 2002, pp. 22–27. [12] Y. Jianchao, Image registration based on both feature and intensity matching, Proc. Acoustics Speech Signal Processing 3 (2001) 1693–1696. [13] H.-Y. Kim, Moving object extraction methods and a video coding scheme for visual surveillance, Doctoral

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21] [22] [23]

[24]

[25]

[26]

[27]

253

Thesis, Department of EE & CS, Division of EE, KAIST, 2004. V. Koval, The competitive sensor fusion algorithm for multi sensor systems, Int. Workshop on Intelligent Data Acquisition and Advanced Computing System: Technique and Application (2001) 65–68. C.-W. Lee, Hierarchical mosaic construction using a resolution map and progressive residual motion estimation, Doctoral Thesis, Department of EE & CS, Division of EE, KAIST, 2004. J.S. Lee, K.Y. Lee, S.D. Kim, Moving target tracking algorithm based on the confidence measure of motion vectors, in: Proceedings of International Conference on Image Processing, 2001, pp. 369–372. A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, Second ed, Addison Wesley, Reading, MA, 1994. L. Marcenaro, F. Oberti, C.S. Regazzoni, Multiple objects color-based tracking using multiple cameras in complex time-varying outdoor scenes, 2nd IEEE International Workshop on Performance evaluation of tracking and surveillance, December, 2001. E.P. Ong, B.J. Tye, W.S. Lin, M. Etoh, An efficient video object segmentation scheme, Int. Conf. Acoustics Speech Signal Processing 4 (2002) 3361–3364. P.D. Picton, Tracking and segmentation of moving object in a scene, in: Proceedings of 3rd International Conference Image Processing and Its Application, 1989, pp. 389–393. P.L. Rosin, Thresholding for change detection, in: Proceedings of ICCV ‘98, 1998, pp. 274–279. H.-Y. Shum, R. Szeliski, Panoramic Image mosaics, Technical Report of Microsoft Research (1997) 1–50. K. Skifstad, R. Jain, Illumination-independent change detection for real world sequences, Comput. Vision Graphics Image Processing 46 (1989) 387–399. C. Stauffer, W.E.L. Grimson, Adaptive background mixture models for real time tracking, IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2 (1999) 246–252. K. Toyama, J. Krumm, B. Brumitt, B. Meyers, Wallflower: principles and practice of background maintenance, Proc. Comput. Vision 1 (1999) 255–261. E. Trucco, A. Verri, Introductory Techniques for 3-D Computer Vision, Prentice Hall, Englewood Cliffs, NJ, 1998. F. Ziliani, A. Cavallaro, Image analysis for video surveillance based on spatial regularization of a statistical model-based change detection, in: Proceedings of the International Conference on Image Analysis and Processing, 1999, pp. 1108–1111.