A comparison between feature-based and EM-based contour tracking

A comparison between feature-based and EM-based contour tracking

Image and Vision Computing 24 (2006) 1218–1232 www.elsevier.com/locate/imavis A comparison between feature-based and EM-based contour tracking Arthur...

6MB Sizes 1 Downloads 46 Views

Image and Vision Computing 24 (2006) 1218–1232 www.elsevier.com/locate/imavis

A comparison between feature-based and EM-based contour tracking Arthur E.C. Pece a,b,*, Anthony D. Worrall c b

a Heimdall Vision, Bjørnsonsvej 29, 2500 Valby, Denmark Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark c School of Systems Engineering, University of Reading, RG6 6AY Reading, UK

Received 7 April 2003; received in revised form 6 June 2005; accepted 7 June 2005

Abstract Most active-contour methods are based either on maximizing the image contrast under the contour or on minimizing the sum of squared distances between contour and image ‘features’. The Marginalized Likelihood Ratio (MLR) contour model uses a contrast-based measure of goodness-of-fit for the contour and thus falls into the first class. The point of departure from previous models consists in marginalizing this contrast measure over unmodelled shape variations. The MLR model naturally leads to the EM Contour algorithm, in which pose optimization is carried out by iterated least-squares, as in featurebased contour methods. The difference with respect to other feature-based algorithms is that the EM Contour algorithm minimizes squared distances from Bayes least-squares (marginalized) estimates of contour locations, rather than from ‘strongest features’ in the neighborhood of the contour. Within the framework of the MLR model, alternatives to the EM algorithm can also be derived: one of these alternatives is the empiricalinformation method. Tracking experiments demonstrate the robustness of pose estimates given by the MLR model, and support the theoretical expectation that the EM Contour algorithm is more robust than either feature-based methods or the empirical-information method. q 2005 Elsevier B.V. All rights reserved. Keywords: Generative model; Active contour; EM algorithm; Kalman filter; Empirical information matrix

1. Introduction Active contour methods find application to tracking when camera motion prevents the use of background subtraction methods, and/or when only specific kinds of objects need to be tracked, and the shape, but not the appearance, of these objects is known a priori. Most active-contour methods can be classified as featurebased if the pose of the object is optimized by minimizing squared distances between contours and image features; and contrast-based if the pose of the object is optimized by maximizing the norm of the image gradient (or some related measure) underlying the contour. A recent overview of activecontour methods can be found in [13]. Feature-based methods have found wide application in tracking [2,3,17,35,37], but feature extraction is a process * Corresponding author. Address: Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark. E-mail address: [email protected] (A.E.C. Pece).

0262-8856/$ - see front matter q 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2005.06.013

notoriously sensitive to noise, which leads to instabilities in tracking. Contrast-based methods include the original snake model [20], the model-based tracker by Kollnig and Nagel [21], and several methods based on image statistics [12,38,34]. These methods have the disadvantage of not taking explicitly into account unmodelled variations of the contour shape. In addition, gradient-based optimization of the object pose/ shape is not easily applicable to the underlying model (although it has been shown [21] that the Levenberg– Marquardt method can be applied). The MLR (Marginalized Likelihood Ratio) contour model [29,31] is contrast-based, but allows for random, unmodelled shape variations. The basic assumptions of the model are as follows: (1) grey-level values of nearby pixels are correlated if both pixels belong to the object being tracked, or both belong to the background, but there are no correlations if the two pixels are on opposite sides of an object boundary; (2) the shape of the contour is subject to random local variability.

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

The first assumption leads to the use of a likelihood ratio as the objective function to be maximized in pose/shape refinement. The same principle is adopted in [12,38,34]. The second assumption implies that the likelihood ratio must be marginalized over possible deformations of the contour. A similar principle is adopted in [23]. Taking the two assumptions together means that no features need to be detected and matched to the model, leading to greater robustness against noise; while at the same time local shape variations are taken explicitly into account. As in many scenarios involving marginalization, the EM algorithm [24] is a natural choice for optimization of the marginalized likelihood ratio. Other modified Newton methods are also applicable. The MLR model and the EM Contour algorithm have been developed in previous publications, as applied to vehicle tracking with 3D models [31,28,6,5] or eye tracking with 2D models [14–16]. The general MLR model is discussed more extensively in [29]. This paper contains a shorter introduction to the model (in the specific form described in the Appendix of [29]) and focuses on developing Newton-like methods for pose optimization based on the MLR model. The three methods considered in this paper are the EM Contour algorithm, an empiricalinformation method [25], and a feature-based contour algorithm. Both theoretical and experimental comparisons between these methods are presented. No performance comparison can conclusively prove the superiority of an entire class of methods: the experiments with the empirical-information method are meant primarily to show that the EM algorithm is not the only feasible optimizer for the MLR model; the implementation of the feature-based algorithm is meant primarily to illustrate the similarities in practice between the EM Contour algorithm and the featurebased approach. Taken together with the theory developed in Section 3, however, the performance comparisons suggest that tracking with the EM Contour algorithm is more robust, while the computational cost is approximately equal for all algorithms. Section 2 describes the MLR contour model. Section 3 derives modified Newton methods for pose refinement based on the MLR model. Section 4 extends the algorithms from pose refinement to tracking (Kalman filtering). Section 5 describes the results obtained by tracking motor vehicles. Finally, Section 6 contains a discussion of these comparisons.

1219

normals to these sample points are computed from the geometric model together with the estimated state parameters. The normal line to a sample point will be called observation line. Due to the aperture problem, only the normal component of the displacement of the object boundary can be locally detected. Therefore, only the intersection between the object boundary and the observation line is of interest in the pose refinement algorithms. A distinction must also be made between the predicted intersection (i.e. the contour intersection) and the actual intersection: these differ not only because of errors in the state estimate, but also because of errors in the geometric model. In the following, the symbol ni will be used for the coordinate on the observation line indexed by i. The symbol mi will be used for the coordinate of the contour intersection. The distance between contour and actual intersection is denoted by ei. The actual intersection is of course unknown: in general it is only possible to estimate a pdf (probability density function) over values of ei. The subscript i will be dropped when not needed. The grey-level profile on observation line i will be denoted by Ii(n). Using a digital computer, only a finite set of grey levels can be measured on the observation line. Given regularlyspaced sampling of grey levels (with spacing Dn and bilinear interpolation) we define the observation as I i Z fIi ðj DnÞjj 2Zg. In the following, the subscript j will always denote location on the observation line. Fig. 1 illustrates the meaning of the symbols m, n, e, Dn. 2.2. Likelihoods of grey-level differences In this paper, we consider a specific form of the MLR model: the general model is described in [29] and the specific model described here is analyzed in better detail in the Appendix of [29]. In the specific model, it is assumed that

ν µ

∆ν ε

2. Likelihood model The object state is described by an m-vector x(t) which is a function of time t. Given the state and a geometrical model of the object (which can be a 2D model or a 3D model), the object contour is projected onto the image plane. The contour is then used to estimate the likelihood of the image, given the object state. 2.1. The observation A finite set of n sample points on the contour are used to estimate the likelihood. The image coordinates and unit

Fig. 1. The yellow line represents a model contour, mismatched with the object boundary, either because of misalignment or because of shape variability. The green line represents the normal to a sample point on the contour. n is the coordinate on the normal line, m is the coordinate of the intersection with the contour, and e is the distance between the intersections of the normal line with the contour and with the object boundary. Grey levels are sampled on the normal with a regular spacing Dn. The useful range of sampling of grey levels is determined by the width s of the Gaussian window (see text). (For interpretation of the reference to colour in this legend, the reader is referred to the web version of this article.)

1220

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

the pdf of the observation depends only on grey level differences (gld’s). Clearly, relaxing this assumption might lead to better tracking performance: this is left for future work. We define a binary indicator variable h(m), with value 1 if the modelled boundary (i.e. the boundary modelled by the contour used to define the observation line) intersects the observation line between mKDn/2 and mCDn/2; and value 0 otherwise. Note that the absence of the modelled boundary does not imply the absence of an edge: there are edges within the background as well as within the object, due to unmodelled objects, textures, etc. Given no modelled boundary between image locations mKDn/2 and mCDn/2, the pdf of the observation is defined as: def

fL ðI; mÞ Z f ½IðmKDn=2Þ; Iðm C Dn=2ÞjhðmÞ Z 0

(1)

Research on the statistics of natural images shows that the pdf fL is well approximated by a generalized Laplacian [18]. Defining DI(m)ZI(mCDn/2)KI(mKDn/2), the generalized Laplacian is of the form     1  DIðmÞ b exp K fL ðI; mÞ Z (2)  2lGð1 C 1=bÞ l where ZL is a normalization constant, l is a parameter that depends on the distance Dn, and b is a parameter related to the kurtosis of the pdf. For natural images, b is in the range (0.5,1.0). Experiments show that bZ1 is a good approximation for tracking purposes and allows fast computation of log fL. This model differs from previous active-contour models based on image statistics [12,34,36,38] in making use of a parameterized pdf for gld’s (with a single fitted parameter), instead of empirical histograms of gld’s. It also differs in using image statistics at locations with no modelled object boundaries, rather than statistics at locations with no object boundaries whatsoever. Given a modelled boundary between image locations mKDn/2 and mCDn/2, the pdf of the observation is defined as: def

fE ðI; mÞ Z f ½IðmKDn=2Þ; Iðm C Dn=2ÞjhðmÞ Z 1

(3)

Given that grey levels observed on opposite sides of a modelled boundary are statistically independent, this pdf can be assumed to be uniform fE ðI; mÞ Z 1=q

(4)

where q is the number of grey levels. 2.3. Likelihood ratios

image plane: the likelihoods of observations computed for different contour locations are not comparable, because they are likelihoods of different observations. A better evaluation function is given by the likelihood of the entire image as a function of contour location: the entire image does not depend on the contour location, although its likelihood does. Given a modelled boundary at location mi on observation line i and the grey-level profile Ii on the same line, the likelihood of the image can be decomposed as follows: f ðimagejboundary at mi Þ Z f ðimagejno contourÞfR ðI i jmi Þ

The first multiplicative term on the right-hand side is a constant independent of the presence and location of the contour. Therefore, only the likelihood ratio needs to be considered in the optimization of the contour pose/shape.

2.4. The marginalized likelihood ratio model The likelihood ratio, as defined above, has the problem of not being smooth in the space of the object pose: a displacement of Dn of the contour on the observation line means that the gld, and therefore the likelihood ratio, is no longer measured across the object boundary. This problem not only makes pose refinement more difficult: it also implies that, even at the optimal pose, any error in the geometrical model will make the likelihood ratio an inappropriate measure of goodness-of-fit. There are at least two possible solutions to this problem: blurring of grey-level profiles and marginalization of likelihood ratios. The MLR model is based on marginalization over all possible positions of the object boundary on the measurement line. The underlying assumption of the model is that a random shape deformation happens at each sample point on the contour. The prior pdf fD(e) of the deformation is assumed to be Gaussian:  2 1 Ke def fD ðeÞ Z pffiffiffiffiffiffi exp (7) 2s2 2ps The geometric noise includes all unmodelled shape variability. For simplicity, it is assumed to be independent and identically distributed on all sample points on the contours. The assumption of independence will break down if the spacing between sample points is small. The likelihood ratio of the observation I given the contour location m and the deformation e is, of course, fR(IjmCe). The joint likelihood ratio of the observation I and the deformation e, given the contour location m, is given by f ðI; ejmÞ Z fR ðIjm C eÞfD ðeÞ

As in previous work [4,34], the optimal contour location is found by maximizing the likelihood ratio def fE ðI; mÞ

fR ðIjmÞ Z

fL ðI; mÞ

(5)

The motivation for considering likelihood ratios is that the observation depends on the contour location on the

(6)

(8)

The marginalized likelihood ratio of the observation is obtained by integrating over all possible deformations. The integral can be approximated as a summation over the discrete set of possible deformations ejZj DnKm: X fM ðIjmÞ Z fR ðIjj DnÞfD ðej ÞDn (9) j

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

In the following, negative logarithms of likelihood ratios will be used for convenience: pffiffiffiffiffiffi 2ps q def hðIjmÞ ZKlog fM ðIjmÞ Z log C log Dn 2l ! (10) X jDIðjDnÞj e2j Klog K 2 exp l 2s j The negative log-likelihood ratio h(Ijm) will also be called point evaluation function. A summation of point evaluation functions over all sample points gives the negative loglikelihood ratio for the entire contour, which will be called contour evaluation function. 2.5. Corrections for interference and statistical dependencies When combining evidence from all the sample points on the contours, statistical dependencies between observations must be taken into account. This problem is present in all contour models, but it is particularly troublesome when the geometric model is composed of straight line segments, as in the experiments reported below: in this scenario, long model line segments become accidentally aligned with long, high-contrast background image edges, e.g. marks on the road. The most effective way that we found to reduce the problem is to weigh the point evaluation functions by a factor proportional to the square root of the length (in the image plane) of the corresponding line segment. Line segments close to each other create another problem: large values of gld’s can arise from either segment, but using the above estimation technique would give a high loglikelihood to both segments. To reduce this problem, the weight of each sample point is reduced when another line segment is within a distance of 2s on the observation line. More specifically, define the distance Li (in pixels) between the two outermost sampling points on the segment including sample point i and the number ni of sampling points on the segment. The weighting term for the segment pffiffiffiffi length is zi Z Li =ni . Now suppose that one or more other line segments intersect the observation line at distances dk from sample point i, with dk!2s. The weighting term for the interfering segments is 2 3K1 X 2sKjdk j 5 ui Z 41 C 2s jd j!2s k

The weight for sample point i is wiZzi$ui. The contour evaluation function is the weighted sum of the point evaluation functions: HðxÞ Z

n X

wi hðI i jmi ðxÞÞ

(11)

iZ1

1221

however, it is of interest whether the log-posterior pdf of the state (log-prior pdf minus contour evaluation function) can be robustly maximized by a gradient-based method. If this is feasible, then the innovation in a Kalman filter can be computed efficiently. In this section and the following, the subscript k denotes iteration number in optimization for a single image frame, e.g. xk is the state at the kth iteration; this subscript will be omitted when not needed. For simplicity, the symbol hi will be used for h(Iijmi). 3.1. Gradient and Hessian of the evaluation function For small perturbations dx of the state variables, the observable component dmi of contour displacement is an approximately linear function of the perturbation dmi zJTi $dx

(12)

def

where Ji Z Vx mi is a column of the m!n Jacobian matrix J. The Jacobian matrix is obtained by differentiation of the geometric model (including the projective model in the case of 3D geometric models). The gradient of the evaluation function is given by Vx HðxÞ Z

n X

w i Vx h i Z

iZ1

n X

wi

iZ1

vhi J vmi i

The Hessian of the evaluation function is given by   X n n X vhi v2 h 2 Vx HðxÞ Z w i Vx Ji z wi 2i Ji $JTi vmi vmi iZ1 iZ1

(13)

(14)

The approximation in Eq. (14) is equivalent to turning the full Newton method into the Gauss–Newton method (see chapter 10 in [7]). Define the n-vector h_ with elements h_ i Z wi vhi =vmi and the € with diagonal elements H€ iiZ n!n diagonal matrix H wi v2 hi =vm2i . By using these definitions, Eqs. (13) and (14) can be written in matrix form: Vx HðxÞ Z J$h_

(15)

€ T V2x HðxÞ Z J$H$J

(16) def

Define the distance dxk Z xKxk of an arbitrary state x from the current state estimate. Using the gradient and Hessian, we can write the Taylor expansion of the evaluation function around the state estimate xk at iteration k 1 € k $JTk $dxk (17) HðxÞ zHðxk Þ C dxTk $Jk $h_ k C dxTk $Jk $H 2 € k indicates evaluation at xk. where the subscript k in Jk, h_ k , H 3.2. Center of probability mass

3. Optimization of the marginalized likelihood ratio The first derivative of the point evaluation function is given by The contour evaluation function given by the MLR model can be used in conjunction with a particle filter for tracking, as shown in [16,29]. Given the speed advantage of Kalman filtering,

vh K1 vfM Z vm fM ðIjmÞ vm

(18)

1222

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

vfD ðej Þ vh K1 X Z f ðIjjDnÞ vm fM ðIjmÞ j R vm

(19)

Eq. (19) is obtained by remembering that the derivative vfR/vm is zero, because the likelihood ratio fR(IjjDn) depends only on the value of DI(jDn) and not on the contour location. It is convenient to define the conditional distribution of the def indicator variables pj Z pðhj Z 1jI; mÞ given by pj Z

fR ðIjm C ej ÞfD ðej ÞDn fM ðIjmÞ

(20)

We further introduce the concept of center of mass of the observation given the contour location m, i.e. the first moment of pj: def X n^ Z pj jDn (21) j

The center of mass is the BLS (Bayes’ Least Squares) estimate of the location of the object boundary on the normal to the contour. These definitions allow us to rewrite the first derivative of the point evaluation function with respect to m: X ej vh ZK pj 2 (22) vm s j vh mKn^ Z 2 vm s

(23)

This means that going down the gradient of the evaluation function is equivalent to decreasing the distances between contour and centers of mass. Of course, the centers of mass depend on the current contour location: as the contour moves at each iteration, the centers of mass need to be re-estimated.

3.3. Second derivative of the point evaluation function From Eq. (16), it can be seen that a sufficient condition for the Hessian to be positive definite is that (1) the Jacobian J is of full rank (i.e. enough sample points are visible to resolve the state of the object) and € has no negative elements and a (2) the diagonal matrix H sufficient number of positive elements to resolve the pose from the corresponding sample points. When using 3D models, it is not possible to guarantee that the Jacobian is always full rank, as this depends on the € need to be positive viewpoint. Which diagonal elements of H also depends on the viewpoint. On the other hand, a sufficient condition for the Hessian to be positive semidefinite is simply that the matrix H€ has no negative elements: if this is the case, then prior constraints can be used to regularize the problem and make the Hessian positive definite.

Differentiating Eq. (18) we obtain the second derivative of the point evaluation function:     v2 h v K1 vfM 1 vfM 2 1 v2 fM Z Z K (24) vm fM vm fM vm fM vm2 vm2 Using Eqs. (22) and (23), we obtain ! ! X ej 2 X e2j v2 h 1 Z pj 2 K pj 4 K 2 vm2 s s s j j X ðjDnKnÞ ^2 1 pj C 2 ZK 4 s s j

(25)

which shows that the second derivative (and therefore the € is negative when corresponding diagonal element of matrix H) the second moment from the mean of pj is larger than s2. It follows that the standard Newton method cannot be used to maximize the evaluation function. 3.4. Approximate Newton methods A standard solution to the problem posed by a non-positive definite Hessian is the Levenberg–Marquardt method ([32], chapter 15). However, in its standard form, this method involves a series of line optimizations, which implies a larger number of function evaluations. For this reason, it is desirable to use methods that fall into the general class of the delta € is simply replaced by a diagonal algorithm [19], in which H weighting matrix A which is guaranteed to be positive definite. Obviously the weighting matrix should be as close as possible € to increase speed of convergence. to H One algorithm in this class can be easily obtained by following the suggestion in [33] (page 130) of taking only the first term on the right-hand side of Eq. (24). In this case, the diagonal elements of A are given by  2 vhi Aii Z wi (26) vmi It is easy to see that, in this case, the weighting matrix A is positive semidefinite. Given the similarity between the matrix J$A$JT obtained in this way and the empirical Fisher information matrix as defined in [25], this method will be defined as the EICo (Empirical-Information Contour) algorithm in the following. There is also an analogy to the Gauss– Newton method because only the squared first derivative is taken, neglecting the second derivative in Eq. (24). Alternatively, one could take only the last term on the righthand side of Eq. (25) and obtain Aii Z

wi s2

(27)

It is easy to see that the matrix A constructed in this way is positive definite. (However, the matrix Jk Ak JTk is still positive semi-definite if the Jacobian is not full rank.) This second choice can be justified by the application of the EM algorithm to the contour model, as shown in Appendix A. In the

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

following, the resulting method will be defined as the EMCo (EM Contour) algorithm. Other general algorithms of this type can be found in [19]. 3.5. A feature-based contour algorithm It can be easily seen that the EMCo algorithm reduces to iterated (weighted) least-squares minimization of distances between contour and centers of mass. In other words, the main practical difference between the EMCo algorithm and featurebased contour methods is in the squared distances whose sum is minimized. This difference can also be described, to a first approximation, as the difference between the BLS estimate (center of mass) and the ML estimate (strongest feature) of the location of the object boundary. However, if the location of the strongest def feature is defined as nML Z argmax fR ðIjnÞ, then tracking would be practically impossible in a cluttered scene. Therefore, fR is usually maximized only within a short distance q from the contour. In other words, the usual definition of ‘strongest feature’ is really def

nstrong Z argmax½fR ðIjnÞrððnKmÞ=qÞ where the rectangular function r is defined as ( 1 ðjnj% 1Þ rðnÞ Z 0 ðjnjO 1Þ

(28)

(29)

The use of a rectangular window leads to another problem: features suddenly pop into view or out of view when they are close to mGq. This problem suggests the use of the MAP feature with location nMAP defined as def

nMAP Z argmax½fR ðIjnÞfD ðnKmÞ

dxðtÞ Z D½xðtÞdt C S1=2 dbðtÞ

4. Kalman filtering 4.1. Dynamical model The tracker is based on recursive estimation of the object state, given the image sequence up to the current frame. By Bayes’ theorem, the posterior pdf is proportional to the product of prior pdf and likelihood ratio. The prior pdf is given by a dynamical model, described by an equation of the form

(31)

where D describes the deterministic dynamics of the system, S is a diagonal matrix whose diagonal elements are the variances of the dynamical noise on the corresponding state variables, and b is m-dimensional Brownian motion [26]. The specific motion model used in the experiments is the steering-angle model introduced in [22]. This model includes mZ5 state variables: X and Y coordinates on the ground plane, orientation on the ground plane, velocity, and steering angle (angle of the front wheels with respect to the vehicle main axis). The pose variables that can be resolved from a single frame are the X and Y coordinates and the orientation on the ground plane. Camera calibration is required to define the ground plane and thereby reduce the number of pose variables from 6 (three translational and three rotational in 3D) to 3. Having specified the system, it is straightforward to implement the prediction step of an iterated extended Kalman filter [1] for the state variables. The prediction step gives the central estimate x^ and covariance matrix C^ for the state variables at the next frame. These are the parameters of the prior pdf fP of the state variables:   1 1 K1 fP ðxÞ Z exp K DxT ðtÞ$C^ ðtÞ$DxðtÞ (32) ZP 2 ^ 1=2 is a normalization factor and where ZP Z ð2pÞm=2 jCðtÞj def ^ DxðtÞ Z xKxðtÞ (not to be confused with dxkZxKxk introduced in Section 3.1). Adding the negative log-prior pdf to the contour evaluation function (Eq. (11)), the objective function to be minimized (negative log-posterior pdf) becomes: GðxÞ Z HðxÞKlog fP ðxÞ

(30)

Using nMAP rather than nstrong means substituting the rectangular window with a Gaussian window. A Gaussian window is more robust against noise, since features do not suddenly pop into and out of view. The definition of MAP feature suggests performance comparisons between the EMCo algorithm and a (feature-based) contour algorithm which is identical, except for the substitution of ‘center of mass’ by ‘MAP feature’. This algorithm will be defined as MAPfeature algorithm in the following. Note that, unlike the EMCo and EICo algorithms, the MAPfeature algorithm is heuristic and there are no results about its convergence properties.

1223

Z

n X iZ1

1 K1 wi h½I i jmi ðxÞ C DxT $C^ $Dx C log ZP 2

(33)

4.2. Bayesian contour algorithms By using either the EMCo or EICo algorithm, the evaluation function is locally approximated as 1 ~ HðxÞ Z Hðxk Þ C dxTk $Jk $h_ k C dxTk $Jk $Ak $JTk $dxk 2

(34)

where Ak is the weighting matrix evaluated at xk. The procedure for extending ML estimation to MAP estimation with delta algorithms is quite simple: the negative log-prior pdf is added to the approximate evaluation function, to obtain the approximate objective function to be minimized: 1 ~ GðxÞ Z Hðxk Þ C dxTk $Jk $h_ k C dxTk $Jk $Ak $JTk $dxk 2 1 K1 C log ZP C DxT $C^ $Dx 2

(35)

The matrix Jk $Ak $JTk is positive (semi-)definite by K1 construction. Given that C^ is an inverse covariance matrix,

1224

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

it is always positive definite if the dynamical system has been modelled realistically. Therefore, G~ is quadratic and can be minimized in a single Newton step. Minimizing G~ is equivalent to minimizing the expression  2 ! 1=2 _ T!  AK $hk A1=2 k k $Jk 1   GðxÞ Z  (36) $dx C k K1=2  2  C^K1=2 $ðx KxÞ  ^ ^ C k

2

^ dxk C xk Kx; ^ where we have used the equality DxZ xKxZ K1=2 and C^ is the Cholesky decomposition of the inverse covariance of the state. The difference between G~ and G is  an additive constant which has no effect on the location of the minimum; the advantage of minimizing G is that the solution  can be obtained by singular value decomposition as outlined in [32], page 676. We are now in a position to formulate the tracking algorithm as applied to one image frame (using the double subscript ki for iteration k, sample point i): † Prediction step: use the dynamical model to obtain prior estimates of the state and its covariance at the current frame, from the posterior estimates at the previous frame (as in the well-known iterated extended Kalman filter). † Innovation step: iterate to convergence: E: use the geometric model to obtain the current estimates of mki, iZ1,.,n, from the current estimates of the state variables xk, then compute the centers of mass n^ki from mki and the image evidence Iki using Eq. (21); M: Compute the Newton step dxk by minimizing G~ in Eq. (35) (or equivalently G in Eq. (36)) using the  current estimates of the deformations n^ki Kmki and the Jacobian Jk. The innovation step has been decomposed into E and M sub-steps for clarity. However, the algorithm description is valid not only for the Bayesian EMCo algorithm, but also for the Bayesian EICo algorithm, or any other form of Bayesian delta algorithm: the only difference is in the choice of the matrix A in Eq. (36), which is used in the M sub-step. In addition, by substituting ‘MAP features’ for ‘centers of mass’, the description becomes applicable to the MAPfeature algorithm. The tracker is illustrated by means of a block diagram in Fig. 2. 4.3. Covariance of the state variables Once the algorithm has converged to a minimum of the objective function, the Hessian of the objective function is guaranteed to be positive semi-definite. Therefore, it can be safely added to the inverse covariance of the Kalman prediction to obtain the inverse covariance C of the posterior estimate: € T C C^K1 CK1 Z J$H$J

(37)

Note that, in general, the likelihood is not Gaussian, and therefore, the posterior pdf is not Gaussian either. Even if the posterior pdf were Gaussian, the non-linear dynamics of the

state estimate

image

E

M

Kalman prediction

predicted state

centers of mass Fig. 2. Schematic diagram illustrating the operation of the EMCo tracker: the E step takes the image and the current estimate of the state as inputs, and computes the centers of mass; while the M step takes the centers of mass and the prior estimate of the state as inputs and computes a new state estimate. At convergence (broken arrows), the final posterior state estimate is propagated through the Kalman prediction module, to obtain the prior state estimate for the next frame.

system would make the prior pdf non-Gaussian at the next image frame. In spite of these problems, approximating the prior pdf as a Gaussian function works well in practice. 4.4. Multi-scale implementation In our implementation, optimization is performed at multiple scales. More specifically, the innovation step is repeatedly applied to every frame of the image sequence with decreasing values of s. At each scale ss, 0%s%[, define the initial state estimate xðsÞ 0 before the first iteration and the final state xðsÞ at convergence. At the largest scale s0, the initial U estimate xð0Þ is given by the Kalman prediction; at all 0 subsequent scales ss, 0!s%[, the initial estimate is given by ðsK1Þ the final estimate for the previous scale: xðsÞ 0 Z xU . 4.5. Increasing robustness When the algorithm is applied to 3D models, a nonincreasing objective function is guaranteed if the linearized perspective is a valid approximation within the distance between xk and xkC1: at the following iteration, full perspective is used to re-estimate the contour locations, the objective function, and the linearized perspective. Given that linearization is not always a good approximation, it is necessary to test that the objective function does not increase over iterations. The objective function depends not only on the state and on the image, but also on s. While optimizing at scale ss, one could test for non-increasing objective function either at the current scale or at the final scale s[. We opted for the second option, and the test for non-increasing objective function is not performed after every iteration, but only after convergence at any given scale. More specifically, denote by Gs(x) the objective function measured at scale ss. At any scale ss, 0%s%[, the objective function G[(x) is measured before and after optimization, ðsÞ resulting in the two measures G[ ðxðsÞ 0 Þ and G[ ðxU Þ. If ðsÞ ðsÞ ðsÞ G[ ðxU ÞO G[ ðx0 Þ, then the state is reset to x0 before optimization is applied again at scale ssC1. A further improvement in performance is achieved by searching, when appropriate, for the lowest objective function

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232 ðsÞ ðsÞ on a grid centered at xðsÞ 0 . Specifically, if G[ ðxU ÞO G[ ðx0 Þ, then G[ is evaluated on a grid of states defined by def

ðsÞ ðsÞ xa Z xðsÞ 0 C a5ðx0 KxU Þ

(38)

where 5 denotes element-wise multiplication between vectors, and a is an m-vector with elements equal to one of {K1, 0, 1} for the pose variables, and equal to zero for the other state variables (velocity and steering angle). Given that the dynamical model includes 3 pose variables, the grid includes 33Z27 states. The state in the grid with the lowest objective function is taken as the starting point x0ðsC1Þ for optimization at the next scale, or (if sZ[) it is taken as the final posterior estimate of the object state. 5. Tracking results The contour trackers were tested on the PETS 2000 test sequence [9] and the PETS 2001 test sequence 1 (camera 1) [8]. 5.1. Model parameters The input noise is Gaussian with zero mean and a diagonal covariance matrix. The diagonal elements of this covariance matrix were adjusted manually to optimize the tracking of the second vehicle in the PETS 2000 sequence, between frames 380 and 630, with the EMCo tracker. The parameters of the dynamical model are listed in Table 1, together with the values used in the performance comparisons. The physical dimensions of the variances are obtained by considering the equation ^ C DtÞ Z CðtÞ C JD $CðtÞ$JTD C S$Dt Cðt

(39)

where Dt is the time interval between video frames (or, in the case of an iterated Kalman filter, between iteration steps) and JD is the Jacobian of the system dynamics D (Eq. (31)). From this equation it can be seen that the physical units of the noise variance terms are equal to the squared units of the corresponding state variables, divided by the time unit. The geometric model is the shape of the ‘average’ object being tracked. In the context of this paper, the objects of interest are motor vehicles; the geometric model of the average vehicle (in the South of England) was obtained in previous research [10]. It is straightforward to project the 3D boundaries between the facets of the geometric model onto the image plane, to obtain the expected image contours. The software used for the projective geometry and hidden-line removal has been employed in previous research on tracking (e.g. [36,37, 30,31]). The hidden-line removal is based on the algorithm of Table 1 Noise parameters of the dynamical model Parameter

Meaning

Value

Sv Sj Sf SX, SY

Variance of velocity input Variance of steering-angle input Variance of orientation input Variance of position input

102 102 82 0.252

1225

Cyrus and Beck with the Liang–Barsky modification ([11], Section 3.12.4). The parameters that have yet to be specified are the range of scales (s values) at which the pose refinement takes place, and the termination criterion at each scale. The range of values of s in pixels was determined as follows: if F is the focal length of the camera and d is the distance from camera to object, then the initial value of (d/F)s was equal to twice the standard deviation of the prior estimate of vehicle position (clipped to the range 0.25–0.5 m). Starting with this value, pose refinement was carried out at progressively finer scales (s being decreased by a factor of 1.4 at each scale), down to a value of (d/F)sZ0.1 m on the ground plane. At each scale, the convergence criterion is that the RMS displacement of the vehicle center on the ground plane, from one iteration to the next, becomes less than 0.01(d/F)s. A maximum number of 10 iterations was allowed at each scale. The tracking was carried out at 5 frames/s (i.e. every fifth frame of the test sequences was used). For computational savings, the spacing of sample points was increased in relation to the square root of the length of the corresponding line segment. 5.2. Initialization The contour trackers are initialized when a moving vehicle is detected and its pose and velocity are estimated. This is accomplished by the Cluster Tracker described elsewhere [27]. Briefly, a reference image is subtracted from the current image. A new object is detected when the image difference has a significant amount of energy at a location distinct from the locations of objects already detected. The locations and sizes of objects are estimated by EM clustering of the image pixels, with one cluster corresponding to the background and one additional cluster for each moving object. When an object is no longer in contact with the image boundaries, the centroid and covariance of the corresponding cluster give initial estimates of the location and size of the object: the center of the object is assumed to be on the ray defined by the centroid, at an elevation of 0.2 m with respect to the ground plane; and the size of the object is assumed to be such that, if the object were compact, flat and elliptical in shape, and located 0.2 m above the ground plane, it would generate a cluster of pixels with the observed covariance. If the estimated size is compatible with a car, initial estimates for the orientation and velocity of the object are computed from the position estimates obtained by the Cluster Tracker in two subsequent frames. The initial steering angle is assumed to be zero. 5.3. Tracking results on the PETS 2000 test sequence

(m2/s3) (degree2/s) (degree2/s) (m2/s)

This sequence includes three vehicles: a saloon, a hatchback, and a van. Figs. 3–5 show representative frames obtained with the EMCo, MAP-feature, and EICo trackers. It can be

1226

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

Fig. 3. Tracking of the first vehicle in the PETS 2000 sequence. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

seen that the geometric model does not fit the hatchback and especially the van. Nonetheless, all the vehicles could be tracked through the image sequence (at least with the EMCo tracker). From visual inspection, it seems that this robust tracking is due to the fit of the models to the bottom of the vehicles, more than to marginalization over shape deformations: the geometric models are fitted to the lower edges of

the vehicles even when this leads to a mismatch between the upper part of the contour and the upper edges of the vehicles. The first vehicle (saloon) was tracked until it disappeared from view, with any tracker. However, only the EMCo tracker could recover the correct pose when the vehicle was far away, as shown in Fig. 3, frame 295: the other trackers managed to follow the vehicle by hanging on to its trailing edge.

Fig. 4. Tracking of the second vehicle in the PETS 2000 sequence. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

1227

Fig. 5. Tracking of the third vehicle in the PETS 2000 sequence. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

The second vehicle (hatchback) was tracked well until it started turning left, then a lag appeared with all three trackers between the true and estimated orientations of the vehicle, as shown in Fig. 4, frame 575. This lag was only slightly smaller with the EMCo tracker. Both the EMCo tracker and the MAPfeature tracker recovered, however, and the pose was correctly estimated when the hatchback stopped in its parking slot, as shown in Fig. 4, frame 670. On the other hand, the EICo tracker never recovered the correct hatchback pose, as shown in the same figure. The third vehicle (van) has a much stronger contrast with the background, due to its white color. Its upper image edge overlaps for a few frames with the lower-left edge of the hatchback. Nonetheless, both the EMCO tracker and the MAP-feature tracker maintained a correct estimate of the poses of both vehicles, as shown in Fig. 5, frame 815. The pose estimate of the hatchback is also unperturbed when a pedestrian (who happens to be the guest editor of this special issue) passes in front of the hatchback (Fig. 5, frame 915a). The van is successfully tracked all the way by any tracker, due to its strong contrast (Fig. 5, frame 915b). In conclusion, both the EMCo tracker and the MAP-feature tracker perform well on the PETS 2000 sequence, although the MAP-feature tracker does not fit the saloon pose when the saloon is far from the camera (Fig. 3, frame 295). The EICo

tracker is less robust, as shown by the tracking failure with the hatchback. 5.4. Tracking results on the PETS 2001 test sequence1 (camera 1) Frames 500–1000 were used from this sequence. These frames include a hatchback and a van. All parameters were kept the same as in the experiments on the PETS 2000 sequence. The hatchback was tracked well by all trackers until it reached its parking slot, as shown in Fig. 6. The pose of the van was successfully recovered by all three methods in the first frame in which it was detected, as shown in Fig. 7, frame 710. This is in spite of an unfavorable viewing angle, a long distance from the camera, an inappropriate geometric model, and a lessthan-perfect camera calibration (note how the wire-frame looks small compared to the real vehicles surrounding it). However, only the EMCo tracker could successfully track the van from this starting point: the MAP-feature tracker and the EICo tracker were ‘distracted’ by the row of parked vehicles, resulting in tracking failures, as shown in Fig. 7, frames 720 and 785. In order to further compare the trackers, the MAP-feature tracker and the EICo tracker were re-initialized on the van

1228

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

Fig. 6. Tracking of the first vehicle in the PETS 2001 sequence. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

Fig. 7. Tracking of the second vehicle in the PETS 2001 sequence, up to frame 785. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

1229

Fig. 8. Tracking in the PETS 2001 sequence: frames 835–990. Left column, EMCo tracker; center column, MAP-feature tracker; right column, EICo tracker.

starting from frame 745, and from that point they could track the van successfully. In this sequence, the van occludes the hatchback almost completely when passing in front of it. All three trackers performed very poorly under near-total occlusion, as can be seen in Fig. 8, frames 835 and 865: the strong image edge between the white van and the background drags away the hatchback model from the hatchback, in spite of the fact that occlusion between vehicles is modelled in the system (but collisions between vehicles are not modelled). The estimated pose of the hatchback then becomes unstable, as can be seen in Fig. 8, frame 990(a). On the other hand, the van is tracked successfully by all trackers, as can be seen in Fig. 8, frame 990(b). In conclusion, the PETS 2001 sequence shows once more that the EMCo tracker is more robust than the MAP-feature tracker or the EICo tracker when the vehicle is very far from the camera. This sequence also shows that all three trackers

cope poorly with almost-total occlusion (which is hardly surprising).

5.5. Computational cost In all the contour trackers considered in this paper, the computational cost depends mostly on the extraction of image grey levels from the observation lines. Therefore, differences in computational cost between the EMCo, EICo, and MAPfeature trackers are best estimated by the number of iterations required for convergence, rather than by the empirical CPU time which depends on the implementation. Average numbers of iterations per frame per object in the PETS 2000 test sequence are given in Table 2. As can be seen, the computational cost is not significantly different between trackers.

1230

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

Table 2 Numbers of pose-refinement iterations per frame per vehicle Tracker

Average

EMCo MAP-feature EICo

120.0 112.3 117.6

6. Conclusions The performance comparisons presented above show that three different algorithms derived from the MLR contour model can be used for tracking. The EMCo algorithm has the best performance (by a narrow margin over the MAP-feature algorithm), but there are a couple of qualifications to be made: (1) the dynamical parameters were tuned using the EMCo tracker: this might give the EMCo tracker an advantage; (2) five vehicles are not sufficient to draw firm conclusions. The parameter tuning is unlikely to seriously affect the results, because it was carried out by tracking the hatchback, which is always less than 50 m from the camera; while the most serious tracking failures happened when vehicles were very far from the camera. However, a different parameter tuning would have been necessary if the purpose of this study had been to provide ratings for the trackers. With respect to the second point, a better performance comparison could be carried out using image sequences including more vehicles, such as the dt_passat03 sequence (http://i21www.ira.uka.de/image sequences/) and evaluating the tracking performance automatically as in [6]. Again, this would be appropriate only if the intent had been to rate the performance of the trackers. In fact, it is the theoretical analysis that provides the main motivation for the use of the EMCo algorithm. It has been shown that the weighting matrix A defined by the EMCo algorithm (Eq. (27)) is positive definite, while the weighting matrix defined by the EICo algorithm (Eq. (26)) is only positive semi-definite: this means that the EMCo algorithm is expected to be more robust. The analysis and numerical experiments in [25] also confirm this expectation. As for the MAP-feature algorithm, it is surprising how well it performs in practice, given that it is a heuristic method with no proven convergence properties. Considering that the computational cost, as well as the amount of programming work, are almost the same for all algorithms, we conclude that the EMCo algorithm has a theoretical advantage and no apparent theoretical or practical disadvantages, compared to the alternative methods. It must be emphasized that all three algorithms are based on the same MLR model: the robustness of the trackers is mainly due to the rigorous probabilistic formulation, the use of image statistics, and the marginalization over a Gaussian ‘window’: the particular choice of optimization algorithm is less important. That the trackers cope badly with almost total occlusion is not surprising: if no edges of the object being tracked are visible, then the contour will be pulled towards edges in the background,

or edges of the occluding object. Collision detection would be helpful in tracking with 3D models. Ultimately, however, the problem of occlusion requires higher-level reasoning. The active contour methods considered in this paper are boundary-based in the sense that they only exploit image information at the estimated object boundaries. Boundarybased contour methods tend to be sensitive to image texture: in highly-textured image regions, large grey-level differences are detected on all observation lines, destabilizing the trackers. The EMCo algorithm should be less sensitive than feature-based algorithms, because the center of mass is a more robust estimator than the ‘strongest feature’. However, the test sequences did not allow testing of this prediction. In principle, all the trackers compared in this paper should be robust against even drastic changes of illumination: the initialization method can break down (since it is based on temporal image differences), but the contour trackers can still work as long as the l parameter of the generalized Laplacian pdf (Eq. (2)) is updated. Of course, it is necessary to avoid saturation of the camera, otherwise the pdf of gld’s is no longer a generalized Laplacian. In conclusion, the MLR model and the EMCo algorithm have been shown to be theoretically sound, robust in practice, and computationally inexpensive. This is demonstrated not only by the experiments reported in this paper, but also by further work on both vehicle tracking [6] and eye tracking [16]. The MLR model and EMCo algorithm have potential applications to many other tracking and segmentation problems. An implementation of the EMCo tracker is included in the motris software package developed at the University of Karlsruhe, freely available from http://i21www.ira.uka.de/ motris/. Future work will include combining the EMCo algorithm with particle filtering and extending the MLR model to color images. Acknowledgements James Ferryman provided a valuable stimulus to research in visual tracking by providing the PETS test sequences and organizing the PETS workshops as well as this special issue. Discussions with Bo Markussen were very helpful in improving the quality of this paper. Appendix A. The EM Contour algorithm The EM method is a general iterative method for ML or MAP estimation, useful when the underlying model includes some confounding hidden variables in addition to the observed variables (grey levels) and the state variables that need to be estimated. The theory of the EM method ([24], chapter 3) proves that (a) the objective function does not increase from one iteration to the following and (b) the algorithm converges to an extremum of the objective function. In the case of the MLR model, the confounding variables are the deformations. We shall begin by considering a single sample point. If the location n of the object boundary on this

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

observation line were known, one could estimate (and maximize in state space) the complete-data log-likelihood, i.e. the logarithm of the joint likelihood of observation and deformation, given the state (see Eq. (8)): log f ðI; ejmðxÞÞ Z log½fR ðIjnÞfD ðnKmðxÞÞ

(A1)

Given that the boundary location is unknown, the EM theory prescribes maximizing the expected value of the complete-data log-likelihood. This expected value, which will be defined as EM functional for brevity, depends on the pdf of the boundary location. Define the conditional pdf of the boundary location (given current state estimate xk and observation): f ðnjmðxk Þ; IÞ Z Ð

fR ðIjnÞfD ðnKmðxk ÞÞ fR ðIjn 0 ÞfD ðn 0 Kmðxk ÞÞdn 0

(A2)

The EM functional 9 is obtained by integrating the complete-data log-likelihood under the conditional pdf of the boundary location: ð def 9ðxjxk Þ Z f ðnjmðxk Þ; IÞlog½fR ðIjnÞfD ðnKmðxÞÞdn (A3) The distinction between x and xk is crucial: the former is the argument of the complete-data log-likelihood, while the latter is the current state estimate (and argument of the conditional pdf of the boundary location). As in the case of Eq. (9), the integrals in Eqs. (A2) and (A3) can be approximated by a summation over a discrete set of deformations. This means that the pdf f(njm(xk), I) is replaced by the probability distribution pkj defined by Eq. (20) (the double subscript kj denotes iteration k and coordinate jDn on the observation line). By this discretization, the EM contour algorithm becomes similar to the EM clustering algorithm (e.g. [24], Section 2.7), in which a vector of indicator variables is defined for each data point to be clustered. The discretized version of the EM functional is given by: 9ðxjxk Þ Z

X

pkj log½fR ðIjjDnÞfD ðjDnKmðxÞÞDn

j

2l Dn C log pffiffiffiffiffiffi q 2ps   X jDIðjDnÞj ðjDnKmðxÞÞ2 K C pkj $ l 2s2 j

Z log

(A4)

Define 2l Dn C log pffiffiffiffiffiffi q 2ps   X ^2 jDIðjDnÞj ðjDnKnÞ K C pkj $ l 2s2 j def

ck Z log

(A5)

Note that ck is a constant independent of x (depending only on xk). Remembering the definition of center of mass

1231

(Eq. (21)), the EM functional reduces to 9ðxjxk Þ Z ck K

2 ^ ½nKmðxÞ 2s2

(A6)

The EM functional for the entire contour is, of course, the sum of the functionals of all sample points: Qðxjxk Þ Z

X i

wi 9i ðxjxk Þ Z Ck K

X i

wi

½n^i Kmi ðxÞ2 2s2

(A7)

def P where CkZ i wi cki . Note that the EM functional must be maximized at each iteration. It can be seen that the EM functional is a weighted sum of squared distances between contour and centers of mass. Therefore, the iteration of the EM Contour algorithm can be described very succinctly as follows:

E: For all sample points on the contours, estimate the centers of probability mass; M: Minimize the weighted sum of squared distances between sample points and corresponding centers of mass.

References [1] Y. Bar-Shalom, T.E. Fortmann, Tracking and Data Association, Academic Press, London, 1988. [2] A. Baumberg, D.C. Hogg, Learning flexible models from image sequences Proceedings of the European Conference on Computer Vision: ECCV’94, LNCS 800, Springer, Berlin, 1994. pp. 299–308. [3] A. Blake, M. Isard, Active Contours, Springer, Berlin, 1998. [4] J. Coughlan, A. Yuille, C. English, D. Snow, Efficient deformable template detection and localization without user initialization, Computer Vision and Image Understanding 78 (3) (2000) 303–319. [5] H. Dahlkamp, A. Ottlik, H. Nagel, Comparison of edge-driven algorithms for model-based motion estimation, in: Proc. Workshop on Spatial Coherence for Visual Motion Analysis: SCVMA’ 04, 2004. [6] H. Dahlkamp, A.E.C. Pece, A. Ottlik, H.-H., Nagel, Differential analysis of two model-based vehicle tracking approaches, Pattern Recognition, Proceedings of the 26th DAGM Symposium, LNCS 3175, pp. 71–78, Springer, 2004. [7] J.E. Dennis Jr., R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 1996. [8] J.M. Ferryman, J.L. Crowley (Eds.), Proceedings of the Second IEEE International Workshop on Performance Evaluation in Tracking and Surveillance: PETS (2001). [9] J.M. Ferryman, A.D. Worrall (Eds.), Proceedings of the First IEEE International Workshop on Performance Evaluation in Tracking and Surveillance: PETS (2000). [10] J.M. Ferryman, A.D. Worrall, G.D. Sullivan, D. Baker, Visual surveillance using deformable models of vehicles, Robotics and Autonomous Systems 19 (1997) 315–335. [11] J.D. Foley, A. van Dam, S.K. Feiner, J.F. Hughes, Computer Graphics, Second ed. in C, Addison-Wesley, Reading, MA, USA, 1996. [12] D. Geman, B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on PAMI 18 (1) (1996) 1–14. [13] R. Hanek, M. Beetz, The contracting curve density algorithm: fitting parametric curve models to images using local self-adapting separation criteria, International Journal of Computer Vision 59 (3) (2004) 233–258. [14] D.W. Hansen, A.E.C. Pece, Iris tracking with feature free contours, in: Proceedings of the workshop on Analysis and Modelling of Faces and Gestures: AMFG 2003, October 2003.

1232

A.E.C. Pece, A.D. Worrall / Image and Vision Computing 24 (2006) 1218–1232

[15] D.W. Hansen, E.C. Pece, Eye typing off the shelf, Proceedings of the International Conference on Computer Vision Pattern Recognition: CVPR, vol. 2 2004 pp. 159–164. [16] D.W. Hansen, E.C. Pece, Eye tracking in the wild, Computer Vision and Image Understanding 98 (1) (2005) 155–181. [17] C. Harris, C. Harris, Tracking with rigid models in: A. Blake, A. Yuille (Eds.), Active Vision, MIT Press, Cambridge, MA, 1992, pp. 59–73. [18] J. Huang, D. Mumford. Statistics of natural images and models, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: CVPR’99, vol. 1, 1999, pp. 541–547. [19] B. Jørgensen, The delta algorithm and GLIM, International Statistical Review 52 (3) (1984) 283–300. [20] M. Kass, A. Witkin, D. Terzopoulos. Snakes: active contour models, in Proceedings of the International Conference on Computer Vision: ICCV’97, 1987, pp. 259–268. [21] H. Kollnig, H.-H. Nagel, 3D pose estimation by directly matching polyhedral models to gray value gradients, International Journal of Computer Vision 23 (3) (1997) 283–302. [22] H. Leuck, H-H. Nagel. Automatic differentiation facilitates OFintegration into steering-angle-based road vehicle tracking, in: Proceedings of the International Conference on Computer Vision and Pattern Recognition: CVPR’99, vol. 2, 1999, pp. 360–365. [23] J. MacCormick, A. Blake. A probabilistic contour discriminant for object localisation, in Proc. International Conference on Computer Vision: ICCV’98, 1998, pp. 390–395. [24] G.J. McLachlan, T. Krishnan, The EM Algorithm and Extensions, Wiley, New York, 1997. [25] I. Meilijson, A fast improvement to the EM algorithm on its own terms, Journal Of The Royal Statistical Society, Series B 51 (1989) 127–138. [26] B. Øksendal, Stochastic Differential Equations, Springer, Berlin, 1995.

[27] A.E.C. Pece, Generative-model-based tracking by cluster analysis of image differences, Robotics and Autonomous Systems 39 (3–4) (2002) 181–194. [28] A.E.C. Pece. The Kalman-EM contour tracker, in Proceedings of the Third Workshop on Statistical and Computational Theories of Vision: SCTV, 2003. [29] A.E.C. Pece. Contour tracking based on marginalized likelihood ratios. Image and Vision Computing, in press. [30] A.E.C. Pece, D. Worrall, A statistically-based Newton method for pose refinement, Image and Vision Computing 16 (8) (1998) 541–544. [31] A.E.C. Pece, D. Worrall, Tracking with the EM contour algorithm, Proceedings of the 7th European Conference on Computer and Vision: ECCV, LNCS 2350, Springer, Berlin, 2002. pp. 3–17. [32] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C (2nd edition), Cambridge University Press, Cambridge, 1992. [33] R.A. Redner, F. Walker, Mixture densities, maximum likelihood and the EM algorithm, SIAM Review 26 (1984) 195–239. [34] H. Sidenbladh, J. Black, Learning the statistics of people in images and video, International Journal of Computer Vision 54 (1/2/3) (2003) 183– 209. [35] R.S. Stephens, Real-time 3D object tracking. In Proceedings of the Alvey Vision Conference, pp. 85–90, 1989. [36] G.D. Sullivan, Visual interpretation of known objects in constrained scenes, Philosophical Transactions of the Royal Society of London, Series B 337 (1992) 361–370. [37] A.D. Worrall, J.M. Ferryman, G.D. Sullivan, D. Baker. Pose and structure recovery using active models, in Proceedings of the British Machine Vision Conference: BMVC’95, 1995, pp. 137–146. [38] A.L. Yuille, M. Coughlan, Fundamental limits of Bayesian inference: order parameters and phase transitions for road tracking, IEEE Transactions on PAMI 22 (2) (2000) 160–173.