Incremental estimation of image flow using a Kalman filter

Incremental estimation of image flow using a Kalman filter

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION Vol. 3, No. 1, March, pp. 39-57, 1992 Incremental Estimation of Image Flow Using a Kal...

13MB Sizes 8 Downloads 153 Views

JOURNAL

OF VISUAL

COMMUNICATION

AND

IMAGE

REPRESENTATION

Vol. 3, No. 1, March, pp. 39-57, 1992

Incremental Estimation of Image Flow Using a Kalman Filter AJIT

SINGH

Advanced Image Processing Project, Siemens Corporate Research, Princeton, New Jersey 08540 Received October 18, 1990; accepted April 29, 1991

is not discussed in this paper because of space limitations Many applications of visual motion, such as navigation and is that it serves to unify a very wide class of existing tracking, require that image flow be estimated in an on-line, incre- techniques or image-flow computation. The issue of unifimental fashion. Kahnan filtering provides a robust and efficient cation is discussed in [26]. Before the framework is demechanism for recording image-flow estimates along with their scribed, a brief review of the state of the art is in order. uncertainty and for integrating new measurements with the existFor reasons of computational efficiency as well as ing estimates. In this paper, the fundamental form of motion physiological plausibility, image-flow computation must information in time-varying imagery-conservation hrformamake use of local spatial and temporal neighborhoods. It tion-is recovered along with its uncertainty from a pair of imis well understood [4, 211 that by using local measureages using a correlation-based approach. As more images are acments alone, the true velocity can be recovered only in quired, this information is integrated temporally and spatially those image regions that have sufficient local intensity using a Kahnan filter. The uncertainty in the estimates decreases variation, such as intensity corners and textured regions. with the progress of time. This framework is shown to behave very well at the discontinuities of the flow field. Algorithms based on This constitutes the well-known aperture problem. Vethis framework are used to recover image flow from a variety of locity must be propagated from regions of full informaimage sequences. 0 1991 Academic Press, Inc. tion, such as corners, to regions of partial or no information. This implies that any approach to local computation of image flow must incorporate two functional steps. In the first step, local information about velocity is recov1. INTRODUCTION ered using the image-intensity distribution in small spaImage flow is a commonly used representation for vi- tiotemporal neighborhoods. In the second step, the local sual motion. It assigns to each point on the visual field a information in small spatial neighborhoods is integrated two-dimensional velocity vector that depicts the projecto recover the flow field that satisfies certain smoothness tion of the instantaneous three-dimensional velocity of criteria. The past research is summarized below in light the corresponding point in the scene. Typically, all the of these two steps. A detailed review can be seen in [2, information that is available about a dynamic scene is an 251. Most of the current frameworks for image-flow comimage sequence. The image-flow field must be computed putation use one of the following three basic approaches from the image sequence. Furthermore, many real-time for the first step mentioned above: (i) correlation-based applications of image flow, such as navigation and trackapproach [4, 241, (ii) gradient-based approach [9, 11, 16, ing, require that image flow be recovered, along with an 21, 301, and (iii) spatiotemporal energy-based approach associated measure of uncertainty, in an on-line, incre[ 1, 141. The output of the first step is in the form of initial mental fashion. estimates that are updated iteratively in the second step. This paper describes a Kalman filtering-based frameFor the second step, the current frameworks use either a work for image-flow computation. The principal advansmoothness constraint [4, 15, 16, 211 or the analytical tages offered by this framework are as follows. (i) Covaristructure of image flow [19, 291. ante matrices (or alternatively, confidence measures) are In a generalization proposed in [26], I had termed the associated with the estimate of image flow at each stage information recovered in the first step as conservation of computation. (ii) Kalman filtering allows fusion of new information because it is derived from the imagery by measurements of image flow with the existing estimates using the assumption of conservation of some image on the basis of covariance information; this leads to pro- property over time. Typically, this property is intensity gressive reduction of uncertainty. (iii) It is possible to [ll, 16, 211, some spatiotemporal derivative of intensity estimate certain types of discontinuous flow fields with[93, or intensity distribution in a small spatial neighborout any a priori knowledge about the location of discontihood [4, 241, etc. Similarly, the information used in the nuities. The flow fields thus recovered are not blurred at second step was referred to as neighborhood information motion boundaries. A contribution of this framework that because it is derived by using the knowledge of velocity 39 1047-3203/92 $3.00 Copyright Q 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

40

AJIT SINGH

_

@t-l -

Delay -

@t-l -

U t-1

+

Delay e

-+

Ut-1

ut-

FIG. 1. A schematic diagram of the Kalman titer.

variation over small spatial neighborhoodsin the visual field. I had shown that recoveringimage flow amounted to performingan optimal combination of the two types of information. In essence,the first step resulted in estimates of image flow along with their uncertainty. The second step iteratively updated the estimates (and reduced the uncertainty) by fusing information from small spatial neighborhoods. In the framework discussedin this paper, the process of updating velocity estimatesis extendedto use spatial as well as temporal neighborho0ds.rKalman filtering is usedas a mechanismfor performing such updatesincrementally over time. The rest of the paperis devotedto a descriptionof the Kalman filtering-basedframework and its comparisonwith the various existing frameworks reviewed above. The organizationof the paper is as follows. An overview of the framework is given in Section 2. Computationalaspectsof various componentsof the framework are discussedin Section 3. Implementation details are given in Section 4 and the results of applying this framework to a variety of image sequencesare described in Section 5. Finally, concluding remarks are given in Section 7.

In Fig. 1, the portion to the left of the dotted line depicts the measurementmodel, and it can be summarized as follows. The quantity to be estimatedis the state vector U, whoseevolution over time is specifiedthroughthe statetransition matrix at-i and addition of Gaussianprocessnoise with a covarianceQt.-i, as shown in Eq. (1). Further, the statevector U, is relatedlinearly to the measurementvector D, through the measurementmatrix H, and addition of GaussianSensornoise with a covariance Rt, as shown in Eq. (2). The processnoise and the sensor noise are assumedto be uncorrelated.It is important to note that the measurementmatrix H, is neededbecause, in general,the measurementD, is not in the samecoordinate spaceas the U,, the quantity of interest. Furthermore, the measurementD,, in general,is never free of error-hence the needto estimate U,. The portion to the right of the dotted line in Fig. 1

Measurement model u, = WJ,-,

+ WI,

D, = HJJ, + tr, 2.

AN OVERVIEW

OF THE FRAMEWORK

Kalman filtering providesa convenientplatform for incremental estimation (of a quantity) in a setting where multiple measurements(that relate to the quantity) are acquiredover time. A schematicdiagram of the Kalman filter is shown in Fig. 1 and its associatedequationsare summarizedin Fig. 2. It is apparentfrom the figuresthat the filter is basedon a linear measurement model and it operates in two phases-prediction phase and update phase-to obtain an estimate that is unbiasedand has a minimum mean-squarederror [ 121.A brief descriptionof the underlying model and the two phasesis given below.

Q3

l, - NW, R3

~hA:l = 0

(1)

(2) (3)

Prediction phase

0; =cp,-,o:-, P;

= (PI-,P:-,a:-,

(4) + Q,m,

(5)

Update phase 0:

= 0‘;

+ KJD,

P:

= [I - K,HJP;

K, = P;H;[H,P;H:

I Spatial and temporal coherence of image flow has been used in several recent works on image flow [5,7,9].

7, - W',

- H,ti;]

(6) (7)

+ R,1-’

FIG. 2. Kalman filter equations.

(8)

IMAGE-FLOW

41

ESTIMATION

Temporal Update

,.

,ti “t .,

-

~~~~~

i______ .. .... .... ..

Evolution

-

i,___,..__..___ .... j -+

Ut-1

Ut

FIG. 3. A scheme for image-flow estimation embedded in the Kalman filter. The blocks in dotted lines are implicit and do not have to be implemented.

correspondsto the estimator (which estimatesthe state vector U, and its covariance). As mentioned above, it operatesin two phases-prediction and update. In the prediction phase,the previous state estimate oI:-i and its covariance PLi are extrapolated to the predicted state vector 0; and its covarianceP; . The predicted covariante is used to compute the Kalman gain K,. In the update phase, the measurementresidual D, - H,U; is weighted by the Kalman gain K, and added to the predicted state 0; to yield the updatedstate 0:. In essence, the Kalman gain servesas a weighting function that uses the relative uncertaintiesof the previousestimateand the current measurementto produce a new estimate. Figure 3 shows the proposed image-flow estimation schemeembeddedin the Kalman filter discussedabove. It is composedof the following componentsand is described in detail in the next section. 1. Conservation. I will show in the next section that conservationinformation is in the form of an image-flow vector U, = (uc, u,) for each pixel, accompaniedby a covariancematrix Sc.2A comparisonbetweenFig. 1 and Fig. 3 reveals that recovering conservationinformation correspondsto measurement.Therefore,D, = U, and R, = S,. Further, since the objective is to estimate image flow, the state vector is the image-flowvector itself. This implies that the matrix H, is a 2 x 2 identity matrix (that is, in this specific problem of image-flow estimation, the measurementandthe estimateare in the samecoordinate space). Conservation information can be recovered in this form usingany of the three basicapproaches-correlation-based,gradient-based,and spatiotemporalenergybased[26]. For the sake of simplicity, I use the correlation-basedapproachin this paper. 2. Temporal update. This amounts to computing the Kalman gain K, and updatingthe predictedstateestimate 0; and its covarianceP; . I refer to the updatedstateas * The subscript “c”

characterizes conservation.

f?:i and the updatedcovariance as P:‘, the superscript ‘7” standingfor intermediate. This is becausethey must undergo spatial update before the “true” state 0: and covarianceP: can be obtained. 3. Spatial update. The estimates of image flow obtainedfrom conservationinformation can be noisy, especially in the areas of uniform intensity. Spatial update servesto “fill in” these areas. 4. Prediction. This involves implementing@-I based on the knowledge about how image flow at a point evolves with time. The output of this stage is the state estimate Z?; and its covarianceP; . 3.

3.1.

COMPUTATIONAL

Conservation

DETAILS

Information

An explicit assumption on which moit image-flow computation techniques are based is that some image property is conservedover time. In other words, in each image of a sequence,the projection of a given moving point in the scenewill have the same value of the conservedproperty. Factors that affect the robustnessof the choice of conservedproperty are illumination, type of motion (rotational/translational), noise and digitization effects, etc. 14,251.I usethe Laplacian of intensity (computed by the difference-of-Gaussians operationusing the masks suggestedby Burt [S])as the conservedproperty.3 Basedon the assumptionof conservationestimating im3 Anandan [4] gives adequate justification for using the Laplacian. First, as compared to intensity itself, it is relatively insensitive to the changes in illumination. It filters out the dc component. Second, a Laplacian pyramid serves to decompose the original imagery into its spatial frequency components, and can be used in hierarchical computation of image flow. This is useful if image velocity can range from very small to very large. In many experiments discussed in [26], I use the Laplacian pyramid to estimate image flow. Finally, compared to several other invariants, such as spatiotemporal D’Alembertian [9], the Laplacian is computationaly simpler.

42

AJIT SINGH

age flow using the correlation-based approach [4] amounts to an explicit search for the best match for a given pixel of an image (Laplacian) in a searcharea in subsequentimages of the sequence.The extent of the searchareacan be decidedon the basisof a priori knowledgeabout the maximum possibledisplacementbetween two imagesor by usinga hierarchical strategy[4]. Correlation gives a response, i.e., a matching strength,at each pixel in the searcharea. Thus, the searcharea can be visualized as covered with a “response distribution.” Anandan [4] had shown that using the sum of squared differences (SSD) offers several computational advantagesover correlation. Using SSD, which is a measureof mismatch, one obtains an “error distribution” over the searcharea. The procedurefor obtaining error distribution and converting it into responsedistribution is discussedbelow. For eachpixel 9(x, y) at location (x, y) in the previous (Laplacian)image .%i(acquiredat time t - l), a correlation window w,, of size (2n + 1) x (2n + 1) is formed aroundthe pixel. A searchwindow Wr,of size (2N + 1) x (2N + 1)is establishedaroundthe pixel at location (x, y) in the current (Laplacian)image 92 (acquiredat time r). The (2N + 1) x (2N + 1) sampleof error distribution is computedusing sum of squareddifferencesas

i=-n

-

ously between zero and unity over the entire range of error. I suggestthat responsedistribution be interpreted as follows. Each point in the searcharea is a candidatefor the “true match.” However, a point with a small responseis lesslikely to be the true match than a point with a high response.Assuming that the time elapsedbetween two successiveimagesis unity, eachpoint in the search area representsa point in u - u space.Thus, response distribution could be interpretedas a frequencydistribution in velocity space-the responseat a point depicting the frequency of occurrence, or the likelihood, of the correspondingvalue of velocity. This interpretation allows one to use a variety of estimation-theoretictechniquesto computevelocity andassociatea notion of confidence with it. Specifically, the quantity that we are trying to measureis the true velocity U, = (u,, u,). With the interpretationgiven above,we know the frequencyof occurrence?&(u, u) of various valuesof velocity (u, u) = (u,, u,) + (e,, e,) over the searcharea. The quantity (e,, e,) is the error associatedwith the point (u, u), i.e., its deviation from the true velocity. One can obtain an estimate of the true velocity using a weighted-least-squares approach[6]. This estimate,denotedby U, = (uc, uc),is given by

j=-n

9*(x

+

IA

+

-N-(u,us+N.

i,

y + u + jp, (9)

W”~.,(U, u)u uc = &z,sRR,(u, u) -

(11)

By interpreting (e,, e,) as the deviation of the point The (2N + 1)3<(2N + 1)sampleof responsedistribution (u, u) from the true velocity, asdiscussedabove,we have is computedas imposed an additive error model [6]. Further, if one makesthe following assumptions,it is possibleto associGRc(u,u) = e-Kisc(u,v),-N I u, u 5 +N. w-v ate a notion of a covariancematrix with the estimate. 1. The errors have a zero mean. A sufficient (but not The choice of an exponentialfunction for converting ernecessary) condition for the errors to have zero mean is ror distribution into responsedistribution is basedprithat the response distribution be symmetrical aroundthe marily on computational reasons.4First, it is well betrue velocity. It is possible to satisfy this condition in havedwhen error approacheszero. Second,the response regions of smoothly varying intensity if oneusesa hierarobtained with an exponential function varies continuchical implementation[4]. In this implementation,an approximate location of the point in u - u spacethat corre4 Converting error distribution into a response distribution through an spondsto the true velocity is known from lower levels of exponential function is crucial in developing a Bayesian interpretation resolution. A symmetric search window is established of this framework. The sum of squared differences essentially gives an energy distribution in velocity space. The exponential of energy (with a aroundthis point to refinethe velocity estimate.With this construction, the searchwindow is approximately symminus sign), appropriately normalized, gives a Gibbs probability distrimetric aboutthe true velocity. Also, sincethe region is of bution [lo, 281. Further, the assumption of spatial and temporal coherence of image flow could be formulated as an a priori probability distrismoothly varying intensity, the responsedistribution is bution. These two distributions could be used in a Bayesian approach to unimodal, with its modeapproximately at the true velocrecover an estimate of image flow that has maximum a posteriori probaity. This resultsin an approximately zero-meanerror. In bility. It can be shown that this estimate would be identical to the a single-resolutionimplementation,the searchwindow is estimate obtained later in this paper (Eq. (20)), under the assumptions symmetric aboutthe origin of u - u space,not about the outlined in Section 3.3.

IMAGE-FLOW

64

(b)

43

ESTIMATION

(cl

FIG. 4. Response distribution over the search window for some representative examples; the darker the pixel, higher the response. The labels “high” and “low” refer to the confidence measures associated with the eigenvectors.

true velocity. This causesthe mean error to deviatefrom zero. Similarly, if the region under considerationis textured, it may lead to multiple modes within the search window, only one of which is at the true velocity. This also causesthe mean error to deviate from zero. 2. The errors are independent. This assumptionis almost always violated becausethe responsedistribution at various pixels in the search window is obtained using overlappingcorrelationwindows. Second,in hierarchical implementation, an error incurred at a pixel at a coarse resolution is transmitted to several neighboringpixels in finer levels. This causes the errors to be dependent. However, as the experimentsin the next section suggest, the violation of this assumptiondoes not seemto have a major effect on image-flow estimates. Under these assumptions,the covariancematrix associated with this estimate is given by

where the summation is carried out over -N 5 u, u 5 +N. It is known [6] that reciprocalsof the eigenvaluesof the covariance matrix serve as confidencemeasuresassociatedwith the estimate, along the directions given by the corresponding eigenvectors. Figure 4 shows the eigenvectorsand the correspondingconfidencemeasures for some typical responsedistributions.5Further, these 5 In interpreting response distribution, we have assumed that it is unimodal. This assumption does get violated in the presence of texture, espeically if the size of the search window is greater than the scale of

eigenvectorscorrespondto the principal axesof response distribution. Principal axeshavebeenusedpreviously by Scott [24] to representvelocity. Summarizing, there are three essential steps underlying the computation of conservationinformation. They are: (i) selectingthe conservedquantity and deriving it from intensity imagery, (ii) computing error distribution and responsedistribution over the searchareain the velocity space]and (iii) interpreting responsedistribution, i.e., computing an estimate of velocity UCalong with a covariance matrix S,. As discussedearlier, UCand S, serve as the parametersD, and R, for the measurement model of the Kalman filter. Before leaving this subsection,the following clarification would be in order. In interpretingthe responsedistribution, I haveassumedthat it is unimodal. This assumption does get violated in the presence of texture, especiallyif the size of the searchwindow is greaterthan the scale of intensity variations. The weighted-leastsquaresapproachusedabove“averagesout” the various peaks,giving an incorrect estimateof velocity. However, sincethe “spread” of the distribution is largein this case (as comparedto the situation where the responsedistribution has a single well-defined peak), the confidence associatedwith the estimate will be low. In essence,althoughthe procedurefor interpreting the responsedistribution gives an incorrect estimateif the distribution is not unimodal, it does associatea low confidence with the (incorrect) estimate. intensity variations. The weighted-least-squares approach used above “averages out” the various peaks, giving an incorrect estimate of velocity. However, since the “spread” of the distribution is large in this case (as compared to the situation where the response distribution has a single well-defined peak), the confidence associated with the estimate will be low. In essence, although the procedure for interpreting the response distribution gives an incorrect estimate if the distribution is not unimodal, it does associate a low confidence with the (incorrect) estimate. One could perhaps do better by identifying the “correct” mode and using it to determine the estimate of velocity and its covariante matrix. We are currently investigating this approach.

44 3.2.

AJIT SINGH

Temporal

Update

The objective of this stage is to integrate the new “measurements” (output of the conservation stage) with predicted estimates of image flow. This is done by computing the Kalman gain K, and using it to update the state vector U, (image flow) and its covariance P,. From Eq. (8), the Kalman gain is given by K, = P;HT[H,P;HT

Substituting

+ R,]-‘.

(13)

H, = Z and R, = S,, one can obtain K, = P;[P;

+ S,]-‘.

(14)

mation, a weighted-least-squares estimate of velocity, 0, can be computed. Further, assuming additive, zeromean, and independent errors, a covariance matrix, S, , can be associated with this estimate. The estimate and the covariance matrix thus obtained serve as the “opinion” of the neighborhood regarding the velocity of the central pixel. Quantitatively, if the neighborhood size is (2~ + 1) x (2w + l), the velocities of these (2w + l)* pixels map to the points (Ui, Vi) in u - u space (where 1 5 i I (2w + l)*) and if the weight assigned to the point (ui , Vi) is CRn(Ui, Vi), the weighted-least-squares-based estimate a = (ii, V) of velocity of the central pixel is given by

%~“%I(4 u)u ii = ~,Z”9tR,(U,u)

Substituting this in Eq. (6) along with D, = UC and H, = I, the updated image-flow estimate is given by ir:’

= 0;

+ PJP;

+ S,]-‘[UC

- OF],

0u) v=W”%zG4 Iz,~“ckR,(U,

(17)

(15)

and because Eqs. (7) and (8) can be combined to give P: = [(P;)-I + (HTR,H,)-*]-I, the covariance associated with this estimate is given by

and the covariance matrix associated with this estimate is given by ISi Yin(Uiy

P:’ = [(PT)-’ 3.3.

Spatial

+ (s,)-yl.

(16)

Update

The objective of spatial update is propagate velocity from regions of low uncertainty (or conversely, high confidence), such as corners and textured regions, to those of high uncertainty, such as edges and regions of uniform intensity. Spatial update can be viewed as a part of Kalman filtering that incorporates prior knowledge about the nature of spatial variation (or smoothness) of the flow field. Assume for a moment that the velocity of each pixel in a small neighborhood around the pixel under consideration is known. One could plot these velocities as points in u - u space giving a neighborhood velocity distribution. Some typical distributions are shown in Fig. 5. What can one say about the velocity of the central pixel (which is unknown)? Barring the case where the central pixel lies in the vicinity of a motion boundary, it is reasonable to assume that it is “similar” to velocities of the neighboring pixels. In statistical terms, the velocity of each point in the neighborhood can be thought of as a measurement of the velocity of the central pixel. It is reasonable to assume that all of these measurements are not equally reliable-they must be weighted differently if used to compute an estimate of velocity of the central pixel. I weight the velocities of various pixels in the neighborhood according to their distance from the central pixel-the larger the distance, the smaller the weight. Specifically, I use a Gaussian mask. Based on this infor-

xi

'I2 =

IfSi %n(Uiy

i

xi

UJ(Ui - ii)*

an(Ui,

vi)

UJ(Ui - E)(Ui an(ui

- 6)

9 Ui)

IZj S?dR,(Ui,Ui)(Ui - U)(Ui - 6) 2;

CRn(Ui,

Ej %,(*iq..Uj)(Uj

\

Vi)

- U)*

ii Gi,(Ui, ii)

I

7 (18)

/

where the summation is carried out over 1 I i I (2w + 1)2. What do we do with this estimate and covariance matrix? It can be used as “prior” knowledge (given that sufficient time has elapsed since the estimation process began) that can be fused with the newly acquired knowledge (the estimate 0:’ and the covariance matrix PTi). It can be shown [26, 61 that the updated estimate 0: that weights the two sources of information appropriately to minimize the squared error is given by (P:‘)-‘(OI:

- @i)

+ S,‘(@

- 0) = 0.

(19)

In this equation, Dand S, are derived on the assumption that velocity of each pixel in the neighborhood is known in advance from an independent source. This assumption is invalid in practice. Hence, u and S, are unknown and the velocity 0: cannot be derived directly from Eq. (19). However, Eq. (19) is available at all the pixels in any given neighborhood in the image. If condi-

IMAGE-FLOW

45

ESTIMATION

Av/ X

X

u

(a)

(b)

FIG.5.

Velocity distribution

(cl

for some representative

tions discussed below are satisfied, we essentially have a system of coupled linear equations that can be solved by an iterative technique such as the Gauss-Siedel relaxation algorithm [22]. The iterative solution can be written as [22] (d;)k+l (o;)o

= [(PT’)-’ = G:i

+ S,‘]-’

[(PT’)-10:’

u

+ sp] cw

and the covariance matrix P: associated with the final estimate of velocity is given by [(P:‘)-’ + S;‘]-I, where S;’ is computed from the final iteration. The eigenvalues of this matrix depict the confidence measures corresponding to the final estimate. This matrix serves several purposes. Qualitatively, it helps to identify those regions in the image that have the most reliable image-flow estimates from the viewpoint of applicability to high-level interpretation. Quantitatively, it serves as an essential input to procedures for incremental computation of scene geometry [18]. The conditions that must be satisfied for the iterative solution to converge are discussed below. First, for Eq. (19) to represent a system of coupled linear equations, S, must be a constant and must be known in advance. As shown in the experiments discussed later, this is true after a sufficient number of frames have been processed. Second, for the iterative procedure to converge irrespective of the value of initial estimate (@)O, both (P:‘)-’ and S;’ must be positive definite. As discussed in [27], this criterion is generally satisfied in real imagery except in pathological cases such as absolutely uniform (intensity) regions. Finally, for the iterative scheme to converge to the optimal solution, the two estimates @’ and USCmust be independent. This condition is clearly violated in our implementation, because ak is computed from the velocities (from the previous iteration) in a small spatial neighborhood. Further research needs to be done in order to determine the overall effect that violating the independence assumption has on the final estimates of velocity. In all the experiments that we have conducted,

neighborhoods.

the flow fields have been recovered to a very good accuracy (see Section 5). I show below that each iteration of the iterative procedure amounts to performing an optimal combination of conservation information with neighborhood information derived from the previous iteration. This can be used to give a geometrical interpretation of the iterative procedure and to explain the performance of this framework at motion boundaries. Geometrical interpretation. metrical interpretation of the use the following result from refer to it as the “information proof of the theorem is given

In order to give iterative procedure, estimation theory fusion theorem.” A in [26].

a geoI will [12]. I formal

INFORMATIONFUSIONTHEOREM. There are n sensors, each giving a measurement Xi of an unknown quantity X with an additive, zero-mean, and independent error Vi, where Xi, X, and Vi are p X 1 vectors. Each error Vi is characterized by its p x p covariance matrix Si. The optimal estimate X for the unknown quantity X and its associated covariance matrix 3 are given as follows. The optimal& is meant to imply that the error in the fused estimate is unbiased and is minimal in a mean-squared sense. 8 = [s;’ + s;’ + - * * + S,‘]-qspx, + . . . + S,‘X,] 5 = [s;’

+ s;’

+ * * * + S,‘]-1.

+ s;‘xz

(21)

It is clear that Eq. (20) is identical in its form to Eq. (21) for n = 2. Thus, Eq. (20) can be interpreted as follows. At each iteration, conservation information and neighborhood information act as two sensors giving oTi and ok as their respective measurements of the velocity U, with error characteristics given by P:’ and S,, respectively. The updated velocity ( oj:i)k+ l is the optimal combination of the two measurements as suggested by the theorem. It is important to note that the covariance associated with

46

AIIT SINGH

Neighbxhocd Infomatio?

COlWXUtiOtt Information

lnfomation



FIG. 6. A geometrical interpretation

of the iterative procedure.

the combined estimate (tiI:i)k+* is given by [(P:‘))i + (S,)-II-i. Given that both sensors are measuring the same mean, the resultant covariance is smaller than the individual covariances. This can be geometrically interpreted as shown in Fig. 6. Conservation information and neighborhood information at any iteration are represented by estimates accompanied by regions of uncertainty. At each iteration, the updated velocity is represented by an estimate that lies somewhere between the conservation and neighborhood estimates. Furthermore, the region of uncertainty associated with the new estimate is smaller than either of the two regions of uncertainty associated with conservation or neighborhood information. Performance at motion boundaries. In quantifying neighborhood information, I have assumed so far that the neighborhood velocity distribution forms a single cluster in u - u space. Obviously, this assumption does not hold true at motion boundaries, surfaces slanting away from the viewpoint, rotating surfaces, etc. In the following discussion, I will analyze the performance of the framework at motion boundaries-the other scenarios are similar. Specifically, I will show that (i) the procedure discussed above for using neighborhood information is still justified and (ii) in the absence of texture,6 it does a better job of preserving the step discontinuities in the flow field than smoothing-based procedures [4, 161. For this discussion, recall that each of the two estimates 0:’ and uk maps to a point in u - u space. Similarly, each of the two covariante matrices P:’ and S, maps to an ellipse that has its center at the respective estimate and that has its major 6 In this context, texture is meant to imply an intensity variation whose scale is smaller than the size of the search window.

and minor axes equal to the eigenvalues of the covariance matrix. Therefore, each iteration amounts to finding a point in u - u space that has the minimum weighted sum of squared perpendicular distances from the axes of the two ellipses-the eigenvalues serving as weights. The behavior of this procedure in the vicinity of a motion boundary is depicted in Fig. 7a. For the conservation ellipse E,, only the major axis is shown because the minor axis will be very small in this region. In other words, all that conservation information tells (with high confidence) about the velocity of the central pixel is that it lies somewhere along the major axis of the ellipse E, . Velocities of neighboring points are also plotted from the previous iteration. Assuming that there is no texture in the vicinity of the boundary and the intensity is smoothly varying (i.e., conservation information is reliable) and that the boundary corresponds to a step discontinuity in the flow field, the velocities of neighboring points form two clusters in u - u space. As a result, the minor axis of the neighborhood ellipse E, will be very small. In other words, all that neighborhood information tells (with high confidence) about the velocity of the central pixel is that it lies somewhere along the major axis of the ellipse E, . Since the correct velocity will lie in one of the two clusters, this opinion of neighborhood is correct. In other words, the iterative update procedure developed for the nonboundary pixels is justified even for pixels that lie on a motion boundary.7 In order to show that this method performs better at ’ The problem of using a neighborhood information at a motion boundary is that of interpreting a multimodal distribution. This is similar to the problem of multimodality of response distribution encountered earlier in the case of conservation information. As discussed earlier, the ideal solution is to identify the “correct” mode and to use it for the interpretation task. For the problem of using neighborhood information at a motion boundary, this amounts to throwing away the “outhers” (the pixels not belonging to the same object as the central pixel of the neighborhood). Recently, there has been some effort in applying outlier rejection to prevent smoothing of image flow at discontinuities 171.The applicability of this technique to our approach is currently under investigation.

A

(a) FIG. 7.

(b) Performance at motion boundaries.

IMAGE-FLOW

discontinuities as compared to smoothing, the result of smoothing[4, 161is shownin Fig. 7b. A smoothingprocedure such as that of Anandan [4] or Horn and Schunck [16] will place the updatedvelocity on the line D in u u spacethat is perpendicularto- the - major axis of E, and that passesthroughthe point (u, V) correspondingto the average velocity in the local neighborhood. This is clearly inappropriatebecauseit is known with high confidencethat the updatedvelocity lies on the major axis of E,. Also, this leads to blurring of the flow field at the discontinuity. The propagation procedure discussed in this section, on the other hand,placesthe updatedvelocity (approximately) at the intersection of the two major axes. This is justified becausethere is a high confidence associatedwith eachof the two major axes. Further, the updatedvelocity will be closer to the cluster with which the conservationinformation at the pixel is most consistent. As the experiments in Section 5 (and [26]) show, this limits the blurring of flow fields at the motion boundaries substantially. Someblurring is inevitable, however, becauseof two reasons.First, since the correlation window is not infinitesimally small, the initial estimate of velocity (derived from conservation information) at a pixel is affectedby the motion of its neighboringpixels. Second, during the iterative updateprocedure,the nonzero confidence associated with the two minor axes causesthe estimate of velocity to be slightly offset from its correct value. Further comparison of our approach with some existing ones (with respectto performanceat discontinuities) is given in Section 6. 3.4.

Prediction

The prediction stagemust predict imageflow as well as its covariance for each pixel in the current image. In essence,prediction amounts to defining @, the way in which image flow, i.e., the state vector, evolves over time. Our definition of Q,is basedon the assumptionof temporal coherence. That is, the velocity at a point “does not changedrastically” over small periodsof time. The prediction schemediscussedbelow is basedon that used by Matthies et al. for disparity extrapolation [18]. Scenemotion shifts the point X,-1 in the previous imageto the point X, in the currentimage. It is assumedthat U; at point X, is the same as the velocity @-, at point X,-i. Therefore,Q,is simply the function that warps X,-, to X, . In general,this warping function will predict image flow at locations that do not coincide with pixel locations in the new image. This is becausethe image-flowvectors in the previous image do not necessarilyhave both compoentsas exact integers.Therefore, the predictedimageflow field needsto be resampledin order to obtain the estimates at pixel locations. For this purpose, a unit squareis formed around eachpixel in the current image. The correct predictedestimateat this pixel is taken to be

47

ESTIMATION

a bilinear interpolation of the velocities of warped pixels that lie within this square. It must be pointed out that the prediction schemeoutlined above is basedon the assumptionof temporal coherence,which is essentiallya heuristic. A morerigorous schemecould be designedif more information about the underlying scenestructure and motion is available. For instance,if the sceneis known to be planarand its velocity is known to be constant, one could use projective geometry and kinematics to show that the variation of image flow in both spaceand time is quadratic in nature [29]. This relationship could be used as the basis of an “exact” model for @. Computationally, such a scheme would be much lessexpensive,comparedto the warpingbasedschemediscussedabove. Uncertainty of image flow will increaseduring prediction due to inaccuraciesin modelingtemporal coherence, camera optics, etc. This is accountedfor by a simple multiplicative amplification of covariance: P; = (1 + &>2P:_I.

(22)

The amplified covariance is warped and interpolated in exactly the samefashion as the imageflow itself. 4. IMPLEMENTATION For conservationinformation, one must establish the parametersN, n, and k in order to compute response distribution. The choice of N dependson the maximum possibledisplacementof a pixel betweentwo frames. If the displacement is small (of the order of one to two pixels per frame), N = 2 (i.e., a 5 x 5 searchwindow) is appropriate.If the displacementis large, one can still use N = 2 along with a hierarchical searchstrategy [4]. The value of n is decidedon the basisof how many neighbors should contribute their opinion in estimation of velocity of the point under consideration.Too small a neighborhood leadsto noisy estimates.Too large a neighborhood tendsto smoothout the estimates.I haveusedn = 1 (i.e., a 3 x 3 window). The parameterk is essentiallya normalization factor. In the implementationusedhere, k is chosen in such a way that the maximum responsein the searchwindow is a fixed number (close to unity). For temporal and spatial update, inversion of various matrices posesproblems when one or more of the eigenvalues are zero or very small. For this reason, singular value decompositionis used for matrix inversion. Also, for spatial update, one must establish the value of the parameterw andsomecriteria to stop the iterative update process.The role of w is similar to that of n; henceit is chosen to be unity. In the experiments reported in this paper, ony one iteration of spatial update is performed per frame. More iterationsof spatialupdatewould reduce

48

AJIT SINGH

the number of frames required to reach an acceptable level of accuracy. This will, however, require that more time be allowed between successive frames. Finally, for the prediction stage, the parameter E is chosen to be 0.05. 5. EXPERIMENTS The experiments described in this section can be divided into two categories-quantitative and qualitative. A detailed description of the objectives, methodology, and results of each category of experiments is given below. Quantitative experiments. The general objective of this category of experiments is to judge the quantitative correctness of the flow fields. In order to accomplish this, the “ground-truth” flow field must be known. Typically it is possible to know (or compute) the ground-truth flow field only if(i) the motion is synthetically generated, e.g., by warping a given image in some known fashion, or (ii) the camera motion and the depth of each point in the

scene are exactly known. The second scenario is considered in the experiment that follows. For the sake of brevity, only one experiment is described. The imagery for this experiment is selected in such a way that the flow field does not have any discontinuities, simply because it is very difficult to come up with the ground-truth flow field in the presence of discontinuities. Specifically, the scene is composed of a textured poster rigidly mounted on a precision translation table. A 512 x 512 camera is mounted on the table as well, but its (translational) motion can be accurately controlled. The poster is placed facing the camera and slanted in such a way that (i) the optical axis is not perpendicular to the plane of the poster and (ii) the distance between the camera an the poster is very small (about 12 inches). Both these arrangements help make the resulting flow field interesting even when the camera is undergoing a pure translation. Twelve frames are shot at regular intervals as the camera translates horizontally by 0.6 inch, in a plane perpendicular to its optical axis. Between the first two frames, the image displacement is roughly 4 pixels where the poster is closest to the camera and roughly 2 pixels

(a>

(b)

cc>

(4

FIG. 8. The poster experiment: (a) 1st frame, (b) 4th frame, (c) 8th frame, and (d) 12th frame.

IMAGE-FLOW

ESTIMATION

6)

(b)

(cl

Cd)

49

FIG. 9. The poster experiment: (a) and (b) confidence measures computed as the eigenvalues of the covariance matrix P: at time t = 2; (c) and (d) the corresponding confidence measures at time t = 12.

where the poster is the farthest from the camera. The corresponding displacements between the last two framesare 6 and3 pixels, respectively.The exact amount of cameratranslation as well as the distanceof the lens from the rigid mount is recorded. The camera is then calibrated and its focal length is determined.The correct flow field for each pair of frames is determined using perspective projection equations [29]. The images are low-pass filtered and subsampledto get a resolution of 128 x 128using Burt’s technique [8]. Both components of imagevelocity at eachpoint are divided by 4 to get the correct flow field correspondingto the reduced image size.* Four frames of the sequences(lst, 4th, 8th, and 12th) are shown in Figs. 8a, 8b, 8c, and 8d, respectively. Figures 9a and 9b show the two confidencemeasures(in* Actually, the reduced-size imagery will correspond to image flow that is not exactly equal to the original image flow reduced in magnitude by a factor of 4. This is because of the intensity changes that accompany low-pass altering and subsampling. Due to lack of a quantitative characterization of these changes, I do not account for them.

verse of the eigenvaluesof P:) after the first two frames have beenprocessed(i.e., at t = 2). As expected,(i) the regionswhere the first confidencemeasureis high correspond to the edges in the intensity image and (ii) the regionswhere both confidencemeasuresare high correspondto comers in the intensity image. Figures9c and 9d showthe correspondingconfidencemeasuresat t = 12.It is apparentthat in general(i) at any given pixel, the confidencegrows over time and (ii) as time progresses,confidence “propagates outward” from the regions of high initial confidence.These two effects can be attributed to temporal and spatialupdate,respectively.Figures 10and 11show the image-flowestimate I!?: at t = 2 and t = 12, respectively. The graph in Fig. 12 tracks the percentage root-mean-squarederror in the image-flowestimates(the error being the magnitude of the vector difference between the ground-truthvelocity and the estimatedvelocity). Computed over the entire image, the root-meansquarederror after 12frames is 2.7%. If, however, only the 25% most confident pixels are considered,the rootmean-squarederror after 12frames is only 1.2%.

AJIT SINGH

Squared Error 30.

0

0

Computed over entire image x

“\

Computed over the pixels lying top 25% confidence levels.

x

I

Frames

2

4

6

8

10

12

FIG. 10. The poster experiment: the estimate of image-flow field 0: atr = 2.

FIG. 12. Root-mean-squared tion of time.

Qualitative experiments. The objective of this category is to judge the qualitative correctness of flow fields recovered by the algorithm in realistic environments. The first experiment uses 12 frames of an image sequence of a road scene shot at the Siemens facility in Munich. Figure 13 shows the lst, 4th, 8th, and 12th frames (clockwise from top left). The only moving objects in the scene are

the two cars (because of the wind, there is small movement around the trees too). The ground-truth flow field is not known. The original frames are 256 x 256 in resolution and are subsampled to 128 x 128 for computing image flow. The top left and right panels of Fig. 14 show the two confidence measures (inverses of the eigenvalues of P:) at t = 2, and the bottom left and right panels show the corresponding confidence measures at t = 12. Once again, it is apparent that in general (i) at any given pixel, the confidence grows over time and (ii) as time progresses, confidence propagates outward from the regions of high initial confidence. Figures 15 and 16 show the image-flow estimate 0: at t = 2 and t = 12, respectively. Clearly, the regions corresponding to the cars show noticeable image flow. Some of the sky region also shows movement, which is incorrect. We attribute this to “video shatter” (since the sequence was acquired using a camcorder). It is noteworthy that this region has very low confidence even after sufficient frames have been processed. This is because there is usually very little spatial and temporal coherence in the shatter. Figure 17 shows the image-flow estimates at t = 12 at a higher resolution in the regions corresponding to the two cars. The top-left and bottom-left images show the flow field in a square (roughly 20 x 20 pixels in size) surrounding each car. The top-right and bottom-right images show the same flow fields superimposed on the corresponding regions of the original intensity image. It is noteworthy that the image flow on the car on the right (bottom images) does not bleed into the thin pole that occludes it.

FIG. 11. The poster experiment: the estimate of image-flow field 0: at t = 12.

error in velocity estimates as a func-

IMAGE-FLOW

ESTIMATION

51

FIG. 13. The road-scene experiment (clockwise from top left): 1st. 4th, 8th, and 12th frame of the image sequence.

The second experiment uses a relatively simpler scene. Its objective is to demonstrate that the spatial update procedure incorporated in this framework performs much better at the discontinuities in the flow field, as compared to the conventional smoothing-based techniques [4, 161. This experiment uses a toy train on a flat (and mostly dark) table. Twelve images are shot as the train rolls forward. The motion is largely translational, except in the vicinity of the wheels where it has a small rotational component. Furthermore, the motion boundaries are expected to show up primarily as step discontinuties in the flow field. The images are 256 x 242 in resolution and the maximum image motion is about 3 pixels per frame. For image-flow computations, the images are low-pass filtered and subsampled to get a resolution of 128 x 121. At this level of resolution, the maximum image flow is expected to be between 1 and 1.5 pixels per frame. Figure 18 shows the flow field 0: at t = 12, superim-

posed on the wire frame of the train (obtained from the 11th frame). For the sake of comparison, Fig. 19 shows the corresponding flow field obtained with conventional smoothing [4, 161 (with the smoothing factor (Yset to 0.5), also superimposed on the same wireframe. In order to obtain this flow field, the spatial update block in Fig. 3 is replaced by the smoothing scheme used by Anandan [4]. A comparison of Figs. 18 and 19 clearly shows that our spatial update procedure does an excellent job of preserving motion boundaries. It is apparent that there is very little “bleeding” of velocity from the train into the background in Fig. 18. On the other hand, there is considerable blurring of motion boundaries in Fig. 19. In [26], I have shown about half a dozen experiments that demonstrate the performance of this technique and compare it with that of smoothing-based methods. For the sake of brevity, I do not reproduce them here. Figures 20a and 20b show the confidence measures obtained as eigenvalues of the covariance matrix P: at t = 12.

52

AJIT SINGH

FIG. 14. The road-scene experiment: (top left and right) confidence measures computed as the eigenvalues of the covariance matrix P: at time f = 2; (bottom left and right) the corresponding confidence measures at time f = 12.

6.

RELATIONSHIP

WITH

EXISTING

FRAMEWORKS

In this section, I compare the framework presented above with some of the existing frameworks. I use the essential blocks of Fig. 1, i.e., conservation, spatial update, prediction, and temporal update, as the basis of comparison. 6.1.

Conservation

Znformtion

Our approach is inspired primarily by the work of Anandan [4] and that of Scott [24]. There are, however, some salient differences between our approach and the above mentioned. We will attempt to clarify these differences in the following discussion. Anandan uses the SSD itself as a measure of (mis)match. He assumes that there is a well-defined minimum in the SSD surface and uses the distance of the

minimum from the pixel under consideration as the estimate of displacement. He then computes the principal curvatures of the SSD surface at this minimum and uses them as confidence measures. It is hard to justify this procedure for the following reason. If there are more than one minima in the search window, Anandan’s procedure will pick up one of them as the best match, and it might assign a high confidence to it if the principal curvatures at this minimum are large. Using normalized moments of inertia, however, will give an estimate that averages out the two peaks in the response distribution and will assign a lower confidence to the estimate. This is appropriate because of the high uncertainty associated with local measurement of motion in this example. It must be noted that Anandan uses a coarse-to-fine search strategy that can disambiguate between two possible matches. In that case it may be appropriate to assign a high confidence to

FIG. 15. The road-scene experiment: the estimate of image-flow field 0: at f = 2.

FIG. 16. The road-scene experiment:

the estimate of image-flow field 0: at f = 12.

54

AJIT SINGH

FIG. 17. The road-scene experiment (top left and bottom left): flow field in a small region around each car; (top right and bottom iright) the flow field superimposed on the corresponding regions of the intensity image.

the (correct) estimate. However, the coarse-to-fine search strategy does not always guarantee correct disambiguation. The procedure for deriving velocity estimates and confidence measures in our framework is more similar to Scott’s than Anandan’s. However, it differs from Scott’s procedure in the following. First, whereas Scott uses the principal-axis-based representation for velocity, we use the estimate-covariance-based representation. Even though the two representations are conceptually identical, the estimate-covariance representation is computationally easy to use. Also, it allows the problem of velocity computation to be viewed as a parameter-estimation problem that can be solved by using standard estimationtheoretic techniques. Second, Scott converts error distribution into response distribution by using a rational function given by k&k2 + k&. On the other hand, we use an exponential which is better behaved in terms of its range

and its characteristics when error approaches zero. This offers a considerble computational advantage.9 Also, as discussed earlier, the choice of the exponential function can be justified from the viewpoint of Bayesian interpretation. 6.2.

Spatial Update

Our scheme for incorporating neighborhood information in spatial update has two distinct features that serve as the basis for comparison with existing techniques. 9 Computation of response distribution involves parameters that must be adjusted manually (e.g., the parameters k in our approach or the parameters k, , k2, and kg in Scott’s approach). The choice of these parameters affects the shape of the response distribution and, hence, the estimate and the covariance matrix associated with it. We have not pursued any quantitative comparison of our approach with that of Scott, in terms of the effects of free parameters on the estimates of velocity.

IMAGE-FLOW

ESTIMATION

55

FIG. 18. The toy-train experiment: flow field 0: at t = 12 superimposed on the wire frame of the train.

FIG. 19. The toy-train experiment: flow field 0: at t = 12, obtained using conventional smoothing-based spatial update, superimposed on the wire frame of the train.

First, it has an explicit notion ofpostpropagation confidence measures. While the idea of prepropagation confidence measures (associated with conservation information) has been used before [4,24], that of postpropagation confidence measures is novel. As discussed in Section 3, the postpropagation confidence-measures can serve as a very useful input to systems that use image flow to recover surface geometry. Second, it provides a simple mechanism to prevent blurring of the flow field at motion boundaries. Some of the current approaches for velocity propagation do have mechanisms for this purpose. Notable among them are those of Koch and co-workers [17], Aisbett [3], Schunck [23], Nagel [20, 211, and Waxman and Wohn [29]. Also, Anandan’s scheme [4] has a provision for incorporating motion boundaries, although it has not been imple-

mented. We will attempt to give a brief comparison of the method shown in the previous section to these existing techniques. The recent technique suggested by Koch and co-workers [17] draws upon the theory of Markov random fields [13] and applies the powerful idea of line processes as a supplement to Horn and Schunck’s method to explicitly prevent smoothing across motion boundaries. This technique has shown remarkable promise on both synthetic and real imagery. It, however, uses a priori knowledge about the possible locations of motion boundaries. Regions with either high intensity gradients or high velocity gradients are marked as possible motion boundaries and line processes are started at these locations. The iterative update procedure generates a flow field with appropriate motion boundaries. If an intensity discontinuity does not

(4 FIG.

20.

(b)

The toy-train experiment: (a) and (b) confidence measures computed as the eigenvalues of the covariance matrix P: at time t = 12.

AJIT SINGH

56

correspondto a motion discontinuity, the corresponding line processdies eventually.The procedureshown in this paperis different from that of Koch and co-workers becausevelocity propagationis suppressedin a regiononly if there is a motion discontinuity suggestedby conservation information (in the form of multiple clusters in the neighborhoodvelocity distribution). Also, this method is computationallymuch less expensive. The approachof Aisbett [3] is essentially smoothingbasedand is similar to that of Horn and Schunck. However, the smoothnessterm in the minimization constraint is weightedby the underlyingintensity gradientin such a way that the regions of high intensity gradient are allowed to have large values of velocity gradient. Indirectly, this method allows discontinuities to emerge in the smoothedflow field in the regionsof strongintensity gradients.This approachhas an underlying shortcoming that it assumesthat all intensity edgescorrespondto motion boundariesand that motion boundariesoccur only at intensity edges.Obviously, this assumptionis not always true. Similarly, Schunck’s approach for discontinuous flow fields introducesa novel idea of constraint-lineclustering. However, this approachalso relies, in part, on identification of motion boundariesas discontinuities in the underlying intensity function. Nagel’s approach [20, 211introduces the idea of an oriented smoothnessconstraint. In essence,the regions where the intensity function has a high gradientand/or a high curvature in a particular direction are allowed to violate the smoothnesscriterion in that direction. Thus, the flow field is not blurred in the regionsof high gradient or high curvature.This approachalso, however,assumes that all such regions correspondto motion boundaries. This assumptionis not always true. Waxman and Wohn [29] suggestthat motion boundariescan be detectedduring the minimization processby thresholdingthe magnitude of error residual.Their approachhas shownpromising resultson synthetic imagerybut its ,performancewith real imagery is yet to be established. 6.3.

Prediction

and Temporal

Update

The schemefor prediction andtemporal updateusedin this framework is inspiredby the work of Matthies et al. [18] on Kalman filtering-baseddepth estimation. I have adaptedtheir formulation (for estimatingvelocity instead of depth) without any significant modification. 7. CONCLUSION

In this paper, I have shown a new framework for recovering image flow from time-varying imagery. This framework implements the three essential components for recovering image flow-conservation information, temporal update,and spatial update-within the Kalman filtering formalism. Someof the distinctive featuresof the framework are summarizedbelow.

1. Becauseof the statistical nature of the procedure usedto representand updatevelocity in spaceand time, there is an explicit notion of confidencemeasuresassociated with the velocity estimate at each pixel, at each stageof computation. The conservationinformation associatesa confidencemeasurewith the “raw” measurementsof velocity. This confidenceis updatedduring temporal and spatial update. The idea of confidence measureshas been used before with conservationinformation [4] as well as temporalupdate[18]. However, our formulation of spatial update of confidencemeasuresis novel. In essence, our formulation accounts for the “spread” (in velocity space)of neighborhoodvelocities in addition to their “average” that hasbeenused in earlier formulations [15, 161. 2. The spatial updateproceduredoes a much better job of preservingthe stepdiscontinuitiesin the flow field, especiallyin the absenceof texture in the vicinity of such discontinuities, than the classic smoothing-basedupdate procedures[4, 161.I have demonstratedthis for the roadsceneand toy-train sequencesin the previous section. Propagationproceduresusedin severalframeworksproposedin the recentpast [3, 15, 17,21, 23, 291are capable of preservingmotion boundaries.However, the procedure usedin this framework is different from them in the following respects: (i) it gives image flow in the entire visual field, not just at the edges;(ii) it doesnot require any a priori knowledgeabout the location of the boundaries; (iii) it doesnot assumethat all intensity edgescorrespondto motion boundariesand vice versa; (iv) it does not use high-orderderivatives of the intensity function; and (v) it is computationally simple. There are severalways in which this framework canbe extendedand improved. First, with respectto conservationinformation, the behavior of the responsedistribution needsto be analyzed in greater detail, especially for the multimodal case. Also, the predictedestimate 0; can be used as an additional input to the block that recoversconservationinformation. This estimate can assist in determiningthe location and extent of the searchwindow. Second, in the current version of the framework, the spatial updateprocedureutilizes only the estimateof velocity at neighboringpixels. It doesnot utilize the covariante matrix associatedwith the estimate.It appearsplausible that the knowledge of covariance matrix might assist in identifying motion discontinuities, thus making the velocity-propagationprocedureeven more robust at discontinuities. Also, as discussedearlier, the problem of spatialupdateat a motion boundaryis that of interpreting a multimodal distribution. The ideal solutionto this problem is to identify the “correct” modeand to useit for the interpretation task. This amounts to throwing away the “outliers” (the pixels not belongingto the sameobject as the central pixel of the neighborhood).The applicability

IMAGE-FLOW

of outlier rejection to the framework shownaboveis currently under investigation. Finally, the prediction schemeused in the framework is basedon the assumptionof temporal coherence,which is essentially a heuristic. As discussedearlier, a more rigorous scheme could be designedif more information about the underlying scenestructure and motion is available. For instance,if the sceneis known to be planar and its velocity is known to be constant, one could use projective geometry and kinematics to show that the variation of image flow in both spaceand time is quadratic in nature.This relationshipcould be usedto formulate temporal evolution of image flow. REFERENCES I. E. H. Adelson and J. R. Bergen, Spatiotemporal energy models for the perception of motion, J. Opt. Sot. Amer. 2, 1985, 284-299. 2. J. K. Aggarwal, and N. Nandhakumar, On the Compuuztion of Motion from Sequences of Images-A Review. Tech. Rep. TR-882-47, Computer Vision Research Center, University of Texas at Austin, 1988. 3. J. Aisbett, Optical-flow with an intensity weighted smoothing, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5, 1989, 512-522. 4. P. Anandan, Measuring Visual Motion from Image Sequences, Ph.D. thesis, COINS Department, University of Massachusetts, Amherst, 1987. 5. J. Amspang, Ph.D. thesis, Department of Computer Science, University of Copenhagen, 1988. 6. J. V. Beck and K. J. Arnold, Parameter Estimation in Engineering and Science, Wiley, New York, 1977. 7. M. Black and P. Anandan, A framework for computing motion over time, in Proceedings, 3rd International Conference on Computer Vision, Osaka, Japan, Dec. 4-6, 1990. 8. P. J. Burt, The pyramid as a structure for efficient computation, in Multiresolution Image Processing and Analysis (A. Rosenfeld, Ed.), pp. 6-37, Springer-Verlag, New York/Berlin, 1984. 9. B. F. Buxton and H. Buxton, Computation of optic flow from the motion of edge features in image sequences, Image Vision Comput. 2, 1984. 10. J. Clarke and A. Yuille, Data Fusion for Sensory Information Processing, Khtwer Academic, Dordrecht, The Netherlands, 1990. 11. W. Enkelmann, Investigations of multigrid algorithms for estimation of optical flow fields in image sequences, Comput. Vision Graphics Image Process. 43, 1988, 150-177. 12. A. Gelb (Ed.), Applied Optimal Estimation, MIT Press, Cambridge, MA, 1988. 13. S. Geman and D. Geman, Stochastic relaxation, gibbs distribution and the bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. InteN. 6, 1984, 721-741. 14. D. Heeger, A model for extraction of image flow, in First Znternational Conference on Computer Vision, 1987. 15. E. C. Hildreth, The Measurement of Visual Motion, MIT Press, Cambridge, MA, 1983. 16. B. K. P. Horn and B. Schunck, Determining optical flow, ArtiJiciul Intelligence, 17, 1981, 185-203. 17. J. Hutchinson, K. Koch, and C. Mead, Computing motion using analog and binary resistive networks, Computer, 1988, 52-63. 18. L. Matthies, R. Szeliski, and T. Kanade, Kalman filter-based algorithms for estimating depth from image-sequences, in Proceedings,

ESTIMATION

57

2nd International Conference on Computer Vision, Tampa, FL, 1988, pp, 199-213. 19. D. W. Murray and B. F. Buxton, Reconstructing the optic flow field from edge motion: An examination of two different approaches, in First Conference on AI Applications, Denver, 1984. 20. H. H. Nagel, Displacement vectors derived from second order intensity variations in image sequences, Comput. Vision Graphics Image Process. 21, 1983, 85-117. 21. H. H. Nagel, On the estimation of dense displacement maps from image sequences, in Proceedings, ACM Motion Workshop, Toronto, 1983, pp. 59-65. 22. A. Ralston and P. Rabinowitz, A First Course in Numerical Analysis, McGraw-Hill, New York, 1978. 23. B. Schunck, Image flow: Fundamentals and algorithms, in Motion Understanding: Robot and Human Vision (J. K. Martin and W. N. Aggarwal, Eds.), pp. 23-68, Kluwer Academic, Dordrecht, The Netherlands, 1988. 24. G. L. Scott, Local and Global interpretation of Moving Images, Morgan Kauffman, Los Altos, CA, 1988. 25. A. Singh, Image-Flow Estimation: An Analytical Review, Tech. Rep. TN-89-085, Philips Laboratories, Briarcliff Manor, NY, 1989. 26. A. Singh, Image-Flow Computation: An Estimation-Theoretic Framework, Unification and Integration, Ph.D. thesis, Department of Computer Science, Columbia University, 1990. 27. M. A. Snyder, On the mathematical foundations of smoothness constraints for the determination of optical flow and for surface reconstruction, in Proceedings, IEEE Workshop on Visual Motion, 1989, pp. 107-115. 28. R. Szeliski, Bayesian Model of Uncertainty in Low-Level Vision, Ph.D. thesis, Department of Computer Science, Carnegie Mellon University, 1988. 29. A. M. Waxman and K. Wohn, Contour evolution, neighborhood deformation and global image flow: Planar surfaces in motion, Znternnt. J. Robotics 4, 1985, 95-108. 30. A. M. Waxman, J. Wu, and F. Bergholm, Convected activation profiles and measurement of visual motion, in Proceedings, IEEE CVPR, Ann Arbor, Michigan, 1988, pp. 717-722.

AJIT SINGH received a B.Tech in electrical engineering from the Institute of Technology, Varanasi, India, in 1985, an M.S. in computer engineering from Syracuse University in 1986, and a Ph.D. in computer science from Columbia University in 1990. Currently, he is working in the image processing group at Siemens Corporate Research in Princeton, New Jersey. He is also an adjunct faculty member in the computer science department at Columbia University. Between 1987 and 1989, he worked at Philips Laboratories in Briarcliff Manor, New York. His research interests include image processing, computer vision, biomedical imaging, neurobiology, and statistical estimation theory. He is a member of the editorial board of the IEEE Computer Society Press and the author of the forthcoming book Optic Flow Computation: A Unified Perspective.