Optical flow estimation for motion-compensated compression

Image and Vision Computing 31 (2013) 275–289 Contents lists available at SciVerse ScienceDirect Image and Vision Computing journal homepage: www.els...

Download PDF

4MB Sizes 0 Downloads 96 Views

Report

Full Text

Image and Vision Computing 31 (2013) 275–289

Contents lists available at SciVerse ScienceDirect

Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis

Optical ﬂow estimation for motion-compensated compression☆ Wei Chen ⁎, Richard P. Mied Naval Research Laboratory, Remote Sensing Division, Washington, DC 20375, United States

a r t i c l e

i n f o

Article history: Received 29 May 2012 Received in revised form 24 October 2012 Accepted 22 January 2013 Keywords: Optical ﬂow Optical ﬂow determination Optical ﬂow estimation Displacement estimation Velocity estimation Motion compensation Motion-compensated prediction Motion-compensated interpolation Motion-Compensated Compression

a b s t r a c t The computation of optical ﬂow within an image sequence is one of the most widely used techniques in computer vision. In this paper, we present a new approach to estimate the velocity ﬁeld for motion-compensated compression. It is derived by a nonlinear system using the direct temporal integral of the brightness conservation constraint equation or the Displaced Frame Difference (DFD) equation. To solve the nonlinear system of equations, an adaptive framework is used, which employs velocity ﬁeld modeling, a nonlinear least-squares model, Gauss–Newton and Levenberg–Marquardt techniques, and an algorithm of the progressive relaxation of the over-constraint. The three criteria by which successful motion-compensated compression is judged are 1.) The ﬁdelity with which the estimated optical ﬂow matches the ground truth motion, 2.) The relative absence of artifacts and “dirty window” effects for frame interpolation, and 3.) The cost to code the motion vector ﬁeld. We base our estimated ﬂow ﬁeld on a single minimized target function, which leads to motion-compensated predictions without incurring penalties in any of these three criteria. In particular, we compare our proposed algorithm results with those from Block-Matching Algorithms (BMA), and show that with nearly the same number of displacement vectors per ﬁxed block size, the performance of our algorithm exceeds that of BMA in all the three above points. We also test the algorithm on synthetic and natural image sequences, and use it to demonstrate applications for motion-compensated compression. Published by Elsevier B.V.

1. Introduction The determination of the object motion and its representation in sequential images has been studied by many scientists in several disciplines, e.g., digital image processing, digital video coding, computer vision, and remote sensing data interpretation. The motion analysis, determination, and representation are crucial for the removal of temporal redundancy. A large compression ratio that results in high ﬁdelity motion picture quality requires an accurate large-scale displacement and long temporal range motion estimation for efﬁcient transmission of compressed image sequences. Thus the creation of more efﬁcient and effective algorithms for estimating optical ﬂow is very important. For these reasons, signiﬁcant effort has been devoted to solving the optical ﬂow estimation problem. To place this body of literature in a meaningful context, we present most of these works in a block diagram (Fig. 1). Almost all existing optical ﬂow estimation models and their algorithms assume the image intensity obeys a brightness constancy constraint [3–5]. The inverse problem for estimating a velocity or displacement map (i.e., the optical ﬂow) is found to be under-

☆ This paper has been recommended for acceptance by Yiannis Andreopoulos. ⁎ Corresponding author. Tel.: +1 202 767 7078. E-mail address: [email protected] (W. Chen). 0262-8856/$ – see front matter. Published by Elsevier B.V. http://dx.doi.org/10.1016/j.imavis.2013.01.002

constrained because the two unknown velocity components must be derived from this single conservation equation at each of these pixel points. To solve the under-constrained problem, several constraints on the displacement ﬁeld, such as smoothness and other assumptions, have been proposed. Typical smoothness assumptions include Horn and Schunck's regularization constraint [3], a uniform velocity assumption in a block (or template) (Lucas and Kanade [4,5]; Shi and Tomasi [16]), an intensity-gradient conservation constraint (Nagel et al. [12–14]; Nesi [15]), modeling the motion ﬁeld as a Markovian random ﬁeld (Konrad and Dubois [20]), and velocity ﬁeld modeling with bilinear or B-splines functions (Chen et al. [28,29]). Traditionally, scientists have focused their efforts on extending the many other different methods of estimating the optical ﬂow [1–35]. Some of the algorithms have been implemented in hardware [36–39]. Most realistic image sequences in computer vision applications are constructed by multiple objects moving against a static background and with respect to one another. That is, the velocity ﬁeld can be discontinuous over the image. In order to handle the discontinuities in the transition boundaries between a static background and mobile objects, Chen et al. [28] proposed bilinear modeling of the motion ﬁeld. This numerical model has solved the under-constrained problem successfully. However, the optical ﬂow equation is derived from a differential form of the conservative constraint (i.e., a ﬁrst-order Taylor expansion), and is only valid for inﬁnitesimal motion. Therefore, the motion ﬁeld

276

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

Fig. 1. Block diagram representing literature research and the current proposed work (frames in yellow background). Highlighted in this ﬁgure is one of the numerical approaches for the ﬂow ﬁeld estimation proposed in this paper.

based on the optical ﬂow equation can be estimated successfully for small displacement motion only. Three criteria are used to evaluate the derived optical ﬂow: 1.) comparison between the retrieved and the ground truth optical ﬂow ﬁelds, 2.) frame interpolation, and 3.) performance of motion-compensated compression. An evaluation using only the ﬁrst method with some special datasets may not provide sufﬁciently stringent tests of the capability of the high performance estimators. The comparison of ground truth optical ﬂow is dependent on the type of motion, texture morphology, and scale of displacement. The most important feature of motioncompensated compression is not only how well an estimated optical ﬂow matches the physical motion. Equally important are how well the motion pictures are synthesized with minimized distortion and without artifacts and dirty window effect, and the low coding cost of the motion vector ﬁeld. Although the last performance test is the most important for a variety of applications in computer vision, a successful motion estimator should demonstrate excellent performances for all three tests in a range of displacement scale from small to large. Many motion estimators and video compression algorithms each perform far from optimally by themselves, but motion estimator and video compression techniques must also interface compatibly together. The estimator adopted in the international standards for digital video compression is the block-matching algorithm (BMA) [16–19] (or overlapped BMA). Compared with ground truth optical ﬂow, the BMA method is less accurate for ﬂow ﬁeld estimation, but performs better for motion-compensated prediction (MCP) and interpolation (MCI) in realistic video coding applications. Current popular approaches with the optical ﬂow equation may outperform the BMA methods in the optical ﬂow comparison test for some speciﬁc datasets, but cannot pass the overall tests. For this reason, the BMA method is still adopted as an estimator today. The global (or energy-based) approaches usually employ the brightness constancy constraint combined with a prior constraint on the motion ﬁeld with a weighting (penalty) parameter [1–3]. However, a major issue emerges when using a weighting parameter, which is related to its optimal value. Several different weighting parameter values have been suggested [1,3,25,26], because the correct optimal value depends upon the speciﬁc ground truth ﬂow ﬁeld. In the present paper, we depart from the established weighting parameter approach. Instead, we employ a quantity derived from the nonlinear model and minimize it by varying a huge number of unknown parameters (the average velocity or displacement ﬁeld). Since there exist numerous local minima in image data applications—especially those having large featureless regions—we have found it necessary to develop new algorithms for solving the problem.

To improve the performance of the velocity estimation, especially for large displacement motion, we replace the standard differential form with a direct temporal integral of the optical ﬂow conservation constraint (or Displaced Frame Difference (DFD)) equation, and create a nonlinear system. To solve the inverse problem of the ﬂow ﬁeld estimation, we propose an adaptive framework and employ more stringent performance criteria when applying the motion-compensated compression. Our numerical approach for the ﬂow ﬁeld estimation proposed in this paper is highlighted in Fig. 1. This paper develops a generic approach that can deliver high performance for both ﬂow ﬁeld estimation and motion-compensated compression. A difﬁculty we face is that a moving image scene necessarily contains both featureless and texture-rich regions. Our goal in this paper is to develop a single motion estimation technique, which incorporates the same formalism to treat both types of regions together. This paper is organized as follows: In Section 2, a set of nonlinear system equations with the velocity ﬁeld model is derived. Section 3 introduces algorithms for this estimator. In Section 4, we deal with the validation of the new algorithms by deriving velocity from synthetic tracer motion within a numerical ocean model, and apply the new technique to video image sequences. Finally, conclusions are drawn in last section. 2. An over-constrained system 2.1. Brightness constancy constraint If we designate I(x, y, t) as the intensity speciﬁed in (x, y) coordinates and time t and a velocity vector of the optical ﬂow is v(x, y, t)=(u(x, y, t), v(x, y, t))T, we may write a differential form of the brightness constancy constraint (or optical ﬂow) equation as dIðrðt Þ; t Þ ¼ dt

∂ þ v⋅∇ I ðrðt Þ; t Þ ¼ 0: ∂t

ð1Þ

In order to constrain the image scenes at times t = t1 and t = t2, we integrate Eq. (1) from time t1 to t2 t2

∫t

1

dIðrðt Þ; t Þ dt ¼ Iðrðt 2 Þ; t 2 Þ−Iðrðt 1 Þ; t 1 Þ≡0; dt

where r(t1) and r(t2) are the position vectors at times t1 and t2. If a displacement vector ﬁeld is deﬁned by Δr ¼ rðt 2 Þ−rðt 1 Þ ¼ vΔt;

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

277

then the direct integral form of the brightness constancy constraint or Displaced Frame Difference (DFD) equation is given by DFD ¼ Iðrðt 1 Þ þ Δr; t 2 Þ−Iðrðt 1 Þ; t 1 Þ ¼ 0;

ð2Þ

where the time difference Δt is equal to t2 − t1. Obviously, the linear approximation (Eq. (1)) can be obtained from the DFD Eq. (2) for small Δr and Δt. The differential form of the optical ﬂow Eq. (1) is linear in the velocity; consequently it is valid only for inﬁnitesimal, or nearly inﬁnitesimal, motions. Using the two successive frames, we can estimate the partial derivatives with respect to time in Eq. (1). For this reason, Eq. (1) is more correctly viewed as describing the motion at an intermediate time between times t1 and t2. On the other hand, the two intensity terms in the DFD Eq. (2) correspond to the initial and ﬁnal states of motion at times t1 and t2. It is clear that employing the DFD equation for motion estimation can achieve higher accuracy compared with the optical ﬂow Eq. (1), especially for a larger scale displacement motions. 2.2. Velocity ﬁeld modeling Eq. (2) describes the evolution of optical ﬂow in two successive images. An image having a set of pixels in an Nx ×Ny array, but has 2Nx ×Ny unknown velocity components (u, v). To solve the underconstrained problem, one of the efﬁcient approaches is to expand the velocity ﬁeld as bilinear polynomial functions or two-dimensional B-Spline functions [28,29]. As a trade-off between simplicity and computational efﬁciency, we use bilinear polynomial functions to represent the velocity ﬁeld [28]. We partition the image domain into a number of sub-domain blocks (or tiles) each of which contains an nx × ny array of pixels (Fig. 2). In general, any two-dimensional function can be approximated by Lagrange's bilinear function f ðx; yÞ ¼

1 X 1 X f p þ αnx ; q þ βny Hpþαnx ;qþβny ðx; yÞ;

ð3Þ

α¼0 β¼0

where the function Ha,b (x, y) is deﬁned by 8 > ðnx −x þ pÞ ny −y þ q ða ¼ p∩b ¼ qÞ > > > > > ða ¼ p þ nx ∩b ¼ qÞ 1 < ðx−pÞ ny −y þ q : H a;b ðx; yÞ ¼ nx ny > > ðnx −x þ pÞðy−qÞ a ¼ p∩b ¼ q þ ny > > > > : ðx−pÞðy−qÞ a ¼ p þ n ∩b ¼ q þ n x

Fig. 2. The image array is divided into sub-arrays (or tiles) with n=nx =ny pixels per tile; here, n=4. Node points (marked with solid squares) of the bilinear approximation are indexed with p and q.

and another with discontinuity. Their bilinear approximations (patched images) in Eq. (3) for block size parameters nx = ny = 2 are shown in Fig. 3. Comparing the original and patched images and 3D-mesh plots in Fig. 3, we found that the bilinear expression in Eq. (3) can represent both continuity and discontinuity functions well. All velocities can be calculated with Eq. (4) using the velocity on node points expressed as vpq. Velocity vectors vij = (uij, vij) T for all i ≠ p or j ≠ q in the DFD equations are no longer independent variables when n = nx = ny > 1, except on node points. We can adjust the block size parameter n>1 to control the number of interpolation points related to the resolution of the velocity ﬁeld and the degree of the overconstraint. The system is over-constrained if n>1, because we have made a simplifying assumption about the form of the velocity ﬁeld in Eq. (4). In pixel index notation, the DFD equation in Eq. (2) becomes DFDij ¼ I i þ uij Δt; j þ vij Δt; t 2 −I ij ðt 1 Þ ¼ 0:

ð5Þ

Eq. (5) now is a function of variables of two velocity components that depend on node point velocities in Eq. (4), i.e.

y

The quantized indices p and q are functions of x and y and are given

DFDij ¼ DFDij uij ; vij ¼ DFDij upq ; vpq :

by ( fp; qg ¼

)

⌊ ⌋ ⌊ ⌋

x y nx ; ny nx ny

;

where ⌊⌋ denotes an integer operator. The {p, q} serve as block (tile) indices, since the integer operator increments them by unity after additional nx or ny pixels are counted. We denote a function with discrete variables i and j on a pixel as fij(t) = f (i, j, t) or fij = f (i, j). The velocity of two components in an image scene can be expressed by the following bilinear polynomials: vij ¼

1 X 1 X α¼0 β¼0

vpþαnx ;qþβny H pþαnx ;qþβny ði; jÞ;

ð4Þ

where vij = vij(t1) = (uij, vij) T are the velocity component magnitudes. The bilinear form within each block is actually quite adept at representing functions that may possess discontinuities. We demonstrate this with two examples of discrete functions, one with continuity

All independent DFDij equations have only the smaller number of the independent velocities on nodes when the velocity indices i =p(i) and j = q(j). The total number of DFDij equations is N =Nx × Ny for an Nx × Ny image sequence. The number of node points as shown in Fig. 2 is given by Nnode ¼

⌊

Nx −1 þ1 nx

⌋

⌊

Ny −1 þ1 ny

⌋

!

The total number of independent velocity ﬁeld unknowns with two components upq and vpq is 2 × Nnode. It is clear that this system is over-constrained because the number of DFDij equations for all pixels is greater than the number of independent velocity components upq and vpq (i.e. N> 2 × Nnode) if the block sizes are nx > 1 and ny >1. We can solve the over-constrained system using the nonlinear least-squares model described in the next section to estimate the velocity ﬁeld upq and vpq when the parameters nx >1 and ny >1.

278

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

Fig. 3. Two examples of functions (original images) and their approximations (down-sampled and patched images).

2.3. A nonlinear least-squares model The presence of possible quantization errors and noise suggests the DFDij equations are never identically zero; however, we can choose a set of vpq for which it is minimized in a least-squares sense. Accordingly, we deﬁne a cost function based on Eq. (5) as

MSE ¼

1 2 ∑ DFDij ; N i;j

ð6Þ

ðmþ1Þ

∑→ ∑ ¼ i;j

where i and j go over all pixels in N = Nx × Ny image (i ∈ [0, Nx − 1]∩ j ∈ [0, Ny − 1]). Minimizing the cost function MSE for given indices k and l for all node points in an image, we write the iterative equations for solving the optical ﬂow ﬁeld based on Gauss–Newton and Levenberg–Marquardt algorithms as

vkl

is used, the algorithm is closer to the Gauss–Newton method with second order convergence. This Levenberg–Marquardt method can improve convergence properties greatly in practice, and has become the standard of nonlinear least-squares routines. More importantly, the summation domain is now reduced from the entire image plane to only a local region Ωkl, so that

ðmÞ ðmÞ −1 ðmÞ ¼ vkl − Akl Bkl ;

ð7Þ

i;j∈Ωkl

kþn x −1 X

lþny −1

X

:

i¼k−nx þ1 j¼l−ny þ1

Modeling the velocity ﬁeld with our bilinear functions allows both continuous and discontinuous behaviors in the motion ﬁeld (Fig. 3). In the interior of each block (within the nx × ny region), the expressions are C 2 continuous, but C 1 continuous on the block boundary. The expressions of the velocity or displacement ﬁeld represent typical physical variations appropriately. 2.4. Maximum motion-compensated prediction

where 0 ðmÞ Akl

ðmÞ !2

∂DFDij

B ðλ þ 1Þ ∑ ðmÞ B i;j∈Ωkl ∂ukl B ¼B ðmÞ ðmÞ B ∂DFDij ∂DFDij @ ∑ ðmÞ ðmÞ i;j∈Ωkl ∂u ∂vkl kl

ðmÞ

∑

i;j∈Ωkl

ðmÞ

∂DFDij ∂DFDij ðmÞ

∂ukl

ðλ þ 1Þ ∑

i;j∈Ωkl

1

C C C C; ! 2 ðmÞ C ∂DFDij A ðmÞ

∂vkl

Most works in the literature use additional constraints to estimate the motion ﬁeld and often address the inverse problem by minimizing an objective function containing a weighting (penalty) parameter and more than one cost term [1,3]. For example, a global cost function may be given by

ðmÞ

∂vkl

2

Eglobal ¼ Edata þ α Eprior :

and 0

ðmÞ

Bkl

1 ðmÞ ∂DFDij B ∑ DFDðijmÞ C ðmÞ B i;j∈Ω ∂ukl C kl B C ¼B : ðmÞ C B C @ ∑ DFDðmÞ ∂DFDij A ij ðmÞ i;j∈Ωkl ∂vkl

The Levenberg–Marquardt factor λ ≥ 0 is adjusted at each iteration to guarantee that the MSE converges. If a smaller value of the factor λ

Data energy statements of this sort are used to measure the errors of the DFD (DFDij =MCPij − Iij(t1)) or optical ﬂow equations describing an image sequence. The prior energy term (Eprior) with a parameter α is an additional constraint, and α is optimized for the best ﬁt between the ground truth and the solution. It is difﬁcult to ﬁnd a single optimal value of the parameter for realistic applications if the ground truth ﬂow ﬁeld is unknown. The MCP is not optimized.

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

A Peak Signal-to-Noise Ratio (PSNR) may be used as an error measure between the MCP or MCI, and the original image is deﬁned by

PSNR ¼ 10 log10

! 2552 : MSE

The derivatives of velocity on a pixel with respect to velocity on a node point are given by

∂uij ∂ukl

The iterations in Eq. (7) are derived based on a least-squares principle and lead directly to a solution of the displacement ﬁeld with a minimized target function MSE, or a maximized PSNR for the MCP. Since bilinear velocity ﬁeld modeling is determined by the partition of the image domain into blocks, the target function PSNR is a unique optimized function without any additional weighting parameter. Therefore, the MCP image using the estimated displacement ﬁeld based on this over-constrained system can be optimized. The adjustable block size approach using a smaller number of velocity components on nodes to generate full dense velocity ﬁeld provides a powerful capability for motion-compensated compression. If the block size shown in Fig. 2 is n × n for images having dimension Nx × Ny, then the number of transmitted or stored motion vectors for both proposed and BMA estimators equals [(Nx − 1)/n + 1] × [(Ny − 1)/ n + 1] and (Nx / n) × (Ny /n), respectively. Using almost the same number of the displacement vectors in a ﬁxed block size for both approaches, the current framework (the motion ﬁeld with C 1 continuity obtained by a global optimization strategy) can provide much higher accurate performance than the BMA (the motion ﬁeld with C0 continuity obtained by local searching strategies), which is the currently adopted standard for video coding. 3. Numerical algorithms Detailed implementations for this motion estimator include computation of the MCP function, partial derivative calculations, the progressive relaxation of the over-constraint algorithm, and the iteration procedures. These are described in this section. 3.1. Computation of motion-compensated prediction The solution of the motion-compensated prediction I(i + uijΔt, j + vijΔt, t2) may yield variables that fall between pixels in an image and must be evaluated by an interpolation function. In order to compute the motion-compensated predictions, the general bilinear interpolation function in Eq. (3) is utilized as follows 1 X 1 X I i þ uij Δt; j þ vij Δt; t 2 ¼ I pþα;qþβ ðt 2 ÞHpþα;qþβ i þ uij Δt; j þ vij Δt ; α¼0 β¼0

where the function Ha,b(x, y) is evaluated when nx = ny = 1 (the tile size in Fig. 2 for this interpolation function of the MCP equals one pixel), and {p, q} = {p(i + uijΔt), q(j + vijΔt)}. 3.2. Computation of the partial derivatives The evaluation of the partial derivatives of the intensity with respect to velocity in Eq. (7) on node requires computation of the spatial gradient and it is given by

8 9 ( ) < ∂u ∂I x; j þ vij Δt; t 2 = ∂DFDij ∂DFDij ∂vij ∂I i þ uij Δt; y; t 2 ij ¼ Δt ; ; x¼iþuij Δt ∂v y¼jþvij Δt ;: :∂ukl ∂x ∂y ∂ukl ∂vkl kl

279

¼

∂vij ∂vkl

¼

1 X 1 X α¼0 β¼0

Hkl ði; jÞδk;pþαnx δl;qþβny ;

where the δij is the Kronecker-Delta symbol. Image intensity possesses noise at the pixel level, and the calculation of these derivatives tends to increase this noise. In order to improve computation accuracy of numerical differentiation, we implement the partial differential method with central differences for differentiation (with mask coefﬁcients {1, − 8, 0, 8, − 1}/12 or {− 1, 9, − 45, 0, 45, − 9, 1}/60). After we evaluate the spatial derivatives of the intensity with respect to x and y on each pixel using the numerical differentiation method, we smooth the resulting gradient ﬁelds using a Gaussian low pass ﬁlter with standard deviation from 0.375 to 1.125 pixels. Finally, using the general interpolation function in Eq. (3) when nx = ny = 1, we can calculate values of the spatial derivatives in the above equations at any position (x, y) = (i + uijΔt, j + vijΔt) in an image scene.

3.3. Progressive relaxation of the over-constraint Since there are featureless regions in the real world image scenes, the estimated motion ﬁelds are not unique in most cases. This is problematic, because moving objects can be observed physically only through the changing texture morphology associated with motion in an image sequence. In contrast, the same physical motion may be not physically observed and measured in featureless regions. The determination of displacement vector or motion observation within texturerich or texture-poor morphologies is the well-known aperture problem [1,3]. The physical system with unique or multiple possible displacements observed by this spatially variable texture ﬁeld allows the mathematical model to provide multiple solutions of the displacement vectors on a ﬁxed pixel. Unfortunately, the linear optical ﬂow equation yields only one solution in these featureless texture ﬁelds, which is almost certainly not the correct answer. On the other hand, the multiple motion solutions (roots) from the nonlinear DFD equation are far more likely to contain a correct answer consistent with the motion we actually track. Clearly, all featureless regions in an image sequence can produce a collection of motions, which are all possible candidates for the answer compatible with observation. Our goal in this paper is to seek motion ﬁelds consistent with actual physics rather than the mathematically plausible answers delivered by the technique. This involves the challenging task of sorting through the multiple minima delivered by the nonlinear optimization; this is what we call the Global Optimal Solution (GOS). However, the derived GOS, may not contain vectors which are mutually consistent with their neighbors in featureless regions. To remedy this potentially unrelistic result, we propose the Progressive Relaxation of the Over-Constraint (PROC) algorithm in this paper. Using the block size parameter n to control the degree of the over-constraint from higher to lower during the iteration procedures (PROC is detailed in the next subsection), we regulate the velocity ﬁeld progressively to achieve a ﬁnal valid solution. The initial values of the parameters n0 = nx = ny are selected as > n (the preset block size parameter representing a lower degree of over-constraint). We then progressively relax (decrease) the parameters nx and ny from their initially larger values to smaller ones and decrease their value every Nth iteration until they approach a preset value of n.

280

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

A criterion for motion vector consistency (v = Δr when Δt = 1) with its neighbors in featureless regions is deﬁned by 8 rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ i 2 2 h jþr < iþr X X 1 uij −ukl þ vij −vkl Gij ≤ε∩ðk≠i∪l≠jÞ ; C¼ ∑ Nc i;j k¼i−r l¼j−r : 0 otherwise

where Gij is a gradient of the intensity Iij on a pixel used for texture morphology evaluation and is given by vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! !2 u u ∂Iij 2 ∂Iij Gij ¼ t þ ; ∂x ∂y where Nc is the number of pixels in the featureless region when Gij ≤ ε, ε is a threshold of the gradient Gij, and r is a pixel neighborhood around a ﬁxed pixel (i, j) point. A smaller value of the C criterion corresponds to a higher consistency of the vector with its neighbors. The experimental evaluation of the PROC performance by the criterion of consistency and more detailed discussion of the PROC are demonstrated in the Section 4.6.

Fig. 4. Convergence properties of the PSNR vs. number of converged iterations using the PROC algorithm for an image sequence.

3.4. Iteration procedures All initial velocity vectors (Δr = v when Δt = 1) are preset to 0.01 for most displacement scale ranges. When the displacement scale is very large, the initial values of the velocity vectors can be assigned using the vectors derived by the BMA method. The actual iteration calculation depends upon block size parameter n. The MSE in Eq. (6) may be written MSE(n). To start the iteration, we select a value of the Levenberg–Marquardt factor λ = 10−3 and n0 = n + Δn (Δn > 0) for the PROC procedure, ﬁnd MSE(m)(n0) from Eq. (6) with the initial velocity ﬁeld, iterate Eq. (7) to a solution, and update new MSE (m + 1)(n0) using the new iterated velocity ﬁeld, where m is the index of the iteration. If this iteration does not converge, we increase λ by a factor of 10 and repeat the previous iteration until a converged velocity ﬁeld is found. Otherwise, we continue iterating. As is typical of all Gauss–Seidel algorithms, Eq. (7) is updated with the (m) (m+1) current velocity vectors vkl to enable calculation of vkl . The iteration proceeds using a ﬁxed value of the block size n0 and terminates when (m) (m+1) (m) Δvkl =vkl −vkl ≅ 0, or λ or the number of the iteration for a ﬁxed block size n0 approaches maximum bounds (λmax =105). We then select n0 =n0 −1 and iterate again until n0 =n. The velocity vectors corresponding to the global nadir in MSE(n) are the derived vectors, which are the best ﬁt to the optical ﬂow displacement in the image pair. A typical convergence curve demonstrating PSNR versus the converged iteration index for estimating the velocity ﬁeld using the PROC algorithm is shown in Fig. 4. The second order converge properties by the Gauss–Newton and Levenberg–Marquardt methods (Fig. 4) indicate the PSNR for the image sequence has a sharp increase within ﬁrst ﬁve iterations and approaches a static value 43 (dB) after forty four iterations. 4. Experiments The performance test of the estimators includes a benchmark test for optical ﬂow and an error evaluation of frame interpolation. The optical ﬂow ﬁeld estimated from an image sequence is used to compare with a ground truth ﬂow ﬁeld using average angular and magnitude errors as the benchmark tests. The error evaluation of frame interpolation is an indirect test in which the compared objects are between the ground truth image and the image interpolated by the motioncompensated interpolation (MCI) using a ground truth or estimated ﬂow ﬁeld. The second evaluation method is adapted for the applications of motion-compensated compression and frame rate up-conversion. In

the applications of the motion-compensated compression, a large temporal compression ratio requires that a motion ﬁeld across several frames so that all frames except a reference frame (ﬁrst or last one) can be synthesized by the MCP or MCI based on the single estimated motion ﬁeld. In these applications [41], what is important is not how well the estimated motion ﬁeld agrees with the physical motion, but how well the synthesized motion pictures with minimized distortion and without artifacts and dirty window effects can be predicted or interpolated by MCP and MCI techniques using the motion ﬁeld. Several test datasets are available [40,41]. Middlebury is one such dataset recently introduced in computer vision. Unfortunately, almost all of the motion ﬁelds in these test cases are zoom-in/out, translational, or close-translational motion. The Middlebury dataset has some issues. For example, the ﬂow ﬁeld lacks spatial variability; image scenes have featureless regions; and, large observable errors exist in the ground truth ﬂow ﬁelds. Two examples of ground truth ﬂow ﬁeld errors in the Middlebury dataset are shown in Fig. 5, and the error analysis of the ground truth ﬂow ﬁelds is shown in Appendix A. We extend the test cases having simple spatial motion variability in computer vision to a more complicated case containing rotational and deformational motion (Fig. 6a). In addition, we use synthetic texturerich image sequences based on a simulation model [28,29] for benchmark evaluation. We tested only the interpolation performance with the Middlebury dataset for the purpose of the motion-compensated compression, because of the problems of the Middlebury dataset. Moreover, a comparison between the linear and nonlinear models for optical ﬂow estimation, and three examples in applications for the motioncompensated compression are also demonstrated in this section. For comparison convenience, we will use Optical Flow Estimation for Motion-Compensated Compression (OFEMCC) for the proposed approach. We implement the method created by Horn–Schunck (H–S) using suggested improvement techniques by Barron et al. [1] with a smoothed gradient (Gaussian low pass ﬁlter) because it produces more accurate results in these test cases. We also implement the method (2D-CLG) developed by Bruhn et al. [25] and BMA for selected comparison techniques. 4.1. Error measurement To evaluate the errors involved in (OFEMCC), we apply it to a numerical model, because the model velocities that generate the

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

281

Fig. 5. Two velocity ﬁelds for lack of spatial variability: (a) Yosemite and (b) Venus ground truth ﬂow ﬁelds.

observed tracer motion are known exactly. We employ angular and magnitude measures of error [1,41], and use the mean values of these errors to evaluate the performance of the velocity estimations for this numerical model image sequence. Velocity may be written as v = (u, v, w = 1), so that the mean values of the angular and magnitude ^ ij and our estimate vij are errors between the correct velocity v 0 1 ^ ij þ vij ⋅v^ ij þ 1 uij ⋅u 1 B C AAE ¼ ∑ arccos@qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃqﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃA; N i;j ^ 2 þ v^ 2 þ 1 u2 þ v2 þ 1 u ij

ij

ij

ij

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 2 IE ¼ ∑ ½I ði; jÞ−IGT ði; jÞ ; N i;j MCI

or the PSNR measurement. 4.2. Benchmark evaluations

and AME ¼

The interpolation error (IE) [41] for the evaluation between the MCI image and the ground truth intermediate image is measured by the root-mean-square error

1 ∑ N i;j

r ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 2ﬃ ^ ij þ vij −v^ ij ; uij −u

where angular errors are in degrees and N=Nx ×Ny and v=Δr (Δt=1). The average angular and magnitude errors (AAE and AME) between the ^ and an estimate v are used to evaluate the performance correct velocity v of the velocity estimations.

We use the Yosemite sequence [40,41] without clouds and a numerical model solution (see Chen et al. [28]) as benchmark. The simulation image sequences are used in the test cases between times from t = 18 h to 20 h and t = 16 h to 20 h. The second simulation image sequence (Δt = 4 h) provides larger displacement vector ﬁeld. The AAE and AME results of estimated motion ﬁelds by the OFEMCC, H–S (new), and 2D CLG methods with the Yosemite and simulation dataset are shown in Tables 1 and 2.

Fig. 6. (a) An average of the vector ﬁeld generated by the simulation model is superimposed on the image at time t1 = 16 h. (b) A vector ﬁeld estimated by OFEMCC using data with Δt = 4 h is superimposed on the image at time t2 = 20 h.

282

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

Table 1 Average angular and magnitude errors between the results [25,41] from the different methods with 100% density. Data refer to the Yosemite sequence without clouds. Method

AAE

AME

H–S 2D-CLG OFEMCC

2.64° 2.64° 3.66°

0.140 0.10 0.411

Table 2 Average angular and magnitude errors between the results from the different methods with 100% density. Data refer to the ocean simulation model. Method

AAE (Δt = 2 h)

AME (Δt = 2 h)

AAE (Δt = 4 h)

AME (Δt = 4 h)

H–S 2D-CLG OFEMCC

10.2° 11.4° 6.30°

0.374 0.431 0.221

14.9° 16.8° 6.56°

0.965 1.09 0.390

The numerical model ground truth vectors (Δt = t2 − t1 = 4 h) are shown in Fig. 6a. For comparison, vectors estimated by the OFEMCC with the block size parameter n=2 are shown in Fig. 6b. We show the false color presentation of images (192×192 pixels) at time t=16 h and 20 h in the background in Fig. 6a and b, respectively. The two velocity vector ﬁelds in Fig. 6 are in good agreement, with each showing a distribution of eddies having diameters on the order of 25 pixels and more gently curved tendril structures with larger radii of curvature. We show the AAE (Fig. 7) and AME results (Fig. 8) as a function of the block size parameter n for the OFEMCC and weighting parameter α for the H–S and 2D-CLG methods using two simulation image sequences with Δt = 2 and 4 h. Test results with the Yosemite and simulation data using the three estimators exhibit two different results. The results by the methods with the Horn and Schunck's regularization (or smoothness) constraint agree well with the Yosemite data, and the results estimated by the proposed work agree well with the simulation data. As we mentioned above, most of the image scenes such as Yosemite (around the water falls) and data in Middlebury contain some featureless regions. The motion ﬁelds within these regions are not unique because the initial and ﬁnal positions of a moving particle cannot be physically observed and determined in a featureless region (i.e., the aperture or ill-posed problem). A stronger additional (smoothness) constraint, such as the regularization by the Horn and Schunck, may provide a

retrieved motion ﬁeld that may agree well with the physical motion (ground truth) ﬁeld if most of the motions are simple zoom in/out, translational, or close translational motions as Fig. 5. However, the penalties of the stronger smoothness constraint and processing have been demonstrated through the simulation image sequence for complicated rotational and deformational motion ﬁelds. Higher frequency spatial variability or ﬂow ﬁeld gradients may be ﬁltered out by the smoothness constraint and processing. The information lost through convolution and stronger smoothness constraints cannot be evaluated correctly by the test cases with the ﬂow ﬁelds for their lack of spatial variability. The curves by the H–S and 2D-CLG in Figs. 7 and 8 show that the AAE and AME are very sensitive to the value of the weighting parameter; therefore, it is difﬁcult to ﬁnd a ﬁxed optimal value of the parameter in real-world applications if the ground truth ﬂow ﬁeld is unknown. The performance demonstration using both smaller and larger displacement vector ﬁelds shown in Figs. 7 and 8 indicates that the linear approximation such as is used with the optical ﬂow equation is valid only for estimating smaller displacement ﬁelds. In contrast, the growth of the AAE and AME errors as the scale increases indicates the displacement vector ﬁeld is nonlinear. 4.3. Frame interpolation test The frame interpolation test with the Middlebury dataset is shown in Table 3. The IE in the Table 3 is also plotted bar-chart form (Fig. 9) for a quick comparison. The interpolated images using the ground truth (GT) ﬂow ﬁeld and the ﬂow ﬁelds estimated by the H–S, 2D-CLG, BMA, and OFEMCC are compared to the intermediate (ground truth or original) frames between frames 10 and 11. Two examples of the interpolated Venus and Urban2 frames (using the ground truth ﬂow ﬁelds and motion ﬁelds estimated by the OFEMCC and original intermediate frames) are shown in Figs. 10 and 11. These interpolated frames are designated as #10 +1/2 in the ﬁgure. The PSNR values of the images, interpolated by the ground truth and OFEMCC ﬂows, are labeled on the Figs. 10 and 11. Comparing the MCI images synthesized by the ground truth ﬂows and the OFEMCC ﬂows in Figs. 10 and 11, we see that the distortions caused by the ground truth ﬂows can be observed in the same regions labeled in Fig. A.1 (Appendix A). The performance of the interpolated frames measured by the IE for all eight Middlebury datasets using the proposed method shows better results than all other methods, including the ground truth ﬂow ﬁeld. The larger errors appearing in estimating the ﬂow ﬁelds by

Fig. 7. Plots of error measurement generated by the estimators with ocean simulation model data at time t1 = 18 h and t2 = 20 h: (a) AAE and (b) AME vs. block size parameter n for the OFEMCC and for the H–S and 2D-CLG with weighting parameter α.

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

283

Fig. 8. Plots of error measurement generated by the estimators with ocean simulation model data at time t1 = 16 h and t2 = 20 h: (a) AAE and (b) AME vs. block size parameter n for the OFEMCC and for the H–S and 2D CLG with weighting parameter α.

the H–S and 2D-CLG methods with large-scale displacements, such as the Urban2 and Urban3 datasets, conﬁrm that the linear optical ﬂow equation is valid only for motions with small-scale displacements. The frame interpolation test conﬁrms the error issue of the ground truth ﬂows of the Middlebury datasets as shown in Appendix A because the observable distortions of the MCI images synthesized by the ground truth ﬂows and error evaluations in Table 3 provide enough experimental evidence for this conclusion. 4.4. Comparison of linear and nonlinear models The theoretical analysis and benchmark comparison between the DFD equation and optical ﬂow equation indicate that the ﬁrst equation outperforms the second, especially for large-scale displacement. The performance differences of the frame interpolation test in the previous subsection using Middlebury datasets with small and large displacement scales conﬁrm this conclusion again. In what follows, we demonstrate the performance differences of the frame interpolation between the linear and nonlinear equations with the same image sequences with a range of displacement scales. The ﬁrst white Taxi image sequences (96 × 72 × 3) are from the ﬁrst to third frames with a smaller displacement scale, and the second sequences (96×72×9) is from the ﬁrst to ninth frames with the displacement scale four times larger than the ﬁrst one. The optical ﬂows estimated by all methods are based on the image pairs of the ﬁrst-third and ﬁrstninth frames. The MCP and MCI images synthesized from the OFEMCC, BMA, H–S, and 2D-CLG ﬂows with the small displacement scale are shown in Fig. 12. All parameters of the estimators are optimized for the frame interpolation. The average PSNR values for both MCP and MCI images using the four different estimated ﬁelds are labeled on Fig. 12. The results of the MCP and MCI images for both linear and nonlinear

models indicate that the moving objects (white taxi in the scenes) derived by the motion ﬁelds do not have the common distortions, artifacts, and dirty window effects for this small-scale displacement motion except for the BMA method. The MCP and MCI images using the motion ﬁelds estimated by the four methods from the larger displacement sequences are shown in Fig. 13. There are nine taxi frames in these sequences. The motion ﬁelds are estimated by the BMA and OFEMCC methods using the block size = 4 × 4. The artifacts, dirty window effects, and distortions of the moving object can be observed on the MCP and MCI images using the BMA, H–S, and 2D-CLG derived motion ﬁelds from the large-scale motion as shown in Fig. 13. The synthesized MCP and MCI images demonstrate that the OFEMCC method yields overall more accurate predictions and interpolations without neither artifacts nor dirty window effects, including the observable distortion in comparison with the BMA, H–S and 2D-CLG methods as shown in Fig. 13. As seen in Fig. 13, it is impossible to warp the last frame to the ﬁrst frame employing the linear optical ﬂow equation for large displacement estimation. Comparing the relative positions between the tail of the white taxi and the parked white beetle on the MCI 9 and MCP 1 images by the H–S and 2D-CLG methods as well as the original images, we can ﬁnd that the white taxi in the MCP 1 image appears not to move, but remains in the MCI 9 position during the entire sequence. The MCP and MCI images, using the motion ﬁeld estimated by the BMA method, exhibit artifacts and dirty window effects only, but without observable distortion. The BMA method is based on two assumptions: 1. brightness conservation constraint (using the DFD equation), and, 2. uniform displacement ﬁeld in a block template for the matching algorithm. The uniform displacement ﬁeld assumption in a block template for the BMA approach is not always validated for all motion cases. For a simple example, the BMA method cannot handle the cases in

Table 3 Evaluation results of interpolation error (IE) and Peak Signal-to-Noise Ratio (PSNR) on the Middlebury dataset (gray images). IE/PSNR (dB)

Dimetrodon

Grove2

Grove3

Hydrangea

RubberWhale

Urban2

Urban3

Venus

GT Flow H–S 2D CLG BMA OFEMCC

2.73/39.4 2.66/39.6 2.66/39.6 3.69/36.8 2.28/41.0

7.49/30.6 8.16/29.9 8.26/29.8 8.32/29.7 5.80/32.9

13.1/25.8 14.5/24.9 14.5/24.9 13.5/25.5 8.67/29.4

9.41/28.7 5.45/33.4 5.55/33.2 6.93/31.3 3.75/36.7

2.63/39.7 2.55/40.0 2.59/39.9 2.83/39.1 1.81/43.0

4.52/35.0 11.6/26.9 11.6/26.9 5.09/34.0 3.12/38.2

6.52/31.8 8.81/29.2 8.89/29.1 7.53/30.6 4.11/35.9

6.87/31.4 6.02/32.5 5.80/32.9 7.60/30.5 4.00/36.1

284

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

Fig. 9. The interpolation error for all eight Middlebury datasets between the ground truth frames and the interpolated frames using the ground truth (GT) ﬂow and the ﬂow ﬁelds estimated by the H–S (new), 2D-CLG, BMA, and OFEMCC.

which there are multiple motions with different speeds and directions within a block template. This is because the complicated motions violate the uniform motion ﬁeld assumption. However, the moving white taxi on the MCP and MCI images with the BMA method is correctly located spatially, although artifacts and dirty window effects are present. The frame interpolation tests in Figs. 12 and 13 provide experimental evidence that the motion estimation with the direct temporal integral of the brightness conservation constraint (DFD) equation can perform more accurately overall than the optical ﬂow equation, especially for large-scale displacement motion.

comparison to other approaches. The proposed adaptive framework can store or transmit a smaller number of the motion ﬁeld parameters, as well as the BMA method for digital video coding. However, the motion ﬁeld estimated by the OFEMCC method is continuous between two blocks, and has much higher accuracy compared to the BMA. In order to demonstrate this feature, we choose the block size parameters to be 8 × 8 for these two test cases. The original frames of the Foreman from 28 to 32 are shown in the ﬁrst row in Fig. 14. The MCP and MCI frames by the OFEMCC are depicted on the second rows in Fig. 14. The ﬂow ﬁelds are estimated by the OFEMCC, H–S, 2D-CLG, and BMA methods from the red band frames between 28 and 32 (ﬁve frame sequences). The block size parameter n is equal to eight (8× 8). The total number of the displacement components (Nx × Ny × 2/n 2) for transmission or storage by the OFEMCC and BMA is 3168. The total number of the displacement components for transmission or storage by the H–S and 2D-CLG methods is 202752. A total compression ratio (CR) is the product of the spatial CRs and the temporal CRt if we do not consider a spectral compression. When we take the two-components of the motion vector into account (the factor 2 in the denominator), the temporal CRt is given by CRt ¼

bM ; b þ n22

where b, M, and n are the number of bands (color b = 3 and gray b = 1) of the image, the total number of the frames in the sequences (from ﬁrst to last), and the block size parameter, respectively. To compute the CRt for different techniques, the parameters b and M in the CRt function are the same, except for the block size parameter n, which we set ≡1 for other techniques and > 1 for the proposed and BMA techniques. The total compression ratio in the Foreman test case (b= 3, M = 5, and n = 8) is thus given by

4.5. Motion-compensated compression

CR ¼ CRt CRs ≈5 CRs :

Two standard image sequences, Foreman and Army, are employed to demonstrate the performance of removing temporal redundancy in image sequences. The motion ﬁeld is estimated based on the ﬁrst and last image frames in the multiple frame sequences, and all intermediate frames are dropped. Sequences from the MCP and MCI motion pictures are interpolated by the motion-compensated techniques using the estimated motion ﬁeld. The PSNR is used for evaluation of the distortion between the interpolated and the original images. As in the BMA method, a major feature of the proposed framework is that the full, continuous motion ﬁeld can be generated by a smaller number of on-node velocity vectors in Fig. 2 for the video coding by

Using the proposed framework, we have a great potential to obtain a larger compression ratio with acceptable distortion that depends on the compression ratio CRs for the video coding. Similarly, we also applied the proposed estimator to the Army image sequences (584× 388) from frame 8 to frame 13. The ﬂow ﬁeld is estimated from the red band frames between 8 and 13. The original and the interpolated frames in color using the same ﬂow ﬁeld are depicted in Fig. 15. The temporal CRt in the Army (b = 3, M = 6, and n = 8) test cases is approximately equal to 5.94. The temporal compression ratios, mean PSNR, and block size parameter n for the Foreman, Army, and Taxi datasets by different methods

Fig. 10. Venus images (420 × 380), Left: interpolation image using ground truth ﬂow; Center: original (ground truth) image frame # 10 + 1/2; Right: interpolation image using the ﬂow estimated by the OFEMCC.

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

285

Fig. 11. Urban2 images (640 × 480), Left: interpolation image using ground truth ﬂow; Center: original (ground truth) image frame # 10 + 1/2; Right: interpolation image using the ﬂow estimated by the OFEMCC.

Fig. 12. White taxi image sequences (96 × 72 × 3) and interpolated frames using the motion ﬁelds estimated by the OFEMCC, BMA, H–S, and 2D-CLG methods. The PSNR values labeled on the ﬁgure are the average of the MCP and MCI images compared to the original images, respectively.

Fig. 13. White taxi image sequences (96 × 72 × 9) and interpolated frames using the motion ﬁelds estimated by the OFEMCC, BMA, H–S, and 2D-CLG. The PSNR values labeled on the ﬁgure are averages of the MCP and MCI images compared to the original images, respectively.

286

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

Fig. 14. Original Foreman image (352×288) sequences (frames: 28–32), interpolated frames using ﬂow ﬁeld estimated by the OFEMCC (block size: 8× 8), H–S, 2D-CLG, and BMA (block size: 8×8).

are listed in Table 4. The parameters in the temporal compression ratio for the Taxi dataset in Fig. 13 are b = 1, M = 9, and n = 4. Visual comparison of the synthesized and original images indicates that the proposed framework yields accurate predictions and

interpolations overall without any artifact and dirty window effects compared with the other three methods as shown in Figs. 12 and 16. Moreover, the variable block sizes for different applications can reduce the stored and transmitted motion ﬁeld parameters and

Fig. 15. The synthesized Army motion pictures using the ﬂow ﬁeld estimated by the OFEMCC with block size 8 × 8.

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289 Table 4 Examples of temporal compression ratios, mean PSNR values, and block size parameters by different motion estimation methods. {CRt, PSNR dB, n}

Foreman

Army

Taxi

H–S 2D-CLG BMA OFEMCC

{3, 27.6, 1} {3, 26.6, 1} {4.95, 28.3, 8} {4.95, 29.5, 8}

{3.6, 30.5, 1} {3.6, 26.3, 1} {5.94, 31.3, 8} {5.94, 34.1, 8}

{3, {3, {8, {8,

21.4, 1} 20.6, 1} 23.4, 4} 32.9, 4}

287

by the MCI technique employing the motion ﬁelds estimated by the proposed method with and without the PROC algorithm as shown in Fig. 17. The consistency criteria C when r = 1, with and without the PROC algorithm, are 1.23 and 2.25, respectively. The higher performance of the frame interpolation with the PROC algorithm corresponds to a lower value of C that is consistent with the analysis of the PROC algorithm in Section 3.3. Clearly, the PROC algorithm helps the nonlinear system approach a global optimal solution and seeks a motion ﬁeld, in which each vector is consistent with its neighbors in featureless regions. The distortions of the MCI images evaluated by the PSNR values are equal to 35.1 (dB) and 31.7 (dB), respectively. The dirty window effect can be found in the MCI image inferred by the motion ﬁeld without the PROC algorithm as shown in Fig. 17. The comparison of the PSNR values between two MCI images in Fig. 17 indicates that the distortion of the image derived by PROC algorithm can be reduced dramatically.

5. Conclusion

Fig. 16. An image of intensity gradient of Taxi frame 1 and histogram of the gradients.

increase the temporal compression ratio signiﬁcantly as shown in Table 4. The single target function of the MSE or PSNR can result in optimal motion-compensated prediction and interpolation images for digital video compression applications. 4.6. Performance of the PROC Algorithm The adaptive framework employs the PROC algorithm to improve the performance for optical ﬂow estimation. The consistency of neighborhood vectors in featureless regions for a ﬁxed pixel point can be evaluated by the criterion deﬁned in Section 3.3. Here, we demonstrate the difference with and without the PROC algorithm by the frame interpolation and consistency criterion using the Taxi sequences (96 × 72 × 3). An image of the intensity gradient for Taxi frame 1 and its histogram are shown in Fig. 16. The histogram of the gradients indicates that the Taxi images contain large featureless areas. The motion ﬁelds from the Taxi image sequence (frame 1 and frame 3) are derived with the PROC and without the PROC. Two intermediate frames are synthesized

In this paper, we have presented an adaptive framework for solving the optical ﬂow for motion-compensated compression. Using the nonlinear DFD equations, modeling the velocity ﬁeld, and least-squares model, we formulate iterative equations based on Gauss–Newton and Levenberg–Marquardt algorithms. We also propose an algorithm for the progressive relaxation of the over-constraint on a ﬂow ﬁeld, in which each vector is consistent with its neighbors. The overarching goal of optical ﬂow estimation is to make the observed (actually tracked) motion consistent with the physical one (i.e., ground truth). However, these two motion ﬁelds are usually inconsistent, especially within the featureless images. Since the displacement vector of a moving particle in the featureless image scenes cannot be physically observed, measured, or determined (ill-posed problem), the solution from this physical system is not unique. Imposing a stronger smoothness constraint and spatial/temporal convolution processing may regulate the estimated ﬂow ﬁeld to match some special test cases where the optical ﬂow has weak spatial variability as shown in Fig. 5. However, the smoothness constraint may not hold for all test cases. The penalties for using the stronger smoothness constraint and processing have been demonstrated through the simulation image sequences for complicated rotational and deformational motion ﬁelds in Fig. 6. Higher frequency spatial variability of the ﬂow ﬁeld may be ﬁltered out by the smoothness constraints and processing. The information lost through convolution and stronger smoothness constraints cannot be adequately evaluated by the test cases with the ﬂow ﬁeld because they have weak spatial variability. In this regard, the image sequences with the simple type ﬂow ﬁelds do not provide sufﬁciently challenging tests of the capability of the system proposed in this paper.

Fig. 17. Two MCI images using the motion ﬁelds with and without the PROC algorithm.

288

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289

For the motion-compensated compression, how an estimated optical ﬂow matches a physical motion is important. Additional concerns are how well the synthesized motion pictures exhibit minimal distortion, eliminate artifacts and dirty window effects, and have low coding motion vector cost. The estimated ﬂow ﬁeld in this paper is based on a single minimized target function that leads to optimized motion-compensated predictions for applications of the motion-compensated compression without any penalty parameters. With almost the same number of the displacement vectors in a ﬁxed block size for both proposed and BMA, the proposed method can provide much higher accurate performance than the BMA, which is the currently adopted standard for video coding. The benchmark tests indicate that the ﬂow ﬁelds with stronger spatial variability (simulation data) estimated by the new framework outperform the other tested methods. In the frame interpolation tests, the interpolated frames measured by the IE for all eight Middlebury datasets using the proposed method show better results than all other methods including the ground truth ﬂow ﬁeld. The frame interpolation tests also provide experimental evidence that the nonlinear DFD equation can perform more accurately overall than the optical ﬂow equation, especially for large-scale displacement motion. In the demonstration of the motion-compensated compression, the MCP and MCI images synthesized by the proposed framework yields more accurate predictions and interpolations overall. In addition, they have no artifact and dirty window effects, in contrast with the other three methods. In summary, the proposed framework presents a great potential to obtain a larger compression ratio with acceptable distortion for video coding.

Acknowledgements This research work was supported by the Ofﬁce of Naval Research through the project WU-4279-02 at the Naval Research Laboratory.

Appendix A. Error analysis for Middlebury ground truth To verify how accurate the ground truth motion ﬁelds such as Middlebury are in computer vision [41], we employed a simple method to ﬁnd the distortion of the synthesized image using the ground truth ﬂow ﬁeld employing a motion-compensated prediction. We used the ground truth motion ﬁelds Δr to synthesize Venus and Urban2 images (I(r, t1) ← I(r + Δr, t2)) as shown in Fig. A.1. Some residues are seen on the synthesized images in Fig. A.1. All the distortions on the MCP images indicate that the ground truth ﬂow ﬁelds in the Middlebury datasets introduce large and observable errors.

References [1] J.L. Barron, D.J. Fleet, S.S. Beauchemin, Performance of optical ﬂow techniques, Int. J. Comput. Vis. 12 (1) (1994) 43–77. [2] C. Stiller, J. Konrad, Estimating motion in image sequences: a tutorial on modeling and computation of 2D motion, IEEE Signal Process. Mag. 16 (4) (1999) 70–91. [3] B. Horn, B. Shunck, Determining optical ﬂow, Artif. Intell. 17 (1981) 185–203. [4] B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, DARPA Proc, of Image Understanding Workshop, 1981, pp. 121–130. [5] B.D. Lucas, Generalized image matching by the method of differences, PhD thesis, Carnegie Mellon Univ., 1984. [6] J. Bigun, G.H. Granlund, J. Wiklund, Multidimensional orientation estimation with applications to texture analysis and optical ﬂow, IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991) 775–790. [7] L. Alvarez, J. Esclar´ın, M. Lef´ebure, J. S'anchez, A PDE model for computing the optical ﬂow, Proc. XVI Congreso de Ecuaciones Diferenciales y Aplicaciones, Las Palmas de Gran Canaria, Spain, 1999, pp. 1349–1356, (Intelligence, 13(8):775–790). [8] G. Aubert, R. Deriche, P. Kornprobst, Computing optical ﬂow via variational techniques, SIAM J. Appl. Math. 60 (1) (1999) 156–182. [9] M.J. Black, P. Anandan, The robust estimation of multiple motions: parametric and piecewise smooth ﬂow ﬁelds, Comput. Vision Image Underst. 63 (1) (1996) 75–104. [10] F. Heitz, P. Bouthemy, Multimodal estimation of discontinuous optical ﬂow using Markov random ﬁelds, IEEE Trans. Pattern Anal. Mach. Intell. 15 (12) (1993) 1217–1232. [11] A. Kumar, A.R. Tannenbaum, G.J. Balas, Optic ﬂow: a curve evolution approach, IEEE Trans. Image Process. 5 (4) (1996) 598–610. [12] H.H. Nagel, Constraints for the estimation of displacement vector ﬁelds from image sequences, Proc. Eighth International Joint Conference on Artiﬁcial Intelligence, Karlsruhe, West Germany, vol. 2, 1983, pp. 945–951. [13] H.H. Nagel, W. Enkelmann, An investigation of smoothness constraints for the estimation of displacement vector ﬁelds from image sequences, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986) 565–593. [14] H.H. Nagel, Extending the ’oriented smoothness constraint’ into the temporal domain and the estimation of derivatives of optical ﬂow, in: O. Faugeras (Ed.), Computer vision — ECCV ’90, Lecture Notes in Computer Science, vol. 427, Springer, Berlin, 1990, pp. 139–148. [15] P. Nesi, Variational approach to optical ﬂow estimation managing discontinuities, Image Vision Comput. 11 (7) (1993) 419–439. [16] F. Glazer, et al., Scene matching by hierarchical correlation, Proc. IEEE Comp. Vision Pattem Recognition Conf., (Washington, DC), June 1983. [17] H. Ghanbari, M. Mills, Block matching motion estimations: new results, IEEE Trans. Circuit Syst. 37 (1990) 649–651. [18] V. Seferidis, M. Ghanbari, General approach to block-matching motion estimation, J. Opt. Eng. 32 (July 1993) 1464–1474. [19] J. Shi, C. Tomasi, Good features to track, CVPR, 1994, pp. 593–600. [20] J. Konrad, E. Dubois, Estimation of image motion ﬁelds: Bayesian formulation and stochastic solution, Proc. IEEE 1988 Int. Conf. on Acoustics, Speech, and Signal Processing, Apr. 1988, pp. 1072–1075. [21] M. Proesmans, et al., Determination of optical ﬂow and its discontinuities using non-linear diffusion, in: J.-O. Eklundh (Ed.), Computer vision — ECCV ’94, Lecture Notes in Computer Science, vol. 801, Springer, Berlin, 1994, pp. 295–304. [22] J. Weickert, C. Schnorr, A theoretical framework for convex regularizers in PDE-based computation of image motion, Int. J. Comput. Vis. 45 (3) (2001) 245–264. [23] D.J. Fleet, A.D. Jepson, Computation of component image velocity from local phase information, Int. J. Comput. Vis. 5 (1) (1990) 77–104. [24] B. Galvin, B. McCane, K. Novins, D. Mason, S. Mills, Recovering motion ﬁelds: an analysis of eight optical ﬂow algorithms, Proc. 1998 British Machine Vision Conference, Southampton, England, 1998. [25] A. Bruhn, J. Weickert, C. Schnorr, Lucas/Kanade meets Horn/Schunck: combining local and global optic ﬂow methods, Int. J. Comput. Vis. 61 (3) (2005) 211–231.

Fig. A.1. The residue in MCP images synthesized by the ground truth ﬂow ﬁeld indicates that the ground truth ﬂow ﬁelds in Middlebury introduce large and observable errors.

W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289 [26] N. Papenberg, A. Bruhn, T. Brox, S. Didas, J. Weickert, Highly accurate optic ﬂow computation with theoretically justiﬁed warping, Int. J. Comput. Vis. 67 (2) (2006) 141–158. [27] S. Uras, F. Girosi, A. Verri, V. Torre, A computational approach to motion perception, Biol. Cybern. 60 (1988) 79–87. [28] W. Chen, R.P. Mied, C.Y. Shen, Near-surface ocean velocity from infrared images: global optimal Solution to an inverse model, J. Geophys. Res. 113 (2008) C10003, http://dx.doi.org/10.1029/2008JC004747. [29] W. Chen, A global optimal solution with higher order continuity for the estimation of surface velocity from infrared images, IEEE Trans. Geosci. Remote Sens. 48 (4) (2010) 1931–1939. [30] W.B. Thompson, Exploiting discontinuities in optical ﬂow, Int. J. Comput. Vis. 30 (3) (1998) 163–173. [31] J.D. Robbins, A.N. Netravali, Recursive motion compensation: a review, in: T.S. Huang (Ed.), Image Sequence processing and Dynamic Scene Analysis, Springer-Verlag, Berlin, Germany, 1983, pp. 76–103. [32] C. Cafforio, F. Rocca, The different method for motion estimation, in: T.S. Huang (Ed.), Image Sequence Processing and Dynamic Scene Analysis, Springer-Verlag, New York, 1983, pp. 104–124. [33] D.R. Walker, K.R. Rao, Improved pel-recursive motion compensation, IEEE Trans. Commun. COM-32 (Oct. 1984) 1128–1134.

289

[34] J. Shen, Wai-Yip Chan, A novel code excited pel-recursive motion compensation algorithm, IEEE Signal Process. Lett. 8 (4) (April 2001). [35] D.J. Fleet, A.D. Jepson, Computation of component image velocity from local phase information, Int. J. Comput. Vision 5 (1990) 70–104. [36] Z. Wei, D.-J. Lee, B.E. Nelson, J.K. Archibald, B.B. Edwards, FPGA-based embedded motion estimation sensor, Int. J. Reconﬁg. Comput. 2008 (636135) (2008) 8. [37] G. Botella, A. García, M. Rodríguez, Ros, U. Meyer-Baese, M.C. Molina, Robust bioinspired architecture for optical ﬂow computation, IEEE Trans. Very Large Scale Integr. VLSI Syst. 18 (4) (April, 2010) 616–630, http://dx.doi.org/10.1109/ TVLSI.2009.2013957. [38] Diego González, Guillermo Botella, Soumak Mookherjee, Uwe Meyer-Bäse and Anke Meyer-Bäse, NIOS II processor-based acceleration of motion compensation techniquesProc. SPIE 8058 (2011) 80581C, http://dx.doi.org/10.1117/12.883684. [39] V. Mahalingam, K. Bhattacharya, N. Ranganathan, H. Chakravarthula, R.R. Murphy, K.S. Pratt, A VLSI architecture and algorithm for Lucas–Kanade-based optical ﬂow computation, IEEE Trans. Very Large Scale Integr. VLSI Syst. 18 (2010) 29–38. [40] B. McCane, K. Novins, D. Crannitch, B. Galvin, On benchmarking optical ﬂow, Comput. Vision Image Underst. 84 (2001) 126–143. [41] S. Baker, D. Scharstein, J.P. Lewis, S. Roth, M.J. Black, R. Szeliski, A database and evaluation methodology for optical ﬂow, Int. J. Comput. Vis. 92 (2011) 1–31, http://dx.doi.org/10.1007/s11263-010-0390-2.

Optical flow estimation for motion-compensated compression

Optical flow estimation for motion-compensated compression

Recommend Documents