Image and Vision Computing 31 (2013) 275–289
Contents lists available at SciVerse ScienceDirect
Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis
Optical flow estimation for motion-compensated compression☆ Wei Chen ⁎, Richard P. Mied Naval Research Laboratory, Remote Sensing Division, Washington, DC 20375, United States
a r t i c l e
i n f o
Article history: Received 29 May 2012 Received in revised form 24 October 2012 Accepted 22 January 2013 Keywords: Optical flow Optical flow determination Optical flow estimation Displacement estimation Velocity estimation Motion compensation Motion-compensated prediction Motion-compensated interpolation Motion-Compensated Compression
a b s t r a c t The computation of optical flow within an image sequence is one of the most widely used techniques in computer vision. In this paper, we present a new approach to estimate the velocity field for motion-compensated compression. It is derived by a nonlinear system using the direct temporal integral of the brightness conservation constraint equation or the Displaced Frame Difference (DFD) equation. To solve the nonlinear system of equations, an adaptive framework is used, which employs velocity field modeling, a nonlinear least-squares model, Gauss–Newton and Levenberg–Marquardt techniques, and an algorithm of the progressive relaxation of the over-constraint. The three criteria by which successful motion-compensated compression is judged are 1.) The fidelity with which the estimated optical flow matches the ground truth motion, 2.) The relative absence of artifacts and “dirty window” effects for frame interpolation, and 3.) The cost to code the motion vector field. We base our estimated flow field on a single minimized target function, which leads to motion-compensated predictions without incurring penalties in any of these three criteria. In particular, we compare our proposed algorithm results with those from Block-Matching Algorithms (BMA), and show that with nearly the same number of displacement vectors per fixed block size, the performance of our algorithm exceeds that of BMA in all the three above points. We also test the algorithm on synthetic and natural image sequences, and use it to demonstrate applications for motion-compensated compression. Published by Elsevier B.V.
1. Introduction The determination of the object motion and its representation in sequential images has been studied by many scientists in several disciplines, e.g., digital image processing, digital video coding, computer vision, and remote sensing data interpretation. The motion analysis, determination, and representation are crucial for the removal of temporal redundancy. A large compression ratio that results in high fidelity motion picture quality requires an accurate large-scale displacement and long temporal range motion estimation for efficient transmission of compressed image sequences. Thus the creation of more efficient and effective algorithms for estimating optical flow is very important. For these reasons, significant effort has been devoted to solving the optical flow estimation problem. To place this body of literature in a meaningful context, we present most of these works in a block diagram (Fig. 1). Almost all existing optical flow estimation models and their algorithms assume the image intensity obeys a brightness constancy constraint [3–5]. The inverse problem for estimating a velocity or displacement map (i.e., the optical flow) is found to be under-
☆ This paper has been recommended for acceptance by Yiannis Andreopoulos. ⁎ Corresponding author. Tel.: +1 202 767 7078. E-mail address:
[email protected] (W. Chen). 0262-8856/$ – see front matter. Published by Elsevier B.V. http://dx.doi.org/10.1016/j.imavis.2013.01.002
constrained because the two unknown velocity components must be derived from this single conservation equation at each of these pixel points. To solve the under-constrained problem, several constraints on the displacement field, such as smoothness and other assumptions, have been proposed. Typical smoothness assumptions include Horn and Schunck's regularization constraint [3], a uniform velocity assumption in a block (or template) (Lucas and Kanade [4,5]; Shi and Tomasi [16]), an intensity-gradient conservation constraint (Nagel et al. [12–14]; Nesi [15]), modeling the motion field as a Markovian random field (Konrad and Dubois [20]), and velocity field modeling with bilinear or B-splines functions (Chen et al. [28,29]). Traditionally, scientists have focused their efforts on extending the many other different methods of estimating the optical flow [1–35]. Some of the algorithms have been implemented in hardware [36–39]. Most realistic image sequences in computer vision applications are constructed by multiple objects moving against a static background and with respect to one another. That is, the velocity field can be discontinuous over the image. In order to handle the discontinuities in the transition boundaries between a static background and mobile objects, Chen et al. [28] proposed bilinear modeling of the motion field. This numerical model has solved the under-constrained problem successfully. However, the optical flow equation is derived from a differential form of the conservative constraint (i.e., a first-order Taylor expansion), and is only valid for infinitesimal motion. Therefore, the motion field
276
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
Fig. 1. Block diagram representing literature research and the current proposed work (frames in yellow background). Highlighted in this figure is one of the numerical approaches for the flow field estimation proposed in this paper.
based on the optical flow equation can be estimated successfully for small displacement motion only. Three criteria are used to evaluate the derived optical flow: 1.) comparison between the retrieved and the ground truth optical flow fields, 2.) frame interpolation, and 3.) performance of motion-compensated compression. An evaluation using only the first method with some special datasets may not provide sufficiently stringent tests of the capability of the high performance estimators. The comparison of ground truth optical flow is dependent on the type of motion, texture morphology, and scale of displacement. The most important feature of motioncompensated compression is not only how well an estimated optical flow matches the physical motion. Equally important are how well the motion pictures are synthesized with minimized distortion and without artifacts and dirty window effect, and the low coding cost of the motion vector field. Although the last performance test is the most important for a variety of applications in computer vision, a successful motion estimator should demonstrate excellent performances for all three tests in a range of displacement scale from small to large. Many motion estimators and video compression algorithms each perform far from optimally by themselves, but motion estimator and video compression techniques must also interface compatibly together. The estimator adopted in the international standards for digital video compression is the block-matching algorithm (BMA) [16–19] (or overlapped BMA). Compared with ground truth optical flow, the BMA method is less accurate for flow field estimation, but performs better for motion-compensated prediction (MCP) and interpolation (MCI) in realistic video coding applications. Current popular approaches with the optical flow equation may outperform the BMA methods in the optical flow comparison test for some specific datasets, but cannot pass the overall tests. For this reason, the BMA method is still adopted as an estimator today. The global (or energy-based) approaches usually employ the brightness constancy constraint combined with a prior constraint on the motion field with a weighting (penalty) parameter [1–3]. However, a major issue emerges when using a weighting parameter, which is related to its optimal value. Several different weighting parameter values have been suggested [1,3,25,26], because the correct optimal value depends upon the specific ground truth flow field. In the present paper, we depart from the established weighting parameter approach. Instead, we employ a quantity derived from the nonlinear model and minimize it by varying a huge number of unknown parameters (the average velocity or displacement field). Since there exist numerous local minima in image data applications—especially those having large featureless regions—we have found it necessary to develop new algorithms for solving the problem.
To improve the performance of the velocity estimation, especially for large displacement motion, we replace the standard differential form with a direct temporal integral of the optical flow conservation constraint (or Displaced Frame Difference (DFD)) equation, and create a nonlinear system. To solve the inverse problem of the flow field estimation, we propose an adaptive framework and employ more stringent performance criteria when applying the motion-compensated compression. Our numerical approach for the flow field estimation proposed in this paper is highlighted in Fig. 1. This paper develops a generic approach that can deliver high performance for both flow field estimation and motion-compensated compression. A difficulty we face is that a moving image scene necessarily contains both featureless and texture-rich regions. Our goal in this paper is to develop a single motion estimation technique, which incorporates the same formalism to treat both types of regions together. This paper is organized as follows: In Section 2, a set of nonlinear system equations with the velocity field model is derived. Section 3 introduces algorithms for this estimator. In Section 4, we deal with the validation of the new algorithms by deriving velocity from synthetic tracer motion within a numerical ocean model, and apply the new technique to video image sequences. Finally, conclusions are drawn in last section. 2. An over-constrained system 2.1. Brightness constancy constraint If we designate I(x, y, t) as the intensity specified in (x, y) coordinates and time t and a velocity vector of the optical flow is v(x, y, t)=(u(x, y, t), v(x, y, t))T, we may write a differential form of the brightness constancy constraint (or optical flow) equation as dIðrðt Þ; t Þ ¼ dt
∂ þ v⋅∇ I ðrðt Þ; t Þ ¼ 0: ∂t
ð1Þ
In order to constrain the image scenes at times t = t1 and t = t2, we integrate Eq. (1) from time t1 to t2 t2
∫t
1
dIðrðt Þ; t Þ dt ¼ Iðrðt 2 Þ; t 2 Þ−Iðrðt 1 Þ; t 1 Þ≡0; dt
where r(t1) and r(t2) are the position vectors at times t1 and t2. If a displacement vector field is defined by Δr ¼ rðt 2 Þ−rðt 1 Þ ¼ vΔt;
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
277
then the direct integral form of the brightness constancy constraint or Displaced Frame Difference (DFD) equation is given by DFD ¼ Iðrðt 1 Þ þ Δr; t 2 Þ−Iðrðt 1 Þ; t 1 Þ ¼ 0;
ð2Þ
where the time difference Δt is equal to t2 − t1. Obviously, the linear approximation (Eq. (1)) can be obtained from the DFD Eq. (2) for small Δr and Δt. The differential form of the optical flow Eq. (1) is linear in the velocity; consequently it is valid only for infinitesimal, or nearly infinitesimal, motions. Using the two successive frames, we can estimate the partial derivatives with respect to time in Eq. (1). For this reason, Eq. (1) is more correctly viewed as describing the motion at an intermediate time between times t1 and t2. On the other hand, the two intensity terms in the DFD Eq. (2) correspond to the initial and final states of motion at times t1 and t2. It is clear that employing the DFD equation for motion estimation can achieve higher accuracy compared with the optical flow Eq. (1), especially for a larger scale displacement motions. 2.2. Velocity field modeling Eq. (2) describes the evolution of optical flow in two successive images. An image having a set of pixels in an Nx ×Ny array, but has 2Nx ×Ny unknown velocity components (u, v). To solve the underconstrained problem, one of the efficient approaches is to expand the velocity field as bilinear polynomial functions or two-dimensional B-Spline functions [28,29]. As a trade-off between simplicity and computational efficiency, we use bilinear polynomial functions to represent the velocity field [28]. We partition the image domain into a number of sub-domain blocks (or tiles) each of which contains an nx × ny array of pixels (Fig. 2). In general, any two-dimensional function can be approximated by Lagrange's bilinear function f ðx; yÞ ¼
1 X 1 X f p þ αnx ; q þ βny Hpþαnx ;qþβny ðx; yÞ;
ð3Þ
α¼0 β¼0
where the function Ha,b (x, y) is defined by 8 > ðnx −x þ pÞ ny −y þ q ða ¼ p∩b ¼ qÞ > > > > > ða ¼ p þ nx ∩b ¼ qÞ 1 < ðx−pÞ ny −y þ q : H a;b ðx; yÞ ¼ nx ny > > ðnx −x þ pÞðy−qÞ a ¼ p∩b ¼ q þ ny > > > > : ðx−pÞðy−qÞ a ¼ p þ n ∩b ¼ q þ n x
Fig. 2. The image array is divided into sub-arrays (or tiles) with n=nx =ny pixels per tile; here, n=4. Node points (marked with solid squares) of the bilinear approximation are indexed with p and q.
and another with discontinuity. Their bilinear approximations (patched images) in Eq. (3) for block size parameters nx = ny = 2 are shown in Fig. 3. Comparing the original and patched images and 3D-mesh plots in Fig. 3, we found that the bilinear expression in Eq. (3) can represent both continuity and discontinuity functions well. All velocities can be calculated with Eq. (4) using the velocity on node points expressed as vpq. Velocity vectors vij = (uij, vij) T for all i ≠ p or j ≠ q in the DFD equations are no longer independent variables when n = nx = ny > 1, except on node points. We can adjust the block size parameter n>1 to control the number of interpolation points related to the resolution of the velocity field and the degree of the overconstraint. The system is over-constrained if n>1, because we have made a simplifying assumption about the form of the velocity field in Eq. (4). In pixel index notation, the DFD equation in Eq. (2) becomes DFDij ¼ I i þ uij Δt; j þ vij Δt; t 2 −I ij ðt 1 Þ ¼ 0:
ð5Þ
Eq. (5) now is a function of variables of two velocity components that depend on node point velocities in Eq. (4), i.e.
y
The quantized indices p and q are functions of x and y and are given
DFDij ¼ DFDij uij ; vij ¼ DFDij upq ; vpq :
by ( fp; qg ¼
)
⌊ ⌋ ⌊ ⌋
x y nx ; ny nx ny
;
where ⌊⌋ denotes an integer operator. The {p, q} serve as block (tile) indices, since the integer operator increments them by unity after additional nx or ny pixels are counted. We denote a function with discrete variables i and j on a pixel as fij(t) = f (i, j, t) or fij = f (i, j). The velocity of two components in an image scene can be expressed by the following bilinear polynomials: vij ¼
1 X 1 X α¼0 β¼0
vpþαnx ;qþβny H pþαnx ;qþβny ði; jÞ;
ð4Þ
where vij = vij(t1) = (uij, vij) T are the velocity component magnitudes. The bilinear form within each block is actually quite adept at representing functions that may possess discontinuities. We demonstrate this with two examples of discrete functions, one with continuity
All independent DFDij equations have only the smaller number of the independent velocities on nodes when the velocity indices i =p(i) and j = q(j). The total number of DFDij equations is N =Nx × Ny for an Nx × Ny image sequence. The number of node points as shown in Fig. 2 is given by Nnode ¼
⌊
Nx −1 þ1 nx
⌋
⌊
Ny −1 þ1 ny
⌋
!
The total number of independent velocity field unknowns with two components upq and vpq is 2 × Nnode. It is clear that this system is over-constrained because the number of DFDij equations for all pixels is greater than the number of independent velocity components upq and vpq (i.e. N> 2 × Nnode) if the block sizes are nx > 1 and ny >1. We can solve the over-constrained system using the nonlinear least-squares model described in the next section to estimate the velocity field upq and vpq when the parameters nx >1 and ny >1.
278
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
Fig. 3. Two examples of functions (original images) and their approximations (down-sampled and patched images).
2.3. A nonlinear least-squares model The presence of possible quantization errors and noise suggests the DFDij equations are never identically zero; however, we can choose a set of vpq for which it is minimized in a least-squares sense. Accordingly, we define a cost function based on Eq. (5) as
MSE ¼
1 2 ∑ DFDij ; N i;j
ð6Þ
ðmþ1Þ
∑→ ∑ ¼ i;j
where i and j go over all pixels in N = Nx × Ny image (i ∈ [0, Nx − 1]∩ j ∈ [0, Ny − 1]). Minimizing the cost function MSE for given indices k and l for all node points in an image, we write the iterative equations for solving the optical flow field based on Gauss–Newton and Levenberg–Marquardt algorithms as
vkl
is used, the algorithm is closer to the Gauss–Newton method with second order convergence. This Levenberg–Marquardt method can improve convergence properties greatly in practice, and has become the standard of nonlinear least-squares routines. More importantly, the summation domain is now reduced from the entire image plane to only a local region Ωkl, so that
ðmÞ ðmÞ −1 ðmÞ ¼ vkl − Akl Bkl ;
ð7Þ
i;j∈Ωkl
kþn x −1 X
lþny −1
X
:
i¼k−nx þ1 j¼l−ny þ1
Modeling the velocity field with our bilinear functions allows both continuous and discontinuous behaviors in the motion field (Fig. 3). In the interior of each block (within the nx × ny region), the expressions are C 2 continuous, but C 1 continuous on the block boundary. The expressions of the velocity or displacement field represent typical physical variations appropriately. 2.4. Maximum motion-compensated prediction
where 0 ðmÞ Akl
ðmÞ !2
∂DFDij
B ðλ þ 1Þ ∑ ðmÞ B i;j∈Ωkl ∂ukl B ¼B ðmÞ ðmÞ B ∂DFDij ∂DFDij @ ∑ ðmÞ ðmÞ i;j∈Ωkl ∂u ∂vkl kl
ðmÞ
∑
i;j∈Ωkl
ðmÞ
∂DFDij ∂DFDij ðmÞ
∂ukl
ðλ þ 1Þ ∑
i;j∈Ωkl
1
C C C C; ! 2 ðmÞ C ∂DFDij A ðmÞ
∂vkl
Most works in the literature use additional constraints to estimate the motion field and often address the inverse problem by minimizing an objective function containing a weighting (penalty) parameter and more than one cost term [1,3]. For example, a global cost function may be given by
ðmÞ
∂vkl
2
Eglobal ¼ Edata þ α Eprior :
and 0
ðmÞ
Bkl
1 ðmÞ ∂DFDij B ∑ DFDðijmÞ C ðmÞ B i;j∈Ω ∂ukl C kl B C ¼B : ðmÞ C B C @ ∑ DFDðmÞ ∂DFDij A ij ðmÞ i;j∈Ωkl ∂vkl
The Levenberg–Marquardt factor λ ≥ 0 is adjusted at each iteration to guarantee that the MSE converges. If a smaller value of the factor λ
Data energy statements of this sort are used to measure the errors of the DFD (DFDij =MCPij − Iij(t1)) or optical flow equations describing an image sequence. The prior energy term (Eprior) with a parameter α is an additional constraint, and α is optimized for the best fit between the ground truth and the solution. It is difficult to find a single optimal value of the parameter for realistic applications if the ground truth flow field is unknown. The MCP is not optimized.
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
A Peak Signal-to-Noise Ratio (PSNR) may be used as an error measure between the MCP or MCI, and the original image is defined by
PSNR ¼ 10 log10
! 2552 : MSE
The derivatives of velocity on a pixel with respect to velocity on a node point are given by
∂uij ∂ukl
The iterations in Eq. (7) are derived based on a least-squares principle and lead directly to a solution of the displacement field with a minimized target function MSE, or a maximized PSNR for the MCP. Since bilinear velocity field modeling is determined by the partition of the image domain into blocks, the target function PSNR is a unique optimized function without any additional weighting parameter. Therefore, the MCP image using the estimated displacement field based on this over-constrained system can be optimized. The adjustable block size approach using a smaller number of velocity components on nodes to generate full dense velocity field provides a powerful capability for motion-compensated compression. If the block size shown in Fig. 2 is n × n for images having dimension Nx × Ny, then the number of transmitted or stored motion vectors for both proposed and BMA estimators equals [(Nx − 1)/n + 1] × [(Ny − 1)/ n + 1] and (Nx / n) × (Ny /n), respectively. Using almost the same number of the displacement vectors in a fixed block size for both approaches, the current framework (the motion field with C 1 continuity obtained by a global optimization strategy) can provide much higher accurate performance than the BMA (the motion field with C0 continuity obtained by local searching strategies), which is the currently adopted standard for video coding. 3. Numerical algorithms Detailed implementations for this motion estimator include computation of the MCP function, partial derivative calculations, the progressive relaxation of the over-constraint algorithm, and the iteration procedures. These are described in this section. 3.1. Computation of motion-compensated prediction The solution of the motion-compensated prediction I(i + uijΔt, j + vijΔt, t2) may yield variables that fall between pixels in an image and must be evaluated by an interpolation function. In order to compute the motion-compensated predictions, the general bilinear interpolation function in Eq. (3) is utilized as follows 1 X 1 X I i þ uij Δt; j þ vij Δt; t 2 ¼ I pþα;qþβ ðt 2 ÞHpþα;qþβ i þ uij Δt; j þ vij Δt ; α¼0 β¼0
where the function Ha,b(x, y) is evaluated when nx = ny = 1 (the tile size in Fig. 2 for this interpolation function of the MCP equals one pixel), and {p, q} = {p(i + uijΔt), q(j + vijΔt)}. 3.2. Computation of the partial derivatives The evaluation of the partial derivatives of the intensity with respect to velocity in Eq. (7) on node requires computation of the spatial gradient and it is given by
8 9 ( ) < ∂u ∂I x; j þ vij Δt; t 2 = ∂DFDij ∂DFDij ∂vij ∂I i þ uij Δt; y; t 2 ij ¼ Δt ; ; x¼iþuij Δt ∂v y¼jþvij Δt ;: :∂ukl ∂x ∂y ∂ukl ∂vkl kl
279
¼
∂vij ∂vkl
¼
1 X 1 X α¼0 β¼0
Hkl ði; jÞδk;pþαnx δl;qþβny ;
where the δij is the Kronecker-Delta symbol. Image intensity possesses noise at the pixel level, and the calculation of these derivatives tends to increase this noise. In order to improve computation accuracy of numerical differentiation, we implement the partial differential method with central differences for differentiation (with mask coefficients {1, − 8, 0, 8, − 1}/12 or {− 1, 9, − 45, 0, 45, − 9, 1}/60). After we evaluate the spatial derivatives of the intensity with respect to x and y on each pixel using the numerical differentiation method, we smooth the resulting gradient fields using a Gaussian low pass filter with standard deviation from 0.375 to 1.125 pixels. Finally, using the general interpolation function in Eq. (3) when nx = ny = 1, we can calculate values of the spatial derivatives in the above equations at any position (x, y) = (i + uijΔt, j + vijΔt) in an image scene.
3.3. Progressive relaxation of the over-constraint Since there are featureless regions in the real world image scenes, the estimated motion fields are not unique in most cases. This is problematic, because moving objects can be observed physically only through the changing texture morphology associated with motion in an image sequence. In contrast, the same physical motion may be not physically observed and measured in featureless regions. The determination of displacement vector or motion observation within texturerich or texture-poor morphologies is the well-known aperture problem [1,3]. The physical system with unique or multiple possible displacements observed by this spatially variable texture field allows the mathematical model to provide multiple solutions of the displacement vectors on a fixed pixel. Unfortunately, the linear optical flow equation yields only one solution in these featureless texture fields, which is almost certainly not the correct answer. On the other hand, the multiple motion solutions (roots) from the nonlinear DFD equation are far more likely to contain a correct answer consistent with the motion we actually track. Clearly, all featureless regions in an image sequence can produce a collection of motions, which are all possible candidates for the answer compatible with observation. Our goal in this paper is to seek motion fields consistent with actual physics rather than the mathematically plausible answers delivered by the technique. This involves the challenging task of sorting through the multiple minima delivered by the nonlinear optimization; this is what we call the Global Optimal Solution (GOS). However, the derived GOS, may not contain vectors which are mutually consistent with their neighbors in featureless regions. To remedy this potentially unrelistic result, we propose the Progressive Relaxation of the Over-Constraint (PROC) algorithm in this paper. Using the block size parameter n to control the degree of the over-constraint from higher to lower during the iteration procedures (PROC is detailed in the next subsection), we regulate the velocity field progressively to achieve a final valid solution. The initial values of the parameters n0 = nx = ny are selected as > n (the preset block size parameter representing a lower degree of over-constraint). We then progressively relax (decrease) the parameters nx and ny from their initially larger values to smaller ones and decrease their value every Nth iteration until they approach a preset value of n.
280
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
A criterion for motion vector consistency (v = Δr when Δt = 1) with its neighbors in featureless regions is defined by 8 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 2 2 h jþr < iþr X X 1 uij −ukl þ vij −vkl Gij ≤ε∩ðk≠i∪l≠jÞ ; C¼ ∑ Nc i;j k¼i−r l¼j−r : 0 otherwise
where Gij is a gradient of the intensity Iij on a pixel used for texture morphology evaluation and is given by vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! !2 u u ∂Iij 2 ∂Iij Gij ¼ t þ ; ∂x ∂y where Nc is the number of pixels in the featureless region when Gij ≤ ε, ε is a threshold of the gradient Gij, and r is a pixel neighborhood around a fixed pixel (i, j) point. A smaller value of the C criterion corresponds to a higher consistency of the vector with its neighbors. The experimental evaluation of the PROC performance by the criterion of consistency and more detailed discussion of the PROC are demonstrated in the Section 4.6.
Fig. 4. Convergence properties of the PSNR vs. number of converged iterations using the PROC algorithm for an image sequence.
3.4. Iteration procedures All initial velocity vectors (Δr = v when Δt = 1) are preset to 0.01 for most displacement scale ranges. When the displacement scale is very large, the initial values of the velocity vectors can be assigned using the vectors derived by the BMA method. The actual iteration calculation depends upon block size parameter n. The MSE in Eq. (6) may be written MSE(n). To start the iteration, we select a value of the Levenberg–Marquardt factor λ = 10−3 and n0 = n + Δn (Δn > 0) for the PROC procedure, find MSE(m)(n0) from Eq. (6) with the initial velocity field, iterate Eq. (7) to a solution, and update new MSE (m + 1)(n0) using the new iterated velocity field, where m is the index of the iteration. If this iteration does not converge, we increase λ by a factor of 10 and repeat the previous iteration until a converged velocity field is found. Otherwise, we continue iterating. As is typical of all Gauss–Seidel algorithms, Eq. (7) is updated with the (m) (m+1) current velocity vectors vkl to enable calculation of vkl . The iteration proceeds using a fixed value of the block size n0 and terminates when (m) (m+1) (m) Δvkl =vkl −vkl ≅ 0, or λ or the number of the iteration for a fixed block size n0 approaches maximum bounds (λmax =105). We then select n0 =n0 −1 and iterate again until n0 =n. The velocity vectors corresponding to the global nadir in MSE(n) are the derived vectors, which are the best fit to the optical flow displacement in the image pair. A typical convergence curve demonstrating PSNR versus the converged iteration index for estimating the velocity field using the PROC algorithm is shown in Fig. 4. The second order converge properties by the Gauss–Newton and Levenberg–Marquardt methods (Fig. 4) indicate the PSNR for the image sequence has a sharp increase within first five iterations and approaches a static value 43 (dB) after forty four iterations. 4. Experiments The performance test of the estimators includes a benchmark test for optical flow and an error evaluation of frame interpolation. The optical flow field estimated from an image sequence is used to compare with a ground truth flow field using average angular and magnitude errors as the benchmark tests. The error evaluation of frame interpolation is an indirect test in which the compared objects are between the ground truth image and the image interpolated by the motioncompensated interpolation (MCI) using a ground truth or estimated flow field. The second evaluation method is adapted for the applications of motion-compensated compression and frame rate up-conversion. In
the applications of the motion-compensated compression, a large temporal compression ratio requires that a motion field across several frames so that all frames except a reference frame (first or last one) can be synthesized by the MCP or MCI based on the single estimated motion field. In these applications [41], what is important is not how well the estimated motion field agrees with the physical motion, but how well the synthesized motion pictures with minimized distortion and without artifacts and dirty window effects can be predicted or interpolated by MCP and MCI techniques using the motion field. Several test datasets are available [40,41]. Middlebury is one such dataset recently introduced in computer vision. Unfortunately, almost all of the motion fields in these test cases are zoom-in/out, translational, or close-translational motion. The Middlebury dataset has some issues. For example, the flow field lacks spatial variability; image scenes have featureless regions; and, large observable errors exist in the ground truth flow fields. Two examples of ground truth flow field errors in the Middlebury dataset are shown in Fig. 5, and the error analysis of the ground truth flow fields is shown in Appendix A. We extend the test cases having simple spatial motion variability in computer vision to a more complicated case containing rotational and deformational motion (Fig. 6a). In addition, we use synthetic texturerich image sequences based on a simulation model [28,29] for benchmark evaluation. We tested only the interpolation performance with the Middlebury dataset for the purpose of the motion-compensated compression, because of the problems of the Middlebury dataset. Moreover, a comparison between the linear and nonlinear models for optical flow estimation, and three examples in applications for the motioncompensated compression are also demonstrated in this section. For comparison convenience, we will use Optical Flow Estimation for Motion-Compensated Compression (OFEMCC) for the proposed approach. We implement the method created by Horn–Schunck (H–S) using suggested improvement techniques by Barron et al. [1] with a smoothed gradient (Gaussian low pass filter) because it produces more accurate results in these test cases. We also implement the method (2D-CLG) developed by Bruhn et al. [25] and BMA for selected comparison techniques. 4.1. Error measurement To evaluate the errors involved in (OFEMCC), we apply it to a numerical model, because the model velocities that generate the
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
281
Fig. 5. Two velocity fields for lack of spatial variability: (a) Yosemite and (b) Venus ground truth flow fields.
observed tracer motion are known exactly. We employ angular and magnitude measures of error [1,41], and use the mean values of these errors to evaluate the performance of the velocity estimations for this numerical model image sequence. Velocity may be written as v = (u, v, w = 1), so that the mean values of the angular and magnitude ^ ij and our estimate vij are errors between the correct velocity v 0 1 ^ ij þ vij ⋅v^ ij þ 1 uij ⋅u 1 B C AAE ¼ ∑ arccos@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA; N i;j ^ 2 þ v^ 2 þ 1 u2 þ v2 þ 1 u ij
ij
ij
ij
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2 IE ¼ ∑ ½I ði; jÞ−IGT ði; jÞ ; N i;j MCI
or the PSNR measurement. 4.2. Benchmark evaluations
and AME ¼
The interpolation error (IE) [41] for the evaluation between the MCI image and the ground truth intermediate image is measured by the root-mean-square error
1 ∑ N i;j
r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2ffi ^ ij þ vij −v^ ij ; uij −u
where angular errors are in degrees and N=Nx ×Ny and v=Δr (Δt=1). The average angular and magnitude errors (AAE and AME) between the ^ and an estimate v are used to evaluate the performance correct velocity v of the velocity estimations.
We use the Yosemite sequence [40,41] without clouds and a numerical model solution (see Chen et al. [28]) as benchmark. The simulation image sequences are used in the test cases between times from t = 18 h to 20 h and t = 16 h to 20 h. The second simulation image sequence (Δt = 4 h) provides larger displacement vector field. The AAE and AME results of estimated motion fields by the OFEMCC, H–S (new), and 2D CLG methods with the Yosemite and simulation dataset are shown in Tables 1 and 2.
Fig. 6. (a) An average of the vector field generated by the simulation model is superimposed on the image at time t1 = 16 h. (b) A vector field estimated by OFEMCC using data with Δt = 4 h is superimposed on the image at time t2 = 20 h.
282
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
Table 1 Average angular and magnitude errors between the results [25,41] from the different methods with 100% density. Data refer to the Yosemite sequence without clouds. Method
AAE
AME
H–S 2D-CLG OFEMCC
2.64° 2.64° 3.66°
0.140 0.10 0.411
Table 2 Average angular and magnitude errors between the results from the different methods with 100% density. Data refer to the ocean simulation model. Method
AAE (Δt = 2 h)
AME (Δt = 2 h)
AAE (Δt = 4 h)
AME (Δt = 4 h)
H–S 2D-CLG OFEMCC
10.2° 11.4° 6.30°
0.374 0.431 0.221
14.9° 16.8° 6.56°
0.965 1.09 0.390
The numerical model ground truth vectors (Δt = t2 − t1 = 4 h) are shown in Fig. 6a. For comparison, vectors estimated by the OFEMCC with the block size parameter n=2 are shown in Fig. 6b. We show the false color presentation of images (192×192 pixels) at time t=16 h and 20 h in the background in Fig. 6a and b, respectively. The two velocity vector fields in Fig. 6 are in good agreement, with each showing a distribution of eddies having diameters on the order of 25 pixels and more gently curved tendril structures with larger radii of curvature. We show the AAE (Fig. 7) and AME results (Fig. 8) as a function of the block size parameter n for the OFEMCC and weighting parameter α for the H–S and 2D-CLG methods using two simulation image sequences with Δt = 2 and 4 h. Test results with the Yosemite and simulation data using the three estimators exhibit two different results. The results by the methods with the Horn and Schunck's regularization (or smoothness) constraint agree well with the Yosemite data, and the results estimated by the proposed work agree well with the simulation data. As we mentioned above, most of the image scenes such as Yosemite (around the water falls) and data in Middlebury contain some featureless regions. The motion fields within these regions are not unique because the initial and final positions of a moving particle cannot be physically observed and determined in a featureless region (i.e., the aperture or ill-posed problem). A stronger additional (smoothness) constraint, such as the regularization by the Horn and Schunck, may provide a
retrieved motion field that may agree well with the physical motion (ground truth) field if most of the motions are simple zoom in/out, translational, or close translational motions as Fig. 5. However, the penalties of the stronger smoothness constraint and processing have been demonstrated through the simulation image sequence for complicated rotational and deformational motion fields. Higher frequency spatial variability or flow field gradients may be filtered out by the smoothness constraint and processing. The information lost through convolution and stronger smoothness constraints cannot be evaluated correctly by the test cases with the flow fields for their lack of spatial variability. The curves by the H–S and 2D-CLG in Figs. 7 and 8 show that the AAE and AME are very sensitive to the value of the weighting parameter; therefore, it is difficult to find a fixed optimal value of the parameter in real-world applications if the ground truth flow field is unknown. The performance demonstration using both smaller and larger displacement vector fields shown in Figs. 7 and 8 indicates that the linear approximation such as is used with the optical flow equation is valid only for estimating smaller displacement fields. In contrast, the growth of the AAE and AME errors as the scale increases indicates the displacement vector field is nonlinear. 4.3. Frame interpolation test The frame interpolation test with the Middlebury dataset is shown in Table 3. The IE in the Table 3 is also plotted bar-chart form (Fig. 9) for a quick comparison. The interpolated images using the ground truth (GT) flow field and the flow fields estimated by the H–S, 2D-CLG, BMA, and OFEMCC are compared to the intermediate (ground truth or original) frames between frames 10 and 11. Two examples of the interpolated Venus and Urban2 frames (using the ground truth flow fields and motion fields estimated by the OFEMCC and original intermediate frames) are shown in Figs. 10 and 11. These interpolated frames are designated as #10 +1/2 in the figure. The PSNR values of the images, interpolated by the ground truth and OFEMCC flows, are labeled on the Figs. 10 and 11. Comparing the MCI images synthesized by the ground truth flows and the OFEMCC flows in Figs. 10 and 11, we see that the distortions caused by the ground truth flows can be observed in the same regions labeled in Fig. A.1 (Appendix A). The performance of the interpolated frames measured by the IE for all eight Middlebury datasets using the proposed method shows better results than all other methods, including the ground truth flow field. The larger errors appearing in estimating the flow fields by
Fig. 7. Plots of error measurement generated by the estimators with ocean simulation model data at time t1 = 18 h and t2 = 20 h: (a) AAE and (b) AME vs. block size parameter n for the OFEMCC and for the H–S and 2D-CLG with weighting parameter α.
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
283
Fig. 8. Plots of error measurement generated by the estimators with ocean simulation model data at time t1 = 16 h and t2 = 20 h: (a) AAE and (b) AME vs. block size parameter n for the OFEMCC and for the H–S and 2D CLG with weighting parameter α.
the H–S and 2D-CLG methods with large-scale displacements, such as the Urban2 and Urban3 datasets, confirm that the linear optical flow equation is valid only for motions with small-scale displacements. The frame interpolation test confirms the error issue of the ground truth flows of the Middlebury datasets as shown in Appendix A because the observable distortions of the MCI images synthesized by the ground truth flows and error evaluations in Table 3 provide enough experimental evidence for this conclusion. 4.4. Comparison of linear and nonlinear models The theoretical analysis and benchmark comparison between the DFD equation and optical flow equation indicate that the first equation outperforms the second, especially for large-scale displacement. The performance differences of the frame interpolation test in the previous subsection using Middlebury datasets with small and large displacement scales confirm this conclusion again. In what follows, we demonstrate the performance differences of the frame interpolation between the linear and nonlinear equations with the same image sequences with a range of displacement scales. The first white Taxi image sequences (96 × 72 × 3) are from the first to third frames with a smaller displacement scale, and the second sequences (96×72×9) is from the first to ninth frames with the displacement scale four times larger than the first one. The optical flows estimated by all methods are based on the image pairs of the first-third and firstninth frames. The MCP and MCI images synthesized from the OFEMCC, BMA, H–S, and 2D-CLG flows with the small displacement scale are shown in Fig. 12. All parameters of the estimators are optimized for the frame interpolation. The average PSNR values for both MCP and MCI images using the four different estimated fields are labeled on Fig. 12. The results of the MCP and MCI images for both linear and nonlinear
models indicate that the moving objects (white taxi in the scenes) derived by the motion fields do not have the common distortions, artifacts, and dirty window effects for this small-scale displacement motion except for the BMA method. The MCP and MCI images using the motion fields estimated by the four methods from the larger displacement sequences are shown in Fig. 13. There are nine taxi frames in these sequences. The motion fields are estimated by the BMA and OFEMCC methods using the block size = 4 × 4. The artifacts, dirty window effects, and distortions of the moving object can be observed on the MCP and MCI images using the BMA, H–S, and 2D-CLG derived motion fields from the large-scale motion as shown in Fig. 13. The synthesized MCP and MCI images demonstrate that the OFEMCC method yields overall more accurate predictions and interpolations without neither artifacts nor dirty window effects, including the observable distortion in comparison with the BMA, H–S and 2D-CLG methods as shown in Fig. 13. As seen in Fig. 13, it is impossible to warp the last frame to the first frame employing the linear optical flow equation for large displacement estimation. Comparing the relative positions between the tail of the white taxi and the parked white beetle on the MCI 9 and MCP 1 images by the H–S and 2D-CLG methods as well as the original images, we can find that the white taxi in the MCP 1 image appears not to move, but remains in the MCI 9 position during the entire sequence. The MCP and MCI images, using the motion field estimated by the BMA method, exhibit artifacts and dirty window effects only, but without observable distortion. The BMA method is based on two assumptions: 1. brightness conservation constraint (using the DFD equation), and, 2. uniform displacement field in a block template for the matching algorithm. The uniform displacement field assumption in a block template for the BMA approach is not always validated for all motion cases. For a simple example, the BMA method cannot handle the cases in
Table 3 Evaluation results of interpolation error (IE) and Peak Signal-to-Noise Ratio (PSNR) on the Middlebury dataset (gray images). IE/PSNR (dB)
Dimetrodon
Grove2
Grove3
Hydrangea
RubberWhale
Urban2
Urban3
Venus
GT Flow H–S 2D CLG BMA OFEMCC
2.73/39.4 2.66/39.6 2.66/39.6 3.69/36.8 2.28/41.0
7.49/30.6 8.16/29.9 8.26/29.8 8.32/29.7 5.80/32.9
13.1/25.8 14.5/24.9 14.5/24.9 13.5/25.5 8.67/29.4
9.41/28.7 5.45/33.4 5.55/33.2 6.93/31.3 3.75/36.7
2.63/39.7 2.55/40.0 2.59/39.9 2.83/39.1 1.81/43.0
4.52/35.0 11.6/26.9 11.6/26.9 5.09/34.0 3.12/38.2
6.52/31.8 8.81/29.2 8.89/29.1 7.53/30.6 4.11/35.9
6.87/31.4 6.02/32.5 5.80/32.9 7.60/30.5 4.00/36.1
284
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
Fig. 9. The interpolation error for all eight Middlebury datasets between the ground truth frames and the interpolated frames using the ground truth (GT) flow and the flow fields estimated by the H–S (new), 2D-CLG, BMA, and OFEMCC.
which there are multiple motions with different speeds and directions within a block template. This is because the complicated motions violate the uniform motion field assumption. However, the moving white taxi on the MCP and MCI images with the BMA method is correctly located spatially, although artifacts and dirty window effects are present. The frame interpolation tests in Figs. 12 and 13 provide experimental evidence that the motion estimation with the direct temporal integral of the brightness conservation constraint (DFD) equation can perform more accurately overall than the optical flow equation, especially for large-scale displacement motion.
comparison to other approaches. The proposed adaptive framework can store or transmit a smaller number of the motion field parameters, as well as the BMA method for digital video coding. However, the motion field estimated by the OFEMCC method is continuous between two blocks, and has much higher accuracy compared to the BMA. In order to demonstrate this feature, we choose the block size parameters to be 8 × 8 for these two test cases. The original frames of the Foreman from 28 to 32 are shown in the first row in Fig. 14. The MCP and MCI frames by the OFEMCC are depicted on the second rows in Fig. 14. The flow fields are estimated by the OFEMCC, H–S, 2D-CLG, and BMA methods from the red band frames between 28 and 32 (five frame sequences). The block size parameter n is equal to eight (8× 8). The total number of the displacement components (Nx × Ny × 2/n 2) for transmission or storage by the OFEMCC and BMA is 3168. The total number of the displacement components for transmission or storage by the H–S and 2D-CLG methods is 202752. A total compression ratio (CR) is the product of the spatial CRs and the temporal CRt if we do not consider a spectral compression. When we take the two-components of the motion vector into account (the factor 2 in the denominator), the temporal CRt is given by CRt ¼
bM ; b þ n22
where b, M, and n are the number of bands (color b = 3 and gray b = 1) of the image, the total number of the frames in the sequences (from first to last), and the block size parameter, respectively. To compute the CRt for different techniques, the parameters b and M in the CRt function are the same, except for the block size parameter n, which we set ≡1 for other techniques and > 1 for the proposed and BMA techniques. The total compression ratio in the Foreman test case (b= 3, M = 5, and n = 8) is thus given by
4.5. Motion-compensated compression
CR ¼ CRt CRs ≈5 CRs :
Two standard image sequences, Foreman and Army, are employed to demonstrate the performance of removing temporal redundancy in image sequences. The motion field is estimated based on the first and last image frames in the multiple frame sequences, and all intermediate frames are dropped. Sequences from the MCP and MCI motion pictures are interpolated by the motion-compensated techniques using the estimated motion field. The PSNR is used for evaluation of the distortion between the interpolated and the original images. As in the BMA method, a major feature of the proposed framework is that the full, continuous motion field can be generated by a smaller number of on-node velocity vectors in Fig. 2 for the video coding by
Using the proposed framework, we have a great potential to obtain a larger compression ratio with acceptable distortion that depends on the compression ratio CRs for the video coding. Similarly, we also applied the proposed estimator to the Army image sequences (584× 388) from frame 8 to frame 13. The flow field is estimated from the red band frames between 8 and 13. The original and the interpolated frames in color using the same flow field are depicted in Fig. 15. The temporal CRt in the Army (b = 3, M = 6, and n = 8) test cases is approximately equal to 5.94. The temporal compression ratios, mean PSNR, and block size parameter n for the Foreman, Army, and Taxi datasets by different methods
Fig. 10. Venus images (420 × 380), Left: interpolation image using ground truth flow; Center: original (ground truth) image frame # 10 + 1/2; Right: interpolation image using the flow estimated by the OFEMCC.
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
285
Fig. 11. Urban2 images (640 × 480), Left: interpolation image using ground truth flow; Center: original (ground truth) image frame # 10 + 1/2; Right: interpolation image using the flow estimated by the OFEMCC.
Fig. 12. White taxi image sequences (96 × 72 × 3) and interpolated frames using the motion fields estimated by the OFEMCC, BMA, H–S, and 2D-CLG methods. The PSNR values labeled on the figure are the average of the MCP and MCI images compared to the original images, respectively.
Fig. 13. White taxi image sequences (96 × 72 × 9) and interpolated frames using the motion fields estimated by the OFEMCC, BMA, H–S, and 2D-CLG. The PSNR values labeled on the figure are averages of the MCP and MCI images compared to the original images, respectively.
286
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
Fig. 14. Original Foreman image (352×288) sequences (frames: 28–32), interpolated frames using flow field estimated by the OFEMCC (block size: 8× 8), H–S, 2D-CLG, and BMA (block size: 8×8).
are listed in Table 4. The parameters in the temporal compression ratio for the Taxi dataset in Fig. 13 are b = 1, M = 9, and n = 4. Visual comparison of the synthesized and original images indicates that the proposed framework yields accurate predictions and
interpolations overall without any artifact and dirty window effects compared with the other three methods as shown in Figs. 12 and 16. Moreover, the variable block sizes for different applications can reduce the stored and transmitted motion field parameters and
Fig. 15. The synthesized Army motion pictures using the flow field estimated by the OFEMCC with block size 8 × 8.
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289 Table 4 Examples of temporal compression ratios, mean PSNR values, and block size parameters by different motion estimation methods. {CRt, PSNR dB, n}
Foreman
Army
Taxi
H–S 2D-CLG BMA OFEMCC
{3, 27.6, 1} {3, 26.6, 1} {4.95, 28.3, 8} {4.95, 29.5, 8}
{3.6, 30.5, 1} {3.6, 26.3, 1} {5.94, 31.3, 8} {5.94, 34.1, 8}
{3, {3, {8, {8,
21.4, 1} 20.6, 1} 23.4, 4} 32.9, 4}
287
by the MCI technique employing the motion fields estimated by the proposed method with and without the PROC algorithm as shown in Fig. 17. The consistency criteria C when r = 1, with and without the PROC algorithm, are 1.23 and 2.25, respectively. The higher performance of the frame interpolation with the PROC algorithm corresponds to a lower value of C that is consistent with the analysis of the PROC algorithm in Section 3.3. Clearly, the PROC algorithm helps the nonlinear system approach a global optimal solution and seeks a motion field, in which each vector is consistent with its neighbors in featureless regions. The distortions of the MCI images evaluated by the PSNR values are equal to 35.1 (dB) and 31.7 (dB), respectively. The dirty window effect can be found in the MCI image inferred by the motion field without the PROC algorithm as shown in Fig. 17. The comparison of the PSNR values between two MCI images in Fig. 17 indicates that the distortion of the image derived by PROC algorithm can be reduced dramatically.
5. Conclusion
Fig. 16. An image of intensity gradient of Taxi frame 1 and histogram of the gradients.
increase the temporal compression ratio significantly as shown in Table 4. The single target function of the MSE or PSNR can result in optimal motion-compensated prediction and interpolation images for digital video compression applications. 4.6. Performance of the PROC Algorithm The adaptive framework employs the PROC algorithm to improve the performance for optical flow estimation. The consistency of neighborhood vectors in featureless regions for a fixed pixel point can be evaluated by the criterion defined in Section 3.3. Here, we demonstrate the difference with and without the PROC algorithm by the frame interpolation and consistency criterion using the Taxi sequences (96 × 72 × 3). An image of the intensity gradient for Taxi frame 1 and its histogram are shown in Fig. 16. The histogram of the gradients indicates that the Taxi images contain large featureless areas. The motion fields from the Taxi image sequence (frame 1 and frame 3) are derived with the PROC and without the PROC. Two intermediate frames are synthesized
In this paper, we have presented an adaptive framework for solving the optical flow for motion-compensated compression. Using the nonlinear DFD equations, modeling the velocity field, and least-squares model, we formulate iterative equations based on Gauss–Newton and Levenberg–Marquardt algorithms. We also propose an algorithm for the progressive relaxation of the over-constraint on a flow field, in which each vector is consistent with its neighbors. The overarching goal of optical flow estimation is to make the observed (actually tracked) motion consistent with the physical one (i.e., ground truth). However, these two motion fields are usually inconsistent, especially within the featureless images. Since the displacement vector of a moving particle in the featureless image scenes cannot be physically observed, measured, or determined (ill-posed problem), the solution from this physical system is not unique. Imposing a stronger smoothness constraint and spatial/temporal convolution processing may regulate the estimated flow field to match some special test cases where the optical flow has weak spatial variability as shown in Fig. 5. However, the smoothness constraint may not hold for all test cases. The penalties for using the stronger smoothness constraint and processing have been demonstrated through the simulation image sequences for complicated rotational and deformational motion fields in Fig. 6. Higher frequency spatial variability of the flow field may be filtered out by the smoothness constraints and processing. The information lost through convolution and stronger smoothness constraints cannot be adequately evaluated by the test cases with the flow field because they have weak spatial variability. In this regard, the image sequences with the simple type flow fields do not provide sufficiently challenging tests of the capability of the system proposed in this paper.
Fig. 17. Two MCI images using the motion fields with and without the PROC algorithm.
288
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289
For the motion-compensated compression, how an estimated optical flow matches a physical motion is important. Additional concerns are how well the synthesized motion pictures exhibit minimal distortion, eliminate artifacts and dirty window effects, and have low coding motion vector cost. The estimated flow field in this paper is based on a single minimized target function that leads to optimized motion-compensated predictions for applications of the motion-compensated compression without any penalty parameters. With almost the same number of the displacement vectors in a fixed block size for both proposed and BMA, the proposed method can provide much higher accurate performance than the BMA, which is the currently adopted standard for video coding. The benchmark tests indicate that the flow fields with stronger spatial variability (simulation data) estimated by the new framework outperform the other tested methods. In the frame interpolation tests, the interpolated frames measured by the IE for all eight Middlebury datasets using the proposed method show better results than all other methods including the ground truth flow field. The frame interpolation tests also provide experimental evidence that the nonlinear DFD equation can perform more accurately overall than the optical flow equation, especially for large-scale displacement motion. In the demonstration of the motion-compensated compression, the MCP and MCI images synthesized by the proposed framework yields more accurate predictions and interpolations overall. In addition, they have no artifact and dirty window effects, in contrast with the other three methods. In summary, the proposed framework presents a great potential to obtain a larger compression ratio with acceptable distortion for video coding.
Acknowledgements This research work was supported by the Office of Naval Research through the project WU-4279-02 at the Naval Research Laboratory.
Appendix A. Error analysis for Middlebury ground truth To verify how accurate the ground truth motion fields such as Middlebury are in computer vision [41], we employed a simple method to find the distortion of the synthesized image using the ground truth flow field employing a motion-compensated prediction. We used the ground truth motion fields Δr to synthesize Venus and Urban2 images (I(r, t1) ← I(r + Δr, t2)) as shown in Fig. A.1. Some residues are seen on the synthesized images in Fig. A.1. All the distortions on the MCP images indicate that the ground truth flow fields in the Middlebury datasets introduce large and observable errors.
References [1] J.L. Barron, D.J. Fleet, S.S. Beauchemin, Performance of optical flow techniques, Int. J. Comput. Vis. 12 (1) (1994) 43–77. [2] C. Stiller, J. Konrad, Estimating motion in image sequences: a tutorial on modeling and computation of 2D motion, IEEE Signal Process. Mag. 16 (4) (1999) 70–91. [3] B. Horn, B. Shunck, Determining optical flow, Artif. Intell. 17 (1981) 185–203. [4] B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, DARPA Proc, of Image Understanding Workshop, 1981, pp. 121–130. [5] B.D. Lucas, Generalized image matching by the method of differences, PhD thesis, Carnegie Mellon Univ., 1984. [6] J. Bigun, G.H. Granlund, J. Wiklund, Multidimensional orientation estimation with applications to texture analysis and optical flow, IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991) 775–790. [7] L. Alvarez, J. Esclar´ın, M. Lef´ebure, J. S'anchez, A PDE model for computing the optical flow, Proc. XVI Congreso de Ecuaciones Diferenciales y Aplicaciones, Las Palmas de Gran Canaria, Spain, 1999, pp. 1349–1356, (Intelligence, 13(8):775–790). [8] G. Aubert, R. Deriche, P. Kornprobst, Computing optical flow via variational techniques, SIAM J. Appl. Math. 60 (1) (1999) 156–182. [9] M.J. Black, P. Anandan, The robust estimation of multiple motions: parametric and piecewise smooth flow fields, Comput. Vision Image Underst. 63 (1) (1996) 75–104. [10] F. Heitz, P. Bouthemy, Multimodal estimation of discontinuous optical flow using Markov random fields, IEEE Trans. Pattern Anal. Mach. Intell. 15 (12) (1993) 1217–1232. [11] A. Kumar, A.R. Tannenbaum, G.J. Balas, Optic flow: a curve evolution approach, IEEE Trans. Image Process. 5 (4) (1996) 598–610. [12] H.H. Nagel, Constraints for the estimation of displacement vector fields from image sequences, Proc. Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, vol. 2, 1983, pp. 945–951. [13] H.H. Nagel, W. Enkelmann, An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986) 565–593. [14] H.H. Nagel, Extending the ’oriented smoothness constraint’ into the temporal domain and the estimation of derivatives of optical flow, in: O. Faugeras (Ed.), Computer vision — ECCV ’90, Lecture Notes in Computer Science, vol. 427, Springer, Berlin, 1990, pp. 139–148. [15] P. Nesi, Variational approach to optical flow estimation managing discontinuities, Image Vision Comput. 11 (7) (1993) 419–439. [16] F. Glazer, et al., Scene matching by hierarchical correlation, Proc. IEEE Comp. Vision Pattem Recognition Conf., (Washington, DC), June 1983. [17] H. Ghanbari, M. Mills, Block matching motion estimations: new results, IEEE Trans. Circuit Syst. 37 (1990) 649–651. [18] V. Seferidis, M. Ghanbari, General approach to block-matching motion estimation, J. Opt. Eng. 32 (July 1993) 1464–1474. [19] J. Shi, C. Tomasi, Good features to track, CVPR, 1994, pp. 593–600. [20] J. Konrad, E. Dubois, Estimation of image motion fields: Bayesian formulation and stochastic solution, Proc. IEEE 1988 Int. Conf. on Acoustics, Speech, and Signal Processing, Apr. 1988, pp. 1072–1075. [21] M. Proesmans, et al., Determination of optical flow and its discontinuities using non-linear diffusion, in: J.-O. Eklundh (Ed.), Computer vision — ECCV ’94, Lecture Notes in Computer Science, vol. 801, Springer, Berlin, 1994, pp. 295–304. [22] J. Weickert, C. Schnorr, A theoretical framework for convex regularizers in PDE-based computation of image motion, Int. J. Comput. Vis. 45 (3) (2001) 245–264. [23] D.J. Fleet, A.D. Jepson, Computation of component image velocity from local phase information, Int. J. Comput. Vis. 5 (1) (1990) 77–104. [24] B. Galvin, B. McCane, K. Novins, D. Mason, S. Mills, Recovering motion fields: an analysis of eight optical flow algorithms, Proc. 1998 British Machine Vision Conference, Southampton, England, 1998. [25] A. Bruhn, J. Weickert, C. Schnorr, Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods, Int. J. Comput. Vis. 61 (3) (2005) 211–231.
Fig. A.1. The residue in MCP images synthesized by the ground truth flow field indicates that the ground truth flow fields in Middlebury introduce large and observable errors.
W. Chen, R.P. Mied / Image and Vision Computing 31 (2013) 275–289 [26] N. Papenberg, A. Bruhn, T. Brox, S. Didas, J. Weickert, Highly accurate optic flow computation with theoretically justified warping, Int. J. Comput. Vis. 67 (2) (2006) 141–158. [27] S. Uras, F. Girosi, A. Verri, V. Torre, A computational approach to motion perception, Biol. Cybern. 60 (1988) 79–87. [28] W. Chen, R.P. Mied, C.Y. Shen, Near-surface ocean velocity from infrared images: global optimal Solution to an inverse model, J. Geophys. Res. 113 (2008) C10003, http://dx.doi.org/10.1029/2008JC004747. [29] W. Chen, A global optimal solution with higher order continuity for the estimation of surface velocity from infrared images, IEEE Trans. Geosci. Remote Sens. 48 (4) (2010) 1931–1939. [30] W.B. Thompson, Exploiting discontinuities in optical flow, Int. J. Comput. Vis. 30 (3) (1998) 163–173. [31] J.D. Robbins, A.N. Netravali, Recursive motion compensation: a review, in: T.S. Huang (Ed.), Image Sequence processing and Dynamic Scene Analysis, Springer-Verlag, Berlin, Germany, 1983, pp. 76–103. [32] C. Cafforio, F. Rocca, The different method for motion estimation, in: T.S. Huang (Ed.), Image Sequence Processing and Dynamic Scene Analysis, Springer-Verlag, New York, 1983, pp. 104–124. [33] D.R. Walker, K.R. Rao, Improved pel-recursive motion compensation, IEEE Trans. Commun. COM-32 (Oct. 1984) 1128–1134.
289
[34] J. Shen, Wai-Yip Chan, A novel code excited pel-recursive motion compensation algorithm, IEEE Signal Process. Lett. 8 (4) (April 2001). [35] D.J. Fleet, A.D. Jepson, Computation of component image velocity from local phase information, Int. J. Comput. Vision 5 (1990) 70–104. [36] Z. Wei, D.-J. Lee, B.E. Nelson, J.K. Archibald, B.B. Edwards, FPGA-based embedded motion estimation sensor, Int. J. Reconfig. Comput. 2008 (636135) (2008) 8. [37] G. Botella, A. García, M. Rodríguez, Ros, U. Meyer-Baese, M.C. Molina, Robust bioinspired architecture for optical flow computation, IEEE Trans. Very Large Scale Integr. VLSI Syst. 18 (4) (April, 2010) 616–630, http://dx.doi.org/10.1109/ TVLSI.2009.2013957. [38] Diego González, Guillermo Botella, Soumak Mookherjee, Uwe Meyer-Bäse and Anke Meyer-Bäse, NIOS II processor-based acceleration of motion compensation techniquesProc. SPIE 8058 (2011) 80581C, http://dx.doi.org/10.1117/12.883684. [39] V. Mahalingam, K. Bhattacharya, N. Ranganathan, H. Chakravarthula, R.R. Murphy, K.S. Pratt, A VLSI architecture and algorithm for Lucas–Kanade-based optical flow computation, IEEE Trans. Very Large Scale Integr. VLSI Syst. 18 (2010) 29–38. [40] B. McCane, K. Novins, D. Crannitch, B. Galvin, On benchmarking optical flow, Comput. Vision Image Underst. 84 (2001) 126–143. [41] S. Baker, D. Scharstein, J.P. Lewis, S. Roth, M.J. Black, R. Szeliski, A database and evaluation methodology for optical flow, Int. J. Comput. Vis. 92 (2011) 1–31, http://dx.doi.org/10.1007/s11263-010-0390-2.