SIGNAL PROCESSING:
IN&GE COMMUNICATION
ELSEVIER
Signal Processing:
Image Communication
7 (1995) 503-527
Estimation and compensation of accelerated motion for temporal sequence interpolation Michel Chahine”, I, Janusz Konrada,bT * a Department of Electrical Engineering, McGill University, 3480 University Street, Montreal, Qukbec, Canada H3A 2A 7 blNR4T&communications, Institut National de la Recherche Scientifique, 16 Place du Commerce, Verdun (Ile-des-Soeurs), Q&bee, Canada H3E IH6
Abstract This paper makes two contributions to the area of motion-compensated processing of image sequences. First contribution is the development of a framework for the modeling and estimation of dense 2-D motion trajectories with acceleration. Therefore, Gibbs-Markov models are proposed and linked together by the maximum a posteriori probability (MAP) criterion, and the resulting objective function is minimized using multiresolution deterministic relaxation. Accuracy of the method is demonstrated by measuring the mean-squared error of estimated motion parameters for images with synthetic motion. Second contribution is the demonstration of a significant gain resulting from the use of trajectories with acceleration in motion-compensated temporal interpolation of videoconferencing/videophone images. An even higher gain is demonstrated when the accelerated motion trajectory model is augmented with occlusion and motion discontinuity models. The very good performance of the method suggests a potential application of the proposed framework in the next generation of video coding algorithms.
1. Introduction
There are two objectives in this paper. The first one is to develop a theoretical framework for the modeling and computation of dense nonlinear 2-D motion trajectories from a sequence of images. The second objective is to demonstrate benefits resulting from the use of nonlinear trajectories in motioncompensated temporal interpolation of videoconference/videophone images. *Corresponding author. 1Present address: Northern Telecom Ltd., 185 Corkstown Road - Nepean, P.O. Box 3511, Station C, Ottawa, Ontario, Canada KlY 4H7. 0923-5965/95/$9.50 0 1995 Elsevier Science B.V. All rights reserved SSDIO923-5965(95)00021-6
The knowledge of motion is essential for coding of image sequences at high compression ratios. This is due to a high correlation of image pixels along motion trajectories. In the case of ubiquitous hybrid coders (MPEG-1, MPEG-2, H.261), differential pulse code modulation (DPCM) is applied block-wise along displacement vectors. However, DPCM does not fully exploit the high correlation of motion-compensated pixels, especially at rates below 1 bit/pixel. Recently, it has been shown that alternative techniques outperform DPCM at such rates. It has been demonstrated that entropy-constrained code-excited linear prediction (EC-CELP) outperforms DPCM for highly correlated sources such as a sequence of motion-compensated pixels
504
M. Chahine, J. Konrad / Signal Processing:
[13]. Since EC-CELP jointly encodes intensity at several positions along a motion trajectory, motion compensation must be applied over several frames. Due to the extended temporal support, linearity of motion trajectories is often violated. Therefore, motion models incorporating higher-order effects, such as acceleration, need to be considered. Similarly, in the processing of image sequences, such as interpolation, filtering or enhancement, knowledge of motion trajectories over several frames is essential for high-quality results [9]. It has been shown recently [20] that for image sequences containing motion with acceleration, significantly better results can be obtained if acceleration is used in addition to velocity. In that work, however, acceleration has been assumed to be known. It is clear from the above studies that a framework for the modeling and computation of motion with acceleration is needed. Without loss of generality we consider in this paper only quadratic motion model, i.e.; incorporating acceleration. The approach can be easily extended to cubic trajectories that include the derivative of acceleration. An interesting frequency domain analysis of images containing such high-order trajectories is given in [8]. It is concluded that, unlike in the constantvelocity case where spectral support is planar [21, lo], the spectral support for the case with acceleration and its derivative is truly 3-D. The quadratic motion trajectory model was originally proposed in [ll]. Later, a similar model was proposed in [7] and extended to trigonometric polynomials. However, both models in [7] were applied uniformly throughout the image disregarding spatial variation of motion parameters. In this paper, based on the Markov random field (MRF) theory, we develop the model from [l l] in detail, we formulate a suitable cost function based on the MAP criterion and we show how to minimize it. Since for the extended temporal support occlusions play an important role, we extend the motion model by including occlusions and motion discontinuities. We test the new estimation method on images with synthetic motion and we evaluate its performance numerically. Then, we apply the proposed algorithm to motion-compensated temporal interpolation of image sequences with natural
Image Communication
7 (1995) 503-527
motion, and we demonstrate the benefits of using quadratic trajectories instead of linear ones.
2. Motion trajectories We are interested in the estimation of motion parameters from a time-varying image obtained by a camera that projects a 3-D scene onto a rectangular portion Y+‘of the 2-D image plane. Let x = (x, y) denote the spatial coordinate in w. Since coordinates x,y of a point from a moving object depend on time t, it is useful to consider the trajectory of such a point in a conceptual 3-D xyt space. An example of a 3-D trajectory (x(t), y(t), t) of an image point drawn in such a space is shown in Fig. 1. To define mathematically a trajectory we use the function c(z;x, t) such that c is the spatial position at time z of an image point which at time t was located at x [lo]. c(z;x, t) describes a 2-D trajectory in the image plane, while (c(z;x, t), z) = (x(z), y(z), z) describes a 3-D trajectory in the xyt space. For each x at time t, the corresponding trajectory starts at time ti(x, t) and ends at time tS (x, t). For z # t, we can define a subset “Y-(7;t) of YYconsisting of pixels that are visible over the entire time interval between t and z: Y(Z; t) =
{XI ti(X, t) <
T <
tf(X, t)>.
For z > t, -W - V(z; t) is the set of pixels covered or leaving the image between t and z, while for
t+/ ?
I -1
Fig. 1. Trajectory (x(t), y(t), t) of a projected scene point in 3-D xyl space.
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
505
r < t, w - V(r; t) is the set of pixels exposed or introduced into the image between r and t (more detailed treatment of motion trajectories can be found in [lo]). To describe motion of a pixel in 2-D image plane, the concept of instantaneous velocity u is often used: u(x, t) =
dc(z;x, t) dz 7=*’
Another useful concept, especially for linear motion trajectories, is displacement d for pixels visible between t and r: x - c(r;x,t),
if
7 <
t;
c(z;x,
if
7 >
t;
d(t;x, t) = i
t) -x,
x E V(7;
t).
(1)
Note that with the above definition, the displacement d always points in the positive direction of time t. d(z;x, t) can be also calculated from the velocity by integration, u(c(s;x, t), s)ds,
d(z;x, t) =
x E Y(7;
t),
s .‘Y
with 9’ = [z, t] for 7 < t and 9’ = [t, 71 for 7 > t. For motion with constant velocity u(c(t;x, t), 7) = u(x, t), the displacement is simply u(x, t)(7 d(t;x, t) =
-
t)
i Q, t)b - t)
if
7 <
t;
if
7 >
t;
x E V(7;
t).
Thus, it follows from (1) that c(t;x,
t) = x
+ u(x,
t)(7
-
t),
Fig. 2. Example of a nonlinear interlaced sequence.
trajectory
over 5 fields of an
within “w). We consider pixels in YVor in Y”(7; t) to be sampled according to lattice ,4,. The task is to compute motion trajectories c, from images g,+r such that 7 belongs to the set & = {7:
g1 + T
is used in estimation of ct )
of N = Card(Y*) time instants. Fig. 2 illustrates an example of trajectory over N = 5 fields of an interlaced sequence. In this section, we assume that Y(7; t) = W V7 E Y,, i.e., that all pixels visible in 9V at time t are also visible at all time instants 7 E .Yt,. Therefore, occlusion effects are not considered for now, however they will be accounted for when we extend the algorithm in Section 4.
3.1. MAP criterion x E V(7;
t).
(2)
Consequently, the task is to find, for each pixel (x, t), the two components u, and uYof the velocity u(x, t). For constant-velocity motion, corresponding components of the displacement d(x, t) are sought.
Let 9, = {gl+,: 7 E ,at} be the set of images (observations) used to estimate the field of motion trajectories c,. To find the most probable c, given 9,, we use the maximum a posteriori probability (MAP) criterion [ 173 as follows:
3. Estimation of trajectories for accelerated motion The goal is to estimate segments of motion trajectories c(t;x, t) for z over some time interval containing t, where (x, t) is defined on a sampling lattice A, c R3. Trajectories are estimated from image sequence g defined on another lattice A,. Let c, denote all the trajectories passing at time t through the window 9Y, and let g1be an image at time t (also
= arg~~~p(~:lc,,g,“)p(c,Ig,“),
(3)
where p is a probability density, t, = t + t, is an arbitrarily chosen time instant such that 7, E Yl, and 9: = {gt+,: 7 E Yt\{zn}}. Thus, & is a motion trajectory estimate under the MAP criterion. Formulation in the second line is obtained by applying
506
M. Chahine, J. Konrad / Signal Processing: Image Communication
the Bayes’ rule to p(c,lY,). p(Y:Ic,,g,J is the likelihood of a particular configuration 9: given c, and g,. that depends on the structural model, while p(c,lg,J is a probability density defined by the motion model. In the following sections we describe both models in detail.
3.2. Motion model To make the estimation problem tractable, we model each motion trajectory c by a parametric function cp of a vector of motion parameters p. Hence, for the linear trajectory model (2), we have p = u. For nonlinear trajectories, involving velocity as well as acceleration, we use the model proposed _ _ in [11]
cqz;x,t) x
E
qr;
t),
up(Pr)
=
1
{x,.+1E8,
(Pi
-
PjJT r(Sr.)h
(5)
with z-t
0
0
T--t
(7 - t)2 0
- Pjh
(8)
where pi =p(xi, t) for simplicity of notation.’ {xi,xj} is a L-element spatial clique, i.e., xi and xj are neighbors, and 0, is the ensemble of all such cliques [14]. We use the first-order neighborhood, i.e., north, east, west and south nearest neighbors. r(gJ is a 4 x 4 positive-definite weight matrix introduced to permit weighing of velocity and acceleration components. For this purpose, we choose r to be a diagonal matrix of the form 1 Y“. 0
0
0
1 (9)
(4)
P(z;x, t) = x + A,p(x, t),
pk
where Z,, /-I, are constants [14]. U&J is an energy function that captures the desired smoothness property of the motion field:
= x + u(x, t)(7 - t) + a(x, t)(z - t)2,
with p = [uT uTIT. Note that u(x, t) is the instantaneous velocity, and a(x, t) is one half of the constant acceleration over the length of the trajectory. Eq. (4) can be rewritten as follows:
A, =
7 (1995) 503-527
1 cl 0 0
Y,J
where y’s are weights associated with each of the four motion parameters in p. Similar matrix for constant-velocity motion was discussed in [ 183. Its off-diagonal entries, however, were chosen as functions of the observations grmin order to allow suitable adaptation of the smoothness property to local image structure. Here, we consider only a fixed matrix r for simplicity.
0 (z - t)21 ’
4 = Chdx,t) q(x, t) a& 4 a,k t)lT.
3.3. Structural model
(6)
Note that equations above can be extended to higher-order terms, such as the derivative of acceleration [8], and thus the approach is applicable to general nonlinear trajectories. Letp, be the field of all parameter vectors at time t. Requiring that motion trajectories of neighboring pixels be similar, i.e., thatp, be spatially smooth, we model c{ by a vector MRF with realizationsp, [ 171. Thus, due to the Hammersley-Clifford theorem [ 11, the a priori density p(ct 1gJ can be expressed by the following Gibbs distribution: (7)
The conditional likelihood p($‘lc,,g,J depends directly on the intensity variation along motion trajectories. As the structural model, establishing the relationship between motion and images, we use the constant intensity assumption along motion trajectories. We define a set of interpolated intensities along the trajectory through (xi, t) as follows: g: = (g(c(t;Xi, t), t): t E 9,}. The interpolated intensity i is obtained using bicubic spatial interpolator developed in [16]. Then, @ = {gfl : Xi E (A,),}, with
’ Note thatpi denotes one parameter vector at (xi, t), whereas P, denotes a field of such vectors at t.
507
M. Chahine, J. Konrad 1 Signal Processing: Image Communication 7 (1995) 503-527
(A,), being the restriction of A, to time t (and to window W), is the set of all interpolated observations along the field c,. By modeling ge’ with a 3-D scalar MRF, we can express the likelihood P(%:Ic,, g,_) as follows: p(Y:Ic,,gJ
=
$ e - “~wM~.
(10)
9
Similarly to the model used in [17], we assume that intensity variation along ct is independent for each distinct trajectory. Therefore, the energy U, can be written as follows:
where Np is the number of trajectories to estimate and d = xi + dt,+kjpi.
3.4. Cost function
and optimization
Considering Z, to be a constant and combining probability distributions from (7) and (lo), the MAP estimation (3) becomes an energy minimization problem with respect to ptr
A = argmin u(P,), U(p,) = u,(p) + 2,up(pt), Pt (12)
where each individual term u$(g?) is a Gibbs energy. This energy can be defined as a sum of potentials over 1-D temporal cliques along motion trajectory through (Xi,t). In terms of the set Y,, however, cliques are fully 3-D with associated potential that is motion-adapted. The task of U$(gy) is to measure the departure of observations from the structural model, and therefore to represent a measure of intensity matching over N fields used in the estimation. It can be shown that for a sufficiently large neighborhood system and for suitable cliques, Ub can be chosen proportional to the sample variance:
where 1, is a regularization parameter weighing the importance of the a priori motion model with respect to the structural model. To find the global minimum in (12), global optimization methods, such as simulated annealing, have been used [17]. Here, we use a deterministic approach equivalent to Gauss-Newton optimization. Since U, is not quadratic in p, using Taylor expansion with respect to some intermediate solut + k) from tion b, we approximate g(Xi + dt,+kjPiy (11) by the following linear form:
Gtxi+ d(t+k)Pir t + k, E 8&i + d(*+k)bi, t + k, + v; Btxi+ d(t+k)$i* t + k)(pi-Pi),
C3(C(kxi, t),t + k)- [(xi, t)123
ui(g?) = 1 ke
where VP@(.)is the gradient of @(.)with respect to the vector pi evaluated at @i:
I,
that expresses the variability of intensity along motion trajectory through (xi, t) with respect to the sample mean c(Xi, t) defined as follows: itxi2
t,
=
Using the parametric representation of c(k;xi, t) from (5), the overall energy U, can be expressed as follows:
i&it
t) =
$ kc& C ij(XT,t + k),
cr,ix., = ~x)~(*+k) = [ail’) a;;‘][;; = ,w) [ ax
kkFpG(C(k;xi,t), t + k).
‘g(g’) = ? 1 [$j(&,t + k) -
(13)
pm -
ay
k2
-aa-) ax
k2
lf;]
-aa.) T
ay 1 ’
Substituting Eq. (13) into (1 l), U(pl) can be approximated by a quadratic function ofp as follows:
u(Pt) z : C Crt(ii)+ (ti(bi))T(Pi-bi)12 i=1k E 9,
[(Xi, t)12,
(11)
hf. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
508
3.5. Experimental results
where
In order to carry out minimization (12) we need to differentiate the total energy V(p) from (14) with respect to each pi. Details of this differentiation can be found in Appendix A. 1. Consequently, we arrive at the following linear system: Aipi=biy
i=
1, . . ..NP.
(15)
To evaluate the performance of the algorithm we have generated interlaced and progressive test sequences (72 x 64 pixels, 5 fields) with synthetic motion (Fig. 3). Each image consists of a rectangle (45 x 38 pixels) moving across a still background with some initial velocity and a constant acceleration. The background has been taken from one image, while the rectangle has been obtained from another image after low-pass filtering and subsampling. To simulate motion of the rectangle at b-pixel accuracy its intensities at consecutive time instants are computed via 4:l subsampling with a variable spatial offset. This approach allows more realistic testing than in the case of pixel accuracy
c41.
where
bi
1 pi
= E k E Y,
+
The system (15) is extremely sparse due to the local nature of the neighborhood used and therefore is solved iteratively using Gauss-Seidel relaxation (computation of p). Usually, starting with 9 = 0, relaxation is performed until convergence. Then, the last estimate @ is used as the new p, and the process is repeated. However, we have obtained better performance for bi = pi with ‘on-the-fly’ update of pi (Gauss-Seidel relaxation). This corresponds to one relaxation step between two updates Of bi [4]. To improve the computational efficiency of the algorithm as well as the likelihood of convergence to the global minimum, a multiresolution approach is used. First, a pyramid of images is generated using Gaussian filters [12]. Then, estimation is performed for the lowest resolution images and the result is projected (via interpolation) onto the next higher resolution level. This process is repeated until the full resolution estimate is obtained.
lb) Fig. 3. Field #2 from: (a) test sequence 1 (interlaced); (b) test sequence 2 (progressive). Note that in (a) the field is not vertically interpolated and thus the aspect ratio is incorrect.
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527 Table 1 MSE ofa in region B’, for field #2 of test image 1 withp 1.5 0.5 l.OIT for different sets 4,
= Cl.5
./I
L’x
UY
ax
aY
i - l,l} ( - l,O, 1) (-2, - 1,O.l)
0.0351 0.0566 0.08 14 0.0574 0.0458 0.0401
0.0547 0.0688 0.1461 0.0484 0.0288 0.0291
0.2496 0.0102 0.0256 0.0430 0.0277 0.0207
0.8382 0.9100 0.0735 0.0550 0.0344 0.0206
{ - l,O, 1,2} { -2, - 1,0,1,2} { - 2, - l,O, 1,2,3}
The accuracy of motion estimates is evaluated using the mean-squared error (MSE) between components uX,a,,, a,, aY of the true motion field p and the estimated field @in region 9.8 can be either the full estimation area (9?,), or the area of the moving rectangle (Br), or its complement gz = B?o\9%?I. To disregard the impact of the smoothness constraint, MSE was also calculated over regions ~8~ and %)2deprived of a 5-pixel band around the rectangle border, respectively named 9; and L&. 9; , for instance, represents the area inside the moving rectangle. We have established experimentally that the cost function U(p) and the MSE attain a minimum for I, between 40 and 100 [4]. Therefore, we set i, to 50. We use 4 resolution levels obtained by Gaussian filtering with o2 = 2.5. We set the diagonal elements in r to 1 for u and to 2 for II. We constrain the acceleration more than the velocity since it strongly affects the curvature of c{ (Eq. (4)). To evaluate the impact of the set Yl on the precision of motion trajectories, we have estimated parameters p for various sets 9t (Table 1). Although, the impact of N on the velocity is clear only for N 2 4, it is quite evident for acceleration. A graphical representation of this impact is shown in Fig. 4 where the true and estimated trajectories are plotted for a randomly chosen point within the moving rectangle. Note that the match between the two trajectories is very poor for N = 2 and 3 (the estimated trajectories extend beyond the plot limits), whereas it steadily improves with the increase of N. For further experiments we chose & = { - 2, - l,O, 1,2} since it gives a low MSE and is symmetrical. With the basic parameters of the algorithm established, we have performed a series of experi-
509
Table 2 MSE of a for field # 2 of test image 1 for different p; lines with * show the error for estimation disregarding acceleration (linear trajectory model) ux
VY
a,
aY
1.0
0.0
0.0
1.0
0.0365 0.0070 0.0026 0.0539 0.0032
0.0050 0.0017 0.0010 0.0070 0.0004
0.0142 0.0016 0.0010 0.0217 0.0155
0.0445 0.0260 0.0185 0.0555 0.0295
1.5
1.5
0.5
1.0
0.1631 0.0458
0.1036 0.0288
0.0574 0.0277
0.0733 0.0344
0.8119 2.2306
0.5126 0.9324
-
_ _
Table 3 MSE of a for field # 2 of test image 2 for different p 0,
L’Y
a,
ay
1.0
2.0
1.0
1.0
0.1265 0.0017
0.2503 0.0085
0.1003 0.0018
0.0967 0.0029
1.75
1.5
1.0
1.5
0.6909 0.1067
0.2791 0.0691
0.3123 0.0571
0.2752 0.0370
ments for both test images with various velocities and accelerations. Table 2 shows the MSE for the interlaced test image. Note that the MSE is very small for the integer accuracy, especially if calculated away from the moving rectangle boundaries (W; and 9;). The error is somewhat increased for subpixel accuracy of motion parameters, however it is highly reduced in comparison with the estimation using the linear trajectory model (lines with *). Corresponding results for the pr,ogressive test sequence are shown in Table 3. More results can be found in [5].
510
M. Chahine, J. Konrad J Signal Processing: Image Communication
The velocity and acceleration fields for p = Cl.5 1.5 0.5 LOIT estimated using both models are shown in Fig. 5. Note that the estimates for the quadratic model correspond well to the motion of the rectangle. The smoothing at rectangle boundaries is due to the lack of a motion boundary model. This is confirmed by the MSE for 9’; (no boundary effects) that is much smaller than the error L&, (Tables 2 and 3). The velocity-only estimate (Eq. (2)) for the same image is clearly subopti-
7 (1995) 503-527
ma1 within the moving rectangle; vectors are spatially inconsistent which is confirmed by the large MSE in Table 2.
4. Estimation of trajectories for accelerated motion in the presence of occlusions So far we have not taken occlusion effects into account. Occlusions, however, play a very important
36
\
I
x
I
38
40
42
44
46
(b)
Fig. 4. Estimated (dotted) and true (full) trajectories at (x, t) = (37,30,2) of test image 1 for the (c) { - 2, - l,O, 1); (d) { - l,O, 1,2}; o denotes position xi at t = 2 while x denotes positionsd the range [ - 2,3].
following9,: (a){ -
1,l); (b) { - l,O, 1); for k E Yt,; trajectories are plotted over
M. Chahine,
J. Konrad
/ Signal
Processing:
Image Communication
26
7 (I 995) 5X3-527
I
511
1
36-
42.
4.. 46
x
’ 36
38
40
42
44
46
Fig. 4. (e)
,
,
46
( - 2, - I,% 1.2;; (f) ; -
2, -
l,O, 1,2,3;.
J
(al
(b)
Fig. 5. Estimated (a) velocity for linear trajectory field #2
of test image
I
model; and (b) velocity and (c) acceleration
for quadratic
model both with N = 5 for
and p = Cl.5 1.5 0.5 l.OIT.
role in the formation of images, and consequently in the process of estimating motion. We define an occlusion field o and a motion discontinuity field 1, often called a line field or process [ 14). The occlusion field has its samples defined on the lattice A,. Every occlusion tag 0(x, t) may be assigned one of several possible states, e.g., moving/stationary (visible), exposed or covered. The number of such states is finite and depends on the cardinality of the set 9t. The motion discontinuity field is defined over a union of two orthogonal cosets that define
positions of horizontal and vertical line elements [17]. A horizontal line element [(Xi,Xj,t) is defined between vertically adjacent pixels at (xiyt) and (xj, t), while a vertical line element is defined between horizontally adjacent pixels.
4. I. MAP
criterion
We extend the algorithm described in Section 3 by searching for the most probable triplet (c,, of, I,)
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
512
given observations 8,. Assuming that occlusion field o, and line field I, are samples from scalar random fields 0, and L,, respectively, MAP estimation (3) is now expressed as follows:
defined by & (Fig. 2). However, this is only true if the pixel from (x, t) is visible over all N fields defined by 3,. To relax this condition, at each position (x, t) we define a subset of & as follows: X; = {TE Y,: x E V(z; t)}.
= arg ,FOa:,P(%Ic,, of, L qt.) I. I XP(C*,O, = %L
= blst.),
(16)
where the second p is now a mixed density/probability mass function. In the following sections, models that allow to specify both Gibbs distributions above are described via their energy functions. Note that the prior distribution p(c,,O, = o,, L, = 1,1g,.) is now described jointly by motion, occlusion and motion discontinuity models. 4.2. Motion model In Section 3.2 a globally smooth motion trajectory model was proposed. We extend this model to a piecewise-smooth model by taking explicitly into account motion discontinuities [17]. An alternative would be to consider implicit motion discontinuities derived, for example, from a segmentation map
c231. The new model can be described by the following energy function: u,(P*, 1,) = :,,zEe
‘
(Pi
-tijJTr(Pi -Pj)Cl - ~(xi,xj9 t)l.
(17)
This energy captures the desired smoothness property of a motion field only in the absence of motion discontinuities. Hence, a jump in motion parameters is not penalized if a motion discontinuity had been detected. Dependence on occlusion field o, is not necessary since the information about occlusion state transition is assumed to have been passed by a motion discontinuity (Section 4.4). 4.3. Structural model In Section 3.3 it was assumed that a trajectory through (x, t) extends over the whole time interval
3; is called the oisibility set [ 1l] and contains time instants from & at which pixel from position x at time t is visible. This set can be directly derived from the occlusion state o(x,r) as illustrated in Table 4 for 3- and 5-image estimation. Note that only the most likely combinations are taken into account for N = 5; for example pixels briefly appearing or disappearing are not considered. Since not all pixels visible at time t are visible for all r E YZ, we need to modify our structural model by taking occlusions into account. The visibility set fi lets us incorporate occlusion information into the structural term (11) as follows: ug(#‘,o,) = 5
1
Ili(g,t
i=1kc.97
f k) -
T(xi,t)12, (18)
where i(xi,t)
=
’
Card(E)
1
ia, t +4.
(19)
kE.T:l
Note that now the variation of intensity along a trajectory is evaluated only for time instants at which the considered pixel is visible. The dependence of U9(gcC,0,) on the line field 1,has been omitted since the information about motion discontinuities is conveyed by motion’trajectories themselves.
4.4. Occlusion model An occlusion model that does not require a separate random field but infers occlusion states from segmentation and motion estimates under an ordering constraint was proposed in [23]. In [9], an explicit field o, was used to model occlusions for the case of estimation from 3 images (Yt = { - 1, 0, l}). We adopt a similar approach here for the case of estimation from 5 images (9, = ( - 2, - l,O, 1,2)). The most likely occlusion states for both cases are shown in Table 4.
M. Chahine, J. Konrad 1 Signal Processing: Image Communication 7 (1995) 503-527
0
cl
0 0.0
4
n
2.0
2.0
513
2.0
2.0
(4
000
000
0.0
0.0
0.0
0.0
000
000
OOD
004
1.0
qllm 30.0
1.0
nil+ 30.0
000
DUD
1.0
1.0
ollu 30.0
oil+ 30.0
404
010 5.0
0.0
010
000
010
010
DID
10.0
10.0
10.0
10.0
010
OID
014
010
0.0
0.0
0.0
014
OIW
01+
0.0
0.0
qIW
Ho+ 0.0
20.0
20.0
20.0
414
20.0
20.0
WI+
20.0
(b)
Fig. 6. Costs assigned to: (a) V,,; (b) Voa for various configurations (up to rotation and permutation) of occlusion cliques for 9, = { - 2, - l,O, 1,2} (occlusion states: circle(M); empty square(E); empty diamond (E_ 1); filled square(C); filled diamond (C, I), line element states: empty rectangle (‘off); filled rectangle (‘on’)).
Table 4 Occlusion states and visibility sets for 9, = { - l,O,l} and 9, = { - 2, - l,O, 1,2}
( - 1. 0, l}
ok t)
Description
M E
Moving/stationary Exposed in (t - 1, t) Covered in (t, t + 1)
{ - 1, 0, 1)
Moving/stationary Exposed in (t - 1, t) Exposed in (t - 2, t - 1) Covered in (t, t + 1) Covered in (t + 1, t + 2)
{ - 2, - 1, 0, 1, 2) io,1,2) { - 1, 0, 1,2) { - 2, - LO} 1 - 2, - 1, 0, 1)
c j-2,
-l,O,
M E E -1
1,2}
c c +,
We model the field of occlusions by a discretevalued scalar MRF with energy function UO(O,,l,) = 2
vo,(“(Xi,t))
i=l
+
vo,(O(xi, c i%J,) E Qo
l),
4xj,
t)lGui,xj,
t)),
(20)
where I’01and Vo2are potential functions associated with single- and two-element cliques from the firstorder neighborhood system. It is expected that a typical occlusion field consists mostly of patches of pixels labeled as visible, and some smaller clusters of pixels labeled as covered or exposed. There-
io,11
i - LO)
fore, the potential function V,, provides a penalty associated with the introduction of a state different than M, whereas the potential function Vo2favors the creation of continuous occlusion regions near motion discontinuities only. To achieve this goal, the dependence of V,*on the line field I is used. For instance, whenever (Xi,t) and (xj, t) have the same occlusion state, Vo2is set to 0 (high probability) if the two positions are not separated by a motion discontinuity (i.e., I(Xr,Xj,t) = 0) and to a high value (low probability) if they are separated by a motion discontinuity (i.e., I(Xi,Xj,t) = 1). Costs associated with all possible configurations of single- and two-element cliques in a Sstate occlusion field are shown in Fig. 6. These costs have
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
514
0 0
II
0 60.0
0.0
Oil0 00°
on
OIO On0
on
O
00
(a)
lo
01°
OD
OIO
0.8
0.0
o’2
O.O
01°
om
Oil0
OIO
0.4
2.0
OIO
--
OIO 4.0
(b)
Fig. 7. Costs assigned to: (a) Vi, (b) V f for various configurations (up to rotation) of line cliques (filled rectangle: line element ‘on’;empty rectangle: line element ‘off, circle: pixel position).
been chosen experimentally, and therefore are not optimal. Basically, the costs associated with 2-element cliques are chosen in a way to discourage the occurrence of incompatible combinations of neighboring occlusion tags (e.g., E and C), and to favor the creation of clusters of occluded pixels near motion discontinuities. 4.5. Motion discontinuity model Following the approach proposed in [ 171, we model the motion discontinuity field by a binary MRF with the following energy function:
tJ;eQ;
8;EQ:
where {Xi,Xj} denotes a single-element clique (between two pixel sites), and f$ and 13;are square- and cross-shaped four-element cliques, respectively. indicates presence (eij = 1) or abek,xj, t) 4 eij sence (eij = 0) of a horizontal or vertical intensity edge in image gt midway between pixels at xi and Xj. A line element is said to be turned ‘on’ if I(Xi,Xj,t) = 1, i.e., a motion discontinuity is present midway between Xi and Xj; otherwise, I(xi,xj,t) = 0. T/r, Vs and Vf are potential functions associated with corresponding cliques [ 111. It is assumed that, in general, the introduction of a motion discontinuity coincides with an intensity
edge. This is enforced by the potential function V, which uses single-element cliques to associate a high penalty whenever a motion discontinuity does not match an intensity edge [15]. Therefore, V, can be formulated as follows: V#,e) = lO(1.1 - e)l. Hence, the introduction of a motion discontinuity (I = 1) is penalized by 11 in the absence of an intensity edge (e = 0) and by 1 if such edge is present (e = 1). This latter value assures a penalty associated with the introduction of a line element since otherwise such elements could be introduced between all pairs of sites to bring the energy (17) to zero. The field of intensity edges e, at time t is calculated a priori by the application of Canny’s edge detector [3] to image gt. The control over straight lines, corners and intersections is achieved by the penalty functions Vi and VT using four-element cliques. VS, in particular, discourages formation of double lines and also inhibits the generation of isolated trajectories while Vf discourages the creation of unended and intersecting segments. Fig. 7 shows the proposed costs (inspired by [9]) associated with different configurations of both clique types.
4.6. Cost function and optimization Combining the energy functions (17),(20) and (21) oia suitable weights we can describe the prior
M. Chahine, J. Konrad 1 Signal Processing: Image Communication 7 (1995) 503-527
515
distribution p(c,, 0, = o,, L, = 1,lgJ in (16). Consequently, by combining the two Gibbs distributions in (16) we arrive at the following optimization problem:
(22) where U(pl, o,, I,) is the new multiple-term objective function expressed as follows: U(P,,% I*)= U&“?~,) + &U&&Y I,) + A0U&t, 1,) + AIU&,&J.
(23)
The minimization problem in (22) is performed in an interleaved fashion, i.e., while one field is being updated, the others are kept constant. Hence, at iteration n, the three fields are updated consecutively by performing one iteration of their respective minimization problems:
(b)
with 0f = e(“-‘) t
1, = ijn-I),
with p, = by- I), 0, = $‘- I),
(24)
(25)
=@y-l),
1, =if’-l).
Table 5 MSE of> for field # 2 of test image 3 using algorithms & and %’ Algorithm
6: = arg mOipU,W, 0,) + 1, U&,, I,), withp,
Fig. 8. (a) Field #2 from test sequence 3 (progressive) and (b) corresponding intensity edges e computed using Canny’s edge detector.
(26)
Once each field has been updated, the process is repeated until convergence of U(p,, ol, I,). Optimization (24) of the motion field pt is carried out using the deterministic relaxation algorithm discussed in Section 3.4 with the modifications described in Appendix A.2. On the other hand, optimization of the line and occlusion fields is carried out by solving the minimization problems in (25) and (26) using Iterated Conditional Modes [2].
“X P
2.0
0.0
0.0
0.0
d
80
a
%
0.0851 0.0248
0.0217 0.0022
0.0292 0.0013
0.0111 0.0021
.S? @
91 -@I
0.0266 0.0058
0.0238 0.0025
0.0275 0.0007
0.0145 0.0046
P
4.0
4.0
1.0
1.0
%I %
0.9650
Bt
0.6165
1.1402 0.8582
0.2439 0.1296
0.1973 0.0970
Sd D
WI 91
0.2330 0.0199
0.2115 0.0248
0.0963 0.0034
0.0726 0.0042
.d
M. Chahine, J. Konrad / Signal Processing: Image Communication
516
-. . _,. . ----_
. __._____) . . ._
----.---.-.I
...-
-..
. ._ . . . ,:
/:
:
./
-.
’
:T , .-\-.T,I
..
-.
I . ---
C.-__-___-_a__-.,
)
(a)
i
. . . . . .,
.,.. .I_
~ . .... ..___ .,, ,,_-I
.‘: :
(b) Fig. 9. Estimated: (a) velocity and (b) acceleration using algorithm LS’ at field #2 of test image 3 for p = [2.0 0.0 0.0 O.O]T.
4.7. Experimental
results
For the sake of compactness from now on we will denote by d the algorithm described in Section 3, i.e., without the processing of occlusion areas and motion discontinuities, whereas we will use .B for the extended algorithm described in this section. Again, to evaluate the accuracy of estimates we have generated a progressive test image with synthetic motion. The test image 3, shown in Fig. 8(a), differs from test images 1 and 2 by the fact that it contains more detail, thus making motion estimation more challenging. Fig. 8(b) shows corresponding intensity edges computed using Canny’s edge detector [3]. In the experiments we have used (A,, il,, AI) = (40,7,5), $t = { - 2, - l,O, 1,2}, 4 resolution levels and the same r as before. In Table 5. MSE for two
7 (1995) 503-527
synthetic motion parameters is shown. It is clear from this table that algorithm g outperforms algorithm JZ?by a large margin. An example of velocity and acceleration estimates for algorithm & is shown in Fig. 9. The corresponding estimates for algorithm g together with estimated occlusion and motion discontinuity fields are shown in Fig. 10. Note that around the rectangle boundary the estimates obtained by algorithm L@are more accurate than those obtained by algorithm d. This is due to motion discontinuities that are helpful in disabling motion smoothness at the rectangle border. This corresponds better to the underlying motion and leads to a substantial decrease in the MSE of the estimates in the overall estimation area L%,and in the moving rectangle area Br . Also, note that the detected occlusions (Fig. 10(c)) are consistent with the horizontal motion (from left to right) of the rectangle moving at a constant velocity of 2 pixels per field. The dark area represents the region that is going to be covered within the next 2 time intervals, whereas the bright area represents the region that has been exposed within the previous 2 time intervals. To verify the impact of the set Xt on occlusion/discontinuity estimation, Fig. 11 shows gt and 1, for Yt = { - l,O, l} (N = 3) obtained by the algorithm originally developed in [9]. A comparison with the case of N = 5 (Fig. 10) shows that processing over extended temporal support helps in a reliable identification of occluded regions and of motion discontinuities. For more results the reader is referred to [6].
5. Motion-compensated temporal interpolation for accelerated motion In the previous two sections we have developed a theoretical framework for the estimation of general nonlinear motion trajectories from time-varying images. We have shown that the approach leads to precise velocity and acceleration estimates when applied to images with synthetic motion. In this section, we study the performance of the proposed algorithm on images with natural motion in the context of motion-compensated temporal interpolation.
M. Chahine, J. Konrad 1 Signal Processing: Image Communication
7 (1995) 503-527
. ..--__.. -. _.
----
;
_ _:
.-
517
.:
:_
(a)
(b)
---.I
. -_.
1
;
II
I
i
_._
I _--.
(d)
Fig. 10. Estimated: (a) velocity; (b) acceleration; (c) occlusions and (d) motion discontinuities using algorithm B’ with 9, = ( - 2, - l,O, 1.2) at field #2 of test image 3 for p = [2.0 0.0 0.0 O.OIT(grey = M, dark = C, very dark = C, 1, bright = E, very bright = E- , ).
To achieve additional data compression, some video coding schemes employ temporal subsampling of image sequences. For example, in very low bit rate video coding the original QCIF sequences at temporal frequency of 30 Hz are often temporally subsampled by 3-4 to achieve 7.5-10 Hz. Such sequences are then encoded and transmitted at a reduced temporal rate. In the receiver they are decoded and up-converted (to reconstruct omitted images) by means of motion-compensated temporal interpolation. An example diagram of such a system based on the transmission of motion parameters jointly with temporally subsampled intensities is shown in Fig. 12. In the following, we study experimentally the motion-compensated interpolation block whose function is shown schematically under the diagram. An alternative approach is to
estimate motion parameters at the receiver instead of transmitting them. However, the estimation framework developed in Sections 3 and 4 is applicable to both approaches. We define another set that contains time instants of images available at the receiver and used to reconstruct the omitted images: 9t = {r: St+r is used in motion-compensated polation at time t}.
inter-
Typically 2, c La,, i.e., a subset of images used in motion estimation is transmitted. We define another set Xt = Yt\2,; images at r E X, are dropped. In the receiver, based on the transmitted motion parameters & +r, r E Xt) and images
518
M. Chahine, J. Konrad / Signal Processing: Image Communication
7 (1995) 503-527
klr+r,t E $J, the missing images are reconstructed as follows:
t=4n,
ZEX’~, XiE(Ag)t+r, i= 1, ,..,h’P,
(27)
where n is an integer (images are assumed to be spaced by 1 in time) and the weights must be such that
.-A-
The weights CIas well as the sets Yt and _+$for the case of 4: 1 subsampling are shown in Table 6. The set fZ n 37 denotes time instants of those transmitted images in which pixel from (Xi,t + z) is visible. This introduces occlusion information into the temporal interpolation but requires renormalization of coefficients a for each xi. Since for each r E Xt we know the original image gt+r, we can calculate the reconstruction error 1 Et+, = &+, - %+r, TE%,
--
I ; I 8
,‘I
!
,I! ‘I I i
ii
3, _ -,
--f
that we will use to evaluate the quality of motion estimatesa, +1. Note that in order that E,+r be small, the trajectories described by it+ r must be close to the true motion trajectories in the image. To describe quantitatively the quality of reconstruction, we measure the reconstruction error E in terms of the peak signal to noise ratio (PSNR).
I(b) Fig. 11. Estimated: (a) occlusions and(b) motion discontinuities using algorithm B with 9, = { - l,O, - 1) at field #2 of test image 3 for p = [2.0 0.0 0.0 O.O]?
TRANSMllTER
I
6
A”
*
subsampling
. .
. . . . .
-.....
...
I
:
:
I
. . ..__.
:
I
2
__.
---
:
._..’
. .._._..
5
---
:
.-..._.,
0
inktpc4alicn
L’
.
‘. ._..,.
:
I Moticncompensatad o^
!
:
_....-._..’
1
..__... ...____ ___ : I I .I !i 1 Ial 4:l teqoral
... . . . . ,..
G
C
for t =1,2,3 B
RECEIVER
n
I
Motionesllmatlon
. ...
I’
3
:
4
0
1
2
:
I
3
4
0
1
2
3
4
Fig. 12. Motion-compensated temporal interpolation for 4: 1 subsampling. The vertical dashed and dotted lines denote omitted and reconstructed images, respectively, whereas the dotted curves denote motion trajectories under estimation.
hf. Chahine, J. Konrad 1 Signal Processing: Image Communication 7 (1995) 503-527
Table 6 Configurations of the sets 9,,j, and weights a for 4: 1subsampling starting at t = 0 (n integer) N=5
N=2
t
3,
Rr
J? = 8,
"j,j
1 +4n 2+4n
{ -l,O, 1, 2, 3) { -2, -l,O, 1,2} j-3, -2, -l,O,l}
I-1,3} ( -2, 2) {-3,l)
(- 1, 3) ( - 2, 2) {-3,l)
0.75, 0.25 0.50,0.50 0.25,0.75
3+4n
E %t
(a)
(b) Fig. 13. (a) Field #30 of interlaced test image 4 (224 x 80 pixels, 120 fields); and (b) field # 14 of test image 5 (120 x 120,30 images). The field from (a) is vertically interpolated to maintain proper aspect ratio.
5.1. Experimental
results for algorithm d
We have tested the interpolation scheme described above on two sequences. The interlaced sequence (test image 4) shown in Fig. 13(a) contains complex motion, primarily of the hand and of the arm. The progressive sequence (test image 5) shown in Fig. 13(b)
519
is extracted from a typical video conference material at CIF resolution (‘Miss America’). We have applied algorithm ~2 to 33 fields of test image 4 with Lp = 40, and the sets Y, and $, defined as in Table 6. As the reference, we have used estimation based on the linear motion model (Eq. (2)) for two different values of N. In the first case, we have used N = 5 with the sets 9f and ft for t = 1 + 4n, 2 + 4n and 3 + 4n (n integer) shown in Table 6. This corresponds to linear motion trajectories over 5 images. In the second case, we have used N = 2 for which the sets $ and yt are identical (Table 6). This corresponds to the estimation of linear motion from two transmitted images, a special case being block matching where all displacements in a block are identical. Since block matching is always outperformed (with respect to reconstruction or prediction error; no rate consideration) by pixel-based methods, it is not accounted for in the subsequent comparison. The PSNR curves for all processed fields are plotted in Fig. 14, while the average PSNRs are given in Table 7. The use of the quadratic trajectory model resulted in an average gain of 1.89 dB with respect to the linear model with N = 5, and 3.27 dB with respect to the linear model with N = 2. These gains are due to better tracking of the real motion trajectories that causes the structural term U, in (11) to decrease substantially. The error images for the linear and quadratic trajectory models with N = 5 at field # 105 are shown in Fig. 15 while the corresponding velocity and acceleration estimates are shown in Fig. 16. Field # 105 was chosen due to a significant acceleration present in this part of the sequence, although it falls outside of the range used in Fig. 14. It is evident that in the case of the linear trajectory model the errors are more concentrated in the regions that undergo acceleration (i.e., hand, arm and head). This explains the degradation in the PSNR for the reconstructed fields when linear trajectory model is used. We have applied the same algorithm over 33 fields of the test image 5. The PSNR curves for all processed fields are plotted in Fig. 17, while the average PSNRs are given in Table 7. Again, the use of the quadratic trajectory model resulted in a substantial gain: 4.39 dB over the linear model with
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
520
o:quadratic,N=5 jclinear,N+ ‘:linear.N=Z 34-
32. 0
I 5
/ 10
I
I
/
I
15
20
25
30
FIELD#
35
Fig. 14. PSNR for reconstructed fields from test image 4 using linear (dashed line) and quadratic (full line) trajectory models with N = 5 and linear model with N = 2 (dotted line).
Table 7 Average PSNR (dB) evaluated over 24 fields of test sequences with natural motion using linear and quadratic trajectory models Model
Test image 4
Test image 5
Quadratic (N = 5) Linear (N = 5) Linear (N = 2)
41.06 39.17 37.79
38.59 34.20 32.87
N = 5, and 5.72 dB over the linear model with N = 2. Higher gains for test image 5 are perhaps due to motion that is closer to the quadratic trajectory model than to the linear model; in general the movements of the mouth and eyes of a speaker exhibit substantial acceleration. The reconstructed and error images for the linear and quadratic trajectory models with N = 5 at field #22 are shown in Fig. 18. The estimated velocity and acceleration fields are shown in Fig. 19. Note that most of the velocity and acceleration vectors
can be seen in the mouth area. Throughout the sequence significant motion parameters are concentrated around the mouth and eyes, closely reflecting their opening and closure. These are the regions where interpolation errors are the largest if acceleration is not considered. This is evident in the interpolated images (Fig. 18); the mouth in Fig. 18(a) is less open than in Fig. 18(b) producing a large reconstruction error. Additional results can be found in [4].
5.2. Experimental results for algorithm $I We have applied algorithm G? to test images 4 and 5 with (Lp,&,Ll) = (40,1,2) but only for the specific case of t = 2 + 4n from Table 6, i.e., midway between transmitted images. We did not study the other cases, however this could be done by defining new occlusion states for nonsymmetrical sets Xt (Table 4).
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
521
(b) Fig. 15. Reconstruction error for field # 105 of test image 4 using: (a) linear model (PSNR = 37.68 dB) and (b) quadratic model (PSNR = 41.80 dB), both with N = 5.
Since test image 4 exhibits stronger acceleration towards the end of the sequence, we have applied both algorithms every 4 fields (from field #2 to # 110) with the coefficients c1 and the sets 9t, 2t defined for t = 2 + 4n (Table 6). Fig. 20 shows the PSNR of 28 reconstructed fields obtained using both algorithms. On average algorithm a has improved the PSNR by 1.27 dB.’ In Fig. 21 error images for linear trajectory model (N = 5) and for algorithms d and LZ!J, as well as occlusion field for algorithm g are shown at field # 30. Note a significant reduction of the reconstruction error due to the use of the quadratic trajectory model (Figs. 21(a) and (b)), followed by further reduction due to the occlusion estimation (Figs. 21(b) and (c)). The reconstruction error in the area of the fist that is going to be covered over next few frames is remarkably reduced. 2 Note that due to different temporal range used, only 8 first fields from the graph in Fig. 20 coincide with the graph in Fig. 14.
Fig. 16. Estimated (a) velocity for linear model; and (b) velocity and (c) acceleration (scaled by 4) for quadratic model both with N = 5 for field # 105 of test image 4.
Similar experiments have been repeated for test image 5. The same range of images has been used as in Fig. 17, however JJ~,,J$ and a only for t = 2 + 4n (Table 6) have been applied to all images. The resulting PSNR curves are shown in Fig. 22 with an average increase of 2.30 dB in the PSNR for algorithm %9with respect to d. The error images and the occlusion field are shown in Fig. 23. Note that most of the occluded regions are concentrated in the area of the eyes and the mouth. Again, the processing of occlusions and motion discontinuities has further reduced the already small interpolation error obtained for the quadratic motion trajectory
522
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
m
,’
Yl
32-
I I
m
0
m
m
mquadratic,N=B
;”
x:linear,N=5 “:linear,N=2. m
m
281
m
m
m
k;*
30-
x’
I
I
I
/
5
10
15
20
25
I
I
30
35
FIELD# Fig. 17. PSNR for reconstructed fields from test image 5 using linear (dashed line) and quadratic (full line) trajectory models with N = 5
and linear model with N = 2 (dotted line).
Fig. 18. Reconstructed field #22 of test image 5 using: (a) linear model (PSNR = 30.95 dB); (b) quadratic model (PSNR = 39.42 dB) both with N = 5 and respective error images in (c) and (d).
model (algorithm suits see [4].
LS’). For more experimental
re-
6. Summary and conclusions We have presented a framework and estimation of dense motion
for the modeling trajectories with
acceleration. Using Markov random fields we have developed models for motion-compensated intensities, motion parameters, occlusions and motion discontinuities, and we have linked them using the MAP criterion. To minimize the resulting energy function we have used multiresolution deterministic relaxation. We have demonstrated good performance of the proposed algorithm on images with
523
M. Chahine, J. Konrad 1 Signal Processing: Image Communication 7 (1995) 503-527
(a) Fig. 19. Estimated (a) velocity (scaled by 2) for linear model; and (b) velocity and (c)acceleration (scaled by 4) for quadratic model both with N = 5 for field #22 of test image 5.
44
3 I 42;
36
o:quadratii,N=5
341
32 0
l:quadratic,N=5+occlusions
20
40
60
60 FIELD#
100
1
120
Fig. 20. PSNR for reconstructed fields from test image 4 using algorithm d (full line) and algorithm D (dashed line).
synthetic motion. To assure practical importance of the algorithm, we have applied it to videoconferencing images in the context of motion-compensated temporal interpolation. Such interpolation is often used to achieve high compression ratios, e.g.,
in coding for videoconferencing or videophony. Experimental results have shown a 2-4 dB gain in PSNR of the reconstruction error in favor of the quadratic model over the linear model, both with 5-image temporal support, and a 3-6 dB
524
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
(cl
(d)
Fig. 21. Error images at field #30 of test image 4 for (a) linear trajectory model (PSNR = 37.07dB); (b) algorithm d (PSNR = 39.64 dB); (c) algorithm 1 (PSNR = 41.82 dB) and (d) occlusion field.
gain over the linear model with 2-image support. The extended algorithm, incorporating occlusions and motion discontinuities, has further improved the PSNR by 1-2 dB. The improvement was particularly visible in rapidly moving regions such as the mouth, eyes or hands, which often undergo acceleration and also amplify the occlusion effects. This work has concentrated on the estimation of motion trajectories in the transmitter using all available images. This requires efficient transmission of both velocity and acceleration fields. Two interesting examples of a compact representation of displacement fields are 2-D CELP coding [19] and polynomial approximation [22]. Nevertheless, further research in this direction is needed. It remains an open question how the method proposed here would perform if applied at the receiver to the decoded images. Two issues are important. Firstly, the gains due to the estimation of acceleration will depend on the quality of the decoded images. At
higher bit rates and good quality of the decoded images, this dependence should not be strong. However, in very low bit rate coding, where the reduction in image quality is substantial, gains demonstrated in this paper would have to be reexamined. Secondly, the effective temporal support of the processing would be much greater than in the case studied here due to the temporal subsampling prior to transmission. Thus, equivalent sets Yz would be similar to X, = { - 10, - 5,0,5,10}. Such a large and sparse temporal support may affect the performance of motion estimation. More work in this direction is needed to answer the above as well as other questions related to the importance of accelerated motion in image sequence coding and processing. As shown here, acceleration brings appreciable gain in motioncompensated interpolation and thus should be considered in the next generation of video coding algorithms.
525
M. Chahine. J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
42 I
o:quadratic,N=5
l:quadratic,N=S+occlusions 35’
0
5
10
I
I
/
15
20
25
30
35
FIELD# Fig. 22. PSNR for reconstructed fields from test image 5 using algorithm d (full line) and algorithm D (dashed line).
Appendix A
Below necessary conditions for optimality U@) are derived for algorithms d and 93.
A.l.
Estimation
of
of p, (algorithm d)
To carry out minimization (12), where U(p) is defined in (14), we establish the necessary conditions for optimality as follows:
Note that above we omit the summation over i in the first term since the structural energy for trajectory through Xi is independent of neighboring trajectories. The second term, however, is with respect to all cliques in the motion field and thus Eq. (A.l) becomes
kzF
(Yftii)St@i))
+
+ (ilbpr(pf
i= 1, . ..) Np.
~,U(p)=O,
Taking the above xi E (AC),,we have
gradient
E
at each pixel site
.A
+ ‘P 5,
-pi)
= 0
,
S!@i)(St@i))T
I
(pi
-ii)
64.2)
(A.1)
Fa’(P) = C O,,Cyfki)+ (~$@i))~(pi -i)i)]’ k
,zF [
C (Pi -PjJTrtPi :x,.x,) E0,
-Pj).
for all i = 1, . . . ,NP, where pi is the average parameter vector at Xi,
pi=
$. C 1I E
Pj.
Se(%)
rlc(xi) is the first-order neighborhood of pixel xi and 5i = Iq,(Xi)l is its size. Thus, ti = 4 everywhere
526
M. Chahine, J. Konrad / Signal Processing: Image Communication 7 (1995) 503-527
(b)
(c)
(d)
Fig. 23. Error images at field # 14 of test image 5 for (a) linear trajectory model (PSNR = 30.71 dB); (b) algorithm d (PSNR = 37.28 dB); (c) algorithm L%(PSNR = 41.72 dB) and (d) occlusion field.
except at the image boundaries 2 (corners).
where
and
=
i,
C
PjC(1
-
f(xi9xj,
t)l~
‘lErt,k)
A.2. Estimation of (pt, of, It) (algorithm ~8) 5i =
To minimize the total energy from Eq. (23) with respect to pr, only the energy U,(@,o,) + ;IPU,(&, f,) needs to be considered. Note that the new energies UPand U, (Eqs. (17) and (18)) are only slightly redefined compared to the energies described by (8) and (11). Thus, Eq. (A.2) above applies with minor modifications. Summations with respect to k E Yt are now replaced by summations with respect to k E fi and the sample average is computed as in Eq. (19). Also, due to the introduction of motion discontinuities the average vectorpi
C Cl -
~(xi,xj,t)l*
Note that at image boundaries (window W), line elements are assumed to be ‘on’.
Acknowledgements
This research was supported by a grant from the Canadian Institute for Telecommunications Research under the NCE program of the Government of Canada, and by the Natural Sciences and Engineering
M. Chahine, J. Konrad 1 Signal Processing:
Research Council of Canada under Operating Grant 0GP0121619. The authors would like to thank Dr. Christoph Stiller for reading the final version of this paper and for his stimulating comments about its content. References Cl1 J. Besag, “Spatial interaction and the statistical analysis of lattice systems”, J. Roy. Statist. Sot., Vol. B 36, 1974, pp. 192-236. VI J. Besag, “On the statistical analysis of dirty pictures”, J. Roy. Statist. Sot., Vol. B 48, 1986, pp. 259-279. c31 .J. Canny, “A computational approach to edge detection”, IEEE Trans. Pattern Anal. Machine Intell., Vol. PAMI-8, November 1986, pp. 679-698. c41 M. Chahine, Estimation of accelerated motion and occlusions from time-varying images, Master’s thesis, Dept. Electr. Eng., McGill University, July 1994. c51 M. Chahine and J. Konrad, “Estimation of trajectories for accelerated motion from time-varying imagery”, in: Proc. IEEE Internat. Conf Image Processing, November 1994, pp. 11.800-11.804. C61 M. Chahine and J. Konrad, “Motion-compensated interpolation using trajectories with acceleration”, in:
Image Communication
Cl21
Cl31
Cl41
Cl51
Cl61
Cl71
Cl81
lS&T/SPlE Symp. Electronic Imaging Science and Technology, Digital Video Compression: Algorithms and Technologies 1995, Vol. 2419, February 1995, pp. 152-163.
c71 W.-G. Chen, G.B. Giannakis and N. Nandhakumar, “Spatio-temporal approach for time-varying image motion estimation”, in: Proc. IEEE Internat. Con& Image Processing, November 1994, pp. 111.232-111.236. C81 G. Cortelazzo and M. Balanza, “Frequency domain analysis of translations with piecewise cubic trajectories”, IEEE Trans. Pattern Anal. Machine Intell., Vol. PAMI-15, April 1993, pp. 411-416. c91 R. Depommier and E. Dubois, “Motion estimation with detection of occlusion areas”, in: Proc. IEEE Internat. Conf: Acoust. Speech Signal Process., March 1992, pp. 111.269-111.272. Cl01 E. Dubois, “Motion-compensated filtering of time-varying images”, Multidimens. Syst. Signal Process., Vol. 3, 1992, pp. 21 l-239. [ill E. Dubois and J. Konrad, “Estimation of 2-D motion fields from image sequences with application to motion-com-
Cl91
c201
[211
c221
[23]
7 (1995) 503-527
521
pensated processing”, in: M. Sezan and R. Lagendijk, eds., Motion Analysis and Image Sequence Processing, Chapter 3, pp. 53-87. W. Enkelmann, “Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences”, Comput. Vis. Graph. Image Process., Vol. 43, 1988, pp. 150-177. M. Foodeei and E. Dubois, “Coding image sequence intensities along motion trajectories using EC-CELP quantization”, in: Proc. IEEE Internat. Conf: Image Processing, November 1994, pp. 1.72cI.724. S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images”, IEEE Trans. Pattern Anal. Machine Intell., Vol. PAMI-6, November 1984, pp. 721-741. J. Hutchinson, C. Koch, J. Luo and C. Mead, “Computing motion using analog and binary resistive networks”, Computer, Vol. 21, March 1988, pp. 52-63. R. Keys, “Cubic convolution interpolation for digital image processing”, IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-29, December 1981, pp. 1153-l 160. J. Konrad and E. Dubois, “Bayesian estimation of motion vector fields”, IEEE Trans. Pattern Anal. Machine Intel!., Vol. PAMI-14, September 1992, pp. 910-927. H.-H. Nagel and W. Enkelmann, “An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences”, IEEE Trans. Pattern Anal. Machine Intell., Vol. PAMI-8, September 1986, pp. 565-593. H. Nguyen and E. Dubois, “Representation of motion fields for image coding”, in: Proc. Picture Coding Symposium, March 1990, pp. 8.4.1-8.4.5. A.J. Patti, M.I. Sezan and A.M. Tekalp, “Digital video standards conversion in the presence of accelerated motion”, Signal Processing: Image Communication, Vol. 6, No. 3, June 1994, pp. 213-227. T. Reuter, “Standards conversion using motion compensation”, Signal Processing, Vol. 16, No. 1, January 1989, pp. 73-82. C. Stiller, “Object oriented video coding employing dense motion fields”, in: Proc. IEEE Internat. Conf: Acoust. Speech Signal Process., April 1994, pp. V. 273-V.276. C. Stiller and B. Hiirtgen, “Combined displacement estimation and segmentation in image sequences”, in: Internat. Symp. on Fibre Optic Networks and Video Comm. EUROPTO, Vol. 1977, April 1993, pp. 276-287.