SlGNAL PROCESSING:
IIMAGE ELSEVIER
Signal Processing: Image Communicafion 7 (1995) 187-199
Multistage motion estimation for image interpolation P. Migliorati”* *, S. Tubarob aCEFRIEL. via Emanueli IS, 20126 Milano. Itab b Poiitecnico di Miiano, Dip. di Ehttronicu e informazione, P.zza Leonardo da Vinci 32, 20133 Milano, Itaiy Received I7
June I994
Abstract In this paper we describe an afgorithm for accurate motion estimation that takes into account both the motion of the camera (global motion) and the motion of the imaged objects (local motion) through the use of a multistage structure. First, global motion parameters are accurately estimated and the images are compensated for the camera motion. The local motion field is then estimated on the basis of these ‘compensated’ images. To improve the performance of this estimation, both temporal and spatial congruence constraints have been introduced. To test the performance of the proposed algorithm a motion compensated image interpolation that uses the estimated motion field has been carried out. The experimental results show the effectiveness of the proposed multistage motion estimation procedure. Keywords: Motion estimation; Image interpolation; Camera motion; Pan-zoom estimation
1. Introduction Motion compensated image interpolation algorithms are widely used in state of the art video coding algorithms. For example, in MPEGI and 2 coding schemes a time domain subsampled version of the incoming sequence is first coded. Missing frames are interpolated using the available frames and motion information. The quality of the interpolated images can be improved through the coding of the residual error with respect to the original image frames. The temporal subsampling factors normally used in these algorithms are equal to 1: 3 or 1: 4 [S]. This means that for each inter/intra coded image, the subsequent two or three frames are coded using *Correspondingauthor. 0923-5965/95/%9.50 0 1995 Elsevier Science B.V. AI1rights reserved SSDI0923-5965(95)00024-O
motion compensated interpolation techniques. Generally a full search Block Matching Algorithm (BMA) is employed for the motion estimation. Considering, in one image, a pixel block (reference block), normally of dimensions 16 by 16, the algorithm detects, in the other image, a block very ‘similar’ to the reference one (correspondent block); as the two blocks have the same dimensions only translational displacements are estimated. To reduce the computational load only a limited search area is considered in the second image. The best match is carried out minimizing a suitable objective function, such as the sum of the absolute differences between homologous pixels of the two considered blocks [S]. The displacements estimated by these algorithms are very useful in generating a prediction error in a temporal DPCM loop, but they are inadequate
188
P. Migliorati, S. Tubaro 1 Signal Processing: Image Communication 7 (I 995) 187-199
for image interpolation. In fact, for image interpolation it is important to have a good estimation of the ‘real motion’ of each ‘object’ present in the scene. The differences between real motion field and that estimated by BMA tend to increase as the maximum allowed displacement increases. Moreover, when a focal length change occurs (zooming) the image becomes warped, and this effect cannot be interpreted on the basis of a translational motion field. Therefore, for interpolation purposes, some improvements must be made to the standard BMA algorithm to increase the quality of the estimated motion information. Let us consider the development of the improved algorithms: _ normally a motion vector field associated to consecutive known images is always available (it is used for the coding of the known images themselves); _ it is useful to separate the motion field due to camera motion (global motion field) from that due to the displacement of objects present in the scene (local motion field) [Z, 71; _ considering a frame rate of 25-30 frames/s, the global motion field between images I, and In_,, (with k = 3,4) can be effectively expressed by two pan factors (p,, p,) and by a zoom factor (a) [I&71; _ current hardware implementations of motion estimators for the MPEGl-2 [4] coders are oriented towards full search block matching algorithms; - the introduction, in the motion estimator, of some constraints regarding the relationship between motion vectors associated to adjacent blocks can significantly increase the quality of the estimated motion field, and also reduces the computation effort [l]. On the basis of these considerations we propose, in this paper, a multistage motion estimator, that we have tested, with good results, for motion compensated interpolation. Consider two reference images (I,, I,_k): between them there are k - 1 frames that must be interpolated. A first displacement field is estimated using a standard BMA; it is possible to use this information to code I,, on the basis of Inek. An estimation of the pan and zoom factors is then carried out by analysing this motion
field, and I,, _ k is compensated for these factors, thus IL-k is obtained. At this point IA-k and I, can be considered as acquired by a fixed camera. The estimation of a local motion field is carried out considering IA-k and I,. Also in this case a block matching algorithm is used. To improve its performance both temporal and spatial congruence constraints have been introduced. The interpolation of the intermediate frames is performed using both local and global motion information [2 3. In Section 2 the procedure for global motion estimation is described. Section 3 is devoted to the problem of local motion estimation, whereas in Section 4 the interpolation algorithm is described. Simulation results are reported in Section 5 and some conclusions are drawn in Section 6.
2. Global motion estimation To describe the relations between the 3D imaged scene and the corresponding image points we use a perspective projection. In fact a pin-hole camera model describes a real image acquisition system very well [8]. When we allow the change of the intrinsic and extrinsic parameters of the camera during sequence acquisition the motion of the image points depends not only on the 3D displacements of the imaged objects but also on the parameter changes. In particular camera pan and zoom are the most common parameters that change in video conference application and broadcasting sequences. Assuming that the position of the camera (see Fig. 1) is fixed (only a rotation of the camera around the three coordinate axes and changes in the focal length of the lenses are allowed), the instantaneous velocity (u, u) of an image point (x, y) due to these parameter changes can be described by [8]
wherefis the focal length at time t, ;1 = dfldt, while Q,, 52, and 52, are the rotation velocities.
P. Migliorati, S. Tubaro / Signal Processing: Image Communication 7 (1995) 187-199
Assuming a null rotation around 2 axes (as usually occurs), small velocity rotation around X and Y axes and considering the real dimensions of the CCD sensor (with standard CCD normally x, Y<
Considering two images (I,, In-J, taken at time t and (t - At), when At is small, acceleration effects can be neglected; therefore Eq. (3) can also be used to compute image point displacements. In this case we assume A&y)
= ax + px;
A\,(x,Y) = UY+ PY,
(4)
where (A,, Ay) represents the optical flow field, while a =A& A =ft -Amht and pX= SZ,Atf,, py = -sZ,Atf;.
Eq. (4) indicates that small panning causes the entire frame to be displaced uniformly by the same vector while zooming introduces a ‘stretching’ motion of the image points. Using a tele-lens this displacement model adequately describes the real optical flow field even in the case of large pan angles (between time t and t - At), while with wideangle lenses the pan must be small, in any case Eq. (4) is well verified in many situations.
2.1. Pan and zoom estimation Knowing the displacement field between two generic images (I,, In-J and taking into account Eq. (4), it is possible to estimate pan and zoom factors (pX,p,,, a). Only the displacements relative to image points representing projection of station-
189
ary parts of the imaged scene need be taken into account for the inversion of Eq. (4). As indicated in the introduction, we have assumed that a motion field, obtained with standard block matching techniques, is available between I, and In-k When a zooming occurs, this motion field is generally very noisy due to the fact that with BMA only translational motion fields can be correctly estimated. To search for the unknown camera parameters, a least squares technique [2,7] can be used. We look for those parameters which minimize a certain error function, chosen to be E(p,,p,,a)
= 2
[(Axi - ui)” + (Ayi -
PJ21, (5)
i=l
where N represents the number of considered image blocks, while (ai, pi) is the displacement estimated for each block with a block matching algorithm. Axi, A,i, calculated using Eq. (4), are the displacements of the center point of each of the considered blocks. It is also possible to recover the camera parameters using a Maximum A Posteriori (MAP) estimation as indicated for example in [3], but in our case it is very difficult to make statistical assumption on the camera parameter values and on the noise superimposed to the available input data (motion vectors). In any case a preliminary selection of the ‘reliable’ input vectors, for the global motion estimation, can be very useful. For estimating the noise superimposed on the available motion vectors, two important considerations must be taken into account. On the one hand, as previously indicated, the effect of a focal length change results in an image warping that cannot be completely described by a simple translational displacement of the considered image blocks. On the other hand, motion vectors associated to independent moving objects present in the imaged scene can be seen as very ‘noisy’ input data for that regards the estimation of the global motion parameters. These motion vectors should therefore be detected and discarded as outliers. For example in [3) a nonlinear outlier removal filter has been proposed to smooth the displacement field before the estimation of the camera rotations around the
190
P. Migliorati. S. Tubaro / Signal Processing: Image Communication 7 (1995) 187-199
coordinate axes. An alternative procedure is proposed in [7] where an estimation procedure with two iterations is considered. A first estimation of the global parameters is carried out using all the available motion vectors, on the basis of a least squares technique. The displacements that do not match the recovered global motion field are then discarded and the estimation procedure runs another time. In the selection phase the displacements whose components differ by a value greater than a threshold with respect to those calculated using the estimated global parameters are discarded. Using these techniques we have obtained the best simulation results considering a threshold equal to 1 pixel. In this paper we propose a mixed approach. At first the more reliable vectors are selected and an estimation, with two iterations, is carried out. For the preliminary selection, the idea is to select a set of motion vectors congruent with a pan and zoom global motion field. It can be seen (from Eq. (4)) that for background points, Ax has to be constant in the y direction and by in the x direction. This suggests discarding the motion vectors with x (y) components not in agreement with those of the other blocks with the same y (x) coordinate. In practice the mean value M, (M,) and the standard deviation (T, (a,) of the x (y) displacement components for each column (row) of the block estimated motion field are calculated. The vectors with at least one component outside of the range M, ) hox, M, + ha, are discarded. Good results have been obtained using h = 2. Normally after this selection only a limited number of vectors (300400 out of 1620, see Section 5.1) survive. A first least mean square estimation is carried out using this limited number of displacements. A new selection is then carried out considering all the available motion vectors and discarding those that do not match the recovered global motion field as previously described. Then, with the obtained set of motion vectors, a second estimation is performed. Using this procedure very good results are obtained (see Section 5), without increasing in a significant way the computational complexity. In any case, if the zoom factor between Inek and I, is very large, the motion field used for the estimation is very noisy and only a rough estimation of the pan and zoom factors can be obtained. In this
case it is useful to perform a refinement of the parameter estimation (two step procedure). A first compensation of the camera motion is carried out on frame Zn_k, obtaining Zz_k. This compensated image is evaluated using the following relation:
A further motion estimation is then carried out between IL-k and I, in order to evaluate the residual camera parameter changes (p:, pi, al). The total parameters (pi, pi, a’) are obtained combining the two global parameter sets: P: = P: + Px (1 + 4,
P: = P: + PyO + a’),
a’ = ,I + a(1 + aI).
(7)
After this second step, the image I’, _k can be calculated from I,,_k using the total pan and zoom parameters (Eq. (6)). The computational load of this further motion estimation is very small because it is carried out considering a small search area (see Section 5.1). In cases of significant zooming this procedure results in a further improvement in the quality of the interpolated images, as reported in Section 5.
3. Local motion estimation The estimation of the local motion field is carried out considering l:-k and I,. Also in this case a block matching algorithm is used. To improve its performance both temporal and spatial congruence constraints have been introduced.
3. I. Temporal constraints Temporal congruencies are exploited using the intermediate frames between I, and I:...,+ (Multiple Comparison Algorithm, MCA) [2]. First the pan and zoom compensation is carried out on the intermediate frames considering a linear variation of the camera parameters. This is followed by selecting each block belonging to I, and searching for it in image Z: _k. For each tested displacement the position of the current block on the compensated inter-
191
P. Mgliorati, S. Tubaro f Signal Processing: Image Communication 7 (1995) 187-199
mediate frames is calculated assuming that the displacement of the image points between I, and ZiWkhas occurred at constant velocity. A global distortion is evaluated and this indicates how the considered block has been matched in all the images (I:_,, . . . , Ii_&. The displacement with minimum global distortion is selected as the real displacement of the current block. In this way each part of the moving object is tracked, also in the skipped frames, and the problems of multiple correspondences are greatly reduced. On the other hand, this algorithm requires more computations than the classical method that considers only frames n and n - k; specifically it requires at least k times more computations than requested by a full search algorithm. To reduce the computational effort a suitable procedure for the scanning of the search area has been implemented, thus encouraging the detection of small displacement vectors. In this procedure the reference block is moved along a spiral path in the searched area beginning in the center [Z]. The estimation process is stopped when, for a specific search position, the sum of the absolute differences between the reference block and the candidate lies below a threshold (Z) adapted to the noise superimposed on the images. The displacement corresponding to this iteration is assigned to the current block. For sequences compensated for global motion it has in fact emerged that the module of the displacement vectors has a probability density function with the maximum centered around the value equal to zero. For the current image (In) the threshold Z is calculated as the average value of the Absolute Motion Compensated Luminance Differences (AMCLDs) associated to M blocks of the image Z,_k. The considered motion field is obviously the one estimated between Zn_k and Z:_2k. The M blocks considered are those characterized by lower AMCLD mean values. In our simulation we have considered M = 20 but the results show that this is not a critical parameter.
3.2. Spatial constrain’ts In order to further improve the performance of the local motion estimation we took into account the
I
I
I
I
I
I
!
I
j
Fig. 2. Blocks considered for the congruence constraint.
spatial congruencies between each reference block and its neighbors Cl]. In practice, to avoid false correspondences, the spatial correlation among motion vectors associated to adjacent blocks has been exploited. The major hypothesis of this procedure is that the current block and one or more of the neighboring ones belong to the same object. The current block is named (X, Y) and we considered four adjacent blocks (Fig. 2). Note that the motion vectors relathe blocks (X - 1, Y - 1) and tive to (X + 1, Y - 1) are relative to the current frame, whereas (X - 1, Y + 1) and (X + 1, Y + 1) are taken from the previously considered one after pan and zoom compensation. For each reference block, L displacement vectors relative to the most important minimums of the distortion function, were recorded. They were then compared with the motion vectors assigned to the four selected adjacent blocks. The one more similar to one of the neighboring vectors was assumed as the displacement of the current block. Experimental results show that the best performance is obtained with L = 5, but it can be seen that this choice is not so critical.
4. Interpolation algorithm As previously indicated, when effective motion information is available it is possible to reconstruct, with great accuracy, intermediate frames between the current I, and the previously available In_k. Therefore an estimated motion field can be used in
192
P. Migliorati, S. Tubaro / Signal Processing: Image Communication 7 (1995) 187-199
Fig. 3. Interpolation procedure.
a motion compensated interpolator to test its accuracy. Fig. 3 shows the used interpolation procedure. The reconstruction of a missing image due to a time subsampling with k = 2 is considered. Frames I, and I,_ 2 are known and I,_ 1 must be interpolated. For the sake of simplicity a pure zoom out is considered. Using the parameter vector (p,, py, a)“,,, _ z describing the global motion from I,_ z to I,, the previous frame I, _ z is compensated, obtaining ZC,_~. The local motion field is then employed to obtain Zrc,_ 1. This image is then reconstructed in its correct dimensions using the global motion parameters (px, py, a), _ I,” obtaining Ir,_ 1. Near the boundaries there are some undefined regions that can be extrapolated from frames I, and I, _ z by taking into account the appropriate global compensation. In particular, by using the parameters describing the global motion between I, and I, _ z and the luminance values of I,, we have a first interpolation of the intermediate frame If- i. Then starting from I,_ z the process is repeated to obtain If_ i. Finally the undefined regions of Zr,- 1 are reconstructed considering Zz_ 1 and I!_ 1 [2]. The interpolation approach discussed above results in interpolated images of a much higher visual quality as compared to images interpolated using standard motion information. Residual problems appear in image regions corresponding to new scene and uncovered areas, and in the case of wrong vector estimation. Some improvements are obtained using a simple Median Filter (MF) [6] in the interpolation process
that takes into account, for each image point, both linear and motion compensated interpolations. Median filtering is performed considering three possible values: the values of images I, and ZC:_~ compensated for the local motion vector and the value obtained by linear interpolation. If the two motion compensated values are similar, the MF selects this value, otherwise an intermediate value is selected. The artifacts resulting from the MF tends to be less visible than the one obtained from the average of the two motion compensated values.
5. Simulation results The performance of the proposed algorithms have been evaluated considering several standard sequences in CIF format. Only the luminance component has been considered. The sequences were subsampled by a k factor of 3 in the time dimension and the missing images reconstructed and compared with the .original ones. The dimension of the blocks considered in the motion estimation is of 8 by 8 pixel, that corresponds to 16 by 16 on images in the CCIR-601 format. In particular the following aspects have been considered: - effectiveness of different methods for global motion estimation; - effectiveness of different algorithms for local motion estimation;
P. Migliorati, S. Tubaro 1 Signal Processing: Image Communication 7 (1995) 187-199
been obtained minimizing the absolute differences between I, and IL_, (Z,_k compensated for the global motion) on the ‘manually’ selected background part of the imaged scene. These results show that the proposed preliminary selection greatly improves the precision of the estimation and indicate how the two step procedure (first global motion estimation, global motion compensation and refinement estimation) is very useful to recover the correct values of the camera parameters in case of very large zoom factors. Moreover, considering the 2-STEP algorithm, the threshold values used for vector selection become less critical. In fact, in case of rough estimation in the first step, the refinement introduced by the second step guarantee anyway good results. Fig. 4 shows the quality of the images obtained by compensating the global motion of the camera. The global motion was estimated considering the three algorithms previously described (2-IT, 2-ITSEL, 2-STEP). As we can see, the use of the preliminary selection of the motion vectors (2-IT-SEL) results in a marked improvement in the quality of the interpolated images. The 2-STEP procedure gives a further improvement, more relevant in those
Table 1 Zoom parameter estimated using different algorithms Images
2-IT
2-IT-SEL
2-STEP
MAN
27-30 45-48
0.068 0.042
0.085 0.061
0.080 0.061
0.081 0.060
_ comparison of different interpolation
algorithms; - interpolation with and without median filtering. In Table 1 some results relative to the global motion estimation are presented. The motion fields associated to two couples of images (30 : 27,48 :45) of the sequence ‘Table Tennis’ were considered. In one case a large zoom is occurring, while in the other a small camera zoom is in action; in both cases the pan factors are negligible. In Table 1 the estimated zoom parameter obtained by three different algorithms is shown. The considered methods are the ‘two iterations algorithm’ (2-IT) [7], the one that uses also the proposed preliminary selection (ZIT-SEL) and the ‘two step algorithm’ (2-STEP) (in both steps the preliminary selection is used). The parameters ‘manually’ estimated are also indicated (MAN). These reference global parameters have
18’3
15
*
18
I.
21
24
I
27
I.
30
193
33
I.
38
39
I,,
42
45
48
,J
51
Fig. 4. ‘Table’ sequence: SNR of the images obtained by compensating the global motion of the camera.
194
P. Migliorati, S. Tubaro / Signal Processing: Image Communication 7 (199.5) 187-199
33 mQ ...................................
m._..
.
I
El hAcGLMcA+sc .*.. WXLZMCA -M--
.................
hCGLZFS
251
13 16 19 22 25 25 31 34 37 40 43 45 4 frame
Fig. 5. SNR of the interpolated images obtained considering different local motion estimation algorithms (‘Table’ sequence).
parts of the sequence characterized by significant zooming of the camera. Fig. 5 shows the performance obtained by considering a Motion Compensating Global Local interpolator (MCGL) that uses three different algorithms for local motion estimation, namely: Full Search block matching (FS), Multiple Comparison Algorithm (MCA) and the proposed Multiple Comparison Algorithm with Spatial Congruencies (MCA+SC). The global motion was estimated in every case considering the 2-STEP procedure. The MCGL-MCA+ SC gives the best results, with an average gain of more than 0.5 dB with respect to the case of MCGL-MCA and more than 3 dB with respect to MCGL-FS. The improvement obtained by MCA + SC with respect to MCA is relevant not only in terms of SNR (0.5 dB) but also in the subjective quality of the interpolated images. This is due to the reduction of visually annoying artifacts which can result from inaccurately estimated motion vectors. Figs. 6-8 show respectively the motion fields obtained considering the traditional full-search BMA, the global motion field and the local motion field obtained considering the MCA+SC algorithm.
In Fig. 9 the performance of the MCGLMCA-t SC interpolator (2 step procedure for global parameter estimation and MCA with Spatial Constraints for local motion estimation), a Motion Compensating interpolator that uses displacement vectors obtained by applying a Full Search algorithm (MCFS), and a Linear interpolation algorithm (LIN) are compared.
-160
-100
40
0
60
loo
160
Fig. 6. Motion field estimated by full search BMA algorithm.
P. Migliorati, S. Tubaro / Signal Processing: Image Communication 7 (1995) 187-199
Fig. 7. Global motion field.
195
Fig. 9. SNR of the images interpolated using different interpolation algorithms (‘Table’ sequence).
Figs. 13 and 14 show the images obtained considering an MCGL that uses or not the Median Filtering (MF) in the interpolation process. As we can see, the MF gives a better image quality with less artifacts than the other method.
5. I. Computational complexity
Fig. 8. Local motion field estimated by MCA + SC algorithm.
The MCGL-MCA+SC P:ves the best results, with an average gain of more than 7 dB with respect to the case of MCFS and 10 dB with respect to linear interpolation. Figs. 10-13 show the original image and the corresponding ones obtained by applying the different interpolation algorithms. Also the subjective evaluation of the experimental results indicates that a global-local motion estimation, in which regularization techniques are employed, is very useful to obtain the ‘physical’ motion field necessary for effective image interpolations.
For that regards the computational complexity, it is useful to make some comparisons between the proposed global-local motion estimation algorithm and a standard Full Search (FS) algorithm. In both cases a bilinear interpolation is used to calculate luminance values relative to noninteger image points. In the FS algorithm we have used a search window of f 30 pixel in both horizontal and vertical directions in order to track the large motion present in the sequence when subsampling factor k = 3 is considered. The proposed algorithm (MCGL-MCA + SC), as previously described, can be subdivided in the following steps: (1) First Full Search motion estimation; (2) Estimation of the global parameters; motion compensation on image (3) Global Inek (Zzek is obtained); (4) Motion Estimation between I, and Iimk; (5) Refinement of the global parameters;
196
Fig. IO. Original
Fig.
image.
11. Linear interpolated
image.
Fig. 12. MCFS
interpolated
image.
Fig. 13. MCGL
interpolated
image.
198
P. Migliorati, S. Tubaro j Signal Processing: Image Communication 7 (1995) 187-199
Fig. 14. MCGL interpolated image using Median Filtering.
(6) Global motion compensation considering refined parameters (Ilnmkis obtained); (7) Local Motion Estimation. The most significant steps, from the computational point of view, are (l), (4) and (7). In (4) only a small search window is used ( f 10 pixel in both directions), in fact a large amount of the global pan and zoom have already been detected and compensated. The computational load for step (4) is therefore about l/9 of that required at point (1). Also in (7) a reduced search area ( + 20 pixel) is considered (only local motion must be detected). Also the effect of the spiral search with threshold is significant. From our simulation we have obtained that the computational load for this step is about 1.5 times the one associated with step (1). Obviously this result is strongly dependent from the amount
of background present in the imaged scene. In any case, independent moving objects cover normally only a limited part of the images. In total, from these considerations and from our simulations, we have obtained that the proposed MCGL-MCA+SC algorithm have a computational load that is 1.8 (1 + l/9 + 1.5 x 4/9) times that of the FS algorithm.
6. Conclusions In this paper we have proposed an algorithm for multistage motion estimation that takes into account both the motion of the camera and the motion of the imaged objects. The estimation of the global motion parameters is performed considering a preliminary selection of
P. Mgliorati, S. Tubaro / Signal Processing: Image Communication 7 (1995) 187-199
the motion vectors and a two step procedure that improves the accuracy of the estimation. To improve the performance of the local motion estimation both temporal and spatial congruence constraints are introduced. We tested the performance of the proposed algorithm by using the estimated motion field to interpolate the images. The good results of the tests indicate that the use of a global pan and zoom estimation is very important in the estimation of the ‘physical’ motion field, especially in the case of natural scenes where changes in camera position, orientation and focal length continuously occur. Moreover, the introduction of simple constraints in the estimation of local motion greatly increases the quality of the estimation. The accuracy of the estimation of the global parameters is also pointed out by the fact that the quality of the interpolated images is about the same in both the cases of significant panning/zooming and of stationary camera. For interpolated images there is an average gain of 7 dB with respect to the case of standard motion compensated interpolation. The use of the median filtering in the interpolation procedure reduces in a significant way the effects of residual errors on the motion fields. Current research is oriented toward the precise detection of the covered and uncovered areas of the scene in order to use, for these areas, a different interpolation scheme. Moreover, it is our intention to investigate pyramidal motion estimation techniques in order to increase the resolution of the estimated motion fields that can be applied to image interpolation.
199
Acknowledgements The authors would like to thank Dr. L. Sorcinelli for the helpful suggestions and the reviewers of this paper for their useful input. Their suggestions certainly helped to improve the clarity of presentation of the paper.
References [l) G. de Haan, P.W.A.C. Biezen, H. Huijen and O.A. Ojo, “True-motion estimation with 3-D recursive search block matching”, IEEE Trans. on Circuits and Systemsfor Video Technology, Vol. 3, No. 5, October 1993. [2] P. Formenti, P. Migliorati, L. Sorcinelli and S. Tubaro, “Global-local motion estimation in multilayer video coding”, Proc. SPIE 1818 Visual Communications and Image Processing ‘92, Boston, USA, 18-21 November 1992, pp. 573-584. [3] C.S. Fuh and P. Maragos, “Affine models for motion and shape recovery”, Proc. SPIE 1818 Visual Communications and Image Processing ‘92, Boston, USA, 18-21 November 1992, pp. 12&134. [4] ISO-IEC/JTCl/SC29/WGi 1, Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262, ISO/IEC 13818-2, November 1993. [S] L.W. Lee, J.F. Wang, J.Y. Lee and CC. Chen, “Motion oriented picture interpolation with the consideration of human perception”, Proc. Internat. Con& Acoust. Speech Signal Process, ‘93, Minneapolis, MN, USA, 27-30 April 1993, pp. V-425-428. [6] W.K. Pratt, ‘Digital Image Processing’, Wiley, New York, 1991. [7] Y.T. Tse and R. Baker, “Global zoom-pan estimation and compensation for video compression”, Proc. Internat. Conf Acoust. Speech Signal Process, ‘91, Toronto, Canada, 14-17 May 1991, pp. 2725-2728. [S] S. Tubaro and F. Rocca, “Motion field estimators and their application to image interpolation”, in: M.I. Sezan and R.L. Lagendijk, eds., Motion Analysis and Image Sequence Processing, Kluwer Academic Publishers, Dordrecht, 1993.