Signal Processing: hnage Communk'ation 4 (1991) 93 99 Elsevier
93
Short Communication
Piecewise linear motion-adaptive interpolation Wen Jun Zhang Philips Kommunikations lndustrie AG, Thurn-und-Taxis Strasse !0, LA22~, W8500 Niirnherg i0, Germany
Yu Song Yu h~stitute ~" hnage Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of Chhla Received 2 May 1990 Revised !8 December 1990
Abstract. A new motion-adaptive interpolation algorithm is presented in this paper. Based upon the characteristics of typical
video-conferencing scenes, a simple yet effective technique called background subtraction is proposed to refine the segmentation of the frame for interpolation. In this scheme, the outlines of missing frames are transmitted if necessary. The adaptivity depends on the magnitude of the representative vector, which is determined by the distribution of motion vectors on the outlines. At cost of small overhead data. the algorithm improves the edge integrity and the quality of reconstruction of the interpol~ted frame. Keyw~rds. Motion-adaptive interpolation, background subtraction, motion estimation, coding for videoconferencing.
I. Introduction For emerging ISDN services, new coding algorithms by introducing scene analysis-based techniques [46, 9-I1], aim to further reduce the bitrate or dramatically improve coding performance. Among these algorithms, the frame skipping method--skipping frames at the transmitter and interpolating the skipped frames at the receiver, appears to be one of the promising techniques for transmitting videoconferencing images at low bitrate (~<384Kbit/s). However, this method runs into the following difficulties if there is motion in the scene: - Moving objects become blurred by normal linear interpolation; - Spatial discontinuity occurs at the boundaries of the moving objects. 0923-5965/91/$03.50
To overcome these difficulties, a robust motionadaptive frame interpolation with reliable displacement estimation as well as refined techniques for segmenting the frame to be interpolated are necessary. The motivation of the scheme presented here is to meet both of these requirements. In this paper, the advantages and disadvantages of two representative frame interpolation methods, proposed by Furukawa et al. [3] and Bergmann [I], respectively, are first pointed out. A new segmenting scheme is then presented under the consideration of the characteristics of typical videoconferencing scenes. A region matching algorithm (RMA) [8], based on the segmentation, is employed to improve the displacement estimation procedure. Next, we describe the new, 'piecewise linear motion-adaptive interpolation method', which needs to adaptively transmit the outlines of
~ 1991 Elsevier Science Publishers B.V. All rights reserved
94
W.J. Zhang, Y.S. Yu / Piecewise linear motion-adaptive interpolation
the frame to be interpolated. This transmission is controlled by the amplitude of the motion vectors between two transmitted frames. Results of simulation experiments compare the proposed method to the methods in [1, 3] at last.
2. Study of frame interpolation techniques Furukawa et al. [3] discussed the problem of erroneous displacement estimation in motionadaptive interpolation. Each missing frame is reproduced at the decoder using a representative motion vector of a moving object, based upon a rigid body motion assumption. This involves processing to provide more reliable displacement vectors for the spatial continuity of the reconstructed moving object. However, besides global motion of head and shoulder, the typical videoconferencing scenes often contain many local motions of eyes, mouth, which can not be represented by a single representative motion vector. Unlike Furukawa et al., Bergmann [l] discussed the problem of detection and segmentation of the areas with different motion characteristics in a frame. The segmenter he designed (see Fig. 1) indicates for a transmitted frame the following areas: bt: moving area, b2: area uncovered in the present frame, Input
r l . ....,
!
v
.....,
::( r
l Threshold '""Y l
i 'n
Thfel told
3. Background subtraction
L o g i c a l Combination
¢¢ bt
¢¢ b,
b3
b,
Fig. 1. Block diagram of Bergmann's segmenter. S i g n a l P r o c e s s i n g : Image Communication
b3: area going to be covered in the next frame, b4" stationary background. This segmentation is surely required for correct reconstruction of the skipped frames. Unfortunately the segmenter in Fig. 1 only gives the segmentation for the transmitted frames (frame Sk-t and Sk+ t are 'tipped frames). For simplification, Bergmann assumes that, *.he same displacement is true between the skipped frame and transmitted frame, only the size of the displacement vector is divided by a factor of two. The assumption works well when the motion speed is small (less than +5 pels/line). In an actual case, however, to meet the requirement of low bit-rate transmission, large displacements (often greater than 8 pels between two transmitted frames) may occur due to skipping of several intermediate frames. This leads to a wrong segmentation which results in degradation of quality in reproduced frames. Meanwhile, due to the high correlation of adjacent picture elements, the output of binary threshold in Fig. 1 may be zero even though the element should be assigned to the moving area (bt), i.e., the segmenter may be unable to obtain the correct segmentation directly based on the procedure depicted in Fig. 1. In fact, a region growing algorithm [2] is necessary for post-processing, as the four areas segmented by Bergmann's segmenter are mixed and disorderly. If the information of the skipped frame is required in the receiver, the transmission of it will cost a lot of bits. Recently, Thoma [7] presented an improved version of Bergmann's scheme. Based on the segmenter mask, Thoma classifies ba and b3 in the missing frame stage by stage to improve the accuracy of segmentation. But he also assumes that the outlines of moving area in transmitted frames are the same as that in missing frames. This, however, does not hold due to the deformation of objects.
Compared to television scenes, typical videoconferencing scenes have two distinct characteristics:
95
W.J. Zkang, Y.S. Yu / Piecewise linear motion-adaptive interpolation
- The participants mainly cause she motions in the scene, which contain only moderate global motions of head, shoulder ar'~d hands as well as fast local motions of eyes an,~! mouth. - The background is fixed, i.e, the background arrangement, the positions o~ cameras, and the lighting condition of the conference room are all fixed in advance and remain unchanged during the meeting. In view of the characteristics, especi~>~ ' ~ second one, we have strong support for the following scenario. An initial scene of the background for videoconferencing but with no participants in it is transmitted to the receiver. In every receiver, we assume a special frame memory for storing the received background image. Using the digital subtraction technique which is often found in some medical image processing applications, we subtract the background image having been stored in the memory from input videoconferencing scenes and thus obtain the outlines of the moving objects (participants) after necessary processing (see Fig. 2). Considering possible fluctuation of simple illumination changes during meeting, the variation of grey level (the average value) within a specific window, opened at the top left corner of input
sequences, is detected in real time to get a gain factor for the background subtraction. The background subtraction procedure is simple yet very effective. It makes a significant improvement for the performance of the proposed segmenter below.
4, Design of segmenter With the outlines obtained from Fig. 2, a binary segmenter mask, which distinguishes between moving objects (denoted as '1') and stationary background (denoted as '0') is easily constructed by use of an inner-filling algorithm. This segmentation assumes that non-zero displacement vectors only connect with the area '1'. Through a logical combination of the masks of three successive frames, just as that in Fig. 1, the four different areas are precisely classified, i.e., the proposed segmenter is the same as Bergmann's except that the input now is binary mask not original image. The outlines of moving objects can easily be coded by octdirectional chain-code. These codes are regarded as overhead information and transmitted to the receiver.
Input Sequences
Postprocesslng Block
G~r|n Factor,._,,.._ (
)
Single Nosle Output Sequences
Fig. 2. Block diagram of background subtraction. Vol. 4, No. i. November 1991
96
W.J. Zhang, Y.S. Yu / Piecewise linear motion-adaptioe interpolation
5. Region matching algorithm Based upon the correct segmentation, a displacement estimation algorithm, called RMA [8] is employed to find motion vectors on a block-byblock basis. The key of RMA is that the block which contains the outline of the moving area is divided into several parts according to the segmenter results, and then the parts, but not the whole block (done in the block matching algorith/n [5]), are set out to participate the matching procedure to find the motion vectors. As an example, Fig. 3 gives the difference between RMA and BMA. A plate is moving left from frame k - 1 to frame k in Fig. 3. According to the segmenter results, Block bAB in frame k is divided into two parts, AI and B~. In the same way, Block boo is divided into C~ and D~. With RMA, AI belongs to the stationary area and finds the same position A0 in frame k - 1 . B~ belongs to the moving area and finds the corresponding position B0 in frame k - 1 with a non-zero motion vector. On the other side, with BMA, At would find a wrong matching position Ao because of block-matching. To make things even worse, D~ belongs to uncovered background and has no matching position in frame k - ! for BMA. In this case, only C~ is employed to estimate the motion vector and Dt is reproduced from the background image for RMA. The advantages of RMA are the accuracy of motion vectors is increased, especially at the outlines of moving objects;
Frame K-1
Frame K
Fig. 3. The difference between RMA and BMA. Signal Processing: Image Communication
- d u e to the consideration of the correlation between adjacent blocks, the adjacent motion vectors of specific objects are homogenous concerning direction and amplitude; - with the help of the segmentation, the algorithm is suited to estimate large displacements. These attributes are very helpful for a robust motion-adaptive interpolation.
6. Piecewise linear motion-adaptive interpolation Some motion-adaptive interpolation techniques do not encode any information for the frames to be interpolated. Based on a linear translational approximation model, the missing frames are interpolated between two transmitted frames. When a large frame dropping rate, often greater than 2:1, is employed, this motion model does not hold due to the deformation of moving objects, the change ofpel values (as a result of the change ofpel values, as variation), and so on. Meanwhile, the spatial discontinuity occurs on the boundaries of moving objects owing to the erroneous motion estimation. The discontinuity is very annoying. As stated above, Bergmann and Furukawa et al. have pointed out the problem and have tried to resolve it. In our opinion, the important information of the frame to be interpolated ought to be transmitted to the receiver when large motion vectors exist between two transmitted frames. This is the basic idea of our piecewise linear motionadaptive interpolation. The important information, in our method, is the outlines of moving objects, which encode easily with chain code and take a small number of bits. With the help of the outlines, the four different areas, bt, bE, b3, b4, in each missing frame are precisely classified, no matter how large the motion vectors are. Then the contents surrounded by every outline (moving area b~) in the missing frames are interpolated with the corresponding areas (determined by motion vectors) in the two transmitted frames. The mathematic
W.J. Zhang. KS. Yu / Piecewise linear motion-adaptive interpolation Table I Performance comparison among three methods Figure
Method
Overhead data
(,4, T~, T2) NMSE(%)
4(a) 5(a) 6(a) 4(b) 5{b) 6(b)
Furukawa's Bergmann's ours Furukawa's Bergmann's ours
0 0 2063 bit 0 0 2527 bit
--(5, 2, 8) --(10, 2, 8)
0.1632 0.1565 0.1536 0.0624 0.0637 0.0601
representation of the interpolation filter is listed in Table l (the same as that in [l]). When the motion is small, the transmission of each missing frame outlines is wasteful, and the resulting artifacts are often even more annoying than that of linear interpolation. An adaptive controller is therefore needed to decide to transmit the outline of a skipped frame or not. The following procedure is suited when the frame dropping rate is equal to 4: l (field rate 30 Hz), and can be extended for higher dropping rate. Between two transmitted frames (denoted as SI and $5), the outline information transmission of the skipped frames (denoted as $2, S:~ and $4) depends upon the magnitude of a representative motion vector (denoted by Dry). Dry is determined by the motion vector field obtained from $1 and $5. Unlike the Furukawa's method, Dr~ are on an outline of a moving object. If there are several outlines, Dr~ is the largest one. Given two threshold values TI and 7"., (controlled by overall bit-rate requirement), T2 > T~ > 0, = (dry.,,
dr.,),
d = max{drvx, dr..}.
If d<~T~, the outlines of all the three skipped frames are not transmitted. Since the motion is small, these three frames can simply be reconstructed by Furukawa's method. If T~ ~d~< T2, the outlines of frame $3 should be transmitted. At the receiver, $3 is interpolated with Table 1. Then considering the reconstructed $3 as a transmitted frame, $2 and $4 are reproduced by Furukawa's method.
97
If d> 7".,, the outlines of three skipped frames are all transmitted and every skipped frame is reconstructed with Table l. Note that Furukawa's method adopted here is a variation of [3] ; the areas inside the moving objects are interpolated with the local motion vectors, not with the unified representative vectors. Because of the adaptive transmission of outlines, the necessary interpolation and extrapolation of the skipped frame is on the basis of the outlines of the movir~ objects; this is the main difference between ours and Bergmann's method.
7. Experimental results Simulation experiments have been performed using typical videoconferencing scenes, 'Checked Jacket' and 'Miss America' provided by the Spe~ cialist G r o u p for Visual Telephony in C C I T T SGXV. Image sizes are 352 pels x 288 lines for Y and 176 pels x 144 lines for Cr and Ch. Each pel is represented by 8 bits. Since a priori background scene is not available, two synthetic background images from these videoconferencing sequences are created (the moving objects are manually digged out of the original sequence and the remains are averaged or extrapolated to form a synthetic background image). The 78th frame of 'Miss America' and the 65th frame of 'Checked Jacket' are selected for our experiments. The frame dropping rate is 4:1. The block size used in RMA is 8 x 8 pels. The reconstructed frames by use of Furukawa's, Bergmann's and ours are shown in Figs. 4, 5 and 6, respectively. The detail results are given in Table 2. Because of the small motions in 'Miss America', the improvement of our method is not very noticeable. For the 'Checked Jacket' scene, one can see in Fig. 4(b) and Fig. 5(b) the obvious blurred palm and the edge degradation in the shadow of palm of the center person. But in Fig. 6(b) the edge degradation is removed and the blurring is reduced. As a cost, about 2.5 kbit overhead data, which are Vol. 4, No. I. November 1991
98
W.J. Zhang, Y.S. Yu / Piecewise linear motion-adaptive interpolation
Fig. 4. Reconstructed frames with Furukawa's method.
Fig. 5. Reconstructed frames with Bergmann's method.
Fig. 6. Reconstructed fames with our method. Signal Processing: Image Communication
W.J. Zhang, Y.S. Yu / Piecewise linear motion-adaptive &terpolatt~.n
Table 2
99
References
Coefficients and output of the interpolation filter Area Coefficients
Interpolated output S,.... (x, y)
b~
a,,,S,,,(x-dx/2.y-dy/2)+ a,.~NS.,+u(x+dx/2.y+dy/2) a,, +NS,,+N(X, y)
b2
a,.=n/N, a,,,+,,=(N-n)/N a,,, = 0,
b3
a,,, = I.
am ~ N
~---
!
a,,,S.,(x, y)
a,. t N = 0
b4
am=n/N, a,,, N= ( N - n ) / N
a,,,S,,(x, y)+a,,~NS,,.N(x,y)
n = i , 2. . . . . N-I
the outlines coded by octdirectional chain-code, must be transmitted. In view of the automatic gain-control of the camera used in our laboratory, the average brightness of the background of input sequence is different from that of pre-photoed background image. But with a window opened at the top left corner (see Fig. 2), this difference is detected and compensated successfully. The window size used here is 16 x 16 pels.
8. Conclusion Based on the background subtraction technique, a piecewise linear motion adaptive interpolation algorithm is described. Unlike Furukawa's and Bergmann's method, the outline information of a skipped frame is transmitted, if necessary, so as to obtain a correct segmentation of different moving areas in the frame. Simulation results show that our scheme performs better than the other schemes, when a large frame dropping rate is required.
[ !] H.C. Bergmann, "Motion-adaptive frame interpolation", Proc. lnternat. Zurich Seminar on Digital Communication, Zurich, March 1984, pp. 57-61. [2] R. Lenz and A. Gerhard, "'Image sequence coding using scene analysis and spatiotcmporal interpolation", in: T.S. Huang, ed., Image Sequence Processing and Dynamic Scene Analysis, Springer, Berlin, 1983, pp. 264-274. [3] A. Furukawa, T. Koga and K. Linuma, "'Motionadaptive interpolation for videoconference pictures", Proc. Internat. Conf. Communication, Amsterdam, The Netherlands, 14-17 May 1984, pp. 707-710. [4] N. Mukawa and H. Kuroda, "Uncovered background prediction in interframe coding", IEEE Trans. Comm., Vol. COM-33, November 1985, pp. 1227 1231. [51 H. Musmann, P. Pirsch and H.J. GraUert, "Advances in picture coding", Proc. IEEE, Vol. 73, No. 4, April 1985, pp. 523 548. [6] A. Puri, "Conditional motion-compensated interpolation and coding", hlternat. Workshop on 64 Kbit/s Coding of Motion Video, Hannover, Germany, September 1989, pp. 1-5. [7] R. Thoma, "A segmentation algorithm for motion compensating field interpolation", Proc. Picture Coding Symposium 87, Stockholm, Sweden, June 1987, pp. 81 82. [8] Wen Jun Zhang, "'High compression ratio coding for moving sequences and stationary images", Doctoral Dissertation of Shanghai Jiao Tong University, July 1989 (in Chinese). [91 International Workshop on 64 Kb/s Coding of Moving Video, Hannover, F.R. Germany, June 1988. [101 Picture Coding Symposium 87. Stockholm, Sweden, June 1987. [ill Special Issue on Video Image at Low Bit Rate, IEEE J. Sel. Areas Commun.. Vol. SAC-5, No. 7, August 1987.
Vol. 4. No. I. N o v e m b e r 1991