Signal Processing 18 (1989) 259-267 Elsevier Science Publishers B.V.
259
M O T I O N VECTOR C O D I N G WITH C O N D I T I O N A L T R A N S M I S S I O N Woo Young CHOI and Rae-Hong PARK Department of Electronic Engineering, Sogang University, ('.P.O. Box 1142, Seoul 100-611, Korea Received 3 October 1988 Revised 1 February 1989 and 2 June 1989
Abstract. This paper describes the coding methods of the motion vector in the block matching algorithm (BMA). The six entropy coding methods are applied to the original motion vector or to the smoothed motion vector, which is obtained by considering neighboring motion vectors. Here to increase the coding efficiency of the motion vector, we propose the conditional transmission of the smoothed motion vector by considering the corresponding prediction error. In the proposed coding method, we transmit the image by using entropy coding of the smoothed motion vector and D C T coding of the prediction error for various threshold values. The computer simulation shows that by using the proposed conditional transmission method of the smoothed motion vector, we can compress the motion vector efficiently with a slight degradation in the reconstructed image. Zusammenfassung. Dieser Beitrag beschreibt Verfahren zur Kodierung eines Bewegungsvektors im sogenannten BlockMatching-Algorithmus (BMA). Eine Entropie-Kodierung kann a u f den urspriinglichen Bewegungsvektor angewandt werden oder a u f eine gegl/ittete Version davon, die m a n bei Beriicksichtigung benachbarter Bewegungsvektoren erh~ilt. Zur Steigerung der Wirksamkeit der Bewegungsvektor-Kodierung schlagen wir die lJbertragung des geglfitteten Vektors unter bestimmten Bedingungen vor, welche die Betrachtung des entspechenden Vorhersagefehlers betreffen. In der vorgeschlagenen Kodiermethode iibertragen wir das Bild, indem wir eine Entropie-Kodierung des geglfitteten Bewegungsvektors v o r n e h m e n und den Vorhersagefehler nach einer D C T fiir verschiedene Schwellwerte kodieren. Rechnersimulationen zeigen, da D das vorgeschlagene Verfahren zu einer effizienten Kompression des Bewegungsvektors bei geringfiigiger Beeintr~ichtigung des rekonstruierten Bildes fiihrt.
R~sum~. Ce texte d6crit les m6thodes de codage d ' u n vecteur de mouvement dans le cas d'utilisation de l'algorithme d'appariement de blocs (en anglais block matching algorithm (BMA)). Les six re&bodes de codage par entropie ont 6t6 appliqu6es au vecteur de m o u v e m e n t original et $ sa version filtr6e obtenue en tenant compte des vecteurs de mouvement voisins. Pour am61iorer l'efficacit6 du codage, nous proposons ici l'utilisation de la transmission de la version filtr6e du vecteur de m o u v e m e n t conditionn6e par l'erreur de pr6diction lui correspondant. Notre m6thode consiste ~t transmettre l'image en utilisant un codage par entropie du vecteur de m o u v e m e n t filtr6 et un codage D C T de I'erreur de pr6diction pour diff6rentes valeurs de seuil. Une simulation sur ordinateur montre que la mise en oeuvre de cette m6thode de transmission conditionnelle permet une compression eMcace du vecteur de mouvement associ6e une faible d6gradation de I'image reconstruite.
Keywords. Block matching algorithm, motion vector, prediction error, average code length, entropy, PSNR.
I. Introduction In general, it is well understood that interframe prediction is very effective for TV signal coding. In order to improve the coding efficiency further, motion compensation has been successfully applied to interframe prediction. The block matching algorithm (BMA), where motion vector detection is carried out on a block-by-block basis, shows very high efficiency when applied to video telecon0165-1684/89/$3.50 © 1989, Elsevier Science Publishers B.V.
ference sequences. In motion compensation based on the block matching method, the motion vector is transmitted along with the prediction error. Then, it is very important to represent the motion vector and the prediction error with as small amount of information as possible by using efficient coding schemes. The information rate for the motion vector varies a great extent and it sometimes shares, for example, 40% of transmitted information amount for codecs at bit rate of
260
W.-Y.. Choi, R.-H. Park / Conditional coding of motion vectors
384 Kbits/s [4]. Thus efficient coding of the motion vector is important. However, most research has been devoted to compressing the prediction error. In this paper we describe various compression methods for the motion vector for coding systems at around 384 Kbits/s. First, based on the efficient coding methods of the motion vector, proposed by the Specialists Group on Coding for Visual Telephony (CCITT SG XV) [2], we describe six entropy coding methods of the motion vector in Section 2. To increase the compression ratio of the motion vector further, we propose two schemes. One is the method in which compression efficiency can be obtained by smoothing the motion vector. The other is the method in which compression efficiency can be obtained by transmitting the motion vector conditionally. Each scheme is presented in Sections 3 and 4, respectively. As a result of simulation shown and discussed in Section 5, we get the highest compression ratio by transmitting the smoothed motion vector conditionally. And it is expected that if this method is combined with buffer control techniques for transmitting the prediction error, a better performance can be obtained. Conclusion and summary are given in Section 6.
2. Description of investigated motion vector coding algorithms In this section, we first consider the efficient coding methods for transmitting the motion vector proposed by C C I T T S G X V . The proposed methods are classified into two categories: Fixedlength coding methods and variable-length coding methods [2]. In the latter, either the magnitude of the motion vector or the difference of the motion vector should be coded. Iinuma et al. [4] have applied entropy coding to motion vector coding by using the property that the distribution of differential motion vectors is concentrated on the value of zero. To encode the motion vector, Koga and Ohta [6] have used the following three methods: Signal Processing
(1) Two-dimensional coding. A motion vector is represented by a single code set. (2) One-dimensional coding I. A motion vector is represented by two codewords. Horizontal and vertical components of the motion vector are coded with different code sets. (3) One-dimensional coding 2. A motion vector is represented by two codewords. Horizontal and vertical components of the motion vector are coded with the same code set. According to their simulation results, twodimensional coding is slightly more efficient than one-dimensional coding 1 and the latter is slightly more efficient than one-dimensional coding 2. In this paper we use the three-step search method [5] to obtain the motion vector. It is assumed that each frame is divided into 8 x 8 blocks and the motion compensation range of each block is +7 pixels/frame and +7 lines/frame, i.e., there may be 225 kinds of motion vectors within this range. To simulate the entropy coding of the motion vector, we adopt six methods including the abovementioned three methods. And we use Huffman coding [3, 7] to encode the codeword of the motion vector. The six methods are as follows:
Method 1. Two-dimensional coding: The motion vectors are coded with the same code set (-7,-7),
(-7,-6),
...
(-7,6),
(-7,7)
(7, - 7 ) ,
(7, - 6 ) ,
...
(7, 6),
(7, 7)
Method 2. Two-dimensional coding of the differential motion vector: The differential motion vectors between two sequential frames are coded with the same code set (-14,-14),
(-14,-13),
...
(-14,13),
(-14,14)
(14,-14),
(14,-13),
...
(14,13),
(14,14)
Method 3. One-dimensional coding 1: Horizontal and vertical components of the motion vector are
W..-Y. Choi, R.-H. Park/Conditional coding of motion vectors
coded with two different code sets Horizontal component: -7, -6,...,
6, 7.
Vertical component: -7,-6,...,6,7.
Method 4. One-dimensional coding 2: Horizontal and vertical components of the motion vector are coded with the same code set
261
motion vector, we obtain the smoothed motion vector and increase the correlation between motion vectors. Therefore we can improve the compression ratio of the motion vector. In contrast, prediction error also increases due to the noise in the encoding of the motion vector. I f the increasing amount of prediction error does not exceed the fixed threshold, we can use this method for encoding the motion vector. The proposed method for smoothing the motion vector is described as follows:
Step 1. Motion vectors are obtained by applying
Horizontal and vertical components:
BMA.
-7, -6,...,6,7.
Method 5. One-dimensional coding of the differential motion vector I: Horizontal and vertical differential components of the motion vector are coded with two different code sets
Step 2. Detect the m a x i m u m number of occurrences in the 3 x 3 window of the motion vector domain. Step 3. If MAX > T1 then MV'
= MVma x
else MV'= (0, 0),
Horizontal differential component: -14, -13,...,
13, 14.
Vertical differential component: -14, -13,...,
13, 14.
Method 6. One-dimensional coding of the differential motion vector 2: Horizontal and vertical differential components of the motion vector are coded with the same code set Horizontal ponent:
and
-14, -13,...,
vertical
differential
com-
13, 14.
3. Motion vector smoothing algorithm The motion vectors obtained by block matching algorithms are determined simply by the value of error function. Accordingly the motion vector may be different from the real motion in the image. And each motion vector may have less correlation with its neighboring motion vectors. If we smooth the
where MV' is the smoothed motion vector, MAX is the m a x i m u m number of occurrences, and M V m a x is the motion vector which has the m a x i m u m number of occurrences. In this step, we smooth the non-zero motion vector detected by BMA in the stationary area due to noise or small luminance gradient. Step 4, If the mean absolute value of block difference between previous and current frames is greater than the threshold value T2, this block retains the motion vector obtained in Step 3. Otherwise, this block has zero vector. The results obtained by applying the above method to the frame of the "Miss America" sequence are shown in Fig. 1. The motion vector obtained by BMA is shown in Fig. l(a) and its smoothed motion vector is shown in Fig. l(b). The three-dimensional representation of the distribution of the motion vectors in Figs. l(a) and l(b) are displayed in Figs. l(c) and l(d), respectively. In these figures, most of the distribution of the motion vector is concentrated on the zero vector. Therefore we can improve the compression efficiency by applying entropy coding to the smoothed motion vectors. Vol. 18, No. 3, November 1989
W.- Y. Choi, R.-H. Park / Conditional coding of motion vectors
262 ~'~XT-
"[''-C--'~'-~"'~
7-~. . . . . . . .
tiT.;
...........................................
)4"/Xi~'I"1 I,,~(~, I.-fl IxL". )~ ;~I,:,:,V.F : ' ". :... I
'~¢L~' ~-I.I .............. 4 t ...... "11 ,, ,i,N
'~'~"'
U,l~/~ \I
F: h ~i [~x~'.': :',-~:'::',',:t~' ":~,'~.;'j ,,,..,,,. .............
, I..
"~
.r,,f ..................
:'r'-\~
~-'
....
,
.....
'-~ ,,:.::!::::::;;:;',:
~\I
I'll...~.
i ......
/~'t ':l
'
....
~,. 1',1'
"'""t"
................. ......... "
II
' .....
-;1\11 :':v:,.:'-I,',
x .....
.,.(.: :'~.i~- /,;tl ~ji:;:-ii!ikllt ~tt~):':::;:~a~ l Xtkll/-..4:/~'J~l:'{"}::i:i!! i ::i:!!:?:!:::)} li ii ~!!;~,~ I'~ Ifl I R/X 7i F:4: t k?'t I l,ftl IIM.~ .: iiii:i:!! I!!iiii:i:::tt~l i!! ii
~TIONV~CTO~ I~0~
7
c
~TIONVECTOR HISTO(~
7
7
[
7
td
-
-
,
Fig. 1. Two- and three-dimensional representation of the distribution of the original motion vector (shown in (a)) and the smoothed motion vector (shown in (b)) with threshold values T~= 2 and Tz = 3.
4. Conditional transmission coding of the smoothed motion vector
c o r r e s p o n d i n g block o f the previous frame and denote this value by MSE.
Step 2. Calculate the mean squared difference As explained in the previous section, we can compress information to be transmitted by smoothing the m o t i o n vector. H o w e v e r the p r o b l e m that prediction error also increases occurs. To solve this problem it is desirable if we can reduce the increasing rate o f prediction error as m u c h as possible and simultaneously increase the compression rate of the s m o o t h e d m o t i o n vector. To implement this idea, we use conditional transmission o f the m o t i o n vector. In this section, we use mean square prediction error as a criterion o f determining whether the m o t i o n vector should be transmitted or not. The decision procedure is given below. Step 1. Calculate the m e a n squared difference between the block o f the current frame and its Signal Processing
between the block o f the current frame and the block o f the previous frame which is displaced by the motion vector and denote this value by MSE'.
Step 3. MV =/transmitted, t n o t transmitted,
if ( M S E - M S E ' > Tr), otherwise,
where MV is a motion vector and Tr is a rejection threshold value. For m o t i o n vectors which are determined to be transmitted, two messages, the c o d e w o r d and the mc-bit, are transmitted. The mc-bit represents whether the m o t i o n vector will be transmitted or not. For m o t i o n vectors which are determined not to be transmitted, only the mc-bit is transmitted.
W.-Y. Choi, R.-H. Park / Conditional coding of motion vectors
First, we apply the six Huffman coding methods to the original motion vector. The average code length of the motion vector over the whole sequence for each method is shown in Table 1 with the three fixed D C T coding bit rates of prediction error. According to Table 1, we get the highest compression ratio by using Method 1 (twodimensional coding). The same six coding methods have been applied to the smoothed motion vectors as obtained with the algorithm of Section 3. When we fix D C T coding bit rate of prediction error at 0.3 bits/pixel with error feedback and vary the threshold values T~ and 7"2 from 2 to 3, the sequence average code length of the smoothed motion vector is shown in Table 2. In this simulation result, we also obtain the best result by using Method 1 (two-dimensional coding). In the case of Method 1, for example, the average bit rate of the smoothed motion vector is 1.0634bits/block for "Claire" image and 1.0794 bits/block for "Miss America" with threshold values T~ = 2 and T2 = 3. By the way, due to the smoothing of the motion vector, the entropy of prediction error increases. Since the increasing amount of entropy is 0.42 bits/pixel for "Claire" image and 0.39bits/pixel for "Miss America" image, respectively, the total amount of information for both motion vector and prediction error increases. To solve this problem we simulate the conditional transmission method. We vary the rejection threshold value Tr for the mean square error of each block to determine whether the motion vector
If the mc-bit corresponds to the case of no transmission, the receiver only replaces that block with the block of the previous frame. This means that an mc-bit signifies the zero motion vector. Therefore, when we encode the motion vector by entropy coding, we assign the codeword of the zero motion vector to the motion vector which is determined not to be transmitted and we do not need to send the mc-bit.
5. Simulation results We apply motion vector encoding methods explained in Section 2 to the 30 flames of "Claire" and "Miss America" sequential images with C I F format, respectively. We use the three-step search method proposed by Koga et al. [5] to get motion vectors and take account of the error feedback of prediction error which occurs in any interframe coder. The prediction errors are transmitted by D C T coding with a Laplacian uniform quantizer and the D C T coding bit rate of prediction error fixed (0.2bits/pixel, 0.3bits/pixel, and 0.4bits/pixel). The bit allocation map for D C T coefficients of 8 x 8 subblocks is calculated by using b; = M ~ n +
263
log2 ~y~- n i= 1
where b; is the bits allocated to a set of n variables (in the present case, D C T coefficients of a subblock) with known variances o-;2and M is the total number of bits [1].
Table 1 Sequence average code length of original motion vectors "Claire" sequence Bitrate (bits/pixel) Method 1 (bits/block) Method 2 (bits/block) Method 3 (bits/block) Method 4 (bits/block) Method 5 (bits/block) Method 6 (bits/block)
0.2 4.34519 5.32028 4.65549 4.68693 5.63428 5.67200
0.3 4,11827 5,13281 4,45917 4,49042 5,43203 5.47348
"Miss America" sequence 0.4 3.88585 4.89064 4.25362 4.28269 5.17641 5.21612
0.2 5.94087 6.92839 6.28384 6.38063 7.32297 7.36202
0.3 5.85165 6.85369 6.19800 6.29738 7.26129 7.31067
0.4 5.81525 6.81476 6.15905 6.25745 7.21574 7.27304
Vol. 18. No. 3, November1989
W.-Y. Choi, R.-H. Park / Conditional coding of motion vectors
264 Table 2
Sequence average code length of smoothed motion vectors. (DCT coding bit rate of prediction error = 0.3 bits/pixel) "Claire" sequence Threshold
( Tt, T2) Method Method Method Method Method Method
1 (bits/block) 2 (bits/block) 3 (bits/block) 4 (bits/block) 5 (bits/block} 6 (bits/block)
(3, 2)
(3, 3)
(2, 2)
(2, 3)
(3, 2)
(3, 3)
(2, 2)
(2, 3)
1.0299 1.0549 2.0195 2.0262 2.0398 2.0428
1.0287 1.0529 2.0191 2.0259 2.0380 2.0412
1.0763 1.1144 2.0547 2.0667 2.0785 2.0821
1.0634 1.1013 2.0474 2.0574 2.0689 2.0734
1.0570 1.1016 2.0347 2.0501 2.0658 2.0747
1.0333 1.0611 2.0193 2.0299 2.0409 2.0469
1.1411 1.2408 2.0987 2.1159 2.1629 2.1759
1.0794 1.1309 2.0567 2.0704 2.0911 2.1005
of the block should be transmitted or not. The average n u m b e r of motion vector determined not to be transmitted for various rejection thresholds Tr is normalized to the total number of blocks ((288 × 360)/(8 × 8) = 1620) and is shown in Fig. 2. From this result, we see that for a codec with error feedback, the increasing rate of motion vectors determined not to be transmitted is not linearly
Y
1Z
3
4
b
5
/
~
~
bieGle #ate4.5 ¢-x
b
0
i
2 3 i
5 6)
8 ~
Fig. 2. The average percentage of motion vectors not to be transmitted for various rejection threshold values Tr over the whole sequence, for image sequences: (a) "'Claire"; (b) "Miss America". Signal Processing
"Miss America" sequence
proportional to the increasing rate of the rejection threshold Tr; that is, the greater the rejection threshold value is, the less the increasing rate of the number of motion vectors determined not to be transmitted is. As explained before, the motion vector which is not transmitted can be regarded as the zero vector when encoded by entropy coding. By assigning such motion vectors to the zero vector, we get an average code length which is not linearly proportional to the increasing rate of the rejection threshold Tr. When the six entropy coding methods are applied to the motion vector, the average code length of the motion vector over the whole sequential images is shown in Fig. 3 with a D C T coding bit rate of prediction error equal to 0.3 bits/pixel. Similar results are obtained when the D C T coding bit rate of prediction error is equal to 0.2 and 0.4 bits/pixel. With the rejection threshold Tr increasing, the average code length of the motion vector decreases. In contrast the prediction error increases. Although the interframe prediction errors increase, the increase of error magnitude does not imply more bits in coding (say, using DCT). For example, the bit rate produced by a quantizer in D C T coding may not be affected if a small increase of the error magnitude is not over a quantization step. To test this idea, in simulation we fix the D C T coding bit rate of prediction error to 0.2, 0.3, and 0.4 bits/pixel and consider the various bit rates of the motion vector for various rejection thresholds Tr. Since we fix the D C T coding bit rate of predic-
W.-Y. Choi, R.-H. Park Conditionalcoding of motion vectors
z 5 bitrate=O,3
shown in Fig. 4. When prediction error is transmitted with a D C T coding bit rate of 0.3 bits/pixel and the rejection threshold T, is varied from 0 to 9, in comparison to the entropy coding schemes applied to the original motion vector, the average bit rate for motion vector coding can be reduced by 62.16-71.38% for "Claire" image and 27.4971.28% for "Miss America" image, respectively. The PSNR of the reconstructed image is reduced by 0.058~0.196 dB for "Claire" and 0.141-0.739 dB for "Miss America", respectively. Therefore by using the conditional transmission method of the smoothed motion vector, we can compress the motion vector greatly with slight degradation in the reconstructed image. As a subjective measure of quality of the reconstructed pictures, the last reconstructed frames of "Miss America" sequence for various rejection threshold values Tr with 0.3 bits/pixel of prediction error are shown in Fig. 5. The sub-picture
n method1 o method2 method,3 method4 x method5 v method6
5~
÷
4
.
.
.
.
,
0123456?89
THRESHOLD 8b z ~\ o \\ 0 6~X\\ n 5 \\\\
u method1 + method2 o method,3, A method4 x method5
~4
21ibitrate=O,3 FI
265
39~
23456789
THRESHOLD Fig. 3. Sequence average motion vector code length for various rejection thresholds Tr for: (a) "Claire"; (b) "Miss America". tion error, the total amount of information to be transmitted varies according to the amount of the motion vector for various threshold values. However, the quality of image will be degraded. To measure the picture quality, we calculate the PSNR by using
C[?f
bitrate:O.5
-0~. 3,(
bitrate=O 2
37: 1 2 5 4 5 6 7 8 9
;~ESHC[D
392'
PSNR = -10
(G(j, k) - G(j, k)) 2) .
x Ioglo ,j=l k=l
A2
JK
where G(j, k) is the original image and G(j, k) (size: J x K ) is the reconstructed image. And A represents the m a x i m u m value of G(j, k), that is, 255 if G(j, k) is represented by 8 bits. The average PSNR over the whole reconstructed images, showing the degree of degradation of the reconstructed image with various rejection thresholds Tr, is
3?.7
3l.' b i 2 3 4 5 6 ? 8 THRESHC[D Fig. 4. Sequence average PSNR of reconstructed images for various rejection thresholds Tr for: (a) "'Claire"; (b) "Miss America". Vol. 18, No. 3, November 1989
266
W.-Y. Choi, R.-H. P a r k / Conditional coding of motion vectors
Fig. 5. The last reconstructed frames of "Miss America" sequence for various rejection thresholds Tr. (DCT coding bit rate of prediction error = 0.3 bits/pixel.) " f r a m e b y o r i g i n a l M V & P E " in Fig. 5 is a p i c t u r e r e c o n s t r u c t e d b y original m o t i o n vector a n d corres p o n d i n g p r e d i c t i o n error. We can h a r d l y see the n o t i c e a b l e d e g r a d a t i o n in the r e c o n s t r u c t e d i m a g e s even with large rejection t h r e s h o l d s Tr. Signal Processing
6. Conclusion In this p a p e r , t h r o u g h c o m p u t e r s i m u l a t i o n , c o m p a r i s o n o f various e n c o d i n g schemes o f the m o t i o n v e c t o r is m a d e from the view p o i n t o f the
W.- Y. Choi, R.-H. Park / Conditional coding of motion vectors
encoding efficiency of the motion vector. According to the simulation results of the entropy coding of the motion vector based on the statistics or correlations, two-dimensional coding of the motion vector is superior to the other methods. In the case of the entropy coding of the smoothed motion vector, the amount of information of the motion vector only can be reduced. However, due to the increase of prediction error and the degradation of a reconstructed image, the whole information to be transmitted also increases. To solve this problem we propose the conditional transmission method of the smoothed motion vector in consideration of the relation between the information increase of prediction error and the information decrease of motion vector. In this method, if the motion vector is the zero vector or the motion vector becomes the zero vector by smoothing, the motion vector does not need to be transmitted. As a result of computer simulations, we can compress the motion vector greatly with a slight degradation in the reconstructed images. If we transmit the prediction error through buffer control and the
267
motion vector by varying the threshold value, we may compress further whole information efficiently with good quality of the reconstructed image.
References [1] R.J. Clarke, Transform Coding of Images, Academic Press, New York, 1985, pp. 163-183. [2] CCITT SG XV, Meeting Report 5, Specialists Group on Coding for Visual Telephony, Tokyo, Japan, March 1986. [3] R.W. Hamming, Coding and Information Theory, 2nd ed., Prentice-Hall, New Jersey, 1986. [4] K. Iinuma et al., "A motion-compensated interframe codec", SPIE. Vol. 594, Image Coding, December 4-6, 1985, pp. 194-201. [5] T. Koga et al., "Motion-compensated interframe coding for video conferencing °', Proc. Nat. Telecommun. Conf., G5: 3.1-3.5, 1981. [6] T. Koga and M. Ohta, "Coding of motion vector information", Picture Coding Syrup. (PCS '87), 1987, pp. 138-139. [7] T. Koga and M. Ohta, "Entropy coding for a hybrid scheme with motion compensation in subprimary rate video transmission", IEEE J. Selected Areas Commun., Vol. SAC-5, August 1987, pp. 1166-1174.
Vol. 18, No. 3, November 1989