Moving picture coding system for digital storage media using hybrid coding

Moving picture coding system for digital storage media using hybrid coding

Signal Processing: Image Communication 2 (1990) 109-116 Elsevier MOVING PICTURE CODING USING HYBRID CODING SYSTEM 109 FOR DIGITAL STORAGE MEDIA ...

789KB Sizes 0 Downloads 11 Views

Signal Processing: Image Communication 2 (1990) 109-116 Elsevier

MOVING PICTURE CODING USING HYBRID CODING

SYSTEM

109

FOR DIGITAL STORAGE

MEDIA

Atsushi N A G A T A , Ikuo I N O U E , Akiyoshi T A N A K A and Nobuyasu T A K E G U C H I Image Technology Research Laboratory, Matsushita Electric Industrial Co., LTD, 1006, Kadoma, kadoma-shi, Osaka, 571 Japan

Received 20 January 1990 Revised 5 March 1990

Abstract. A coding system using a hybrid coding method with motion compensated interframe DPCM, Discrete Cosine Transform and frame interpolation methods was examined as a movingpicture coding system for digital storage media such as CD-ROM. The encoder, reducing the frame frequencyof an input picture by half, performs hybrid coding, and the decoder performs frame interpolation to give a playback picture. It was verified that by using this technique before coding, S/N about 1.5 dB better than by direct coding could be obtained in some frames. In frame interpolation, overwriting by changing motion compensated block size has solvedthe problem of unoverwritten area. Localdistortion whichis a problemin frameinterpolation is reduced by detecting distortion and coding with an encoder. Information quantity necessary for this is about 10 % of overall quantity. In intraframe coding, control of quantization step size by activity has improved the picture quality in parts with small amplitude of luminance. Keywords. Video coding for multimedia retrieval systems, hybrid coding, frame interpolation.

1. Introduction Active research of moving picture coding system by about 1 Mbps has been carried out these years aimed at digital storage media. Further, in ISOIEC J T C 1 / S C 2 / W G 8 , international standardization of coding system is under way. On the other hand, in communication system, coding system by hybrid coding is proposed and examined as a reference model (RMS) [3]. This is a combination of motion compensated interframe prediction and orthogonal transformation, an effective method in efficient coding of moving picture [1, 2]. The following special playback functions are required in digital storage media different from those for communication system. (1) R a n d o m access, (2) High-speed search (forward and reverse), (3) Reverse playback. 0923-5965/90/$03.50 © 1990- Elsevier Science Publishers B.V.

This paper is a result of examination of digital storage media coding system using cyclic infraframe coding to realize the above special playback functions and further using frame interpolation for improvement of picture quality as a base o f RM8 under examination in communication system. Outline of coding system is described in Section 2. Coding algorithm featuring cyclic intraframe coding and frame interpolation system are described in Section 3. Simulation results are shown in Section 4.

2. Overview of coding system Shown in Fig. 1 is a block diagram of examined moving picture coding system for digital storage media. The input picture of an encoder have 352 x 240, which is decimated from digital TV signal of

A. Nagata et al. / Moving picture coding system using hybrid coding

110 (1)ENCODER

3. Coding algorithm

3.1. Group of frame

IN - -

V

OUT

L

FI:frame interpolator FM:frame momory DD:error detector ~ : m o t i o n vector ditector LP:Ioop filter

(2)DECODER

Fig. 1. Block diagram.

CCIR601 format by a decimation filter. The frame frequency is 30 frame/sec. The encoder calculates a motion vector used for interframe prediction and frame interpolation from the input picture. The input sequence with frame rate reduced by half, whose frame is thinned out to a half, is coded by motion compensated interframe predicted DCT. However, coding all of the frames by motion compensated interframe predicted DCT results in difficulty for the special playback function shown above, so that cyclic intraframe coding is performed. The frame coded by interframe predicted DCT is called 'Interframe', and that coded within the frame 'Intraframe' hereafter. The decoder, decoding the frame, composes the interpolation frame by motion compensated frame interpolation. The motion vectors calculated in the encoder is applied to frame interpolation. Then, distortion in interpolation frame is corrected by distortion information of the interpolation frame coded by the encoder. After that, an output picture signal can be obtained by conversion to CCIR601 format using an interpolation filter. Signal Processing: Image Communication

Motion compensated interframe predicted coding is an effective method for moving pictures. However, when all the frames are coded by interframe prediction, random access and high-speed search are hard to achieve. Cyclic intraframe coding is effective in achieving such special playback. The shorter the cycle is, the more advantageous it is for random access or high-speed search. In random access, the intraframe before and nearest to the object frame is decoded first, then interframes are decoded in order until they reach the object frame. Therefore, the shorter the gap between intraframes is, the shorter the access time is (maximum time required from starting access operation to regenerating an object picture.) However, 3 to 5 times the information quantity is required in intraframe coding to obtain a picture quality. Taking into account these two reasons, the cycle of intraframe coding is once per 10 frames of input picture, i.e., three times per second. Figure 2(1) shows the construction of a group of frames.

3.2. Intraframe coding In intraframe coding, DCT is performed for every block of 8 x 8 pixels, and the coefficients are quantized and variable-length coded. The same variable-length code as in RM8 is used. High-speed search playback is realized by reading only intraframes out of the digital storage media. Information quantity read out of digital storage media is constant per unit time, the same as in normal playback. Therefore, the information quantity in each intraframe must be controlled below a specified value in encoding. In controlling information quantity in encoding, when the same method as in RM8 is used which control quantization step size by residual output buffer quantity, problems may occur. Coded picture qualities differ between blocks with similar

A. Nagata et aL / Moving picture coding system using hybrid coding 1/30 s e c - ~ l Frame No.

nO

~1-

n2 nl

T

n4 n3

n6 n5

n8 n7

1

(1)

1 A C B C B C B C B C A C B C B

Frame No.

nO

th3 < Act.

1

th2 < Act. <= Ih3

2

t h l < Act. <= Ih2

3

n2

n4 n3

n6 n5

n8 n7

n9

F i g . 3. R e l a t i o n s h i p

between

4

Act

and

Q_Class.

n l 0 n12 n14 n l l n13

(2)

Frame Types

Q_Class

Act. <= t h l

Frame Types

nl

R A N G E of Act.

n l 0 n12 n14 n9 n l l n13

111

A B B B B B B B B B A B B B B A: intraframe B: interframe C: interpolation Fig. 2. G r o u p o f frame.

images within a frame. This is because step sizes are determined only by residual buffer quantity. For example, when a block of which the luminance amplitude is far greater than the average appears, residual buffer quantity increases, as a result quantization step size in subsequent blocks, become large. Blocks with large luminance amplitude have large AC energy, so that high frequency DCT coefficients are large. As a result, the coded information quantity becomes large. In this coding system, activity in each block is examined, and quantization step size is set large in block with large activity, and it is set small in those with small activity. It reduces the quantization step size when luminance amplitude is estimated to be small in blocks with small activity. First, Base_Q_Step as a reference is specified. Base_Q_Step is constant in a frame. Output information quantity of each frame is controlled by controlling Base_Q_Step. Then, activity (Act) in each block is calculated, by which quantization class Q_Class is determined. Relation between Act and Q_Class isshown in Fig. 3.

Figure 3 shows the relation between activity and quantization class. When Base_Q_Step and Q_Class are determined, quantization step size Q_Step in each block is calculated by

Q_Step = Base_ Q_Step/Q_ Class.

(1)

With this, the quantization step is made small where the luminance amplitude is small, and the quantization step is made large where the luminance amplitude is large. It gives a good decoded picture in those parts where the luminance amplitude is small. Although picture quality is deteriorated in the parts with large amplitude, precedence of picture quality in parts with small amplitude brought about better results in subjective estimation. Further, the blocks with similar images in the same frame have the same Q_Class, so that quantization step sizes are equal as well as coded picture qualities are equal.

3.3. Interframe coding In Interframe, motion compensated block size is 16 x 16 and DCT block size is 8 x 8. It differs from RM8 in the following points. Motion vectors used for motion compensation are obtained from the encoder input picture. In RM8, motion vectors are obtained between the local decoder output picture and the input picture from the viewpoint of minimizing the prediction error, while in the coding system motion vectors are obtained between input pictures. This is because a motion vector accurately representing actual motion is required since it is also used in the frame interpolation mentioned in Section 3.4. Vol. 2, No. 2, August 1990

A. Nagata et aL / Moving picture coding system using hybrid coding

112

actual motion and unsuitable for frame interpolation.

3.4. Frame interpolation

1"12

................

Inter#ame

Interpolation Frame Interframe

Fig. 4. Motion compensated frame interpolation. The motion vector obtained using the local decoder output picture has a minimum prediction error, while the local decoder output is a picture including distortion by coding, so that it does not always represent an actual motion. Especially, when compression rate by coding is high, namely when information quantity per frame is small, local decoder output picture has a large distortion. In this case, the motion vector obtained by use of the local decoder output picture is different from

Interpolation frames are composed using frame pictures and motion vectors before and after. Figure 4 shows the principle of motion compensated frame interpolation. n 1 indicates a frame where frame interpolation is performed (Interpolation frame), nO is the previous frame, and n2 is the next frame. When the motion vector of block B2 on frame n2 is indicated by Vp, and block B2 moved along vector Vp onto frame nO is indicated by B0, block B1 on interpolation frame nl is calculated by

B l ( i - u / 2 , j - v/2) =0.5xBO(i-u,j-v)+O.5×B2(i,j),

where (u, v) is a component of vector Vp. Interpolation frame n l thus obtained leaves some part not interpolated (gap). Shown in Fig. 5 is the part of gap made in frame interpolation. Pixels in the same position in frame nO or n2 may

Fig. 5. Not interpolated area. Signal Processing:

ImageCommunication

(2)

A. Nagata et al. / Moving picture coding system using hybrid coding be used to simply fill up the part, but a part with smooth movement appears as distortion. In this coding system, hierarchical frame interpolation is performed in the procedures shown below.

113

.................. 1 7 . . . . . . . . . . . . . . . . . . . . . . . . . n2

I

STEP 1. Frame nl is filled up with the average ol corresponding pixels in frame nO and n2.

I

.................~

STEP 2. Frame n l is divided into four areas, average motion vector in each frame is obtained, and motion compensated frame interpolation is performed in each area using the average motion vector.

Interframe

Fig. 6b. F r a m e interpolation (Step 2).

nO

STEP 3. Motion compensated frame interpolation is performed in each block (16 x 16 pixels).

nl

n2

Frame interpolation through Steps 1 to 3 is shown in Fig. 6. Interpolation frame n 1 is overwritten in the above 3 steps. Motion compensation is not done in Step 1, which generates no gap in interpolation frame n 1. However, resolution deteriorates in a part with motion. In Step 2, motion compensated frame interpolation compensating panning and zooming is performed, which allows interpolation with no deterioration of resolution of background. Some parts are not interpolated in the periphery of a frame, but they are already interpolated in Step 1. Frame interpolation corre-

Interframe

m ~

16 B : t

.................~

Interframe

Interpolation Frame Interframe

Fig. 6c. F r a m e interpolation (Step 3).

sponding to block 16x 16 pixels is performed in Step 3.

3.5. Interpolationframe coding nl pn0(a, b) n2

Pn2la, b)

..................~

Interlrame

Interpolation frame Interframe Pnl(a,b) = 0.5*(PnO(e,b) + Pn2(a,b))

Fig. 6a. F r a m e interpolation (Step l).

Frame interpolation described in Section 3.4 provides a good interpolation frame in most of the pictures, while distortion occurs in some of the following pictures. (1) Some objects move in different direction in a motion compensated block ( 1 6 x 1 6 pixels). (2) The shape of a moving object changes. (3) Background appears from behind a moving object or a moving object covers the background. (4) Motion includes rotation. Vol. 2, No. 2, August 1990

114

A. Nagata et al. / Moving picture coding system using hybrid coding

Since no correct motion vector exists in such cases, frame interpolation is performed by a wrong motion vector, resulting in distortion in some interpolation frames. Distortion due to the above causes appears rather uncommonly on the whole screen, and local large distortion is usually caused. In this system, a block with large distortion is directed beforehand by an encoder, and distortion in the block is coded and sent to a decoder, thereby correcting distortion in an interpolation frame. Area requiring distortion correction has been examined using simulation. Frame interpolation is performed using a test picture, distortion in an interpolation frame is evaluated in block (8 x. 8) unit, and distortion in N block is corrected starting from a block with large distortion. Figure 7 shows the value of a block with the worst S/N. Normal frame S/N evaluation method does not allow local S/N evaluation, so S/N is evaluated by 44

30

snrmin = min min snr(x, y), x=l

(3)

y=l

50

/•,,...,

251

~

40

3O

z

2O

/,o -.~

,

-

/

lO

o 11o

12o FRAME

13o NUMBER

Fig. 7. Worstblock S/N characteristics. Signal Processing:

ImageCommunication

140

where snr(x, y) is S/N of a block in position (x, y) in a frame, snr(x, y) is evaluated by snr(x, y) = 20 log(255/x/~ e2/64).

(4)

snrmin means the value of a block with the worst S/N in a frame. According to this, correction of distortion in 128 blocks (area of approx. 9.7%) provides S/N of approximately 30 dB. In this test picture, correction of 64 blocks corrects most visual distortions. In this coding system, position of a block to be corrected is coded in run length code, and distortion in interpolation frame in DCT, respectively. Quantity of this information is 6000 (bit/frame).

4. Simulation

Coding by reducing the frame frequency by half reduces the quantity of output information from a decoder. In other words, with equal quantity of output information, reducing out a frame frequency by half reduces distortion due to coding. However, when motion compensated interframe predicted coding is used, factors such as the decrease of correlation between frames and increase of motion vector information quantity contribute to increase the output information quantity. Further, deterioration of picture quality in an interpolation frame should be considered as well. To verify the above points, comparison of the following two cases is made by simulation. (1) This coding system where an input picture thinned out to be 15 (frame/sec) is coded, and frame interpolation is performed by a decoder. (2) An input picture of 30 (frame/sec) is directly coded. Two sequence, (a) 'TABLE TENNIS' and (b) 'FLOWER GARDEN', are used. In (a), background is stationary, and only one person is moving, while in (2), scenery (flower garden) is panned. Figure 2(1) shows the structure of frame in coding at 15 (frame/sec), and Figure 2(2) shows that in coding at 30 (frame/sec), respectively. Output

A. Nagata et aL / Moving picture coding system using hybrid coding

115

40 ....

39-

38m

37-

36-

/

;i

\,

,'/

\', ,'1

"1

,i

,



v

~,

;I

\/

"

\,

'

"

SEQUENCE : T A B L E TENNIS 'CODING IN 15 frlan~/se¢ + FRAME INTERPOLATION - - - - ~ CODING IN 30 frlime/sec

110

I

I

120

130

140

FRAME NUMBER

30

29

28 co .-u

~'~',,

=_. 27

V

"\ \\ \ \

26-

\

%

V

I

v

l

I

I I I l

\,./ /

/

I

~

I I

",~

,



25"

V

SEQUENCE : FLOWER GARDEN CODING IN 15 frlimelsec + FRAME INTERPOLATION m - - - - CODING IN 30 frlime/se¢

I

\

I

~J I

|

10

20

30

FRAME NUMBER

Fig. 8. Frame S/N characteristics. Vol. 2, No. 2, A u g u s t 1990

A. Nagata et aL / Moving picture coding system using hybrid coding

116

information quantity is controlled to be a specified information quantity independently in intraframe, interframe and interpolation frame. They are approximately 900 Kbit/sec. Their items are

frame periods (1 see). Coding in 15 (frame/see) provides S/N 0.2-1.6 dB better.

5. Conclusion (1) Coding in 15 frame/sec + Frame interpolation Intraframe: 135,000 bit/frame x 3 frames, Interframe: 33,750 bit/frame x 12 frames, Interpolation frame: 6,000 bit/frame x 15 frames, Total sum: 900,000 bit/sec. (2) Coding in 30 frame/see Intraframe: 135,000 bit/frame x 3 frames, Interframe: 18,333 bit/frame x 27 frames, Total sum: 900,000 bit/see.

Moving picture coding system for digital storage media using motion compensated interframe predicted DCT and frame interpolation has been examined. Periodical intraframe coding allows easier random access and high-speed search. Also, reducing the frame rate by half and performing frame interpolation in decoding have been found to be effective. Further, it has been confirmed that local distortion which is a problem in frame interpolation can be improved by detecting and coding it with an encoder, and correcting it with a decoder. A method controlling information quantity for each frame will be examined in future.

Acknowledgment Figure 8 shows comparison of S/N. In intraframe, coding in 15 (frame/see) provides a better S/N. The difference is significant especially in the interframe directly before the intraframe, amounting to 1.0-1.5 dB. S/N in an interpolation frame is lower than a frame before and behind. This is chiefly because of decrease of resolution in the interpolation frame. Figure 9 shows an averaged S/N in 30

Coding rate

TABLE TENNIS

FLOWER GARDEN

15 frame/see 30 frame/see

37.09 dB 36.86dB

27.01 dB 25.39dB.

Fig. 9. AveragedS/N in 30 frameperiod.

Signal Processing: Image Communication

The authors wish to thank Dr Takahashi of Matsushita Image Technology Research Laboratory for fruitful discussions and for suggesting the frame interpolation method.

References [1] A. N. Netravali et al., "Motion-compensated television coding", Bell Syst. Techn. Z, Vol. 58, No. 3, March 1979, pp. 631-670. [2] T. Koga et al., "Motion compensated interframe coding for video conferencing", NTC '81, G5.3.1-5, December 1981. [3] CCITT SGXV Working Party XV/4 Specialists Group on Coding for Visual telephony, Document #525 1989.