Signal Processing: Image Communication 2 (1990) 155-169 Elsevier
A CCIIT COMPATIBLE OF MOVING IMAGES
CODING
155
ALGORITHM
FOR DIGITAL RECORDING
F. P E R E I R A Instituto Superior T~cnico, Lisbon, Portugal
L. C O N T I N and M. Q U A G L I A Centro Studi e Laboratori Telecomunicazioni, Via Guglielmo Reiss Romoli, 274, 1-10148 Torino, Italy
P. D E L I C A T I universit6 La Sapienza, Roma, Italy
Received 22 January 1990 Revised 19 April 1990 Abstract. This paper describes the work carried out in CSELT with the aim of providing a sensible solution to the problem
of recording moving images on digital storage media. The starting point for the proposed algorithm has been the CCITT H.261 ReferenceModel, future standard for synchronoustransmissions at p × 64 kbit/s (p = 1,..., 30). The algorithm provides all the facilities required for recording purposes. Keywords. Digital storage media, recording facilities, CCITT H.261 algorithm.
1. Introduction
Real video images and sound are powerful tools to provide a more natural interface when presenting many different kinds of information in applications such as education, training and maintenance, entertainment, advertising and so on. Publishers and advertisers are ready to prepare multimedia documents exploiting all the possibilities of the multimedia concept; since late 1987, when the David Sarnoff Research Center presented its implementation of a universal all-digital medium, the expectation of low-cost, compatible, highperformant systems has increased every day. The obstacle towards reaching the third requirement is still represented by the moving video images: C D is certainly one of the most important physical media that experts have in mind as widely spread support for multimedia interactive documents and they just allow a throughput around 0923-5965/90/$03.50 © 1990- Elsevier Science Publishers B.V.
1.2 Mbit/s. At such a bitrate the known algorithms for moving image coding do not provide yet, in all shooting conditions, full satisfactory image quality. It must, moreover, be taken into account that the coding algorithm must include the possibility of accommodating features as reverse playback, random access, etc., already present in normal analogue recording devices, and possibly some new ones. Concerning the cost of the equipments, low prices are possible when mass production is reached and different applications may make use of the same integrated components which can be found on the market. To fulfil the described requirements, together with the appealing perspective of the compatibility issue, ISO has started in the M P E G subgroup the work for the definition of the standard coding algorithm to be used for recording moving images on digital storage media.
156
F.. Pereira et al. / A C C I T T compatible coding algorithm
This paper describes the work carried out in CSELT with the aim of providing a sensible solution to the problem. The starting point for the proposed coding algorithm has been the Reference Model (release RM8) which was defined in CCITT by the 'Specialists Group on coding for visual telephony' [4]. The reasons for this choice are manyfold but two of them are of basic importance: to have the maximum of commonality between the equipments used in telecommunications and consumer environments, and, even more conclusive, the fact that nobody presented up to now a different algorithm performing better than that. The original contributions described in the paper are formed by - - T h e modification of the RM8 scheme; the main effect of this is to make the temporal coding nearly symmetric to allow a good motion rendition in reverse playback. - - T h e use of a new method for the fast convergence of still images. The other important feature included in the coding algorithm, as it really provides improved performances in several video sequences, is the global motion compensation which is implemented according the method described in [1]. In the following the description of the proposed coding algorithm derived from RM8, with particular emphasis on the features suitable for future developments, is provided, and a few simulation results are presented.
2. The CCITT Reference Model The Reference Model represents the convergence point reached by the CCITT 'Specialists Group on Coding for Visual Telephony' after a four-year effort to find via computer simulation the best algorithm to code moving picture at fixed bitrates with the typical requirement arising from the telecommunications field of relatively low delay introduced by the co-decoding process. The coding technique is based on a hybrid DPCM/transform coder that allows bitrates in the Signal Processing: Image Communication
range p x 64 kbit/s (p = 1, 2 . . . . ,30) and accept video sources according to the Common Intermediate Format (CIF). The CIF is characterized by having a spatial resolution of 360 pixels per line and 288 lines per image (non-interlaced format) and a picture rate of 29.97 Hz.
2.1. The layered image structure In RM the redundancy reduction is achieved by means of a 'block coding' technique. Using a 'bottom-up' description, the structure of an image is the following:
1. Block A block is formed by partitioning the images into square non-overlapping matrices of pixels either of luminance or chrominance. The dimensions of the blocks are 8 x 8. 2. Macro Block (MB) Four contiguous luminance blocks (in a 2 x 2 arrangement) together with the spatially corresponding two chrominance blocks form a Macro Block (MB). 3. Group Of Blocks (GOB) To make the coding-parameter control easier, macro blocks are arranged into rectangular matrices with dimensions 3 (vertical) by 11 (horizontal). 4. Image As already mentioned the spatial format of the images is CIF, i.e., an orthogonal pattern of 360 pixels by 288 lines for the luminance (Y) and 180 pixels by 144 lines for the two colour-difference components (CB and CR). Being 360 not a multiple of 16 (the MB dimension), four columns on the left-hand side and as many on the right have been discarded giving a total width of 352 pixels for Y and 176 pixels for CB and CR. In that spatial format 6 (vertical)x 2 (horizontal) GOB are contained.
1~ Pereiraet al. / A CCITT compatible coding algorithm 2.2. The coding algorithm The generic architecture of the coding algorithm is known as a hybrid DPCM/transform coder. Prediction is performed in the temporal direction, i.e., a coded image is used to predict the following one. The prediction errors, still having some spatial statistical dependence within each image, are decorrelated using a two-dimensional transformation. The key elements of the coding process are the following: 1. Motion Compensation (MC) The MC is applied only to MB that have changed significantly. The displacement estimation is achieved by a block matching technique using a search window of :~7 pixels in the previous coded frame. 2. Coding loop A simple DPCM loop, working in the temporal dimension, i.e., interframe, followed by a Discrete Cosine Transformation (DCT) can be identified as the kernel of the configuration. The DCT coefficients are uniformly quantised using one among a predefined set of quantiser steps. 3. Variable Length Coding (VLC) VLC is applied after the quantisation of the DCT coefficient. A zig-zag path is used to scan the DCT coefficients matrices. 4. Buffer control To smooth the inherent instability and to control the bits production, a buffer is used. Depending on its fulness the stepsize of the quantiser is adjusted allowing a fast recover from a bit production surplus.
3. The coding algorithm for digital recording of moving images The definition of a standard for the coding of moving image information on digital storage media
157
is an important target in the International Organisation for Standardisation ( I S O ) - - M P E G Group. This coding algorithm must be prepared to provide not only the Normal Video Playback but also the Reverse Video Playback, the Fast Forward Video Playback, the Fast Reverse Video Playback, Random Access and High Quality Still Mode. Among the known algorithms that could be considered as suitable candidates, the RM8 previously described, which is now in the process of being converted into a CCITT Recommendation (H.261), appears the best one for the reasons indicated in the introduction. In the following, we will describe the variations and additions brought to the RM8 coding algorithm, in order to make it able to provide all the facilities required for recording of moving images on digital storage media.
3.1. The codec structure The codec structure is presented in Figs. 1 and 2 and combines intra/interframe coding with global and local motion compensation and transform coding. In order to improve the H.261 algorithm performance some new coding tools have been introduced. 1. Global motion compensation The motion compensation is improved using simultaneously local and global displacement estimations. The global motion compensation acts over the whole frame and allows to improve the prediction for the interframe coding. Two special cases of motion are for the moment considered: panning and zoom. In the panning compensation a panning vector is transmitted that indicates the displacement that must be applied to the previous decoded frame before the normal interframe coding. In the zoom compensation a zoom factor is transmitted that specifies the expansion or compression to be applied to the previous decoded frame before the normal interframe coding. The panning vector and the zoom factor are included in the Picture Header; when there are no panning Vol. 2, No. 2, August 1990
158
F. Pereira et al. / A C C I T T compatible coding algorithm
~
OOA..Tq 1IH.G. OONTROL I
INTRA/INTER
',
[-. . . . . . . . . . . . . . . . . . . . . . . . . . ,
VIDEO,
+---T
...........
7 .........
q
- -
~
,
IN -I RESOLUTION/
7
+
~
_if'/" "~ ~
I
II
WEIGHTINGI
~
I
I I-----'--I
TRANSFORM~-~ AND ~ LENGTH H MULTIPLEXER~--~ BUFFER I I IQUANTISATIONI ; I CODING I [ J I I l
l WEIGHT/QUANT ,NVERSE 1]
STORAGE MEDIUM
INVERSE TRANSFORM
FRAME h I PREDCTION IGLOBAL/LOCALI DISPLACEMENTI ESTIMATION J
I
SIDE INFORMATION
Fig. 1. The coder.
STORAGE MEDIUM - ~ D E M U ' T , P ' E X ~ R ~
, ....... i,
U ,NVERSEE,"'Z', - I I o~mq -IWE',NVEBSE GHT'OOANT' -ITRANSFORM I FL+~ ,
SIDE INFORMATION
/"V'o~O ,~
Fig. 2. The decoder.
v e c t o r / z o o m factor, the motion compensation is identic to that of the H.261 algorithm.
2. Weighting of transform coefficients Transform coefficients are weighted before quantisation [1]; at the decoder the inverse process is performed. This weighting operation allows to improve the final subjective quality of the coded images.
3. Noise filtering Noise filtering is applied with all the macroblock classes but the intracode; this operation allows to obtain noticeable subjective improvements since it is in fact a dithering operation. Signal Processing:Image Communication
4. Resolution control Since, with the present state of the art, there is no serious chance of representing the video signal with C C I R 601 resolution reaching the target quality using bitrates around 1.2 Mbit/s, the spatial and temporal subsamplings appear without alternative; the chosen solution is spatial C I F resolution, which means 352 x 288 pels for luminance and 176 x 144 pels for chrominance at 25 Hz; the selected algorithm must therefore have a reduction factor of about 20 to get the target bitrates. One of the additional features required to the coding algorithm is the possibility of increasing the quality of particular (still) pictures, selected during the coding phase, by using an amount of
159
F. Pereira et al. / A CCITT compatible coding algorithm
information which is not read during the normal play; this is called High Quality Still Mode (HQSM). In the High Quality Still Mode the resolution may increase until CCIR 601 resolution ( 7 0 4 × 2 8 8 × 2 for luminance); the spatial resolution increase strategy will be explained when describing this facility.
5. Frame prediction The frame prediction accuracy is of fundamental importance for the video coding algorithm efficiency. Beside the introduction of the global displacement estimation, the separation between the local displacement estimation and the filtering has been introduced. This feature implies the introduction of four new macroblock classes and the correspondent changing of the VLC tables. 6. Coding control---quantisation step control In the Reference Model, the quantisation step is determined by the buffer fulness in order to guarantee a fixed output bitrate. For registration purposes the bitrate constraints are not so stringent (there is a peak bitrate limitation) and this allows distributing the available bitrate depending on the image activity and trying to absorb at least short term variations. The quantisation step is computed for each GOB based on the value of the parameter 'Excess' defined as the difference between the number of bits produced after the beginning of the transmission and the number of bits that could be transmitted during the same time on a fixed channel working at the bitrate initially agreed [3]. 3.2. The temporal hierarchical coding As the Reference Model implements a pure interframe coding, it is impossible to provide all the required registration facilities previously indicated; this fact justifies the necessity of breaking the temporal coding correlation introducing recovering points. The introduction of these recovering points is very delicate since it is important to avoid quality breaks corresponding to these
A
B
(N1)
C
ILCl illLtl..... iLtllil AorB
B A - PURE INTRAFRAME CODING B - INTERFRAME CODING WITHOUT MC AND INTRABLOCK CODING
(N2)
C - IDENTIC TO B (THE SAME FRAME IS ALSO [NTRAFFIAME CODED)
e e e
D - [NTERFRAME CODING
Fig. 3. The temporal hierarchical coding. moments, providing at the same time adequate fast playback modes without prohibitive hardware and computational costs. These arguments justify the introduction of a more complex and dynamic temporal coding structure called 'Three levels temporal hierarchical structure' (Fig. 3). This temporal structure considers four frame coding modes: - - M o d e A. Pure intraframe coding for all the frame; - - M o d e B. Normal Reference Model coding excluding global and local motion compensation and intrablock coding; - - Model C. Identic to B; it is used for the frames that are coded with Mode A (these frames are coded twice); - - Mode D. Normal Reference Model Coding with global motion compensation. The present temporal structure has two defining parameters, --N1, the number of Mode B coded frames between two Mode A coded frames; - - N 2 , the number of Mode D coded frames between two Mode B/A coded frames. Changing these parameters, it is possible to obtain different temporal structures from the pure periodic intraframe coding or pure CCITT H.261 scheme to other more complex schemes. It is also possible to change the coding mode of a specific frame inside the periodic temporal structure since the frame class is clearly indicated in the Picture Vol. 2, No. 2, A u g u s t 1990
160
F. Pereira et al. / A C C I T T compatible coding algorithm
Header of the H.261 bitstream. This fact allows, with an adequate criterion, to dynamically adapt the temporal coding structure to the image activity characteristics. As the diverse coding modes are characterized by different coding compression factors, it is important to avoid noticeable quality variations in time attributing to each coding mode a quantisation step privilege. - - Mode A. The quantisation step of each GOB is identical to the quantisation step of the corresponding GOB in the previous Mode C coded frame; - - M o d e B. The quantisation step computed by the quantisation step control is diminished by 2; - - M o d e C. The quantisation step computed by the quantisation step control is diminished depending on the average value of the quantisation step in the previous 10 frames; - - M o d e D. No privileges (normal quantisation step control). These quantisation privileges allow to smooth the picture quality, avoiding noticeable quality breaks at the recovering points. The simulation results demonstrated that the coding of Mode A and Mode B frames is determinant for the overall picture quality.
3.3. The recording facilities The designed coding algorithm must provide not only the Normal Video Playback but also the additional performances required for recording purposes. In the following these facilities are briefly analysed:
implementation of this facility depending on the required refresh frequency.
Low refresh frequency This option is essentially characterized by the fact that supplemental memory is not needed. The feature is implemented by decoding only the Mode A frames and the Mode B frames (if N~ greater than zero) and repeating each one N2+ 1 times; if N, is zero only Mode A frames (pure intracoded) are used, while for N~ greater than zero also Mode B frames are probably used. The pure reverse processing, where one Mode B frame is obtained subtracting the correspondent differences from the previous Mode B or Mode A frame, is possible due to the introduction of the Mode C frames that maintain the temporal chain. The process is not ideal because the image resulting from Mode C is not equal to that resulting from Mode A; however the simulations have shown that reasonable results may be expected. The use of this option for the NRP implementation will depend on the value of N2; note that there is no delay.
Normal refresh frequency This option requires supplemental memory. The feature is implemented decoding all the frames between two Mode B frames (or Mode B and Mode A). If a reverse playback quality identical to the normal playback quality is requested, it is necessary to buffer all the frames and make the display in reverse order; if a lower refresh frequency is admitted, it is possible to buffer one frame out of 2 or 3, saving memory and repeating the display of each frame a convenient number of times. The delay depends on the N2 value.
3.3.1. Normal Video Playback (NVP) This feature is implemented by the normal sequencial decoding of the bitstream. It is important to avoid noticeable quality variations due to the introduced recovering points.
3.3.2. Normal Reverse Playback (NRP) The dynamic temporal hierarchical structure that is presented allows multiple choices for the Signal Processing: ImageCommunication
3.3.3. Fast Forward Playback (FFP) The Fast Forward Playback implementation has, for this dynamic temporal hierarchical structure, many possibilities depending on the values of N~ and N2. As we must respect the limits for the medium data rate, the FFP implementation will depend on the maximum number of bits per frame since we use for this feature the recovering frames
F. Pereira et al. / A CCITT compatible coding algorithm (Mode A frames) that are the frames with the lowest compression factor; the ISO requirements speak of a speed up factor between 8 and 10. I f we consider an average bitrate of 900kbit/s, a burstiness factor (frame level) of 3 and a m a x i m u m medium data rate of 1.2 Mbit/s, we may conclude that it is possible to read at m a x i m u m 1.11 (peak frame/original s) for a speed up factor of 10 and 1.39 (peak frame/original s) for a speed up factor of 8 (average values). The solution will result combining these values with the N1 and N2 choices that will probably depend on the image activity characteristics. For example, N~ equal to 3 and N2 equal to 5 gives a speed up factor of 8 if we repeat each Mode A frame 3 times at 25 Hz or 6 times at 50 Hz; in this situation we must read 1.04 (peak f r a m e / o r i g i n a l s ) , which is below the indicated limit value.
a pessimist case where all the Mode A, B and C frames are peak frames.
3.3.6. Compatibility with the C C I T T H.261 Algorithm As the presented algorithm is based on the C C I T T H.261 Reference Model, the codec structure is completely compatible; however due to the introduction of some new coding tools and the temporal hierarchical scheme some problems a p p e a r in the decoding of a hierarchically coded signal with a H.261 codec. The situation may be summarized by - - there is 100 % capability to decode H.261 coded signals considering that all the frames will be Mode D frames; - - the decoding of hierarchically coded signals by a H.261 codec will only be possible with the introduction of the Picture Header analysis related to the frame coding mode detection and the disabling of the new coding tools.
3.3.4. Fast Reverse Playback (FRP) For Fast Reverse Playback all the comments made for Fast Forward Playback are still valid. The solution for the implementation of this facility may be the same reading, of course, the data in reverse mode.
3.3.5. Random Access ( R A ) R a n d o m Access is the feature that allows to access directly to a target frame. This feature must use the address table to reach the nearest recovering point (Mode A frame) to the target frame and from that point proceeds until the display of the claimed frame.. The m a x i m u m Random Access time depends on the burstiness factor (frame level), on the temporal frequency and for the hierarchical coding on the N1 and N2 values. The hierarchical temporal coding has a m a x i m u m Random Access time (MRAT) of about M R A T = [[int((N~ + 1)/2) + 2] x BF + ( ( N 2 + 1))] x (1/fq) (s), where BF is the burstiness factor (frame level) and fq is the temporal frequency. It is considered here
161
-
3.3.7. High Quality Still Mode ( H Q S M ) One of the additional features required to the coding algorithm is the possibility of increasing the quality of particularly interesting (still) pictures, making use of a certain amount of information which is recorded on the media but is not read during the normal play; this High Quality Still Mode (HQSM) may simply produce an image with the same spatial resolution but with a better rendition of the details or even a picture with a higher resolution (e.g., according to C C I R 6 0 1 ) depending on the used amount of information. The implementation of this feature has been done in a flexible way since it is possible to choose the strategy of the quality increasing defining - - t h e successive levels of resolution choosing between >level 1 - - C I F at 25 Hz, >level 2 - - C I F with double horizontal resolution at 25 Hz, >level 3--level 2 at 50 Hz; the m a x i m u m number of frame codings in each intermediate resolution level. -
Vol. 2, No. 2. August 1990
162
F. Pereira et al. / A CCITT compatible coding algorithm
During the High Quality Still Mode coding a quantisation step convergence process that allows to optimize the coding efficiency is implemented. All the Still Mode frames are Mode D coded. The convergence process
The convergence process here described is a particular case of a more general convergence process [2] for the bitrate and quality optimization of the transmission of fixed pictures with a hybrid scheme. The general process admits not only artificially fixed pictures (e.g., data bases) but also pictures resulting from a fixed camera and affected by the camera noise. The method detects automatically the fixed image and implements its coding in an optimal way considering that the sequence may become again a normal moving sequence. For the specific case here considered the automatic detection is not necessary since the pictures that must be submitted to the High Quality Still Mode are specifically selected during the coding. - - The quantisation step strategy
The necessity of a special quantization step strategy for a still image coding is motivated by the non-efficient progress of the coding if we use the quantisation steps resulting from the normal quantisation step control. These quantisation steps, that result essentially from the necessity of respecting an agreed average bitrate, are normally too near or even equal for the same GOB of two consecutive frame codings, leading to the repeated coding of the accumulation of the arithmetic and quantisation errors, without any image quality improvement. The purpose of the convergence process is to distribute, in an optimal way, the available bits taking into account that the agreed average bitrate must always be respected. In the convergence process two quantisation steps act on each GOB: - - R e a l Quantisation Step ( R Q S ) is at each moment the minimum quantisation step already used for the coding of each GOB. The process finishes when the RQS is the minimum QS for all the GOB's. Signal Processing: Image Communication
is the value actually used during the coding in the convergence process. If the available bitrate does not allow an RQS decrease for the present GOB, the CQS assumes a political value that overcomes the errors and allows to maintain the full compatibility without wasting bits. For this case the choice of the political QS value is not critical on what happens when coding fixed pictures embedded in moving sequences, since we must be prepared to code a scene cut/movement overcoming at the same time the errors and the camera noise. It has been concluded by simulations that, if the average bitrate control allows an RQS decrease for a GOB, the optimal q u a n tisation step evolution is to make the CQS equal to half the RQS; RQS is in this case updated. In order to guarantee a uniform image quality, the RQS o f a GOB may only be decremented when the difference between the actual RQS and the maximum RQS in the frame is below a selected threshold. The convergence process is mainly based on a continuous interaction between RQS and CQS which leads the coding process; the coding is stopped when the agreed number of bits for the HQSM is reached.
--Coding
Quantisation Step ( c o s )
- - The convergence process with resolution improvements
The increasing of the resolution may be done using o n c o r two steps as explained above. In the resolution changes the picture is interpolated and filtered to obtain the first prediction for the new resolution level coding. Starting from the default level (352 x 288 pixels for luminance and 176 x 144 pixels for chrominance) we may double first the luminance and chrominance horizontal resolution and afterwards the temporal resolution or do the two resolution increases simultaneously. The resolution increases follow the rules: --the initial resolution level is always the default resolution level (level 1); -the transition to a higher resolution level is done when the average RQS (frame level) reaches a threshold value or when the agreed maximum
F. Pereira et al. / A C C I T T compatible coding algorithm
number of frames for an intermediate resolution level is reached; the resolution level changes may be implemented with or without memory on the quantisation step convergence process (note that the subjective impact of the two options is not the same): with memory: the quantisation step convergence process acts as described above along all the High Quality Still Mode coding; in the resolution level changes to each GOB is attributed the RQS of the correspondent GOB in the previous resolution level. This option, which is in the long term more efficient, may bear a negative subjective impact if the available number of bits for the HQSM is not enough to code at least one time each GOB in the final resolution level. without memory: within each resolution level, the quantisation step convergence process acts as described above. In the first coding of each new resolution level, the RQS and the CQS values are made equal for each GOB and are only determined by the quantisation step con-
163
trol respecting however an upper threshold; the quality improvement is very smooth. The described convergence process guarantees an image quality improvement obtained in a smooth way in terms of bitrate, signal-to-noise ratio and image subjective impact.
4. Results
I. Normal coding To analyse the performance of the temporal hierarchical coding algorithm presented in Section 3..2, we have coded the sequences 'Table Tennis' and 'Diva" at 900 kbit/s, using some of the more interesting combinations of N~ and N2; in Figs. 4-9 the luminance signal-to-noise ratio (dB) related to the studied cases is represented. The presented results suggest the following comments: - - T h e differences on the luminance signal-tonoise ratio between the Reference Model and
Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s 37 36
l/
35 34 33 32 31 30 29 28 27 26 25 25
50
75
100
125
150
175
200
225
250
frame
Fig. 4. Reference Model. Vol. 2, No. 2, August 1990
F. Pereira et al. / A C C I T T compatible coding algorithm
164
Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s
38.8 38.6 38.4 38.2 38 37.8 37.6 37.4 37.2 37 36.8 36.6 36.4 36.2 36 35.8 35.6 25
50
75
1O0
125
frame Fig. 5. R e f e r e n c e M o d e l .
Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s 35
ArAIA
34
/
33 32 31
vi
30
III
'lv'~
29
I I
A
28 27 26 25 25
50
75
100
125
150
frame Fig. 6. N I = 0 ; Signal Processing: Image Communication
N 2=11.
175
200
225
250
F. Pereira et aL / A CCITT compatible coding algorithm
165
Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s
38
37
36
35
34
33 25
50
75
100
125
frame F i g . 7. N 1 = 0; N 2 = 11.
Luminance Signal to Noise Ratio (dB) "TABLE TENNIS" - 900 kbit/s 35 34 33 32 31 30 29 28
27
I
26 25 24
25
50
75
100
125
150
175
200
225
250
frame Fig. 8. N~ = 4; N z = 4. VoL 2, No. 2, August 1990
F. Pereira et al. / A CCITT compatible coding algorithm
166
Luminance Signal to Noise Ratio (dB) "DIVA" - 900 kbit/s 38 37.5 37 36.5
AI U II
38
^I
I IVLAlllt/IV
w ll "l/VllVI
35.5
'
1
35
Y',lVUIII/V w wl
34.5 34-33.5
........................
I ........................
25
l ........................
I ........................
50
75
I ........................
100
125
frame F i g . 9. N l = 4; N 2 = 4.
the Temporal Hierarchical Coding here described are essentially due to the variable bitrate approach and the global motion compensation; this remark is particularly evident on the clear zoom and the panning of the sequence 'Table Tennis'. - - The picture quality for the presented algorithm depends on the image activity since the implemented variable bitrate coding allows only to absorb short term variations; this conclusion may be observed comparing the 'Table Tennis' and 'Diva' results. - - T h e picture quality is not the same for all the frame coding modes; in fact the A and B modes introduce breaking points not only in temporal correlation but also on picture quality. The quantisation step privileges result from the research on these frames of the optimum tradeoff between the expended bits and the picture quality. The mode A frames picture quality is critical since it will seriously affect the quality of all the remaining frames. Signal Processing: Image Communication
- - T h e ideal combination of Nt and N2 depends on the image activity and also on the performance required for the registration facilities, essentially the fast modes and the random access.
2. High Quality Still Mode The performance of the HQSM has been tested using the sequence 'Still Flower' coded at 900kbit/s; this sequence has 25 moving frames (coded with Nt = 0 and N2 = 11) being the last one coded with the HQSM. The results are presented in Figs. 10-12 and suggest the following remarks: - - The convergence process allows to improve the coding efficiency (Fig. 10--the LSNR is measured over the CIF matrices). The initial differences are due to a first image different QS. - - The convergence with memory imposes the coding of all the GOB's in the' final resolution level in order to avoid the negative subjective impact of a frame with different qualities in the various
F. Pereira et al. / A CCITT compatible coding algorithm
167
Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit/s + 270 kbits
39 38 37 36 35 34 33 32 31 30 29 28
27 26
~T
25 frame o
WithoutConverg.
+
with Converg.
Fig. 10. With/without convergence--Res. Level = I.
Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit]s + 1.45 Mbits
41 40 39
311 35 34 33 32 31 3O 29
cir
28
--="
25 23 I 22 21 . 20
/
ccJ
S
J
=
i
5
....
i . . . .
10
i
15
. . . .
i . . . . . . .
20
25
i . . . .
25
i . . . .
25
i . . . .
25
i . . . .
25
i . . . .
25
i . . . .
25
25
25
frame t3
Without Memory
+
With Memory
Fig. ll. Res. Level 3--With/without memory (l, 45 Mbits), Vol.
2,
No.
2,
August
1990
F. Pereira et aL / A CCITT compatible coding algorithm
168
Luminance Signal to Noise Ratio (dB) "STILL FLOWER" - 900 kbit/s 50 48 46 44 42 40 38 36 34 32 30 28 26
o
24 22 20 25
25
With memory
25
25
25
25
frame • Without memory
Fig. 12. Res. Level 3 - - C o n y . until QS = 1 for all GOB's.
GOB's, while the convergence without memory guarantees a very smooth quality improvement reaching more rapidly a global uniform quality after the resolution level changes. - - T h e convergence process with resolution improvements and memory seems more efficient in the long term than the convergence process with resolution improvements and without memory. In Fig. 11 the LSNR evolution for these two possibilities (resolution level 3) using 1.45 Mbits is presented, since this is the value that corresponds to the situation where all the GOB's are coded once in the final resolution level. In Fig. 12 the LSNR evolution until the quantisation step is 1 for all the GOB's (res. level 3) is presented; note that the coding evolution is not the same: the coding without memory expends 3.8 Mbits and reaches an LSNR o f 45.4 dB while the coding with memory expends 4.4 Mbits but reaches an LSNR of 48.25dB (Figs. 11 and 12--the LSNR is measured over the CCIR matrices). Signal Processing: Image Communication
5. Conclusions
The basic architecture of the presented algorithm is that of the C C I T T H.261 algorithm; this choice is motivated by the performance of this architecture that is until now the most promising and also by the necessity of increasing the compatibility between all the video applications. The improvements introduced in the coding scheme try to cope with the new requirements imposed by the nature of the image sequences different from the well-known videotelephone images. Another motivation was the necessity of providing all the required recording facilities among which the reverse modes and the random access.
In order to fulfil all the requirements many solutions have been examined from the simplest one--introduction of a periodic intracoded frame--to others that are more sophisticated. Special attention has been dedicated to the resolution of the reverse processing problem in order
F. Pereira et al. / A CCITT compatible coding algorithm
to obtain an almost symmetric (in time) hybrid coding. The modifications motivated by this feature decrease the coding efficiency; however this decrease is acceptable as may be observed in the results presented for the luminance signal-tonoise ratio. Another consequence is the necessity, for the normal playback, of duplicating the memory for the decoded image. The other extension introduced in the algorithm is related to coding, with high quality, of still pictures. The results show a better performance when using the presented convergence process; since this high quality scheme affects only the coding control and not the H.261 compatibility, it seems interesting to consider its introduction in the final recommended algorithm.
169
References [1] C. Herpel, D. Hepper and D. Westerkamp, "'Adaptation
and improvement of CCITT Reference Model 8 video coding for digital storage media applications", Image Communication, Vol. 2, No. 2, August 1990, pp. 171-185. [2] F. Pereira and L. Masera, "High Quality Still Picture Mode embedded into a hybrid coding scheme", Picture Coding Symposium, Cambridge, U.S.A., March 1990. [3] F. Pereira and M. Quaglia, "Extension of the CCITT visual communication coding algorithm for operation in ATM networks", Image Commun. J., 1990. [4] CCITT SG XV, Draft Recommendation H.261, Tokyo Meeting, October 1989.
Vol. 2, No. 2, August1990