SIGNAL PROCESSING:
IRlAeGE COMMUNICATION
Signal Processing:
Concealment
Image Communication 7 (1995) 173%182
techniques for data-reduced Maka Kharatichvili*,
HDTV recording
Peter Kauff
Heinrich-Hertz-Institut fiir Nachrichtentechnik, Einsteinufer 37, IO.587Berlin. Germany Received
11April 1994
Abstract Most of all data-reduced HD-VCRs use powerful error protection schemes in order to prevent error propagation, Under normal conditions these schemes are able to correct all errors from magnetic tape channel. However, under bad conditions some uncorrected errors may be left. Therefore, it must be possible to conceal erroneous DCT blocks in order to avoid visible degradations in this worst-case situation. In this context four different concealment techniques (spatial, temporal, motion-adaptive and motion-compensated concealment) are discussed and compared with respect to their efficiency. The results of subjective tests and the discussion of some system aspects show that the motion-adaptive approach is the most suitable one for the underlying application. Keywords: Magnetic recording; Concealment
techniques; DCT coding
1. Introduction Digital video cassette recorders (VCRs) are one of the most important key components in the HDTV studio. However, one crucial disadvantage of digital HD-VCRs is the rather limited playing time, which is imposed by the restricted amount of tape material in video cassettes. For example, a conventional D2 cassette is not sufficient for recording long HDTV productions like movies of more than about 90 min, even if improved tape material with an increased recording density is employed [S, 93. Therefore some state-of-the-art HDTV recorders still use open reel transport systems instead of cassette drives [18]. Because of these restrictions, several solutions, which are
*Corresponding
author.
0923-5965/95/$9.50 0 1995 Elsevier Science B.V. All rights reserved SSDIO923-5965(95)00014-3
based on digital source coding techniques, have been proposed in the past [5, l&12,17]. The source coding reduces the bit rate by compressing the HDTV signal. As a consequence, the compressed HDTV signal can be recorded for more than 90min by using either conventional TV recorders [S, 111 or modified HDTV recorders with reduced scanner complexity and retarded tape velocity [lo]. In all proposals the coding schemes are based on intraframe or even intrafield coding, because professional recorders have to support frame-by-frame editing [2]. Furthermore, the compression ratio is limited to rather low values in a range from 1: 2 up to 1: 5. This is because a professional HD-VCR has to guarantee an excellent picture quality without any visible degradations, even after multiple copy operations like special postproduction E&omakey [2,13].
174
M. Kharatichvili, P. Kauff / Signal Processing: Image Communication 7 (1995) 173-182
Another very important requirement of professional applications is a low sensitivity to magnetic tape errors. Therefore most of all digital HD-VCRs use extremely strong error protection in order to prevent error propagation. Nevertheless, an overload of error protection cannot be avoided completely and, hence, suitable concealment strategies are needed for this worst-case situation of an error protection overload. In this context the underlying paper presents concealment techniques which have been developed for the coding scheme proposed in [12]. This scheme as well as the data structure on tape and the corresponding properties of tape errors are explained in Section 2, while different approaches to concealment and their efficiency are discussed in Section 3.
2. The VCR codec 2.1. The encoder As shown in Fig. 1, the VCR encoder for HDTV is based on four identical and independent DCT subsystems, which are working in parallel, since the data-reduced HD-VCR is based on four parallel
recording channels. Because of this parallel structure, the entire HDTV frames are decomposed into four subframes. This is done by two special preprocessing steps, which are called ‘macroblock generation’ and ‘intersector distribution’. By using this special decomposition scheme, the pictorial input information is distributed over the four DCT subsystems without any loss of coding efficiency. At the output of each subsystem, there are two different kinds of data, namely FLC (fixed length code) and VLC (variable length code) data. This separation into FLC and VLC data is given by special shuttle requirements which will be explained in more detail in Section 2.3. The FLC data not only contain the DC value, but also the lowfrequency AC coefficients (LF-AC) with an accuracy reduced from 12 to 4 bit by means of nonlinear quantization (NL) as shown in Fig. 2(b). Additionally, the quantization error between original and reconstructed LF-AC values is calculated by using the inverse non-linear quantizer. These quantization errors (ALF-AC) and high-frequency AC coefficients (HF-AC) are then adaptively quantized and recorded as VLC data. Both FLC and VLC data are then multiplexed at the FEC (forward error correction) encoder, which adds error protection bits to the bit stream. Finally, after error
Fig. 1. General structure of VCR encoder.
M. Kharatichvili, P. Kauff / Signal Processing: Image Communication 7 (1995) 173-182
175
low-kqmq AGcoefficient (PartlYWmdparllYvLC) VLC
(a)
DCT coefficient matrix
(b)
cl
high-frequency AC-coefficient (WOdY)
Fig. 2. (a) The distribution of DC, LF_AC and HF-AC coefficients. (b) Separation of FLC and VLC.
protection the multiplexed data streams are recorded on tape via one of the four heads. The encoder does not use any kind of interframe or interfield coding. This facilitates frame-by-frame editing, which is absolutely necessary for professional use [2]. Therefore, image blocks with fast motion are coded by an intrafield mode, whereas non-moving or almost non-moving blocks are coded by an intraframe mode. The adaptive switching between intrafield and intraframe coding is controlled by an activity criterion and by a resulting mode decision flag, which is added to the bit stream for decoding purposes.
2.2. The decoder The VCR decoder performs exactly the inverse operations of the VCR encoder as shown in Fig. 3. However, one special property of the decoder is the handling of error flags. For this purpose the error correction checks whether the bit stream, which has been read from tape, contains errors or not. Under normal conditions, the decoder will be able to correct all these errors [9, lo]. However, if the concentration of errors is too high, the error protection scheme may be overloaded and in this case the contaminated bytes are marked with an error flag.
_grgI_h -__--_---__.
VCR-L hfRul4
.
3F-‘-b~~ ‘VfF) w-3 _mm&
-______-___.
”
Fig. 3. General structure of VCR decoder
&
-C,:
176
M. Kharatichvili,
P. Kauff / Signal Processing:
These error flags are then transferred to the IDCT subsystem, where special concealment techniques [16] can be activated.
2.3. Significance of FLC and VLC data Before explaining the concealment algorithms, it is reasonable to discuss why the VCR codec uses two kinds of data, namely FLC and VLC data. This is because a VCR codec has to produce an acceptable picture quality during shuttle. For quick search purposes, it must be possible to shuttle the recorded sequence with variable tape speed in both the forward and backward directions. In this special operation mode the playback heads do not follow the tracks in the normal way, but they cross them diagonally. Consequently, the heads can only read small data packets, so called ‘synchron blocks’, from each track. During shuttle these synchron blocks are read in wrong order and a lot of
effect of erronems FLC data= randomlydistributedblock errors
Image Communication
7 (1995) 173-182
blocks are lost. Nevertheless, as each synchron block has its own identification, all data, which are read successfully during shuttle, can be re-ordered and are then used to refresh an output frame store, which displays the shuttled image information. However, in the context with data-reduced recording, this procedure is only successful for FLC data, whereas the synchronization is lost for VLC data. This is because only FLC data are organized in a special data structure on tape. This structure consists of so-called synchronization blocks which are leaded by a special synchronization word as well as an identification word. Thus, all FLC data within such a synchronization block can be identified correctly. The VLC data are asynchronously placed inbetween the FLC data. Due to this asynchronous recording of VLC data, most of the VLC data cannot be decoded during shuttle. As a consequence, the basic information which is needed to recognize the picture content has to be recorded as
effect of erroneousVLC data= vertical stripe errors ( due to errorpropagations)
09
0a Fig. 4. Different
effects of FLC and VLC errors.
h4. Kharatichvili,
P. Kaufl / Signal Processing: Image Communication
FLC data. This is achieved by the special quantization scheme from Fig. 2 and the subsequent separation into FLC and VLC data.
2.4. Different effects of FLC and VLC errors Due to the separation between FLC and VLC data, an error in a magnetic tape channel will cause two different kinds of disturbances: FLC block errors and VLC stripe errors. As shown in Fig. 4(a), the contaminated FLC data will appear on the screen as randomly distributed block errors. This is achieved by shuffling the FLC data of one frame in order to spread burst errors from the tape over the whole output picture. In opposition to this random character of FLC errors, VLC errors propagate in the horizontal direction up to the right border of the frame, where inserted synchronization words stop error propagation by resynchronizing the VLC decoding. Therefore VLC errors appear as socalled stripe errors, where a stripe error occurs only in every fourth macroblock of a horizontal block line due to initial intersector distribution (Fig. 4(b)). As shown in Fig. 4, there are still two other properties of FLC and VLC errors. The first one is that erroneous blocks are always surrounded by faultless blocks. This is because of the special kind of intersector distribution, which is used for decomposing the entire HDTV picture into four subframes. The second property is that FLC and VLC errors will most likely not appear in the same DCT blocks, due to asynchronous recording of VLC and FLC data. This especially means that for example those blocks, which are contaminated by VLC errors, are not lost completely, because they still contain the basic information of the FLC data.
3. Concealment 3.1. Spatial concealment
As erroneous DCT blocks are always surrounded by faultless blocks from other recording
7 (1995) 173-182
117
channels, spatial interpolation of missing data seems to be a promising concealment strategy in the VCR codec under study. However, spatial concealment has to be performed in the frequency domain and not in the spatial domain itself. One restriction of this approach is that spatial interpolation can only be applied to the DC and LF-AC. In the case of HF_AC coefficients spatial interpolation must fail in principle, because there is no correlation between HF_AC coefficients of spatially adjacent DCT blocks. Therefore erroneous HF-AC coefficients have to be set to zero, although this may reduce the sharpness in the output blocks after decoding. In the case of erroneous FLC data the DC coefficients are reconstructed by averaging the DC values from horizontally and vertically adjacent blocks. Furthermore the missing FLC part of lowfrequency coefficients is reconstructed by weighting DC and LF-AC values from adjacent blocks. For this reconstruction we have used the method proposed in [4,6], but we have adapted them to our application, because in our case we can use the DC and LF-AC coefficients only from four adjacent blocks, because in shuttle operation mode the playback heads do not follow the tracks in the normal way, but they cross them diagonally (therefore two adjacent blocks in the diagonal direction are missing). The method proposed in [4,6] uses the coefficients from all eight adjacent blocks. Furthermore the DC coefficients were given different weighting. In the case of erroneous VLC data, the missing HF-AC coefficients are always set to zero. Blocking effects, which may occur in this case due to the absence of detail information, can be reduced by using special smoothing algorithms [7]. The whole algorithm works nearly perfect for still pictures, because even remaining defects, like wrongly reconstructed FLC data or artefacts from blocking smoothing, are often masked by the picture content. However, computer simulations have shown that the artefacts become clearly visible if the above algorithm is used in real video sequences. The problem is given by the fact that a concealed block appears only in a single frame, while the same block remains undisturbed in previous and subsequent frames. Consequently, this concealed block
178
hf. Kharatichvili,
P. Kauff 1 Signal Processing:
flashes up for a short moment. This temporal effect can be detected easily by the human observer and, therefore, it leads to an annoying impairment, even if only a few blocks are disturbed.
Another interesting concealment approach is the temporal repetition of missing data. A crucial benefit of this approach is that it can be applied to the DC value as well as to the low- and high-frequency AC coefficients. In this case, the contaminated coefficient can be replaced by the corresponding undisturbed data of the previous frame. For example, in the case of contaminated VLC data, the ALF-AC values and HF-AC coefficients will be taken from the previous frame and subsequently the concealed data (ALF-AC and HF-AC) and the faultless data (DC and LF-AC) are merged together before using the inverse DCT. This rather simple method of temporal concealment works well, not only for sequences without motion, but also for sequences with moderate motion. In order to demonstrate this advantage of temporal concealment over spatial concealment, sub-
1
annoying
.
slightly annoying
-
7 (1995) 173-182
jective tests have been carried out on the basis of different sequences with quite different motion content [ 143. These tests have been organized according to CCIR Recommendation 500-4 using the impairment scale and an undisturbed reference picture (EBU method) [l]. The simulation of the HDTV-codec has been downscaled to TV. The subjective tests have been organized with 12 nonexpert observers (three in each session), the viewing distance was four times picture height (4H). The disturbed sequences were contaminated with 22 FLC block errors and 11 VLC stripe errors at one frame out of 20. This error event represents the typical overload situation of the error protection under study. The results of the subjective tests (Fig. 5) show that the artefacts generated by the temporal concealment for sequences with moderate motion ‘Fade’ and ‘Mobile and Calendar’ have been rated ‘perceptible, but not annoying’, because FLC and VLC disturbances do not coincide (as explained before). Therefore, at least a part of the DCT information can always be reconstructed. Consequently, for pictures with moderate motion there is only a slight deviation between low- and high-frequency information. It causes a short flaring of edges, which is hardly visible and which is
3.2. Temporal concealment
perceptibly, but not atuloylng
Image Communication
imperceptible .
fade
mobile
flower
kiel
increasing complexity Fig. 5. Subjective
quality
of temporal
concealment
wheel
tennis
of motion versus spatial
concealment.
M. Kharatichvili, P. Kauff / Signal Processing. Image Communication 7 (1995) 173-182
considerably less annoying than the ‘flashing’ effects of spatial concealment. Even in sequences with some more motion, like ‘Flower Garden’ or ‘Kiel Harbour’, the artefacts of temporal concealment have only been rated as ‘slightly annoying’ and are still less annoying than spatial concealment techniques. However, for complex motion like rotation in ‘Wheel’ or the fast zoom in ‘Table Tennis’, the artefacts of temporal concealment are unacceptably annoying, because low- and high-frequency information do not correspond anymore. In this situation the artefacts of spatial concealment are less annoying, because human observers can hardly recognize the ‘flashing’ effect of spatial concealment in image regions with complex or very fast motion.
3.3. Motion-adaptive concealment The previously discussed results suggest the implementation of a motion-adaptive concealment which switches between the spatial and temporal
179
concealment as shown in Fig. 6. The adaptive switching is controlled by the mode decision flag for intrafield/intraframe coding, which is already available in the bit stream of coded data. In the case of blocks with moderate motion, which are labelled by the intraframe mode, the concealment uses temporal repetition as described in Section 3.2. Only in the case of fast motion, which is indicated by the intrafield mode, the concealment algorithm switches over to a spatial interpolation in order to avoid the strong degradation of temporal concealment. Finally, after inverse DCT a block smoothing algorithm [7] reduces blocking effects, which might occur in the case of spatial concealment. This adaptive algorithm has also been evaluated by subjective tests [15]. The chosen assessment method again was the EBU method. The results are presented in dependence on error rate in Fig. 7. At low error rates (22 FLC block errors and 11 VLC stripe errors) the adaptive concealment works just as well as temporal concealment at sequences with moderate motion (‘Mobile and Calendar’) and even
VLC intra VLC intra error frame efTor field flag mode flag mode
HF-AC VLC - coded DCT- coefficients A LF-AC
c temporal
-
output
zero
_ ZDCT -
block 4 smoothing
t
t
FLC field mode v?c error
error
frame
error
intra field
flag
mode
flag
mode
FLC
intra
FLC
Fig. 6. Block diagrams for adaptive concealment with block smoothing.
data
180
M. Kharatichvili,
P. Kauff 1 Signal Processing:
Image Communication
7 (1995) 173-182
(fat motion)
-WV
adaptiveooncwlmsnt in “MOBILE & Calendar” @mIdmotion)
__--
@we =wh
\FmmW+itiion (both=l==N
P$Me&a
hpaeptible 22
I
I
88
352
Fig. 7. Subjective quality of motion-adaptive
better at sequences with some more motion (‘Kiel Harbour’). This can easily be noticed by comparing the results from Figs. 5 and 7. The second result is that for sequences with moderate motion, the adaptive algorithm guarantees better concealment results than simple frame repetition at an error rate up to 350 FLC blocks and 44 VLC stripes. However, in the case of sequences with fast motion, the adaptive algorithm is already worse than frame repetition at a rather low error rate. Following these results, a further improvement could be obtained by introducing frame repetition as a fallback mode. For this purpose the algorithm measures the total number of erroneous blocks, which are coded in intrafield and intraframe mode, respectively. If the number of errors in blocks with fast motion (intrafield mode) exceeds a first threshold (e.g. 22 FLC block and 11 VLC stripe errors) or the number of contaminated blocks with moderate motion (intraframe mode) exceeds a second threshold (e.g. 352 FLC blocks and 44 VLC stripe errors), the system switches over to the fallback mode. Thus, the degradations caused by tape errors can be limited to slightly annoying artefacts in any case. Taking into account that the high error rates from Fig. 7 will only occur in catastrophic situations with very low probability (e.g. extremely long burst errors caused by drop outs or loss of complete tracks) and that even the lowest error rate (22 FLC block errors and 11 stripe errors) will only occur
I 792/
FLC blockerrors perfhna
concealment versus frame repetition.
under bad operational conditions, this adaptive concealment approach can be considered as sufficient for the envisaged application. The statement about error rate has been derived from a technical report of magnetic data recording giving error statistics of a Dl channel [19].
3.4. Motion-compensated concealment Although the above adaptive approach is sufficient for the system under study we have additionally investigated whether the performance of concealment at high error rates can be improved by introducing motion compensation. Fig. 8 shows the principle of such a motion-compensated approach. In erroneous DCT blocks the corresponding block from the previous frame is taken by using motion compensation. Afterwards, the motioncompensated blocks are transformed in the DCT domain in order to split the spectral coefficients again into FLC and VLC data. This is done because only those data which are really disturbed by tape errors (either FLC or VLC data) have to be replaced by concealed data. One possibility for implementing such a system is to estimate the vectors before recording (e.g. by using the algorithm proposed in [3]) and to record them on special data channels which might be available in professional HD-VCRs [9]. In this case the performance of motion-compensated
M. Kharatichvili, P. Kauff J Signal Processing: Image Communication 7 (1995) 173482
181
error-free blocks. In a second step, the vectors of erroneous blocks can be interpolated from vectors of neighbouring error-free blocks. We observed that this approach works well at low error rates, but at high error rates the interpolation of missing vectors is not accurate enough. However, at low error rates the motion-adaptive approach from Section 3.3 also shows good results and, in fact, motion-compensated concealment was intended to yield better quality at high error rates and not at low error rates, Therefore, in this case also, we did not organize subjective tests of this vector interpolation technique. It can be stated that the high complexity of motion-compensated concealment is not acceptable with respect to the minor improvements which can be obtained by this technique. I
I
Fig. 8. Principle of motion-compensated
concealment.
concealment considerably depends on the number of motion vectors to be transmitted. Promising results could be demonstrated by using one vector per macroblock. However, this leads to a total number of about 1000 vectors per HDTV frame and generates an overhead rate, which does not fit into available data channels. Therefore, it was investigated whether or not it might be sufficient to record global vectors only. For this purpose, the input field is decomposed into large block units of 160 pels by 80 lines. One global vector is estimated for each block. In this way 12 motion vectors are recorded per HDTV frame. As each vector is coded with 16 bit, the additional overhead for recording the global motion vectors is in a range of 4,8 kbit/s. This fits into an additional data channel, which is available in the digital studio recorder under study. However, in this case we observed that the quality obtained by motion-compensated concealment was hardly better than the motion-adaptive approach from Section 3.3. We therefore did not organize subjective assessments for this technique. A second possibility is to estimate the local vectors at the receiver side. This can be done by estimating in a first step the local vectors for all
4. Conclusions Four different approaches to concealment techniques for data-reduced HD-VCRs, namely spatial, temporal, motion-adaptive and motion-compensated concealment, have been investigated with respect to their performance limits. These investigations have shown that temporal concealment is in principle more suitable in sequences with moderate motion, whereas spatial concealment gives slightly better results at fast motion. Therefore, a motionadaptive algorithm has been proposed. This algorithm combines spatial and temporal concealment in an optimal manner and additionally it foresees frame repetition as a fallback mode. In the HHI codec, the switching between spatial and temporal concealment is controlled by a mode decision flag for intrafield/intraframe DCT coding, which indicates the motion content of the transmitted DCT blocks and which is available within the recorded bit stream. Additionally, the algorithm counts the total number of erroneous blocks with slow and fast motion and switches over to the fallback mode whenever one of these figures exceeds corresponding threshold values. This motion-adaptive concealment has also been compared to motion-compensated approaches. However, considering the restrictions of storage capacity for recording additional motion vectors on the
182
M. Kharatichvili. P. Kauff / Signal Processing: Image Communication 7 (1995) 173-182
tape in the HHI codec, this comparison has shown that motion-compensated concealment provides only slight improvements which do not justify the high complexity of such an approach. Therefore it can be concluded that the motion-adaptive approach is sufficient for the envisaged application. Even for the worst case of error events, which will occur with extremely low probability, the degradations can be limited to slightly annoying artefacts. At more moderate error events (e.g. bad operational conditions, interchanging problems), the artefacts may become visible, but in this case they are not really annoying.
References [ 1] CCIR Recommendation 500-4, Method for the subjective assessment of the quality of television pictures, Document ll/BL/Sl-E, 26 May 1992. [2] EBU, Acceptance criteria for bit-rate reduction in professional digital VTR, EBU Document, MAGNUM/GEN/ 2.1992, Torremolinos, February 1992. [3] M. Ernst, “Motion compensated interpolation for advanced standards conversion and noise reduction”, Proc. 4th lnfernat. Workshop on HDTV and Beyond, Torino, Italy, September 1991. [4] C.A. Gonzales, L. Allman, T. McCarthy, P. Wendt and A.N. Akansu, “DCT coding for motion video storage using adaptive arithmetic coding”, Signal Processing: Image Communication, Vol. 2, No. 2, August 1990, pp. 145-154. [S] P. Guillotel, “A fixed data rate coding scheme for digital HDTV recording”, PTOC. 5th Internat. Workshop on HDTV, Tokyo, November 1992. [6] W. Hartnack, “Concealment techniques for block encoded TV signals”, Picture Coding Symposium, Tokyo, 1986. [7] W. Hartnack, Electronical display (in German), German Patent Office, Publication Number DE 3906712AL, 1990.
[S] M. HausdGrfer, “On digital HDTV recording” (in German), Fernseh und Kinotechnik, Vol. 43, No. 7, 1989, pp. 364-367. [9] R. Hedtke, “An experimental 1.2Gbit/s digital cassette recorder for HDTV”, SPIE Symp. on Electrical Imaging, San Jose, 1991, pp. 661-665. [lo] R. Hedke, P. Kauff, R. Schgfer and P. Stammnitz, “An experimental digital VCR for data-reduced recording of HDTV signals” (in German), Proc. f5th Ann. Conf of FKTG, Berlin, June 1992. [1 1] T. Kato, T. Kuge, K. Majima, T. Kurioka, H. Okuda and H. Oshima, “A study on the bit rate reduction for a broadcast-use HDTV-VTR”, Proc. Internat. Workshop on HDTV, Ottawa, 1993. [12] P. Kauff, S. Rauthenberg, R. Ritter, M. Charatishvili and M. Hahn, “An improved coding scheme for studio recording of interlaced and progressive HDTV signals”, Proc. 5th Internat. Workshop on HDTV, Tokyo, November 1992. [13] P. Kauff and F. Fechter, “Performance limits of datareduced HDTV-recording” (in German), Fernseh und Kinotechnik, Vol. 47, No. 12, December 1993, pp. 749-756. [14] P. Kauff and M. Kharatichvili, Preliminary results of subjective tests on the efficiency of temporal concealment at VCR coding, EUREKA 95 Document DRV 29, October 1991. Cl51P. Kauff and M. Kharatichvili, Subjective comparison between adaptive concealment and frame repetition, EUREKA 95 Document DRV 35, October 1992. Cl61M. Kharatichvili and P. Kauff, “Concealment techniques of a DCT codec for HDTV studio recording”, Proc. Internat. Symp. on Signals, Systems and Electronics, International Union of Radio Science, Paris, September 1992. Cl71P. Stammnitz, K.Biittcher, K. Griineberg, U. Halker and H. Klein, “Hardware codec for digital HDTV recording”, Proc. ECJROPTO Conf: on Fiber Optic and Video Comm.,
Berlin, April 1993. WI M. Umemoto, Y. Eto, K. Katayama and N. Ohwada, “1.2 Gbit/s HDTV digital VTR”, Signal Processing: Image Communication, Vol. 2, No. 3, October 1990, pp. 343-348. Cl91J.H. Wilkinson, “The SMPTE type Dl digital TV recorder-error control”, SMPTE J., Vol. 95, November 1986, pp. 1144-1149.