SIGNAL PROCESSING:
COMMUNICATION Signal Processing:
ELSEVIER
Image Communication 12 (1998) 209-229
A comparison of motion-compensated interlace-to-progressive conversion methods Graham Thomas * BBC Research & Development Department, Kingswood Warren, Tadworth. Surrey KT20 6NP, UK Received
30 April 1996
Abstract
The conversion of interlaced video to progressive format may be carried out in a number of ways, including vertical interpolation, fixed or motion-compensated vertical-temporal interpolation, and approaches based on motion compensation using non-uniform sampling theory. Motion-compensated methods potentially offer the best performance, although the accuracy and reliability of the vectors has a significant impact on their performance. For applications such as display conversion using vectors recovered from an MPEG-2 bitstream, the vectors available may not be of the highest quality. This paper considers several motion-compensated interlace-to-progressive conversion methods, and assesses their performance with both accurate and inaccurate vectors using objective and subjective techniques. The method found to work best uses a motion-compensated three-field vertical-temporal filter, in which the vertical component of the vector is rounded to the nearest even number of picture lines per field period. This method reduced the RMS error of converted picture sequences by almost 20% compared to simple vertical filtering when using vectors from an MPEG-2 bitstream. Conversely, some other motion-compensated methods were found to increase the error by a similar amount. 0 1998 Elsevier Science B.V. All rights reserved. Keywords:
Motion compensation;
Interlace-to-progressive
1. Introduction
The conversion of an interlaced image signal to a progressively scanned one is a key step in any standards conversion process which operates in either the vertical or the temporal domains. The advent of flat-panel displays based on technology such as liquid crystals or micromirrors, which are inherently progressively scanned devices, will create the need to perform this conversion in many video displays. * E-mail:
[email protected], 0923.5965/98/$19.00 Copyright PII SO923-5965(97)00037-4
0
1998 Elsevier
conversion; Deinterlacing;
MPEG-2
Much has been written in the literature on interlace-to-progressive conversion methods. Motionadaptive switching between vertical and temporal interpolation is one approach [l]; similar performance can be achieved without the need for switching by using fixed spatio-temporal interpolation filters [9]. Other adaptive methods include the use of vertical-temporal median filters [4], and techniques based on identifying the spatial slope of lines [IS]. Potentially the best performance may be achieved using motion-compensated methods [S, 91, which offer the possibility of maintaining high vertical resolution with minimal aliasing for
Science B.V. All rights reserved
G. Thomas f Signal Processing:
210
Image Communication
12 (1998) 209-229
2. Description of the algorithms considered
the majority of vertical motion speeds, except those very close to speeds of an odd number of picture lines per field period. In the future, many displays are likely to be fed by signals decoded from an MPEG-2 bitstream. This potentially gives the display device access to the motion vectors from the MPEG-2 coding process, which could allow motion-compensated interlace-to-progressive conversion to be carried out without the need to incorporate a motion estimator in the display converter. However, the vectors recovered from the bitstream are unlikely to be of the highest quality: the MPEG-2 standard divides the vector field into ‘macroblocks’ of 16 x 16 samples, and quantises vertical displacements to either half a picture line or half a field line, depending on the prediction mode. Furthermore, vectors generally refer to the average motion over several field periods, rather than the motion between adjacent fields. This paper compares a number of interlace-toprogressive conversion methods, and considers the impact of inaccuracies and errors in the vector field on those methods using motion compensation. Section 2 describes the conversion techniques considered. Section 3 considers their performance with ‘perfect’ vector information, to show what level of performance is theoretically possible. Section 4 considers the effect of small inaccuracies in the vectors, and Section 5 considers the effect of noise in the image. Section 6 considers how the motioncompensated algorithms may be modified to reduce the effect of large vector errors. Section 7 presents the results obtained when the algorithms were used to process natural image sequences, using vectors recovered from an MPEG-2 bitstream. The main conclusions of the work are summarised in Section 8.
The interlace-to-progressive assessed were as follows:
interpolation
methods
2.1. I/2, l/2 vertical jilter A simple intra-field vertical filter.
2.2. Fixed vertical-temporal jilter A fixed vertical-temporal filter, with an aperture of 3 fields and 9 picture lines. The coefficients in the outer fields summed to zero, guaranteeing full temporal bandwidth for low vertical frequencies and thereby avoiding motion blur. The coefficient values, taken from [lo], are shown in Table 1.
2.3. Vertical-temporal filter with even motion compensqiion The vertical-temporal filter described above was steered to follow vertical motion in steps of 2 picture lines per field. That is, for vertical motion speeds below 1 picture line per field it was unchanged from the fixed filter; for speeds between 1 and 3 picture lines per field the coefficients in the outer fields were moved up and down by two picture lines in the direction of motion, and so on. Note that it is not possible to compensate a three-field filter such as this for fractional motion speeds without reducing the vertical resolution and the degree of alias suppression, because the temporal cut-off frequency cannot be reduced. This is a direct consequence of
Table 1 Coefficients
of the fixed vertical-temporal
filter (the central
‘1’ acting
Vertical Field -1 0 1
-4 0.03 1 0.0 0.031
-3
-2
0.0 - 0.026 0.0
-0.116 0.0 -0.116
-1 0.0 0.526 0.0
to pass existing
position,
picture
lines unchanged)
lines
\
0
1
2
0.170 1.0 0.170
0.0 0.526 0.0
-0.116 0.0 -0.116
4
13 0.0 -0.026 0.0
0.031 0.0 0.031
~
G. Thomas / Signal Processing:
Image Communication
using only three fields, and is discussed in more detail below. The filter was also compensated for horizontal motion to a resolution of 0.25 pixels per field period, using a four-tap horizontal filter to perform sub-pixel interpolation,
2.4. Motion-compensated low-pass jilter
12 (1998) 209-229
211
reject the alias spectrum centred on 25 Hz and 288 cycles per active picture height (c/aph). The passband of the temporal filter should be progressively reduced as the motion speed rises towards the ‘strobe’ speed of one picture line per field, since the baseband and alias spectra approach each other as they are skewed by the motion. This implies a bandwidth ( -6 dB) of 12.5 Hz (with a null at 25 Hz) when the image is stationary, reducing to 6.25 Hz for a motion speed of half a picture line per field and 3.125 Hz for a motion speed of 0.75 picture lines per field. Theoretically, such a filter allows the full vertical resolution to be maintained, without aliasing, for all motion speeds except those very close to the strobe speed. As the strobe speed is approached, the large number of temporal taps in the filter required to achieve a null at a suitably low frequency will lead to practical problems. Fig. 1 shows a filter with a bandwidth of 6.25Hz and
6.25 Hz temporal
Interlace-to-progressive conversion may be achieved using a motion-compensated low-pass temporal interpolation filter to remove the alias spectra caused by interlaced sampling. A spatial interpolator must be used to allow the filter to be skewed for motions of fractions of a pixel or picture line per field. The pass-band of the filter must be skewed so that it is centred on the wanted baseband spectrum, and it must be sufficiently narrow to
vertical frequency, cycles per active picture height
:y, Hz
’
pass-band of filter
alias due to interlace
first null in response
baseband spectrum
/ Fig. 1. Response
of a temporal
low-pass
filter, motion-compensated
for a vertical
velocity
of 0.5 picture
lines per field period.
G. Thomas / Signal Processing:
212
Image Communication
a null at 12.5 Hz, compensated for a motion of 0.5 picture lines per field period. It can be seen that the wanted baseband spectrum lies within the passband of the filter whilst the alias spectra from interlaced sampling lie outside the passband. The null at 12.5Hz ensures complete rejection of the centres of the alias spectra, where most of the unwanted energy lies. For the experiments reported here, a 7-field temporal low-pass filter with a cut-off frequency of 6.25Hz was used for all motion speeds up to 0.5 picture lines per field period. This avoided the need for a very large number of temporal taps, and the need to change the filter pass-band for different motion speeds. The filter was compensated for vertical motion to the nearest 0.25 picture line per field, using an eight-tap vertical interpolator to perform vertical interpolation within each frame (although in practice four of the eight contributions always came from lines of inserted zeroes). The temporal filter had nulls at 25, 18.75 and 12.5 Hz to ensure rejection of the alias at (288 c/aph, 25 Hz) for each compensation velocity. The filter coefficients (multiplied by 16) were (2 -J2),
2, (2 +J2),
4, (2 +J2),
2, (2 -J2).
To avoid motion speeds near the strobe speed, the value of the vertical vector used was modified as follows: for rounded motion speeds of 0.75 and 1.0 picture lines per field, a vertical vector of 0.5 picture lines per field was used; for a rounded speed of 1.25, a value of 1.5 was used. This pattern was repeated for all vertical motion speeds, in a ‘modulo 2’ manner. Avoiding the speed of 0.75 picture lines per field period obviated the need for a temporal filter with a cut-off of 3.125Hz, which would have required around 15 temporal taps for adequate performance. Ideally, a simple vertical filter should have been used for speeds around the strobe speed; the only reason that the temporal filter was used at this speed was to simplify software implementation. The filter was also compensated for horizontal motion to a resolution of 0.25 pixels per field period, in the same way as the vertical-temporal filter. Strictly speaking, this filter was a verticaltemporal filter rather than a purely temporal filter, since vertical interpolation was necessary to deal
12 (1998) 209-229
with fractional vertical speeds. However, the vertical element of the filter was (ideally) of an all-pass nature; the predominant filtering action was lowpass temporal. Conversely, theJixed v-t and u-t with euen mc filters were designed to attenuate both vertical and temporal frequency components. Thus it was felt to be justified to distinguish these filters by the terms temporal and vertical-temporal.
2.5. General sampling theorem 2-jield jilter An alternative to an approach based on temporal filtering is to treat the samples in two successive fields as unevenly spaced samples of the same signal, and use the so-called general sampling theorem to interpolate samples in the required position [8]. The method assumes that the original signal was band-limited vertically to the Nyquist limit for a frame (N/2 c/aph where there are N lines in a frame). The vertical spectrum of each field consists of the wanted baseband signal, plus an unwanted alias signal whose amplitude and phase depend on the relative offset of the lines of the two fields. If the vertical motion between the fields is known, the effective offset may be derived, and the amplitude and phase of the aliases calculated. Appropriate filters can then be designed to recover the wanted baseband spectrum and completely reject the aliases. This allows full vertical resolution to be obtained for all speeds except those very close to the strobe speed. As the strobe speed is approached, the coefficients of the filter become large, leading to noise amplification. A family of filters was designed according to this method; the coefficients are listed in Table 2. Filters were designed for motion speeds of multiples of 0.25 picture lines per field, and a ‘l/2, l/2’ vertical filter was used for odd numbers of picture lines per field. Contributions were taken from the present and the preceding field, corresponding to the ‘backward only’ mode described in [S]. From the table it can be seen that the coefficients tend to get larger as the strobe speed is approached. The table lists coefficients for speeds between 0 and 1 picture lines per field period; filters for speeds outside this range may be derived by shifting coefficients in the preceding field in steps of one field line, or reflecting
G. Thomas / Signal Processing: Image Communication I2 (1998) 209-229 Table 2 Coefficients
of the general sampling theorem 2-field filter Vertical Y Pas
Preceding
Current
213
field:
field:
picture
lines per field
0.25
0.5
0.75
-0.003 0.007 0.075 1.052 -0.116 -0.016 0.003
- 0.007 0.010 0.145 1.257 -0.352 - 0.053 0.005
-0.015 0.01 I 0.283 1.998 -1.100 -0.172 0.008
0.000
0 2 4 6
0.000 0.000 0.000 1.oOO 0.000 0.000 0.000
1 3 7
0.000 0.000 0.000 0.000 0.000
-0.037 -0.243 0.243 0.037 -0.002
-0.091 -0.589 0.589 0.091 - 0.005
-0.222 - 1.444 1.444 0.222 -0.013
0.000 0.500 0.500 0.000 0.000
-6 -4 -2
-3 -1
0
velocity,
1.0
0.000 0.000 0.000 0.000 0.000 0.000
them about the vertical and temporal axes for corresponding negative speeds. The filter was also compensated for horizontal motion to a resolution of 0.25 pixels per field period, as for the previous filters.
2.6. General sampling theorem 3-Jield filter This approach was very similar to the 2-field version described above, but contributions were taken from both the preceding and following fields, by averaging the coefficients for a ‘forward-only’ and ‘backward-only’ version of the filter. This made a time-symmetrical filter.
3. Performance with ‘perfect’ vectors To assess the performance of these interlace-toprogressive conversion methods in the presence of vertical motion and to examine their tolerance to inaccurate vectors, some experiments were conducted using a sequence containing synthetic movement. A portion of the test picture ‘Formal Pond (Fig. 2) was selected, which contained a number of sharp straight lines at a range of angles. (It was important to choose a test sequence with
Fig. 2. The portion of the test image ‘Formal Pond’ (size 360 x 288) that was moved synthetically to generate test sequences.
a significant amount of high vertical detail, so as to allow the performance of different conversion methods to be clearly distinguished.) This was moved using a very high-order interpolator to produce a progressively-scanned sequence that was stationary for 8 fields, then accelerated vertically at a rate of 0.04 picture lines per field period until reaching a speed of 2 picture lines per field period, 58 fields after the start of the sequence. The speed was held constant at 2 picture lines per field period for the remaining 10 fields of the sequence. The
214
G. Thomas / Signal Processing:
Image Communication
effect of camera integration was not modelled, so the sequence corresponded to that from a camera with a very short shutter time. An interlaced sequence was derived from the progressive version by discarding alternate lines. Since the vertical resolution of the test picture was judged to be not atypical of that from an interlaced source, no pre-filtering was applied before interlacing. Thus the vertical aliasing present within a field resembled that typically found in real interlaced video signals; it is the task of an interlace-to-progressive converter to interpret this aliased part correctly. The sequence thus allowed an objective measurement of the performance of interlace-to-progressive converters to be assessed, by comparing the processed sequence to the progressive original. The subjective quality of the converted sequences was also assessed, using a 50 Hz progressive monitor. Fig. 3 shows the root mean square (RMS) error that each interpolator produced using ‘perfect’ vectors. The simple l/2, l/2 vertical interpolator gave a significant interpolation error of around 7-8 grey
0
I2 (1998) 209-229
levels RMS; it fluctuated as the image moved at fractional motion speeds, since the alias. signal changed phase rapidly. Poor vertical resolution and aliasing were visible in the converted image, most notably on diagonally sloping edges. The aliasing became particularly visible as the image moved, since the aliases moved differently to the picture material. The fixed vertical-temporal interpolator performed better than the simple vertical interpolator for very low speeds, but aliasing quickly became apparent as the speed increased. For motion speeds over 1 picture line per field, the fixed vertical-temporal interpolator performed worse than the simple vertical interpolator, since the contributions from adjacent fields were mispositioned significantly. The application of motion compensation (quantised vertically to the nearest even number of picture lines per field) to the vertical-temporal filter significantly improved its performance for velocities around 2 picture lines per field. Indeed, for all velocities of an even number of picture lines per
0.25 0.5 0.75 1.0 1.25 1.5 1.75 Motion speed, picture lines per field
Fig. 3. RMS error for various interlace-to-progressive vertically with known motion (no camera integration).
conversion
methods,
for
an
2.0
artificially-generated
sequence
moving
G. Thomas / Signal Processing:
Image Communication
field, its performance returns to that of the fixed filter acting on a stationary image. However, vertical aliasing was still clearly visible for intermediate speeds. The three interpolators that used motion compensation to fractional vertical accuracy all performed very well, and all gave broadly similar results. The filters based on the ‘general sampling theorem’ produced slightly lower RMS errors than the motion-compensated temporal interpolator at the motion speeds for which they were designed (multiples of 0.25 picture lines per field). The 2-field filter was particularly sensitive to differences between the design and actual velocity, showing clear peaks in error at velocities mid-way between the design values; the 3-field filter was much better in this respect. The temporal filter tended to give slightly higher errors overall, but was less sensitive to differences between the design and actual velocity. The temporal filter gave higher errors than all other filters tested for speeds around 1 picture line per field, since no switch was made to a pure
12 (1998) 209-229
215
vertical filter for this speed, as explained earlier. Subjectively, all three interpolators that used motion compensation to fractional vertical accuracy performed very similarly, yielding much better picture quality than the other methods considered for all speeds except those close to 1 picture line per field, for which they performed similarly. To assess the impact of temporal camera integration, the experiments were repeated using a similar artificially generated sequence, the generation of which included modelling the effect of a camera with a rectangular temporal aperture giving an integration time of one field period (other effects such as beam discharge lag or spatial aperture were not modelled). The results are shown in Fig. 4. Camera integration provides a pre-filtering action which attenuates high vertical frequencies in the presence of vertical movement, providing some degree of anti-aliasing pre-filtering. As expected, the RMS errors for all methods were reduced at the higher motion speeds, compared to the results with no integration. For example, at 2 picture lines per
.*.
i
0
0.25 0.5 0.75 1.0 1.25 1.5 1.75 Motion speed, picture lines per field
Fig. 4. RMS error for various interlace-to-progressive known motion, and simulated camera integration
conversion methods, of one field period.
for an artificially
generated
2.0 sequence moving
vertically
with
216
G. Thomas / Signal Processing:
Image Communication
field, the error from the vertical and the fixed vertical-temporal filters were both reduced by approximately 40%, and that from the verticaltemporal filter with even motion compensation was reduced by approximately 65%. The performance of the other motion-compensated systems, which gave low errors even in the absence of camera integration, were essentially unchanged. Thus the effect of camera integration was to reduce the differences in performance between the various methods studied. Although there was still a significant advantage to be gained by the use of motion compensation, the performance of the simplest motion-compensated method (v-t with even mc) started to approach that of the more sophisticated methods as the motion speed rose above 1 picture line per field. Note that these results correspond to a camera integrating for a full field period. CCD cameras (which now make up the majority of cameras used for broadcasting) can of course be operated in a ‘shuttered’ mode, where the integration time is reduced. However, the use of short shutter times is generally confined to outside broadcasts, particularly sports. Studio-originated programmes commonly use the cameras ‘un-shuttered’ in order to maintain maximum sensitivity. Thus although a significant amount of temporal camera integration may be present in the majority of broadcast material, its presence cannot be relied upon.
4. Assessment
of interlace-to-progressive conversion methods for vectors with small errors
Having confirmed that significant gains in picture quality of interlace-to-progressive conversion can be made given accurate vectors, an experiment was carried out to see how tolerant the various algorithms were to small errors in the motion vector. The test sequence from the first experiment (without camera integration) was again used, but the motion vector used to compensate the filters was fixed at 0.5 picture lines per field (this speed was chosen as it is mid-way between the ‘special case’ speeds of 0 and 1 picture line per field period). Fig. 5 shows the resulting RMS errors. Note that the filter u-t with even mc is not shown in this figure,
12 (1998) 209-229
since its performance is identical to the Jixed v-t filter for this vector value, as it is below 1 picture line per field. The plots for the fixed filters are, of course, the same as those in Fig. 3, and have been included for comparison. It can be seen that the motion-compensated filters performed better than the fixed filters for velocities close to the correct motion velocity, but that their performance degraded significantly as the motion vector error increased, giving around twice the RMS error when the difference between assumed and true velocities was greater than about 1 picture line per field. The two-field ‘general sampling theorem’ filter was the least tolerant to vector errors; for example the fixed vertical-temporal filter gave better results than this filter for vector estimation errors of 0.25 picture lines per field or more. The 3-field version of this filter was significantly more tolerant to vector errors, and performed better than theJixed v-t filter for actual motion speeds within about f0.5 picture lines per field of the assumed speed. The motion-compensated temporal low-pass filter showed about the same overall tolerance to vector error as the 3-field ‘general sampling theorem’ filter, being slightly worse than this filter for actual speeds below the assumed speed but slightly better for actual speeds above. From these results it appears that the vertical component of the motion vector must be known to an accuracy of about 0.25 picture lines per field in order to achieve a worthwhile gain over a simpler fixed vertical-temporal filter. The additional tolerance of the 3-field ‘general sampling theorem’ filter over the 2-field version is likely to be of benefit, particularly when motion vector errors regularly exceed 0.2 picture lines per field. The tolerance of the various methods was also assessed for errors in the horizontal motion vector, using a test sequence with horizontal artificial movement but otherwise similar to that used for the vertical motion experiments. The motion speed varied from 0 to 2 pixels per field period. Each motion-compensated process was given a vector value fixed at 0.5 pixels per field period. Fig. 6 shows the results of this experiment. Comparing Figs. 6 and 5 (which have deliberately been drawn with the same vertical scale) it is clear that errors in the horizontal component of the
G. Thomas 1 Signal Processing:
Image Communication
12 (1998) 209-229
217
25
0
0.25 0.5 0.75 1.0 1.25 1.5 1.75 Motion speed, picture lines per field
Fig. 5. RMS error for various interlace-to-progressive conversion methods, but assuming a fixed vertical motion vector of 0.5 picture lines per field.
25
,
I
I l/2.1/2
I
vetiical
’ 1 fixed
20 LG P u $15 b 8 !jj 10
L-t with evetj .&2~&..~te~.Jpf
v-t
I
k---
for an artificially
generated
2.0 sequence accelerating
vertically,
y-
t_+____
mc +=. {::e::::._... ____ (
igst 2$jeld F*-.-. j gst 3Qleld T-.-
+ j .._..._...__
L ____..............:
i .._....__..
5
0
0
0.25
0.5 0.75 1.0 1.25 1.5 1.75 Motion speed, pixels per field
Fig. 6. RMS errors for various interlace-to-progressive conversion methods, for an artificially horizontal motion speed, but assuming a horizontal motion vector fixed at 0.5 pixels per field.
2.0
generated
sequence
with increasing
218
G. Thomas / Signal Processing: Image Communication I2 (1998) 209-229
motion vector had a much smaller effect than errors in the vertical component. One reason for this is that uncompensated vertical motion can give rise to aliasing, whereas uncompensated horizontal motion merely reduces the resolution of the interpolated lines. The relative tolerance of the motioncompensated methods to horizontal vector errors was similar to that found earlier for vertical errors. It is interesting to note that the v-t with even mc filter is very tolerant to the presence of uncompensated motion, since it takes a relatively small amount of energy from adjacent fields, and furthermore has a small and symmetrical temporal aperture. However, its overall performance was poorer than the other motion-compensated filters. These results suggest that horizontal motion vector accuracy is not critical, and that the vector resolution of a quarter of a pixel per field period (the resolution likely to be available from vectors recovered from an MPEG-2 bitstream) is adequate for even the least tolerant method examined. Lower resolution, to the nearest half or integer pixel, would be sufficient for the u-t with euen mc filter, and would introduce only small errors for the gst 3-field filter.
5. The effect of noise in the image signal Another factor that needs to be considered is the tolerance to noise of the interlace-to-progressive conversion methods. This is of particular relevance when converting the output of an MPEG-2 decoder to progressive format, due to the presence of coding noise caused by the quantisation of the discrete cosine transform (DCT) coefficients in the coder. In particular, filters based on the ‘general sampling theorem’ would be expected to have a poor noise performance for motion speeds around odd numbers of picture lines per field period, since the filters used at such speeds tend to have gains much greater than unity for certain frequencies. Conversely, an approach based on temporal low-pass filters might not be expected to be so sensitive, since such filters only ever attenuate components of the signal spectrum. Indeed, temporal filters may help to reduce the coding noise. To investigate this, a test sequence was made by adding white noise corresponding to a peak signal-
to-noise ratio of 30 dB to the interlaced test sequence with artificial vertical movement (without camera integration) used for the earlier experiments. An alternative to using white noise would have been to code the sequence using MPEG-2 and use the noise introduced by the coding process. This approach was not adopted because the level of noise introduced would have varied as a function of motion speed, buffer occupancy, position within the group of pictures (GOP) structure and so on, making the interpretation of results difficult. The noisy sequence was converted to progressive using the methods described earlier, and the results compared to the original clean progressive sequence. Fig. 7 shows the RMS errors. As expected, the filters based on the ‘general sampling theorem’ were much more sensitive to noise than the temporal filter. The two-field filter amplified the noise significantly; the RMS error for 0.75 picture lines per field being significantly higher than that from the simple vertical interpolator. The three-field filter was less sensitive to noise, but still produced RMS errors which rose to about double those from the temporal filter at a speed of 0.75 picture lines per field. Subjectively, the sudden appearance of noise at certain speeds was annoying, particularly in the case of the gst 2-field filter, reducing the overall picture quality to a similar level to that from the simple vertical or vertical-temporal filter. The temporal filter produced the best subjective results, but would of course be more expensive to implement, as the filter aperture was 7 fields, compared to 2 or 3.
6. Ensuring tolerance to vectors with large errors The tolerance of the various interlace-to-progressive conversion algorithms has already been discussed with respect to small vector errors. However, algorithms also need to be tolerant to significant errors in the vector field, since perfect vectors will never be available 100% of the time. This is particularly important when considering display conversion using vectors from the MPEG-2 bitstream, since the block-based nature of the vectors makes it impossible to convey a perfect vector field.
G. Thomas 1 Signal Processing:
0
Image Communication
12 (1998) 209-229
219
0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0
Motion speed, picture lines per field Fig.7. RMSerror for various moving
with known
vertical
interlace-to-urogressive _ motion.
conversion
One approach to the problem is to assess how well the vector signal matches the motion for each small region or pixel in the image, and switch to a non-motion-compensated method when the vectors become unreliable. However, the derivation of a reliable control signal is not a trivial matter, particularly in the presence of coding noise. We therefore considered the philosophy that has been shown to work successfully for fixed verticaltemporal filters [9], namely ensuring that all lowfrequency vertical information is derived from the present field. With this approach, incorrect vectors can never lead to degradation of low vertical frequencies, and significant blurring or combing is prevented. There is a price to pay in terms of vertical resolution and aliasing, since high vertical frequencies in the source signal can never be properly recovered, and will always manifest themselves as ‘twitter’ or vertical aliasing. However, real signal sources rarely generate much energy at such frequencies, and progressive displays cannot in any case represent high vertical frequencies accurately
methods,
for an artificially
generated
sequence
with 30dB
PSNR,
due to the absence of a proper vertical post-filter at the display, giving rise to the ‘Kell’ factor. Subjectively, errors in the interpolation of low vertical frequency components are much more visible than errors in high frequencies (since the response of the eye is highest for low spatial frequencies) and therefore it is this kind of error which is most important to prevent (examples of this will be shown in Section 7.1). A low-pass intra-field vertical filter was designed with a cut-off frequency of 36 c/aph (a quarter of the vertical resolution supportable in one field). The filter also passed frequencies above 252 ( =288 36)c/aph, since such frequencies cannot be distinguished from low vertical frequencies by a filter operating within a single field. Signals within these pass-bands were derived from a simple ‘l/2, l/2’ intra-field vertical filter, with the remainder being derived by a motion-compensated interlace-toprogressive converter. This was achieved using the arrangement shown in Fig. 8 (compensating delays are not shown). The motion-compensated
220
G. Thomas / Signal Processing:
Image Communication
I2 (1998) 209-229
Fig. 8. Use of a low-pass filter to ensure all low vertical frequencies are derived from an intra-field interlace. -to-p1.ogressive converter.
converter acts on a signal from which the low frequencies have been removed; it is therefore important that it has ‘linear’ behaviour. All the methods studied here are of this type, since their output is a linear combination of input samples, under the control of vectors from an external source. If a motion-compensated conversion method including some non-linear elements was used (such as edge detection, median filtering or motion estimation using the actual input signal), then an alternative arrangement in which the highpass filter was applied to the output of the filter should be used. However, for filters with linear behaviour, the arrangement of Fig. 8 is preferable, since low vertical frequencies in the input signal can potentially produce spurious high vertical frequencies in the output signal if there are discontinuities in the vector field. Removing the low frequencies before the converter prevents this from happening. The cut-off frequency of the filter was deliberately chosen to be very low so as to allow most of the vertical spectrum to be derived from the motioncompensated filter. The figure of 36 c/aph theoretically allowed vertical resolution up to 87% of the Nyquist limit for a frame (252c/aph) to be obtained. This is above the likely usable resolution of most progressive displays, so the filter was unlikely to reduce the perceived vertical resolution. The filter had a vertical aperture of 7 field lines. The choice of the cut-off frequency also depends on how reliable the motion vector signal is considered to be, and on the subjective effect of vector errors on the particular motion-compensated filter being used, since only frequencies below the cut-off are protected from the effect of vector errors. As the results presented below will show, the value chosen here was probably slightly too low, since some artefacts from incorrect vectors were still visible, but no significant
loss of vertical resolution was apparent (compared to the unprotected filter) when the vectors were correct. The cut-off frequency of 36 c/aph can be compared to the vertical response of the vertical-temporal filter, which was around 100 c/aph at 25 Hz and 188 ( =288 -lOO)c/aph at 0 Hz. Thus the verticaltemporal filter took much more information from the current field, making it more robust to uncompensated motion, but it provided poorer vertical resolution in stationary (or correctly motioncompensated) areas. The motion-compensated filter itself can be modified to include the action of the vertical filters in Fig. 8, rather than implementing the filters separately. For example, when this ‘protection’ strategy was applied to the motion-compensated temporal low-pass filter, its response for a motion speed of 0.5 picture lines per field became that shown in Fig. 10, which can be compared to the response of the ‘unprotected’ filter, shown in Fig. 9. The contour lines correspond to responses of -1, -3, -6 and -2OdB. The ‘protected’ filter had unity gain for all temporal frequencies on the zero vertical frequency axis, and zero gain for all temporal frequencies at 288c/aph. The low-level response in the upper left and lower right quadrants was due to the relatively slow roll-off of the vertical filter used when skewing the temporal filter by fractional amounts of a picture line. It can be thought of as the residual response of this filter above 288c/aph, aliased down into the baseband; this can be most clearly appreciated from Fig. 9. For comparison, the response of the fixed verticaltemporal filter (which takes a larger proportion of low vertical frequencies from the current field) is shown in Fig. 11. Note that the vertical-temporal filter with even motion compensation can be
G. Thomas / Signal Processing:
Image Communication
I
12 (I 998) 209-229
,
1
I
221
,
I
’
I
1
I 200 r 20
100
______---6 5 z E 73
0
----__a__
----------______
I I
.fj -100 >
-200
,
11
-20 temporal frequency, Hz Fig. 11 Response
‘;O&Jg$i
_____ _________
__-c_e
--___.._______--------_--_--------___---_____ ---------*
--
_-
,’
____-_ NW---__----_ ----__________.
,
_______a------____
___--__
c_-= = z-=-z
~
=. =‘,‘
~~~~-:
-200
o-
1 I’
I
I
-20
-10
I 0
I
14 0
11
IO
I
t
20
temporal frequency, Hz
Fig. 9. Response of the temporal low-pass filter, motion-compensated for a velocity of 0.5 picture lines per field period.
200
1
-10
I
I
10
20
,I
temporal frequency, Hz Fig. 10. Response of the temporal low-pass filter, motion-compensated for a velocity of 0.5 picture lines per field period, modified to take low vertical frequencies from the current field.
thought of as a ‘l/4, l/2, l/4’ low-pass temporal filter (with even motion compensation), combined with a low-pass ‘protection’ filter with a bandwidth of approximately 100 c/aph.
of the 3-field fixed vertical-temporal
filter.
The plots in Fig. 12 show the slight reduction in performance that occurred in situations in which the vectors were correct, when using the low-pass intra-field filter arrangement described above with the motion-compensated temporal low-pass filter (similar results were obtained when applying this protection to the filters based on the ‘general sampling theorem’). For comparison, the performance of the ‘l/2, l/2’ fixed vertical filter and three-field vertical-temporal filter with motion compensation (to the nearest even number of picture lines per field period) is also shown. The reduction in performance of the motion-compensated temporal filter was typically around 1 grey level RMS. Subjectively, this manifested itself as a small degree of 25 Hz flickering in detailed areas, and more obvious vertical aliasing as the image moved. However, there was still a worthwhile improvement over the vertical-temporal filter with motion compensation, which is an example of a simpler filter that is robust against uncompensated motion. An experiment was also conducted using the test sequence that included camera integration; it was found that for speeds over 1 picture line per field the performance was reduced by a much lower figure (less than 0.5 grey levels). This was to be expected, as at these
222
G. Thomas / Signal Processing:
I
0
I
Image Communication
1
I
1
I2 (1998) 209-229
I
1
0.25 0.5 0.75 1.0 1.25 1.5 1.75 Motion speed, picture lines per field
2.0
Fig. 12. RMS error after interlace-to-progressive conversion using the motion-compensated temporal filter, with and without a filter to take low vertical frequencies from an intra-field low-pass filter operating on the current field. The responses of two of the other filters studied earlier, which are also robust in the presence of non-compensated motion, are shown for comparison.
speeds the vertical resolution of the sequence is reduced, leaving little energy from high vertical frequencies for the ‘protection’ system to misinterpret as high temporal frequencies. An experiment was conducted to test the advantage of this protection method in the presence of incorrect vectors. The sequence used earlier, containing accelerating horizontal motion in the range O-2 pixels per field period, was processed assuming a fixed motion speed of 0.5 pixels per field period using the motion-compensated temporal filter with ‘protection’. The performance of this filter, together with that of the ‘unprotected’ version and some of the other filters studied, is shown in Fig. 13. The protection method gave an improvement when the vector was in error by more than about 0.25 pixels per field period, and succeeded in keeping the error below that from a simple vertical interpolator even for image velocities up to 1.5 pixels per field period above the design speed of the filter (the maximum speed that occurs in this test sequence). The price
paid, as shown previously, was a reduction performance when the vector was correct.
in
7. Tests with real sequences and vectors recovered from an MPEG-2
signal
7.1. Objective assessment
Following the experiments with synthetically moving sequences, the performance of the algorithms was examined using real image sequences. In order to allow an objective measure of performance to be derived, experiments were conducted using three progressive test sequences, football, bbcdisc and renata91. football shows a zoom into a group of football players, with a highly detailed background including the stand, advertising hoardings and brickwork. bbcdisc shows a rotating disc with various pictures and lines on it, with a rotating bicycle wheel overlapping a part of the disc.
G. Thomas / Signal Processing: Image Communication 12 (1998) 209-229
223
16
0
0.25
Fig. 13. RMS errors for the motion-compensated vertical filter, for an image accelerating horizontally for comparison).
0.5
0.75 1.0 1.25 1.5 1.75 Motion speed, pixels per field
2.0
temporal low-pass filter with and without ‘protection’ using a low-pass intra-field but assuming a fixed vector value of 0.5 pixels per field period (other methods shown
renata91 shows a lady walking in front of a detailed background, with the camera panning to follow her. The first two of these were shot on 50 Hz film, the camera having a temporal integration time of approximately half a field period (‘50% shuttered’). Conversely, renata91 was shot with a tube camera, which integrated for a full tield period and additionally exhibited a significant degree of lag. These sequences were converted to interlaced form after application of the well-known ‘HHI’ vertical prefilter, as used for example in [8]. The coefficients of this filter (multiplied by 1000) are -4, 8, 25, -123, 230, 728, 230, -123, 25, 8, -4. Vectors were derived by the phase correlation method, followed by a tracing and refinement process [7]; this approach has been shown to yield vectors that correspond closely with the true motion in the scene. These vectors were then used for coding the sequences into MPEG-2. For the experiments described here, vectors were transmit-
ted for all blocks, including those coded intraframe, in the form of ‘concealment’ vectors in the MPEG-2 bitstream. This introduced a small overhead of around l-2% in overall bitrate, due almost entirely to the vectors sent in I-frames. Had concealment vectors not been transmitted, vectors would have to have been interpolated from spatially or temporally adjacent blocks. Inter-field pixel-rate vectors were derived from the vectors used for coding, by dividing the displacement vectors by the number of field periods between the reference field and the field being predicted. Vectors in the corresponding blocks in the adjacent fields were examined to see whether the temporal distance of either of these vectors was lower, and if so, that vector was used instead. This helped to reduce errors in the recovered vectors due to acceleration or the revealing or obscuring of picture material between the reference and predicted image. This was particularly effective for ‘P’ frames, where the time period between reference
G. Thomas / Signal Processing:
224
Image Communication
and predicted images was the longest (three frames). A four-point bilinear interpolator was used to generate a vector for every pixel from the vectors of the four nearest macroblocks. Whilst this produced a smooth vector field and worked well with the inherently uniform vector fields derived by phase correlation and tracing [7], artefacts could be produced if the original vector field is highly nonuniform (for example, if an exhaustive-search block match was used in the MPEG-2 encoder). Alternative approaches such as ‘block erosion’ [2] may offer better subjective performance in such situations. These vectors were then used to control the various interlace-to-progressive converters, and the converted sequences were compared to the prefiltered progressive original in order to compute the RMS error. The gst 2-jeld filter was not tested, since the three-field version was expected to give a better indication of the performance obtainable from this class of filter. Note that the original (rather than the decoded) interlaced sequences were processed, so that artefacts from the interlace-toprogressive conversion could be separated from those due to coding.
12 (1998) 209-229
Fig. 14 shows the RMS errors calculated for the three sequences used, football, bbcdisc and renata91. Examples of portions (128 pixels by 128 lines) of processed images from bbcdisc from some of the algorithms are shown in Figs. 15-20. The performance of the fixed vertical-temporal filter (Fig. 16) was relatively poor on these sequences (in terms of vertical resolution and aliasing), since virtually all parts of the sequences were moving. The two ‘fully’ motion-compensated algorithms tested (‘general sampling theorem’ 3-field, ‘gst3’, Fig. 18 and the 7-field 6.25 Hz temporal lowpass filter, Fig. 19) both performed even worse than these fixed algorithms, due to inaccuracies and errors in the recovered motion vectors. Some of these errors were present in the original pixel-rate vector field before conversion to GOP-based macroblock form, but others came from the block-based nature of the MPEG-2 vectors, and the vertical vector quantisation. Revealed and obscured background further contributed to the errors. The temporal filter tended to produce low-amplitude blurring where the vectors were wrong, whereas the gst 3-field filter produced higher-amplitude errors but over a smaller area, reflecting the shorter temporal extent of the filter. There were a few areas in both
8 u)
5
>7 2
football 1
$ m5
bbcdisk $4 k “3
renata91 j
$2 LT
Fig. 14. RMS errors
from various
interlace-to-progressive
conversion
methods,
using vectors
recovered
from an MPEG-2
bitstream.
G. Thomas / Signal Processing:
Image Communication
12 (1998) 209-229
225
Fig. 15. Portion of progressive image polated using simple vertical filter.
from
‘bbcdisc’
inter-
Fig. 18. Portion of progressive image from ‘bbcdisc’ interpolated using ‘general sampling theorem’ 3-field mot camp filter.
Fig. 16. Portion of progressive polated using fixed 0-t filter.
from
‘bbcdisc’
inter-
Fig. 19. Portion of progressive polated using motion-compensated
image
image from ‘bbcdisc’ temporal filter.
inter-
Fig. 17. Portion of progressive image from ‘bbcdisc’ interpolated using v-t filter with motion camp to nearest even number of pit. lines.
Fig. 20. Portion polated using protection.
of progressive image from ‘bbcdisc’ intermotion-compensated temporal filter with
test sequences where the ‘fully’ motion-compensated algorithms gave improved vertical resolution or reduced aliasing compared to the fixed filters, but the effects were small compared to the level of
artefacts. The improvements were not as widespread as might have been expected, suggesting that the accuracy of the recovered vectors, particularly in the vertical direction, was insufficient in many areas.
226
G. Thomas / Signal Processing: Image Communication 12 (1998) 209-229
The use of the ‘protection’ system discussed previously significantly improved the performance of the two ‘fully’ motion-compensated algorithms, both subjectively and objectively. Almost all motion artefacts were significantly reduced, with remaining errors taking the form of low-amplitude combing. An example of some artefacts in the bbcdisc sequence can be seen in Fig. 19, generated using the 7-field temporal interlace-to-progressive filter without protection. The same portion processed with the ‘protected’ form of the filter is shown in Fig. 20. In the absence of protection, artefacts from incorrect vectors in the region where the spokes of the bicycle wheel (moving up and to the left) cross over the book cover mounted on the rotating disc (moving down and to the right) are clearly visible. The curved nature of the artefacts in the spokes is due to the effect of the bilinear interpolator smoothing the vectors between macroblocks. The protected version of the interpolator shows artefacts at a much lower level. The best performance came from the verticaltemporal filter with compensation to an even number of picture lines per field period (Fig. 17). This filter was inherently more robust to vector errors than the other motion-compensated filters with protection, since it had a vertical passband for high temporal frequencies about three times that of the 36c/aph filter used in the protection system. Picture resolution was much improved in many moving areas compared to both the simple vertical and the fixed vertical-temporal filters (see, for example, the spoke at the bottom of the picture, just left of centre).
7.2. Subjective assessment In order to quantify the subjective improvement obtained by using the vertical-temporal filter with even motion compensation using vectors derived from an MPEG-2 bitstream, a small-scale subjective test was conducted using the three test sequences used for the objective measurements. In addition to this filter, the ‘l/2, l/2’ vertical filter and the fixed vertical-temporal filter were tested for comparison. Also, a three-point vertical-temporal median filter was tested, since this algorithm is used
in at least one consumer TV set [3]. This filter interpolated samples on the missing lines by computing the median of the sample above, the sample below, and the sample at the same vertical position in the following field. The double-stimulus continuous quality assessment method was used [SJ, using the progressive source pictures (filtered with the ‘HHI’ pre-filter) as the reference. Eleven observers took part, most of whom were ‘expert’ viewers. The test methodology calls for at least 15 observers that should ideally be non-expert, so the results should only be taken as a rough indication of the subjective quality, rather than as a rigorous assessment. The observers sat at a distance of four times picture height. All the tests were conducted with monochrome image sequences. The difference between the quality of the reference and test pictures was computed, and averaged across all observers for each sequence. The results were expressed in terms of grades, one grade being the spacing between adjacent points on the ITU-R five-point quality scale. The means and standard deviations of the impairments are shown in Fig. 21. Comparing Figs. 21 and 14, it can be seen that the subjective measurements tell broadly the same story as the objective measurements. The fixed vertical-temporal filter performed worse than the simple vertical filter on these sequences, since they consist almost entirely of moving areas. (If test sequences with slower movement had been included, it is very likely that the vertical-temporal filter would have performed better than the simple vertical filter, since it is known from [9] that vertical-temporal filters are generally preferable to vertical filters for typical television). The addition of motion compensation to the vertical-temporal filter consistently improved its subjective quality, by up to half a grade. For renata91, the quality of the up-converted sequence using this method was only about 0.3 grades below that of the progressive reference. However, on the other two sequences, its performance was still poor in absolute terms. The average impairment measured over all three sequences was only marginally below that for the simple vertical filter. This was probably because football and bbcdisc contained significant amounts of moving vertical detail close to the strobe speed,
G. Thomas / Signal Processing: Image Communication 12 (1998) 209-229
221
Cl l/2,
l/2
vertical
W v-t with even mc
Fig. 21. Means and standard deviations of impairments measured in the small-scale subjective test comparing the quality of some interlace-to-progressive conversion methods.
which is difficult for any conversion method to handle well. The median filter generally provided the lowest picture quality, although this too would have been expected to work better for stationary or slowly moving sequences. It would have been interesting to carry out further tests using a wider range of source material. However, for this test methodology, a progressive source signal is required (for the reference), and this severely limited the availability of suitable material. Nevertheless, several interlaced source sequences were processed and assessed informally, and the findings confirmed those discussed above. An example of a portion of one of these sequences (basketball) is shown in Figs. 22-24. The poor vertical resolution and aliasing produced by simple vertical interpolation can be seen in Fig. 22, particularly in the lines on the floor and the player’s shorts. The fixed vertical-temporal filter (Fig. 23) produced significantly better results for the lines on the floor, which are moving slowly, but performed poorly on the player, who is moving more rapidly. Indeed, on the upper part of the player’s vest, it appeared slightly worse than the simple vertical filter. The vertical-temporal filter with even
motion compensation (Fig. 24) produced good results on both the player and the floor, without introducing any artefacts, even in the revealed and obscured background areas around the player’s rapidly moving arms and feet (despite the absence of any explicit detection or special processing for such areas).
Fig. 22. Field interpolated using ‘l/Z, l/2’ vertical filter.
228
G. Thomas / Signal Processing: Image Communication I2 (1998) 209-229
Fig. 23. Field interpolated using fixed vertical-temporal filter.
Fig. 24. Field interpolated using vertical-temporal even motion compensation.
filter with
8. Conclusion This paper has considered various methods for performing interlace-to-progressive conversion, with the objective of identifying the best method to use when the only motion vector signal available is one recovered from an MPEG-2 bitstream, which is likely to have limited resolution, both spatially and in terms of vector component magnitude. It was found that by using the vectors recovered from the MPEG-2 bitstream to motion-compen-
sate a vertical-temporal filter in horizontal steps of a quarter of a pixel and vertical steps of an even number of picture lines per field period (termed vertical-temporal jilter with even motion compensation), a worthwhile improvement in picture quality over a simple ‘l/2, l/2’ vertical filter and a fixed vertical-temporal filter could be obtained. This was verified using both objective and subjective measures. Using the vectors to steer more sophisticated filters (6.25 Hz temporal low-pass jilter, general sampling theorem 2- and 3-jeldjilters), which attempted to maintain good vertical resolution for all vertical speeds except those close to the strobe speed, gave worse results than those obtained from the steering the simpler vertical-temporal filter. This was because the filters generated objectionable artefacts in the presence of grossly incorrect vectors, since they placed much more reliance on information derived from adjacent fields. Even when the vectors were approximately correct, the accuracy of the recovered vertical vector component was not always good enough to allow high levels of resolution to be obtained. The vertical-temporal jilter with even motion compensation is also simpler to implement, as it requires fewer coefficients than the more sophisticated filters; also the coefficient values remain fixed rather than varying as a function of vertical motion speed. In order to minimise the appearance of artefacts caused by grossly incorrect vectors being used by the more sophisticated filters, a ‘protection’ scheme was proposed. This scheme ensured that low vertical frequencies, subjectively the most important, always came from the current field and were therefore not influenced by the motion vectors at all. It is possible that the use of such a scheme, coupled with a means of improving the accuracy of the vertical vector component in situations where it was slightly inaccurate, might significantly improve the performance of the more sophisticated filters in this application. For example, a small-range blockmatcher might be used to ‘refine’ the recovered vectors vertically. This would be an interesting subject for future work. There is also scope for optimising the response of the low-pass vertical filter used in the ‘protection’ scheme. The cut-off frequency of 36 c/aph used here was too low, given the relatively poor quality of the vectors retrieved from the
G. Thomas / Signal Processing: Image Communication I2 (1998) 209-229
MPEG-2 bitstream. This value may be more appropriate for better quality vectors that have not had to pass through the ‘bottleneck’ of MPEG-2. There is also scope for optimising the response of the vertical-temporal filter, which was originally designed assuming no motion compensation. Its vertical response at 25 Hz could probably be reduced, and consequently increased at 0 Hz, to give better vertical resolution at the expense of placing greater reliance on information from adjacent fields. Of the more sophisticated filters examined, the general sampling theorem 2-jieldjlter was found to be the least tolerant to small inaccuracies in the estimated vectors, and the least tolerant to noise in the image. The three-field version of this filter performed much better in both these respects. The motion-compensated
6.25 Hz temporal low-pass jilter
proved to be the most tolerant to noise. All three of these filters proved to be capable of providing a worthwhile improvement in image quality, compared to the other simpler filters tested, when accurate vectors were available. Acknowledgements
The author would like to thank the BBC for permission to publish this paper. The work described here was supported by the European Commission as a part of the RACE ‘COUGAR’ project.
229
References Bock, Motion-adaptive standards conversion Cl1 A.M. between formats of similar field rates, Signal Processing: Image Communication 6 (3) (1994) 2755280. 121G. De Haan, H. Huijgen, Motion estimator for TV-picture enhancement, in: Proceedings of the Fourth International Workshop on HDTV and beyond (Vol. l), Turin, Italy, 446 September 1991. architecture for c31 G. De Haan et al., An evolutionary motion-compensated 100Hz television, IEEE Trans. Circuits Systems 5 (3) (1995). Median filtering of television M T. Doyle, P. Frencken, images, in: ICCE Digest of Technical Papers, 1986, pp. 186-187. ITU-R BT.500-7. Methodology for I51 Recommendation the subjective assessment of the quality of television pictures. for scannC61 J. Salonen, S. Kalli, Edge adaptive interpolation ing rate conversions, in: E. Dubois, L. Chiariglione (Eds.), Signal Processing of HDTV, IV, Elsevier, Amsterdam. 1993, pp. 7577764. c71 G.A. Thomas, S.J. Dancer, Improved motion estimation for MPEG coding within the RACE ‘COUGAR’ project, IBC’95, IEE Conference Publication No. 413, pp. 238-243. L. Cuvelier, B. Maison, P. Queluz, PI L. Vandendorpe, P. Delogne, Motion-compensated conversion from interlaced to progressive formats, Signal Processing: Image Communication 6 (3) (1994) 193-211. and motion compensated c91 M. Weston, Fixed, adaptive interpolation of interlaced TV pictures, in: Proceedings of the Second International Workshop on Signal Processing for HDTV, L’aquila, Italy, 29 February-2 March 1988. lines of video signals. European Cl01 M. Weston, Interpolating Patent No. 0266079.