Signal Processing: Image Communication 17 (2002) 497–507
Content-based video transcoding in compressed domain$ TaeYong Kim*, Jong Soo Choi Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, HukSuk-dong 17, DongJak-gu, Seoul 156-756, Republic of Korea Received 31 January 2001; received in revised form 15 February 2002; accepted 9 April 2002
Abstract In this paper, we propose a content-based moving picture expert group transcoding method using a discontinuity feature in the discrete cosine transform (DCT) domain. A DCT block is transcoded differently depending on the height of dominant discontinuity within a block. In the experiment, we show the result that the video quality of content-based transcoding is better than that of a constant cut-off method and the processing time of the adaptive method is much lesser compared with the pixel domain methods in the same bandwidth. r 2002 Elsevier Science B.V All rights reserved. Keywords: Transcoding; DCT manipulation; Discontinuity detection; Adaptive filtering
1. Introduction Various multimedia services, such as teleconferencing, video on demand, and distance learning have been emerging. However, because heterogeneous networks such as ATM, TCP/IP, wireless and PSTN are interconnected, a network does not guarantee the required bandwidth quality for those services. Therefore, it requires a dynamic bit-rate adaptation of incoming video stream to match the available bandwidth of outgoing network [3,11]. Dynamic bit-rate adaptation can be achieved using the scalable coding schemes provided in current video coding standards [5]. $ This work was supported by the BK21 program from the Ministry of Education and the NRL project from the Ministry of Science and Technology of Korea. *Corresponding author. Tel.: +82-2-820-5412; fax: +82-2814-5404. E-mail address:
[email protected] (TaeYong Kim).
However, it can only provide up to three levels of discrete video quality because of the limit on the number of enhancement layers [7]. In many networked multimedia applications, a much finer scaling capability is desirable. Converting a previously compressed video stream to a lower bit-rate stream through transcoding can provide finer and more dynamic adjustment of the bit rate. The recent publications on video transcoding have mainly focused on low-cost architecture or bit allocation problems. In [1,2,12] various low complexity transcoding schemes were described with simple bit scaling methods for rate allocation. Werner [16] studied the requantization for moving picture expert group (MPEG) intra-frames. In [8,9] the picture complexities based on the quantization scale or allocation bits by a encoder were used for the target bit-rate estimation. All these schemes have not considered the spatial
0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V All rights reserved. PII: S 0 9 2 3 - 5 9 6 5 ( 0 2 ) 0 0 0 2 5 - 5
498
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
information embedded in the compressed bitstream to reduce artifacts or to enhance visual perception. To achieve dynamic bit-rate adjustment without much degrading the visual quality, content-based transcoding is essential. In this paper, we propose a dynamic content-based MPEG transcoding method by a discontinuity feature in the discrete cosine transform (DCT) domain. We first detect a representative discontinuity height in a DCT block, and then transcode a block adaptively by the height. We modify and simplify the previous height evaluation technique [15] to adjust the bit rate of a compressed video stream for real-time content-based transcoding. This scheme enhances the visual quality in the same bandwidth against a constant cut-off method, and it is suitable for realtime applications by its fast processing time. The remainder of this paper is organized as follows. Section 2 reviews the MPEG and the conventional transcoding methods. Section 3 provides the evaluation technique of a discontinuity height. A dynamic content-based MPEG transcoding method is presented in Section 4. Experimental results are presented in Section 5, and Section 6 outlines this paper and our future plans.
2. MPEG transcoding In this section, we briefly introduce the concept of MPEG compression technique and the transcoding methods.
correlations between nearby pixels within the same image. However, if the correlation between pixels in nearby frames is high, it is desirable to use an inter-frame coding techniques employing temporal prediction. In MPEG video coding schemes an adaptive combination of both temporal motion compensated prediction followed by transform coding of the remaining spatial information is used to achieve high data compression. An important feature supported by the MEPG1 encoding algorithms is the possibility to tailor the bit rate to specific applications requirements by adjusting the quantizer step size to quantize the DCT-coefficients. Coarse quantization of the DCT-coefficients achieves high compression ratio in the storage or transmission of video, however, it may result in significant coding artifacts. The MPEG-1 standard allows the encoder to select different quantizer values for each coded macro block – this enables a high degree of flexibility to allocate bits in images where needed to improve image quality. The standardized MPEG-2 scalability supports spatial scalability, SNR scalability, and temporal scalability. The intention of scalable coding is to provide interoperability between different services and to flexibly support receivers with different display capabilities. Receivers that unable to reconstruct the full resolution video can decode subsets of the layered bitstream to display video at lower spatial or temporal resolution or with lower quality. However, to support the scalabilities, both encoder and decoder must have scalability functions, which are rare in current H/W systems.
2.1. MPEG compression Video sequences usually contain statistical redundancies in both temporal and spatial directions. The basic statistical property upon which MPEG compression techniques rely is inter-pixel correlation, including the assumption of simple translated motion between consecutive frames. Thus, it is assumed that the magnitude of a particular image pixel can be predicted from nearby pixels within the same frame or from pixels of a nearby frame. The MPEG compression algorithms employ DCT coding techniques on image blocks of 8 8 pixels to explore spatial
2.2. Transcoding schemes Besides the schemes which adjust the bit rate in encoding process, a video server also has functions that dynamically control the bit rate for a network that does not maintain the required bandwidth. The process of converting between different compression formats and/or further reducing the bit rate of a previously compressed signal is known as transcoding. When the resource of the outgoing network is limited, the overall transmission performance
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
will be considerably degraded. In temporal transcoding, a transcoder has knowledge of the frame types and drops frames according to their importance. It is used to reduce the data rate of a stream in a sensible way by discarding a number of frames, transmitting the remaining frames at a slower rate, and maintaining the end-to-end delay requirement. Another way to achieve high transcoding ratio is to use the spatial transcoding. The spatial transcoding performs operations in the frequency domain on the values of the DCT-coefficients, which involves entropy decoding/encoding. The spatial transcoding reduces the spatial resolution using the methods like low-pass filtering or requantization. Low-pass filtering is where the higher frequency DCT-coefficients are discarded on recoding, leaving only the DC DCT-coefficient and a number of low-frequency components. The requantization filter dequantizes the coefficients on the DCT-coefficients, and requantizes the coefficients using a larger quantizer step. Although the spatial transcoding reduces the required bandwidth without affecting frame rate, since it degrades the quality of an image, the spatial transcoding can be combined with the temporal transcoding in the applications. However, since the previous transcoding schemes have not considered the contents of blocks and filter the blocks by uniform parameters, the visual quality of the transcoded blocks is degraded without depending on their importance [4,17]. 3. DCT domain processing Conventionally, there have been many methods to detect features in spatial domain. However, to obtain spatial features, the compressed video frames have to be decoded, processed and encoded again. Alternatively, we can manipulate a compressed video directly without decoding in the DCT domain. Algorithms in the DCT domain show computational speedups of 50 or more over the corresponding processing of the uncompressed data [13,14]. In this section, we briefly review our previous work [15] that detect a representative discontinuity height in a DCT block, and suggest modification for real-time applications.
499
3.1. Discontinuity position alignment The compression process in JPEG or MPEG is done on an 8 8-block basis. The following equations are the mathematical definition of the 8 8 FDCT and IDCT: 7 7 X X 1 F ðu; vÞ ¼ CðuÞCðvÞ f ði; jÞ 4 i¼0 j¼0
cos
f ði; jÞ ¼
puð2i þ 1Þ pvð2j þ 1Þ cos ; 16 16
7 7 X 1X CðuÞCðvÞF ðu; vÞ 4 u¼0 v¼0
puð2i þ 1Þ pvð2j þ 1Þ cos ; ð1Þ 16 16 pffiffiffi where CðuÞ; CðvÞ ¼ 1= 2 for u;v ¼ 0; or 1 otherwise. In an ideal step discontinuity model, intensity levels in an 8 8 block are separated by a local discontinuity between j ¼ 3 and 4 in Eq. (1), which is formulated by the intensity function f ði; jÞ ¼ a at j ¼ 0; 1, 2, 3 or f ði; jÞ ¼ b at j ¼ 4; 5, 6, 7, where 127pa;bp127: Alignment is a process to shift an arbitrary discontinuity position k to ‘‘4’’, where 1pkp7: To achieve the position alignment, we compensate the values of a given set of DCT-coefficients, which is the same operation as the position of a discontinuity is shifted in the spatial position. The compensation frequency values are derived by ck ðh; vÞ ¼ F4 ð0; vÞ Fk ð0; vÞ; where Fk are DCTcoefficients with a discontinuity at position k: In the case of k ¼ 3; the derivation is cos
c3 ðh; vÞ ¼ F4 ð0; vÞ F3 ð0; vÞ 2 7pv ; ¼ pffiffiffi h cos 16 2
ð2Þ
and compensation frequencies for the rest of the positions can also be formulated by changing the frequencies. The compensated (shifted) DCTcoefficients F#k are obtained by F#k ð0; vÞ ¼ F ð0; vÞ þ ck ðh; vÞ; where 1pvp7; h is discontinuity height and k in F#k ð0; vÞ represents the position of a discontinuity before alignment.
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
500
3.2. Alignment verification and height evaluation For the position verification, we use the symmetry, which is defined as flipping an image according to its middle-vertical axis and inverting signs in the spatial domain. For an input block f ði; jÞ and an output block gði; jÞ; the symmetry can be expressed as gði; jÞ ¼ f ði; 7 jÞ; where 0p i; jp7: In the compressed domain, the output block can be directly computed from the input block, i.e., Gðu; vÞ ¼ cosðpvÞF ðu; vÞ: If the block has a step discontinuity at k ¼ 4 then Gðu; vÞ ¼ # 2Þ; Fð0; # 4Þ and cosðpvÞF ðu; vÞ ¼ F ðu; vÞ: So, Fð0; # Fð0; 6Þ must be zero in the aligned coefficients. Thus, the verification measure of the discontinuity position is defined as follows: Dk ¼ F#k ð0; 2Þ2 þ F#k ð0; 4Þ2 þ F#k ð0; 6Þ2 ; ð3Þ
descent [6] to enhance the performance and to reduce the noisy discontinuities.
4. Content-based MPEG transcoding In this section, we suggest an adaptive transcoding method by low-pass filtering whose cut-off value is adjusted dynamically according to the block height. 4.1. Transcoding by low-pass filtering
ð4Þ
A sample transmission server that performs transcoding is depicted in Fig. 1. The server first reads a MPEG stream from a storage, and then it decodes the stream as the sequence of variable length decoding (VLD), dequantization ðQ1 Þ and inverse DCT. In the decoded stream, the server transcodes the stream to adjust the outgoing bit rate by using a filter. Finally, the server recompresses the stream as the sequence of forward DCT, requantization ðQÞ and variable length coding. In our approach, DCT and IDCT operations are removed by directly handling the DCTcoefficients. High frequencies of a DCT block which are included in noise or non-dominant discontinuities can be removed without much degrading the overall video quality for reducing bandwidth by a transmission server. High-frequency components of a DCT block are removed by a low-pass filter with a cut-off value Cc [4,17], and the filtering is formulated as follows: ( ACi if ioCc ; ACi ¼ ð5Þ 0 if iXCc ;
where h* is the real height, ck ðh; vÞ is the compensation at k; and ak2 ; ak4 and ak6 are constants for frequencies at k: Using the approximated gradient direction described in [15], we can estimate the direction of a discontinuity and rotate the discontinuity in the DCT domain. After rearranging the DCT-coefficients by rotation, since Dk ’s with various heights follow a hyperbolic curve and have a global minimum value, we use the method of gradient
where 0oip63: If the cut-off value Cc in Eq. (5) is a constant for each frame, the transcoding scheme cannot reflect the contents of a DCT block. Thus, the visual quality of transcoded blocks is degraded equally for all DCT blocks. For preventing degradation in a visually important block, we suggest a dynamic cut-off function, CðhÞ; using a representative height derived in previous section. Since a highly contrasted discontinuity with a large height appeals
where k ¼ 1; 2; y; 7: Each position k is verified by Dk with a fixed value of height h: Because Dk has the smallest value at the aligned position regardless of the height, the position ðkÞ of a dominant discontinuity is detected whose Dk in Eq. (3) has a minimum value. Dk can be simplified by substituting the fixed (aligned) position for the variable k: Since cosine frequency is even and periodic, Dk can be further expanded by substituting the known position k and height variable h, Dk ðhÞ ¼ F#k ð0; 2Þ2 þ F#k ð0; 4Þ2 þ F#k ð0; 6Þ2 * 2Þ þ ck ðh; 2ÞÞ2 þ ðck ðh; * 4Þ þ ck ðh; 4ÞÞ2 ¼ ðck ðh; * 6Þ þ ck ðh; 6ÞÞ2 þ ðck ðh; * k hak Þ2 þ ðha * k hak Þ2 ¼ ðha 2 2 4 4 * k hak Þ2 þ ðha 6
6
¼ ðh* hÞ2 ða2k2 þ a2k4 þ a2k6 Þ;
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
501
Transcoding Server Decoding
Transcoding
Encoding
MPEG Stream
NETWORK VLD−IQ−IDCT
Lowpass Filtering
DCT−> Q−>VLC
Bit Rate Checker Fig. 1. Decoding, transcoding and encoding in a dynamic transmission server.
By adjusting a we can control the slope of the function, which quantize the filtering step roughly or tightly. Although the function can be formulated non-linearly with the height values, we device the filter with height as simple as possible for easy evaluation, which is depicted as shown in Fig. 2, and Eq. (5) is modified as follows: ( ACi if ioCðhÞ; ACi ¼ ð7Þ 0 if iXCðhÞ: If a height in a block is zero, the cut-off value is 1 and only the DC value can pass the filter. If a height is greater than a; b will be a value for filtering, which also degrades the image quality. However, because we usually set b as 2–4 times larger than constant Cc ; the image quality produced by dynamic transcoder is better than that of Cc on preserving the same bandwidth.
60 50 Cut-off
much to the human visual recognition, if the height of a discontinuity is large, it needs many ACs to be reconstructed accurately. Otherwise, ACs can be removed without much sacrificing the human perception. The dynamic function CðhÞ is linearly changed with the height between ½0; a; where a is saturation value that represents the largest height to have maximum AC frequencies. Another parameter b restricts the maximum ACs for the purpose of bandwidth reducing. Using the parameters a and b; the dynamic cut-off function is formulated as follows: ( b1 a h þ 1 if 0phpa and aa0; CðhÞ ¼ ð6Þ b if h > a or a ¼ 0:
40 β 20 10
MIN=1 0
50
100 α 150 Height
200
250
Fig. 2. Cut-off values with a and b:
4.2. Bandwidth adjustment of pre-encoded stream The procedure of the adaptive transcoding in the DCT domain is summarized as follows: 1. Obtain the DCT-coefficients of the luminance component of a block from the MPEG video. 2. Estimate the direction of a dominant discontinuity. If necessary, rotate coefficients by inverting the coefficient signs and/or changing the coefficient positions (in the experiments, we use four directions of multiple 901). 3. Align the coefficient and find the discontinuity position using the criterion in Eq. (3). 4. Evaluate the discontinuity height from the aligned position using Eq. (4). 5. Transcode a block according to the height of a discontinuity within a target bit rate. 6. Repeat from 1 until all blocks are transcoded in a frame.
502
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
In our transcoding scheme, the incoming compressed video bitstream is partially decoded. The DCT-coefficients obtained through VLD and inverse quantization. The transcoder performs low-pass filtering by CðhÞ to reduce the bit rate. The header and motion information are kept unchanged. It preserves the original motion information and can be combined by the compensation mechanism [1] to prevent drift errors. This scheme has low-cost and fast advantages. The target bandwidth ðBout Þ can be estimated by the reduction rate between bit rate of input and that of output, Rða; bÞ ¼ Bout =Bin 100; which is shown in Fig. 3. Fig. 3(a) shows the average
reduction rate with various a’s, which is obtained by averaging 100 transcoded samples of videos. As a increases, the average bandwidth of a video stream decreases monotonically. Fig. 3(b) shows the average percentage of reduction rate with various b’s in the constant cut-off method (dotted line) and in the content-based method (solid line). When a is zero, the transcoder acts as the constant cut-off filter, which abruptly changes the target bandwidth between 80% and 20% of input bandwidth within the interval 20XbX1: On the contrary, we can smoothly adjust the Bout ¼ Bin Rða; bÞ by combining the scales of 0pap255 and 64XbX1 in the content-based transcoding. Thus,
100
Reduction Rate
80
*
60
40
20
0
0
50
100
200
150
250
α
(a) 100
C(h), α =255 Cc, α = 0
Reduction Rate
80
60
*
40
20
0 (b)
60
50
30
40
20
10
0
β
Fig. 3. Average percentage of reduction rate with various a’s and b’s: (a) reduction rate with various a’s ðb ¼ 64Þ in CðhÞ and (b) reduction rate with various b’s ða ¼ 255Þ in CðhÞ and Cc :
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
the parameter a and b for a desired output can be chosen using the average reduction rate, Rða; bÞ ¼ Bout =Bin 100; in Fig. 3. For example, when the reduction rate is 60%, the parameters are obtained as a ¼ 100 and b ¼ 64: We can set a ¼ 255 and b ¼ 35 for Rða; bÞ ¼ 40%, which is represented by a * on the dotted line in Fig. 3(b). The overall bit rate regularization can be achieved by adding or subtracting the number of bits under or over used so far to the bandwidth of the desired output.
5. Experiments In the experiments, our dynamic transcoding method applies to two videos obtained from broadcasting, which are compressed by a MPEG-1 standard encoder of fI B B P B B P B B P B B P B Bg frame structure. Each video consists of 1500 frames and 100 I-frames for 50 s: We used two rotations (901 and 1801), which cover four
503
directions of a discontinuity. The FDCT and IDCT implementation is based on an algorithm described in [10]. To verify the visual quality, we use RMSE and PSNR, which are formulated as follows: P ½ i;j ff ði; jÞ fc ði; jÞg2 1=2 RMSE ¼ ; ð8Þ N 255 PSNR ¼ 20 log10 ; ð9Þ RMSE which are estimates of the quality of a reconstructed image ðfc Þ compared with an original image ðf Þ: Reconstructed images with higher metrics are judged better in PSNR and lower values are judged better in RMSE. To compare the constant filtering of Cc ¼ 4 in Eq. (5) with adaptive Cðh; a; bÞ in Eq. (7), we present sample spatial images for a music video and a sports video in Figs. 4 and 5, respectively. Each original MPEG image is shown in Figs. 4(a)
Fig. 4. Transcoded images of a music video: (a) an original MPEG image, (b) transcoded image by a constant filtering ðCc ¼ 4Þ; (c) discontinuity heights of DCT blocks (darker color represents lower height and (d) transcoded image by adaptive filtering (a ¼ 30 and b ¼ 8).
504
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
Fig. 5. Transcoded images of a sports video: (a) an original MPEG image, (b) transcoded image by a constant filtering ðCc ¼ 4Þ; (c) discontinuity heights of DCT blocks (darker color represents lower height and (d) transcoded image by adaptive filtering (a ¼ 70 and b ¼ 8).
and 5(a), and the transcoded images by the constant filtering are presented in Figs. 4(b) and 5(b). Since these images were filtered by a constant value for whole blocks in each image, some blocks that have sharp contrast (discontinuity with large height) are blurred. We manually select the parameters of the adaptive filtering as a ¼ 30 or 70, and b ¼ 8 for preserving similar bandwidth with the constant filtering. Figs. 4(c) and 5(c) show the height values of DCT blocks, which are evaluated by DCT alignment described in Section 3. The heights are represented by brightness with height, and the resultant height maps present the reasonable description except for the blocks having slant discontinuities. Since the contentbased method filters each block by considering its height, it prevents much degradation of the image quality as shown in Figs. 4(d) and 5(d). The measures, which reflect the visual quality, for the video in Fig. 4 are listed in Table 1. The top row in each column shows the numeric result
transcoded by the constant filtering of b ¼ 4 or 8, and the other rows represent the results of similar bandwidth with various a’s and b’s for the content-based filtering. All the RMSE results of the adaptive filtering decrease about 10% compared with the constant filtering. Several pairs of parameters produce the similar bandwidth with the constant filtering, such as ða ¼ 30; b ¼ 8Þ; ða ¼ 70; b ¼ 12Þ or ða ¼ 100; b ¼ 16Þ: Since b clips the number of ACs, if we select a large value for b; the blocks having large heights are described by many ACs, and the other blocks have to be much degraded. On the other hand, if we choose a small value for b; the blocks having large heights are described by a few ACs, and the other blocks are not much degraded. Although above consideration is not reflected in both PSNR and RMSE, the visual quality is perceived differently by human. Above consideration and the relationship between metrics and parameters are shown in Fig. 6,
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
505
Table 1 Average bandwidth (bits per frame), PSNR (dB) and RMSE of 100 I-frames with various a’s and b’s a
b
Bandwidth
PSNR
RMSE
a
b
Bandwidth
PSNR
RMSE
0 10 20 30 50 60 70 80 90 100
4 8 8 8 12 12 12 16 16 16
64 401 66 959 65 972 63 420 68 388 66 310 62 606 68 728 64 999 64 436
78.22 79.38 79.20 78.82 79.47 79.20 78.66 79.42 78.89 78.77
0.0406 0.0347 0.0352 0.0364 0.0343 0.0352 0.0370 0.0345 0.0362 0.0366
0 05 10 20 40 50 60 60 70 80
8 16 16 16 24 24 24 32 32 32
89 398 89 036 88 193 87 034 91 435 86 794 84 269 92 000 88 451 85 870
81.22 82.39 82.30 82.08 82.55 81.81 81.45 82.51 81.89 81.52
0.0311 0.0268 0.0270 0.0275 0.0265 0.0281 0.0289 0.0266 0.0279 0.0288
90000
β =16 β =12 β=8 Cc= 4
BANDWIDTH (bpf)
80000 70000 60000 50000 40000 30000
50
150
100
250
200
α
(a) 85
β = 16 β = 12 β= 8 Cc = 4
84 83 PSNR
82 81 80 79 78 77 76 (b)
50
100
150
200
250
α
Fig. 6. Average bandwidths and PSNRs with various a and b (horizontal line represents a value of the constant filtering): (a) bandwidth (bits per frame) and (b) peak signal-to-reconstructed image measure.
506
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507
whose data are obtained from a sports video shown in Fig. 5. Fig. 6(a) denotes the bandwidths (bits per frame) with various a’s and b’s, and the horizontal dotted line represents a value of the constant filtering. In the figure, we select b as 2–4 times larger than Cc to enhance the visual quality by maintaining many ACs for blocks having large heights. In each value of b; a can be adjusted to match the required bandwidth, and this scheme regulates the output bandwidth more precisely than the constant filtering. Fig. 6(b) shows the PSNRs with various a’s and b’s, and the horizontal dotted line represents a value of the constant filtering. By observing the graphs in Fig. 6(a) and (b), we can find the facts that the adaptive filtering retains higher PSNR than the constant filtering on the same bandwidth in each value of b: It is also found that the bandwidth of the adaptive filtering is lower than that of the constant filtering on the same PSNR. The processing for the content-based adaptive filtering is not much expensive as shown in Table 2. The elapsed time is checked in a workstation for 1500 frames of each video. The read and depacketize time is denoted as Tr ; and the processing time for DCT and IDCT transforms is represented as Ts : The filtering times for the constant filtering, the requantization, and the content-based low-pass filtering are denoted as Tc ; Tq and Ta ; respectively. It takes 5:43 ms to read a frame and the filtering time is 85:65 ms by using the adaptive filtering for 352 240 sized frame. The filtering time is a little expensive than the constant filtering ð37:95 msÞ or
Table 2 Elapsed times of the transcoding processing (ms): Tr is processing time per frame to read and depacketize the original MPEG stream of 1500 frames, and Tc ; Tq ; Ta and Ts denote the average processing time per I-frame (100 frames) of the constant low-pass filtering, the requantization, the contentbased low-pass filtering, and IDCT plus DCT conversion, respectively Video type
Tr
Tc
Tq
Ta
Ts
Music ð352 240Þ Sports ð320 240Þ
5.43 5.24
37.95 32.80
53.28 47.75
85.65 79.52
367.41 342.39
the requantization ð53:28 msÞ: However, the processing time for the adaptive filtering is much cheaper than that of DCT conversions, and since there are usually two I-frames per second in common video sequences, the processing time is short enough for real-time applications.
6. Conclusion Although there have been many schemes for transcoding, few techniques have considered the spatial information embedded in the compressed bitstream. In this paper, we have proposed a content-based transcoding technique using the representative height in a DCT block. As shown in the experiments, the technique is efficient in processing time and maintains higher visual quality by precisely adjusting the bandwidth. This technique can be a fundamental technique for the different architectures and rate-control methods, which evaluate complexity of DCT blocks or pursue real-time processing in the DCT domain. More research is required to formulate the relationship between the transmission bandwidth and the transcoding parameters.
References [1] P.A.A. Assuncao, M. Ghanbari, A frequency-domain video transcoder for dynamic bit-rate reduction of MPEG-2 bit stream, IEEE Trans. Circuits Systems Video Technol. 8 (8) (December 1998) 953–967. [2] N. Bjork, C. Christopoulos, Transcoder architectures for video coding, IEEE Trans. Consumer Electron. 44 (1) (1998) 88–98. [3] N. Chaddha, A software only scalable video delivery system for multimedia applications over heterogeneous networks, in: Proceedings of the International Conference on Image Processing, Washington, DC, October 1995. [4] F. Garcia, D. Hutchison, A. Mauthe, N. Yeadon, QoS support for distributed multimedia communications, in: Proceedings of IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, February 1996. [5] M. Ghanbari, Two-layer coding of video signals for VBR networks, IEEE J. Select. Areas Commun. 7 (1989) 771–781.
TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507 [6] R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, MA, 1992, pp. 605–606. [7] ISO/IEEE 13818-2, Information technology-generic coding of moving pictures and associated audio information– Part 2: Video, 1995. [8] JunXin, Ming-Ting Sun, Kangwook Chun, Bit-Allocation for transcoding of pre-Encoded video streams, Visual Commun. Image Process. 4671 (2002) 164–171. [9] Ligang Lu, Shu Xiao, J.L. Kouloheris, C.A. Gonzales, Efficient and low cost video transcoding, Visual Commun. Image Process. 4671 (2002) 154–163. [10] C. Loeffler, A. Ligtenberg, G. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 988–991. [11] J. Moura, R. Jasinschi, H. Shiojiri-H, C. Lin, Scalable video coding over heterogeneous networks, Proc. SPIE 2602 (1996) 294–306.
507
[12] Y. Nakajima, H. Hori, T. Kanoh, Rate conversion of MPEG coded video by re-quantization process, in: International Conference on Image Processing, 1995. [13] B. Shen, I.K. Sethi, Inner-block operations on compressed images, in: ACM multimedia ’95, 1995, pp. 489–498. [14] B.C. Smith, L.A. Rowe, Algorithms for manipulating compressed images, IEEE Comput. Graphics Appl. 13 (5) (1993) 34–42. [15] TaeYong Kim, Joon Hee Han, Model-based discontinuity evaluation in the DCT domain, Signal Processing 81 (4) (2001) 871–882. [16] O.H. Werner, Requantization for transcoding of MPEG-2 intraframes, IEEE Trans. Image Process. 8 (2) (1999) 179–191. [17] N. Yeadon, F. Garcia, D. Hutchison, D. Shepherd, Continuous media filters for heterogeneous internetworking, in: Proceedings of SPIE-Multimedia Computing and Networking (MMCN’96), 1996.