Content-based video transcoding in compressed domain

Signal Processing: Image Communication 17 (2002) 497–507 Content-based video transcoding in compressed domain$ TaeYong Kim*, Jong Soo Choi Graduate S...

Download PDF

263KB Sizes 0 Downloads 91 Views

Report

PDF Reader
Full Text

Signal Processing: Image Communication 17 (2002) 497–507

Content-based video transcoding in compressed domain$ TaeYong Kim*, Jong Soo Choi Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, HukSuk-dong 17, DongJak-gu, Seoul 156-756, Republic of Korea Received 31 January 2001; received in revised form 15 February 2002; accepted 9 April 2002

Abstract In this paper, we propose a content-based moving picture expert group transcoding method using a discontinuity feature in the discrete cosine transform (DCT) domain. A DCT block is transcoded differently depending on the height of dominant discontinuity within a block. In the experiment, we show the result that the video quality of content-based transcoding is better than that of a constant cut-off method and the processing time of the adaptive method is much lesser compared with the pixel domain methods in the same bandwidth. r 2002 Elsevier Science B.V All rights reserved. Keywords: Transcoding; DCT manipulation; Discontinuity detection; Adaptive ﬁltering

1. Introduction Various multimedia services, such as teleconferencing, video on demand, and distance learning have been emerging. However, because heterogeneous networks such as ATM, TCP/IP, wireless and PSTN are interconnected, a network does not guarantee the required bandwidth quality for those services. Therefore, it requires a dynamic bit-rate adaptation of incoming video stream to match the available bandwidth of outgoing network [3,11]. Dynamic bit-rate adaptation can be achieved using the scalable coding schemes provided in current video coding standards [5]. $ This work was supported by the BK21 program from the Ministry of Education and the NRL project from the Ministry of Science and Technology of Korea. *Corresponding author. Tel.: +82-2-820-5412; fax: +82-2814-5404. E-mail address: [email protected] (TaeYong Kim).

However, it can only provide up to three levels of discrete video quality because of the limit on the number of enhancement layers [7]. In many networked multimedia applications, a much ﬁner scaling capability is desirable. Converting a previously compressed video stream to a lower bit-rate stream through transcoding can provide ﬁner and more dynamic adjustment of the bit rate. The recent publications on video transcoding have mainly focused on low-cost architecture or bit allocation problems. In [1,2,12] various low complexity transcoding schemes were described with simple bit scaling methods for rate allocation. Werner [16] studied the requantization for moving picture expert group (MPEG) intra-frames. In [8,9] the picture complexities based on the quantization scale or allocation bits by a encoder were used for the target bit-rate estimation. All these schemes have not considered the spatial

0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V All rights reserved. PII: S 0 9 2 3 - 5 9 6 5 ( 0 2 ) 0 0 0 2 5 - 5

498

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

information embedded in the compressed bitstream to reduce artifacts or to enhance visual perception. To achieve dynamic bit-rate adjustment without much degrading the visual quality, content-based transcoding is essential. In this paper, we propose a dynamic content-based MPEG transcoding method by a discontinuity feature in the discrete cosine transform (DCT) domain. We ﬁrst detect a representative discontinuity height in a DCT block, and then transcode a block adaptively by the height. We modify and simplify the previous height evaluation technique [15] to adjust the bit rate of a compressed video stream for real-time content-based transcoding. This scheme enhances the visual quality in the same bandwidth against a constant cut-off method, and it is suitable for realtime applications by its fast processing time. The remainder of this paper is organized as follows. Section 2 reviews the MPEG and the conventional transcoding methods. Section 3 provides the evaluation technique of a discontinuity height. A dynamic content-based MPEG transcoding method is presented in Section 4. Experimental results are presented in Section 5, and Section 6 outlines this paper and our future plans.

2. MPEG transcoding In this section, we brieﬂy introduce the concept of MPEG compression technique and the transcoding methods.

correlations between nearby pixels within the same image. However, if the correlation between pixels in nearby frames is high, it is desirable to use an inter-frame coding techniques employing temporal prediction. In MPEG video coding schemes an adaptive combination of both temporal motion compensated prediction followed by transform coding of the remaining spatial information is used to achieve high data compression. An important feature supported by the MEPG1 encoding algorithms is the possibility to tailor the bit rate to speciﬁc applications requirements by adjusting the quantizer step size to quantize the DCT-coefﬁcients. Coarse quantization of the DCT-coefﬁcients achieves high compression ratio in the storage or transmission of video, however, it may result in signiﬁcant coding artifacts. The MPEG-1 standard allows the encoder to select different quantizer values for each coded macro block – this enables a high degree of ﬂexibility to allocate bits in images where needed to improve image quality. The standardized MPEG-2 scalability supports spatial scalability, SNR scalability, and temporal scalability. The intention of scalable coding is to provide interoperability between different services and to ﬂexibly support receivers with different display capabilities. Receivers that unable to reconstruct the full resolution video can decode subsets of the layered bitstream to display video at lower spatial or temporal resolution or with lower quality. However, to support the scalabilities, both encoder and decoder must have scalability functions, which are rare in current H/W systems.

2.1. MPEG compression Video sequences usually contain statistical redundancies in both temporal and spatial directions. The basic statistical property upon which MPEG compression techniques rely is inter-pixel correlation, including the assumption of simple translated motion between consecutive frames. Thus, it is assumed that the magnitude of a particular image pixel can be predicted from nearby pixels within the same frame or from pixels of a nearby frame. The MPEG compression algorithms employ DCT coding techniques on image blocks of 8 8 pixels to explore spatial

2.2. Transcoding schemes Besides the schemes which adjust the bit rate in encoding process, a video server also has functions that dynamically control the bit rate for a network that does not maintain the required bandwidth. The process of converting between different compression formats and/or further reducing the bit rate of a previously compressed signal is known as transcoding. When the resource of the outgoing network is limited, the overall transmission performance

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

will be considerably degraded. In temporal transcoding, a transcoder has knowledge of the frame types and drops frames according to their importance. It is used to reduce the data rate of a stream in a sensible way by discarding a number of frames, transmitting the remaining frames at a slower rate, and maintaining the end-to-end delay requirement. Another way to achieve high transcoding ratio is to use the spatial transcoding. The spatial transcoding performs operations in the frequency domain on the values of the DCT-coefﬁcients, which involves entropy decoding/encoding. The spatial transcoding reduces the spatial resolution using the methods like low-pass ﬁltering or requantization. Low-pass ﬁltering is where the higher frequency DCT-coefﬁcients are discarded on recoding, leaving only the DC DCT-coefﬁcient and a number of low-frequency components. The requantization ﬁlter dequantizes the coefﬁcients on the DCT-coefﬁcients, and requantizes the coefﬁcients using a larger quantizer step. Although the spatial transcoding reduces the required bandwidth without affecting frame rate, since it degrades the quality of an image, the spatial transcoding can be combined with the temporal transcoding in the applications. However, since the previous transcoding schemes have not considered the contents of blocks and ﬁlter the blocks by uniform parameters, the visual quality of the transcoded blocks is degraded without depending on their importance [4,17]. 3. DCT domain processing Conventionally, there have been many methods to detect features in spatial domain. However, to obtain spatial features, the compressed video frames have to be decoded, processed and encoded again. Alternatively, we can manipulate a compressed video directly without decoding in the DCT domain. Algorithms in the DCT domain show computational speedups of 50 or more over the corresponding processing of the uncompressed data [13,14]. In this section, we brieﬂy review our previous work [15] that detect a representative discontinuity height in a DCT block, and suggest modiﬁcation for real-time applications.

499

3.1. Discontinuity position alignment The compression process in JPEG or MPEG is done on an 8 8-block basis. The following equations are the mathematical deﬁnition of the 8 8 FDCT and IDCT: 7 7 X X 1 F ðu; vÞ ¼ CðuÞCðvÞ f ði; jÞ 4 i¼0 j¼0

cos

f ði; jÞ ¼

puð2i þ 1Þ pvð2j þ 1Þ cos ; 16 16

7 7 X 1X CðuÞCðvÞF ðu; vÞ 4 u¼0 v¼0

puð2i þ 1Þ pvð2j þ 1Þ cos ; ð1Þ 16 16 pﬃﬃﬃ where CðuÞ; CðvÞ ¼ 1= 2 for u;v ¼ 0; or 1 otherwise. In an ideal step discontinuity model, intensity levels in an 8 8 block are separated by a local discontinuity between j ¼ 3 and 4 in Eq. (1), which is formulated by the intensity function f ði; jÞ ¼ a at j ¼ 0; 1, 2, 3 or f ði; jÞ ¼ b at j ¼ 4; 5, 6, 7, where 127pa;bp127: Alignment is a process to shift an arbitrary discontinuity position k to ‘‘4’’, where 1pkp7: To achieve the position alignment, we compensate the values of a given set of DCT-coefﬁcients, which is the same operation as the position of a discontinuity is shifted in the spatial position. The compensation frequency values are derived by ck ðh; vÞ ¼ F4 ð0; vÞ Fk ð0; vÞ; where Fk are DCTcoefﬁcients with a discontinuity at position k: In the case of k ¼ 3; the derivation is cos

c3 ðh; vÞ ¼ F4 ð0; vÞ F3 ð0; vÞ 2 7pv ; ¼ pﬃﬃﬃ h cos 16 2

ð2Þ

and compensation frequencies for the rest of the positions can also be formulated by changing the frequencies. The compensated (shifted) DCTcoefﬁcients F#k are obtained by F#k ð0; vÞ ¼ F ð0; vÞ þ ck ðh; vÞ; where 1pvp7; h is discontinuity height and k in F#k ð0; vÞ represents the position of a discontinuity before alignment.

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

500

3.2. Alignment verification and height evaluation For the position veriﬁcation, we use the symmetry, which is deﬁned as ﬂipping an image according to its middle-vertical axis and inverting signs in the spatial domain. For an input block f ði; jÞ and an output block gði; jÞ; the symmetry can be expressed as gði; jÞ ¼ f ði; 7 jÞ; where 0p i; jp7: In the compressed domain, the output block can be directly computed from the input block, i.e., Gðu; vÞ ¼ cosðpvÞF ðu; vÞ: If the block has a step discontinuity at k ¼ 4 then Gðu; vÞ ¼ # 2Þ; Fð0; # 4Þ and cosðpvÞF ðu; vÞ ¼ F ðu; vÞ: So, Fð0; # Fð0; 6Þ must be zero in the aligned coefﬁcients. Thus, the veriﬁcation measure of the discontinuity position is deﬁned as follows: Dk ¼ F#k ð0; 2Þ2 þ F#k ð0; 4Þ2 þ F#k ð0; 6Þ2 ; ð3Þ

descent [6] to enhance the performance and to reduce the noisy discontinuities.

4. Content-based MPEG transcoding In this section, we suggest an adaptive transcoding method by low-pass ﬁltering whose cut-off value is adjusted dynamically according to the block height. 4.1. Transcoding by low-pass filtering

ð4Þ

A sample transmission server that performs transcoding is depicted in Fig. 1. The server ﬁrst reads a MPEG stream from a storage, and then it decodes the stream as the sequence of variable length decoding (VLD), dequantization ðQ1 Þ and inverse DCT. In the decoded stream, the server transcodes the stream to adjust the outgoing bit rate by using a ﬁlter. Finally, the server recompresses the stream as the sequence of forward DCT, requantization ðQÞ and variable length coding. In our approach, DCT and IDCT operations are removed by directly handling the DCTcoefﬁcients. High frequencies of a DCT block which are included in noise or non-dominant discontinuities can be removed without much degrading the overall video quality for reducing bandwidth by a transmission server. High-frequency components of a DCT block are removed by a low-pass ﬁlter with a cut-off value Cc [4,17], and the ﬁltering is formulated as follows: ( ACi if ioCc ; ACi ¼ ð5Þ 0 if iXCc ;

where h* is the real height, ck ðh; vÞ is the compensation at k; and ak2 ; ak4 and ak6 are constants for frequencies at k: Using the approximated gradient direction described in [15], we can estimate the direction of a discontinuity and rotate the discontinuity in the DCT domain. After rearranging the DCT-coefﬁcients by rotation, since Dk ’s with various heights follow a hyperbolic curve and have a global minimum value, we use the method of gradient

where 0oip63: If the cut-off value Cc in Eq. (5) is a constant for each frame, the transcoding scheme cannot reﬂect the contents of a DCT block. Thus, the visual quality of transcoded blocks is degraded equally for all DCT blocks. For preventing degradation in a visually important block, we suggest a dynamic cut-off function, CðhÞ; using a representative height derived in previous section. Since a highly contrasted discontinuity with a large height appeals

where k ¼ 1; 2; y; 7: Each position k is veriﬁed by Dk with a ﬁxed value of height h: Because Dk has the smallest value at the aligned position regardless of the height, the position ðkÞ of a dominant discontinuity is detected whose Dk in Eq. (3) has a minimum value. Dk can be simpliﬁed by substituting the ﬁxed (aligned) position for the variable k: Since cosine frequency is even and periodic, Dk can be further expanded by substituting the known position k and height variable h, Dk ðhÞ ¼ F#k ð0; 2Þ2 þ F#k ð0; 4Þ2 þ F#k ð0; 6Þ2 * 2Þ þ ck ðh; 2ÞÞ2 þ ðck ðh; * 4Þ þ ck ðh; 4ÞÞ2 ¼ ðck ðh; * 6Þ þ ck ðh; 6ÞÞ2 þ ðck ðh; * k hak Þ2 þ ðha * k hak Þ2 ¼ ðha 2 2 4 4 * k hak Þ2 þ ðha 6

6

¼ ðh* hÞ2 ða2k2 þ a2k4 þ a2k6 Þ;

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

501

Transcoding Server Decoding

Transcoding

Encoding

MPEG Stream

NETWORK VLD−IQ−IDCT

Lowpass Filtering

DCT−> Q−>VLC

Bit Rate Checker Fig. 1. Decoding, transcoding and encoding in a dynamic transmission server.

By adjusting a we can control the slope of the function, which quantize the ﬁltering step roughly or tightly. Although the function can be formulated non-linearly with the height values, we device the ﬁlter with height as simple as possible for easy evaluation, which is depicted as shown in Fig. 2, and Eq. (5) is modiﬁed as follows: ( ACi if ioCðhÞ; ACi ¼ ð7Þ 0 if iXCðhÞ: If a height in a block is zero, the cut-off value is 1 and only the DC value can pass the ﬁlter. If a height is greater than a; b will be a value for ﬁltering, which also degrades the image quality. However, because we usually set b as 2–4 times larger than constant Cc ; the image quality produced by dynamic transcoder is better than that of Cc on preserving the same bandwidth.

60 50 Cut-off

much to the human visual recognition, if the height of a discontinuity is large, it needs many ACs to be reconstructed accurately. Otherwise, ACs can be removed without much sacriﬁcing the human perception. The dynamic function CðhÞ is linearly changed with the height between ½0; a; where a is saturation value that represents the largest height to have maximum AC frequencies. Another parameter b restricts the maximum ACs for the purpose of bandwidth reducing. Using the parameters a and b; the dynamic cut-off function is formulated as follows: ( b1 a h þ 1 if 0phpa and aa0; CðhÞ ¼ ð6Þ b if h > a or a ¼ 0:

40 β 20 10

MIN=1 0

50

100 α 150 Height

200

250

Fig. 2. Cut-off values with a and b:

4.2. Bandwidth adjustment of pre-encoded stream The procedure of the adaptive transcoding in the DCT domain is summarized as follows: 1. Obtain the DCT-coefﬁcients of the luminance component of a block from the MPEG video. 2. Estimate the direction of a dominant discontinuity. If necessary, rotate coefﬁcients by inverting the coefﬁcient signs and/or changing the coefﬁcient positions (in the experiments, we use four directions of multiple 901). 3. Align the coefﬁcient and ﬁnd the discontinuity position using the criterion in Eq. (3). 4. Evaluate the discontinuity height from the aligned position using Eq. (4). 5. Transcode a block according to the height of a discontinuity within a target bit rate. 6. Repeat from 1 until all blocks are transcoded in a frame.

502

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

In our transcoding scheme, the incoming compressed video bitstream is partially decoded. The DCT-coefﬁcients obtained through VLD and inverse quantization. The transcoder performs low-pass ﬁltering by CðhÞ to reduce the bit rate. The header and motion information are kept unchanged. It preserves the original motion information and can be combined by the compensation mechanism [1] to prevent drift errors. This scheme has low-cost and fast advantages. The target bandwidth ðBout Þ can be estimated by the reduction rate between bit rate of input and that of output, Rða; bÞ ¼ Bout =Bin 100; which is shown in Fig. 3. Fig. 3(a) shows the average

reduction rate with various a’s, which is obtained by averaging 100 transcoded samples of videos. As a increases, the average bandwidth of a video stream decreases monotonically. Fig. 3(b) shows the average percentage of reduction rate with various b’s in the constant cut-off method (dotted line) and in the content-based method (solid line). When a is zero, the transcoder acts as the constant cut-off ﬁlter, which abruptly changes the target bandwidth between 80% and 20% of input bandwidth within the interval 20XbX1: On the contrary, we can smoothly adjust the Bout ¼ Bin Rða; bÞ by combining the scales of 0pap255 and 64XbX1 in the content-based transcoding. Thus,

100

Reduction Rate

80

*

60

40

20

0

0

50

100

200

150

250

α

(a) 100

C(h), α =255 Cc, α = 0

Reduction Rate

80

60

*

40

20

0 (b)

60

50

30

40

20

10

0

β

Fig. 3. Average percentage of reduction rate with various a’s and b’s: (a) reduction rate with various a’s ðb ¼ 64Þ in CðhÞ and (b) reduction rate with various b’s ða ¼ 255Þ in CðhÞ and Cc :

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

the parameter a and b for a desired output can be chosen using the average reduction rate, Rða; bÞ ¼ Bout =Bin 100; in Fig. 3. For example, when the reduction rate is 60%, the parameters are obtained as a ¼ 100 and b ¼ 64: We can set a ¼ 255 and b ¼ 35 for Rða; bÞ ¼ 40%, which is represented by a * on the dotted line in Fig. 3(b). The overall bit rate regularization can be achieved by adding or subtracting the number of bits under or over used so far to the bandwidth of the desired output.

5. Experiments In the experiments, our dynamic transcoding method applies to two videos obtained from broadcasting, which are compressed by a MPEG-1 standard encoder of fI B B P B B P B B P B B P B Bg frame structure. Each video consists of 1500 frames and 100 I-frames for 50 s: We used two rotations (901 and 1801), which cover four

503

directions of a discontinuity. The FDCT and IDCT implementation is based on an algorithm described in [10]. To verify the visual quality, we use RMSE and PSNR, which are formulated as follows: P ½ i;j ff ði; jÞ fc ði; jÞg2 1=2 RMSE ¼ ; ð8Þ N 255 PSNR ¼ 20 log10 ; ð9Þ RMSE which are estimates of the quality of a reconstructed image ðfc Þ compared with an original image ðf Þ: Reconstructed images with higher metrics are judged better in PSNR and lower values are judged better in RMSE. To compare the constant ﬁltering of Cc ¼ 4 in Eq. (5) with adaptive Cðh; a; bÞ in Eq. (7), we present sample spatial images for a music video and a sports video in Figs. 4 and 5, respectively. Each original MPEG image is shown in Figs. 4(a)

Fig. 4. Transcoded images of a music video: (a) an original MPEG image, (b) transcoded image by a constant ﬁltering ðCc ¼ 4Þ; (c) discontinuity heights of DCT blocks (darker color represents lower height and (d) transcoded image by adaptive ﬁltering (a ¼ 30 and b ¼ 8).

504

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

Fig. 5. Transcoded images of a sports video: (a) an original MPEG image, (b) transcoded image by a constant ﬁltering ðCc ¼ 4Þ; (c) discontinuity heights of DCT blocks (darker color represents lower height and (d) transcoded image by adaptive ﬁltering (a ¼ 70 and b ¼ 8).

and 5(a), and the transcoded images by the constant ﬁltering are presented in Figs. 4(b) and 5(b). Since these images were ﬁltered by a constant value for whole blocks in each image, some blocks that have sharp contrast (discontinuity with large height) are blurred. We manually select the parameters of the adaptive ﬁltering as a ¼ 30 or 70, and b ¼ 8 for preserving similar bandwidth with the constant ﬁltering. Figs. 4(c) and 5(c) show the height values of DCT blocks, which are evaluated by DCT alignment described in Section 3. The heights are represented by brightness with height, and the resultant height maps present the reasonable description except for the blocks having slant discontinuities. Since the contentbased method ﬁlters each block by considering its height, it prevents much degradation of the image quality as shown in Figs. 4(d) and 5(d). The measures, which reﬂect the visual quality, for the video in Fig. 4 are listed in Table 1. The top row in each column shows the numeric result

transcoded by the constant ﬁltering of b ¼ 4 or 8, and the other rows represent the results of similar bandwidth with various a’s and b’s for the content-based ﬁltering. All the RMSE results of the adaptive ﬁltering decrease about 10% compared with the constant ﬁltering. Several pairs of parameters produce the similar bandwidth with the constant ﬁltering, such as ða ¼ 30; b ¼ 8Þ; ða ¼ 70; b ¼ 12Þ or ða ¼ 100; b ¼ 16Þ: Since b clips the number of ACs, if we select a large value for b; the blocks having large heights are described by many ACs, and the other blocks have to be much degraded. On the other hand, if we choose a small value for b; the blocks having large heights are described by a few ACs, and the other blocks are not much degraded. Although above consideration is not reﬂected in both PSNR and RMSE, the visual quality is perceived differently by human. Above consideration and the relationship between metrics and parameters are shown in Fig. 6,

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

505

Table 1 Average bandwidth (bits per frame), PSNR (dB) and RMSE of 100 I-frames with various a’s and b’s a

b

Bandwidth

PSNR

RMSE

a

b

Bandwidth

PSNR

RMSE

0 10 20 30 50 60 70 80 90 100

4 8 8 8 12 12 12 16 16 16

64 401 66 959 65 972 63 420 68 388 66 310 62 606 68 728 64 999 64 436

78.22 79.38 79.20 78.82 79.47 79.20 78.66 79.42 78.89 78.77

0.0406 0.0347 0.0352 0.0364 0.0343 0.0352 0.0370 0.0345 0.0362 0.0366

0 05 10 20 40 50 60 60 70 80

8 16 16 16 24 24 24 32 32 32

89 398 89 036 88 193 87 034 91 435 86 794 84 269 92 000 88 451 85 870

81.22 82.39 82.30 82.08 82.55 81.81 81.45 82.51 81.89 81.52

0.0311 0.0268 0.0270 0.0275 0.0265 0.0281 0.0289 0.0266 0.0279 0.0288

90000

β =16 β =12 β=8 Cc= 4

BANDWIDTH (bpf)

80000 70000 60000 50000 40000 30000

50

150

100

250

200

α

(a) 85

β = 16 β = 12 β= 8 Cc = 4

84 83 PSNR

82 81 80 79 78 77 76 (b)

50

100

150

200

250

α

Fig. 6. Average bandwidths and PSNRs with various a and b (horizontal line represents a value of the constant ﬁltering): (a) bandwidth (bits per frame) and (b) peak signal-to-reconstructed image measure.

506

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507

whose data are obtained from a sports video shown in Fig. 5. Fig. 6(a) denotes the bandwidths (bits per frame) with various a’s and b’s, and the horizontal dotted line represents a value of the constant ﬁltering. In the ﬁgure, we select b as 2–4 times larger than Cc to enhance the visual quality by maintaining many ACs for blocks having large heights. In each value of b; a can be adjusted to match the required bandwidth, and this scheme regulates the output bandwidth more precisely than the constant ﬁltering. Fig. 6(b) shows the PSNRs with various a’s and b’s, and the horizontal dotted line represents a value of the constant ﬁltering. By observing the graphs in Fig. 6(a) and (b), we can ﬁnd the facts that the adaptive ﬁltering retains higher PSNR than the constant ﬁltering on the same bandwidth in each value of b: It is also found that the bandwidth of the adaptive ﬁltering is lower than that of the constant ﬁltering on the same PSNR. The processing for the content-based adaptive ﬁltering is not much expensive as shown in Table 2. The elapsed time is checked in a workstation for 1500 frames of each video. The read and depacketize time is denoted as Tr ; and the processing time for DCT and IDCT transforms is represented as Ts : The ﬁltering times for the constant ﬁltering, the requantization, and the content-based low-pass ﬁltering are denoted as Tc ; Tq and Ta ; respectively. It takes 5:43 ms to read a frame and the ﬁltering time is 85:65 ms by using the adaptive ﬁltering for 352 240 sized frame. The ﬁltering time is a little expensive than the constant ﬁltering ð37:95 msÞ or

Table 2 Elapsed times of the transcoding processing (ms): Tr is processing time per frame to read and depacketize the original MPEG stream of 1500 frames, and Tc ; Tq ; Ta and Ts denote the average processing time per I-frame (100 frames) of the constant low-pass ﬁltering, the requantization, the contentbased low-pass ﬁltering, and IDCT plus DCT conversion, respectively Video type

Tr

Tc

Tq

Ta

Ts

Music ð352 240Þ Sports ð320 240Þ

5.43 5.24

37.95 32.80

53.28 47.75

85.65 79.52

367.41 342.39

the requantization ð53:28 msÞ: However, the processing time for the adaptive ﬁltering is much cheaper than that of DCT conversions, and since there are usually two I-frames per second in common video sequences, the processing time is short enough for real-time applications.

6. Conclusion Although there have been many schemes for transcoding, few techniques have considered the spatial information embedded in the compressed bitstream. In this paper, we have proposed a content-based transcoding technique using the representative height in a DCT block. As shown in the experiments, the technique is efﬁcient in processing time and maintains higher visual quality by precisely adjusting the bandwidth. This technique can be a fundamental technique for the different architectures and rate-control methods, which evaluate complexity of DCT blocks or pursue real-time processing in the DCT domain. More research is required to formulate the relationship between the transmission bandwidth and the transcoding parameters.

References [1] P.A.A. Assuncao, M. Ghanbari, A frequency-domain video transcoder for dynamic bit-rate reduction of MPEG-2 bit stream, IEEE Trans. Circuits Systems Video Technol. 8 (8) (December 1998) 953–967. [2] N. Bjork, C. Christopoulos, Transcoder architectures for video coding, IEEE Trans. Consumer Electron. 44 (1) (1998) 88–98. [3] N. Chaddha, A software only scalable video delivery system for multimedia applications over heterogeneous networks, in: Proceedings of the International Conference on Image Processing, Washington, DC, October 1995. [4] F. Garcia, D. Hutchison, A. Mauthe, N. Yeadon, QoS support for distributed multimedia communications, in: Proceedings of IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, February 1996. [5] M. Ghanbari, Two-layer coding of video signals for VBR networks, IEEE J. Select. Areas Commun. 7 (1989) 771–781.

TaeYong Kim, J.S. Choi / Signal Processing: Image Communication 17 (2002) 497–507 [6] R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, MA, 1992, pp. 605–606. [7] ISO/IEEE 13818-2, Information technology-generic coding of moving pictures and associated audio information– Part 2: Video, 1995. [8] JunXin, Ming-Ting Sun, Kangwook Chun, Bit-Allocation for transcoding of pre-Encoded video streams, Visual Commun. Image Process. 4671 (2002) 164–171. [9] Ligang Lu, Shu Xiao, J.L. Kouloheris, C.A. Gonzales, Efﬁcient and low cost video transcoding, Visual Commun. Image Process. 4671 (2002) 154–163. [10] C. Loefﬂer, A. Ligtenberg, G. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 988–991. [11] J. Moura, R. Jasinschi, H. Shiojiri-H, C. Lin, Scalable video coding over heterogeneous networks, Proc. SPIE 2602 (1996) 294–306.

507

[12] Y. Nakajima, H. Hori, T. Kanoh, Rate conversion of MPEG coded video by re-quantization process, in: International Conference on Image Processing, 1995. [13] B. Shen, I.K. Sethi, Inner-block operations on compressed images, in: ACM multimedia ’95, 1995, pp. 489–498. [14] B.C. Smith, L.A. Rowe, Algorithms for manipulating compressed images, IEEE Comput. Graphics Appl. 13 (5) (1993) 34–42. [15] TaeYong Kim, Joon Hee Han, Model-based discontinuity evaluation in the DCT domain, Signal Processing 81 (4) (2001) 871–882. [16] O.H. Werner, Requantization for transcoding of MPEG-2 intraframes, IEEE Trans. Image Process. 8 (2) (1999) 179–191. [17] N. Yeadon, F. Garcia, D. Hutchison, D. Shepherd, Continuous media ﬁlters for heterogeneous internetworking, in: Proceedings of SPIE-Multimedia Computing and Networking (MMCN’96), 1996.

Content-based video transcoding in compressed domain

Content-based video transcoding in compressed domain

Recommend Documents