Video transcoding architecture with minimum buffer requirement for compressed MPEG-2 bitstream

Video transcoding architecture with minimum buffer requirement for compressed MPEG-2 bitstream

Signal Processing 67 (1998) 223—235 Video transcoding architecture with minimum buffer requirement for compressed MPEG-2 bitstream Kou-Sou Kan!,", Ku...

311KB Sizes 0 Downloads 44 Views

Signal Processing 67 (1998) 223—235

Video transcoding architecture with minimum buffer requirement for compressed MPEG-2 bitstream Kou-Sou Kan!,", Kuo-Chin Fan!,* ! Institude of Computer Science and Infomation Engineering, National Central University, Chung-Li, Taiwan, ROC " Chung-Hwa Telecom Labs., P.O. Box 71, Chung-Li, Taiwan, ROC Received 13 December 1996; received in revised form 23 January 1998

Abstract In this paper, an extremely efficient MPEG-2 video transcoding method is proposed, which has a low buffer requirement and results in low delay. Most importantly, the proposed method does not need the computational intensive motion estimation. Simulation results demonstrate that the proposed approach results in very consistent video quality and maintains the buffer level effectively. Compared with direct decode/encode approach, the proposed approach only suffers slight quality degradation. ( 1998 Elsevier Science B.V. All rights reserved. Zusammenfassung In diesem Artikel wird eine extrem effiziente Methode zur MPEG-2 Videocodierung vorgeschlagen, die einen geringen Speicherbedarf hat und deshalb nur eine kleine Verzo¨gerungszeit verursacht. Noch wichtiger ist, da{ die vorgeschlagene Methode keine rechenaufwendigen Bewegungsscha¨tzungen beno¨tigt. Simulationsergebnisse zeigen, da{ der vorgeschlagene Ansatz in eine sehr konsistente Videoqualita¨t mu¨ndet und die Speichergro¨{e effektiv ausnutzt. Im Vergleich zu direkten Dekodier- bzw. Kodierungsansa¨tzen leidet der vorgeschlagene Ansatz unter einer leichten Verschlechterung der Qualita¨t. ( 1998 Elsevier Science B.V. All rights reserved. Re´sume´ Nous proposons dans cet article une me´thode de transcodage MPEG-2 extreˆmement efficace, caracte´rise´e par une exigence limite´e pour le tampon, ce qui a pour re´sultat un retard faible. Le fait saillant est que la me´thode que nous proposons ne ne´cessite pas d’estimation de mouvement, qui est si couˆteuse en calcul. Les re´sultats de simulation mettent en e´vidence que l’approche propose´e a pour re´sultat une qualite´ vide´o conse´quente et maintient effectivement le niveau du tampon. L’approche propose´e ne souffre que d’une le´ge`re de´gradation de qualite´ vis-a`-vis de l’approche directe encodage—de´codage. ( 1998 Elsevier Science B.V. All rights reserved. Keywords: Video transcoding; MPEG; Motion estimation

* Corresponding author. Address: Institute of Computer Science and Information Engineering, National Central University, Chung-Li 32054, Taiwan; tel.: 886-3-4227151 ext. 4453; fax: 886-3-4222681; e-mail: [email protected]. 0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 0 3 9 - 5

224

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

1. Introduction

2. Overview of MPEG-2 coding standard

MPEG [3—5] is an efficient video coding standard and has been adopted to many video communication applications such as digital library, video broadcasting, video on demand, and news on demand. In many non-realtime applications, a video source is compressed at a predetermined quality or bit rate. The compressed video is stored for retrieval or transmission at a late time. In order to adapt the compressed bit rate to an available channel rate, the compressed bit stream needs to be decoded and re-encoded, or transcoded, to meet the desired channel rate. Without transcoding, a sources material needs to be pre-compressed at a number of possible channel rates, which is not efficient. Video transcoding discussed in [10] provides several main concepts of MPEG-2 bit rate scaling. However, the system complexity such as buffer size requirement is not addressed in detail. The test video sequence contains simple panning motion and the sample frame number is limited. Therefore, the coding result with violent motion and scene change over practical video sequence content requires further study. In this paper, the proposed video transcoding strategy is intended for constant bit rate (CBR) video. To transmit video over fixed rate channel, a buffering mechanism is required to smooth out the fluctuation in generated bit rate. The buffer overshot and undershot must be avoided to maintain proper decoding operation. In MPEG coding, buffer control has been widely discussed [2,6—9] for video coding based on network feedback. The quality of the transcoded video will degrade significantly using simplified rate shaping mechanism [10] such as direct scaling on the macroblock quantizer. The proposed algorithm, however, suffers slight quality degradation while avoiding high system complexity. The rest of this paper is organized as follows. The MPEG coding standard is briefly reviewed in Section 2. The proposed video transcoding method is discussed and analyzed in Section 3. Section 4 demonstrates the simulation results. Finally, concluding remarks are given in Section 5.

In MPEG-2 video coding, a picture may be treated as a progressive or interlaced source. Each frame or field is coded in one of the I, P and B modes where I-picture indicates intra-coding, P-picture is forward-predictive coded using information from the previous P- or I-picture, and B-picture is bidirectional-predictive coded using information from the neighboring previous and future P- or I-pictures. Usually, the encoding sequence of the pictures is different from the display order as indicated in Fig. 1. Since the B-picture must refer to its previous and future pictures, it cannot be encoded until its reference pictures have been coded. Since I-picture does not take advantage of interframe correlation, its coding efficiency is relatively low. Furthermore, the I-picture is used as an anchor picture for subsequent picture in each group of pictures (GOP). Therefore high quality coding is required for I-pictures to ensure the overall quality of GOP. Consequently, the I-picture always results in much higher bit counts than the P- and B-pictures. As to the P-pictures, they can be coded more efficiently by exploiting the temporal domain redundancy based on the previous I- or P-picture. Since the P-picture is often referred by the future P-picture or neighbors B-picture, it must be coded as precisely as possible to guarantee the uniformity of its following pictures. The B-pictures can exploit interframe correlation based on coded previous and future pictures. Therefore, it accomplishes the most efficient coding. On the other hand, it requires forward and backward motion estimation, which leads to the highest complexit among all coding types. may employ motion compensation from the previous coded I- or P-picture, and from the next coded I- or P-picture. Prediction is called forward if the reference picture is a previous picture and called backward if it is a future picture. Therefore, B-picture has twice complexity and conducts more delay than P-picture. After motion compensation, either the original picture data or intra-coded spatial domain data are subjected to future processing, including discrete cosine transform (DCT), quantization and variable length coding (VLC). The VLC encoding and different coding decisions will result in a variable rate

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

225

Fig. 1. MPEG group coding structure in display order.

data. In order to adapt the variable rate bit stream to fixed rate channel, an elastic buffer with a proper rate channel control is required.

intracoded mode, the primary process includes motion compensation, inverse quantiztion and inverse DCT. The detailed operation steps can be found in [1,4,5].

2.1. System complexity of direct transcoding based on MPEG-2 decode/encode pair

2.1.3. Complexity of encoding process For an I-picture, the major operation includes DCT, inverse DCT, quantization, inverse quantization and VLC coding. For both P-picture and B-picture, they require DCT, inverse DCT, quantization, inverse quantization, VLC coding and motion estimation/compensation. Herein, motion estimation/compensation is the most time-consuming operation. A buffer control and feedback mechanism is required to regulate the variable length coded bitstream for transmission over a fixed channel. In general, the size of this buffer is proportional to the coding rate. A large buffer may absorb more fluctuation. The detailed processing steps can also refer to [1,4,5]. In a direct decode/encode transcoding algorithm, the overall system complexity is the sum of decoder complexity and encoder complexity. The total frame memories required are the sum of frame memories for decoder and encoder. The cost for

In this section, the system complexity of MPEG-2 codec pair is analyzed and details in [2]. Here, we assume that two B-frames are between two neighboring P- or I-pictures. 2.1.1. Video frame buffer requirement In decoding, at least three frame video memory is required to temporarily store the decoded I-, Pand 1 B-pictures. In encoding, it requires 4-frame video memory to temporarily store the input I-, B-, B-, P-pictures, and additional 2-frame video memory to temporarily store the decoded I- or P-pictures. 2.1.2. Complexity of decoding process In intra-coded mode, the major process includes inverse quantization and inverse DCT. In non-

226

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

such transcoder would be very expensive. Consequently, a simple, but effective, transcoding technique is proposed.

tion motion estimation. In order to accomplish the low delay, a small buffer with a sufficient bit rate control is employed. The detailed algorithm will be described as follows.

3. Architecture of the low complexity and low delay video transcoding

3.1. Primary architecture of video transcoding

The aim of this research work is to scale compressed video stream to match a wide range of channel bandwidths while maintaining picture quality. In order to allow the scaled bit stream to be decoded by a standard MPEG-2 decoder, the scaled bit stream must be compatible with MPEG2 syntax. Furthermore, end-to-end delay of the scaling operation is preferred to be as low as possible since it adds to the overall delay. As discussed earlier, the complexity of direct decode/encode transcoding is very high. A low complexity system can reduce the cost. In the proposed primary transcoding, the system structure of the decode—encode process is designed to eliminate the time consum-

The block diagram of the proposed video transcoding is illustrated in Fig. 2. In this proposed scheme, the bit stream is first decoded by regular MPEG-2 decoding process. The reconstructed pictures as well as the coding parameters are stored in the video memory and register as references for subsequent motion compensation. In encoding, the process is designed to refer to the decoded intelligence in order to reduce the time-consuming operation as much as possible. Including motion vectors, macroblock type, quantization step size of each slice and macroblock, bit counts of each macroblock as well as slice and scaled target bit rate can be obtained from decoding operation without

Fig. 2. Transcoding system architecture.

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

227

Table 1 Comparison of transcoding complexity between the proposed approach and direct decode/encode approach Minimum buffer video transcoder IVLC IDCT IQuant. MC 3-frame memory (I, P, 1 B)

Direct decode/encode transcoder

Decoding section

z z z z z

Encoding section

z z z z z z

DCT/IDCT MC Quant./IQuant. simple MB coding decision VLC small buffer for regulating bit created from I, P, B VLC z 2-frame memory (for coded I and P)

z z z z z z

Total frame memory required

5 frames

9 frames

further computation. Note that motion estimation is not considered in the proposed architecture to save the massive computation power. The decoded motion vectors are exploited to represent the corresponding motion of the objects. Basically, using the preceding decoded motion vectors will create extra prediction errors from high-quality to low-quality conversions. Further macroblock decision making based on newly reconstructed picture is necessary to correct the coding type. In the proposed strategy, the additional prediction error and quantizationrelated overhead bits are appended to the decoded macroblock type to produce transcoded macroblock type. In the proposed strategy, the encoding and decoding units virtually and physically share the same video memory. Therefore, highly compact system complexity is expected. The coding architecture is extremely simplified. Table 1 outlines the hardware complexity reduction of the addressed system architecture from the traditional MPEG-2 encode/decode pair. The architecture proposed in primary transcoding architecture provides fundamental rate adjusting equivalent to direct quantization. Accordingly, the extra prediction error may extensively increase the bit count difference among distinct picture coding type [10]. The irregular

z z z z z

IVLC IDCT IQuant. MC 3-frame memory (I, P, 1 B)

DCT/IDCT MC/ME Quant./IQuant. full MB coding decision VLC large buffer for regulating bit created from I, P, B VLC z 4-frame memory for input source (I, P, 2 B) z 2-frame memory (for coded I and P)

quality of service cannot be avoided. Hence, a temporal buffer control mechanism is required to regulate the bit stream and reduce the visual distortion before transmitted over the attached fixed rate channel. As indicated previously, standard MPEG-2 encoder suggests large buffer to balance the data difference between the channel and the variable length coder. This data difference is created by different picture coding types and distinct scene content. The buffering delay is varied according to the buffer size and scene variation. Large buffer size can produce longer end-to-end delay. Practically, a program center may serve many end offices and each end office serves numerous clients with different program streams. Therefore the processing delay must be well constrained within limited time period. Otherwise it is difficult to preserve certain degree of client site quality of service. To reduce the possible irregular quality of service caused by different buffer size is one of the major objects in the proposed strategy. In the proposed transcoding strategy, time delay can be reduced remarkably by more efficient strategy and arrangement. The detailed knowledgebased buffer control strategy is described as follows.

228

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

3.2. Knowledge-based transcoding buffer control strategy The proposed knowledge-based buffer control strategy is based on dynamic measurement and evaluation of the macroblock and slice bit counts. An initial approximation of the macroblock level quantizer is first established. Further adjustment on macroblock quantizer is employed according to the picture coding type and the information content of different pictures. The fundamental concept of this strategy is to adjust the macroblock quan-

tizer based on the measured variation within and between macroblock rows (slices). Moreover, continued monitoring of existing buffer status can avoid the buffer overflow and underflow situations. The object is to preserve consistent reconstructed picture quality while satisfying the allocated buffer constraint. Apart from direct MPEG-2 encode/decode pair work, the proposed strategy exploited existing decoded motion information to accomplish motion compensation. The computer program statement of the macroblock quantizer approximation is expressed as follows.

mbquants1"mbquantdecoded](input—rate/output—rate) if (slice—bitcountdecoded'slice—bitcountscale) M if (picture—type!"I—TYPE) mbquantscale"mbquants1](MB—bitcountdecoded/MB—bitcountscale)]2

(1)

else mbquantscale"mbquants1 N, where slice—bitcountscale"(output—rate/frame—rate)/macroblock—height MB—bitcountscale"slice—bitcountscale/macroblock—width macroblock—height"picture—vertical—size/macroblock—vertical—size macroblock—width"slice"picture—horizontal—size/macroblock—horizontal—size As illustrated in the computer program statement expression (1), the Mbquant is extracted $%#0$%$ from the decoder processing. The ratio between input—rate and output—rate is first used to provide a scaling approximation of the macroblock quantizer step size Mbquant . Mbquant is $%#0$%$ 4#!-% the resulting value of Mbquant scaling work. $%#0$%$ The coded macroblock bit counts and slice bit count are indicated as MB—bitcount and $%#0$%$ slice—bitcount , respectively, which can be ob$%#0$%$ tained from the decoding process. Hereafter, we can approximately deduce the bit count variation trend of each macroblock and slice. If the slice—bitcount which is obtained from the $%#0$%$ decoding process exceeds the scaling number slice—bitcount , then mbquant which is 4#!-% 41

(2)

constituted by output low quality macroblock scaling rate bitcount and input high quality de4#!-% coded macroblock rate MB—bitcount is $%#0$%$ further adjusted and become mbquant . The 4#!-% goal of the preceding operations is to establish an initial approximation of macroblock quantizer. By combining with macroblock and its neighboring slice buffer variation dynamics, additional macroblock quantizer step size adjustment with global/local resource distribution is devised which will be addressed below. For every input target macroblock, the final transcoding macroblock quantizer step size Mbquant is adjusted according to their pic53!/4#0$% ture coding type, and is defined as follows.

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

229

for I—picture or P—picture or B—picture if (buf—state( j!1)'rateratio) M if (slice—bitcount(i!1)'slice—bitcountscale or MB—bitcountaccumu( j!1)'MB—bitcount( j!1)scale—slice) M mbquant( j)transcode"(1/rateratio)](slice—quant#mbquant( j)scale)]buf—state( j!1)# c1](slice—bitcount(i!1)/slice—bitcountscale) N else M mbquant( j)transcode"(1/rateratio)](slice—quant#mbquant( j)scale)]buf—state( j!1)# c2](slice—bitcount(i!1)/slice—bitcountscale) N N else M if (slice—bitcount(i!1)'slice—bitcountscale or MB—bitcountaccumu—slice( j!1)'MB—bitcountscale—slice( j!1)) M mbquant( j)transcode"(1/rateratio)](slice—quant#mbquant( j)scale)]buf—state( j!1)# c3](slice—bitcount(i!1)/slice—bitcountscale) N else M mbquant( j)transcode"(1/rateratio)](slice—quant#mbquant( j)scale)]buf—state( j!1)# c4](slice—bitcount(i!1)/slice—bitcountscale) N N mbquant( j)"mbquant( j)]temp—rate—ratio(k!1) buffer—content( j)"buffer—content( j!1)#MB—bitcount( j)!data—rate where i"[1,mb—height] j"[1,mb—width]

(3)

230

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

k"frame—sequence—no%30 j

Mb—bitcountaccumu—slice( j)" + MB—bitcount(l) l"1 j

Mb—bitcountscale—slice( j)" + MB—bitcount(l) l"1 mb—width

+

slice—bitcount(i)"

mb!count(l#i)

l"1

buf—state( j)"buf—content( j)/buffer—size data—rate"output—rates/(macroblock—width]macroblock—height) macroblock—width

slice—quant"

+

mbquantscale/macroblock—width

1

rateratio"output—rate/input—rate k

temp—rate—ratio(k)" + frame—bitcountcoded(l)/((output—rate/frame—rate)]k).

(4)

l"1

As illustrated in computer program statements (3), buffer fullness status buf—state is examined at macroblock level. The buffer content is varied according to (i) the coded bit count remained in the previous macroblock coding stage mb—bitcount(j!1), (ii) the coded bit count of target macroblock mb—bitcount(j), and (iii) the channel data rate per macroblock data—rate. The third item data—rate is fixed number. The target macroblock quantizer step size is adjusted according to three parameters including (i) approximated Mbquant which is obtained from input—rate 4#!-% and output—rate as expressed in Eq. (1), (ii) former slice bitcounts statistics slice—bitcount — — , 4#!-% 13%7 4-*#% (iii) present coded slice bitcounts record slice—bitcounts . !##6.6—4-*#% Different picture coding types share the same evaluation strategy with distinct adjusting parameter. Basically, a coarse quantization will be activated in the encoding process under two conditions: (i) the previous coded slice bitcounts slice—bitcount — — is larger than the present 4#!-% 13%7 4-*#% default slice—bitcount , and (ii) the present 4#!-% accumulated coded macroblock bitcounts MB—bitcounts is larger than the accumu!##6.6—4-*#% lated default value MB—bitcount — . Target 4#!-% 4-*#% macroblock will acquire more bits for finer quantiz-

ation from the corresponding slice and/or preceding neighboring slice when the bit counts utilization of these slices are less than the allocated number. Note that no extra computation for DCT coefficients statistics is required here. The bit rate scaling is additionally restricted by adapting the macroblock quantizer step size Mbquant 53!/4#0$% with buffer dynamics buf—state. The encoding macroblock quantizer step size mbquant will be 53!/4#0$% replaced by a coarse quantization when the buffer status variable buf—state approaches certain buffer fullness state. Here, the variable slice—quant is the average value of the macroblock quantizer step size within individual slice. In order to maintain the scaled rate under desired number. The macroblock quantizer is further adjusted according to the temporal bit rate variations within each time period before coding as illustrated in the last statement in (3). The parameters c1&c4 are selected either empirically or for the purpose of demonstrating the performance of the proposed transcoding strategy.

4. Simulation results The test video sequences are generated by commerically available realtime MPEG-2 encoder.

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

Simulation results are reported in Figs. 3—5. A set of parameters used in the simulation is listed in Table 2. Four test sequences are chosen for simulation. The test video sequences are test sequence-1, movie ‘Back to the future 3’, test sequence-2, ‘movie Basic Instinct: part1’, test sequence-3, ‘movie Basic Instinct: part 2’, and test sequence-4, movie ‘TopGun’. The input source video is digitized at CCIR 601 4:2:0 resolution. Consecutive 450 frames are used to perform the steered strategy. The original compressed bitstream is coded at 8 Mbps. Optivision realtime MPEG-2 encoder Vstor-40 is employed to create these compressed streams. The target rate is selected as 6 Mbps, 5 Mbps, 4 Mbps and 3 Mbps by exploiting the proposed minimum buffer transcoding strategy. In total four different buffer sizes — 16 Kbytes, 32 Kbytes, 64 Kbytes and MPEG-2 regular 224 Kbytes (112 * 16 Kbits) are selected as target in simulation.

231

Figs. 3—5 are the experimental results of rate scaled at 3 Mbps with buffer size 16 Kbytes. The results contain maximum buffer content status of each frame, actual scaled rate within each time duration period (sec), and PSNR of each frame. Here the buffer status buf—state is examined under macroblock level. The maximum buffer content is defined as the maximum number of buf—state within each frame. In Fig. 3, the maximum buffer content is well restricted in the determined size 16 kbytes. Usually high buffer occupation situation is caused by I-pictures as well as P-pictures. Since common MPEG-2 encoder assigns finer quantization to I-picture and P-picture for better reconstructed picture quality. Both I-picture and P-picture are reference template for the subsequent P-picture and B-picture interframe coding. Hence both I-pictures and P-pictures hold large amount of bit within each

Fig. 3. Maximum buffer content (bit) from 8 Mbps to 3 Mbps scaling with 16 kbytes buffer.

232

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

Fig. 4. SNR (dB) of coded consecutive frame from 8 Mbps to 3 Mbps scaling with 16 kbytes buffer.

time duration unit. The steered transcoding strategy evaluates the present frame-decoded information. Information from surrounded frames is not referred. Therefore, the coded rate is conducted solely by the decoded frame. As depicted in (3) and (4), the available bit resource is efficiently distributed by the steered knowledge-based buffer control strategy. The coding resource of low prediction error macroblocks can be allocated to high prediction error macroblocks within the same slice or preceding neighboring slice. Hence, available coding resource is properly distributed to those macroblocks with high prediction error for finer quantization. The entire bit count can be held within the desired amount while still maintaining an excellent visual quality. The potential buffer expansion caused by addition prediction error and content transition is appropriately overcome. In addition, the entire operation can be greatly simplified than direct decode/encode processing.

Except for intensively scene transition condition, consistent reconstructed picture quality is achieved by proposed knowledge-based adaptive rate control mechanism without motion estimation. As shown in Figs. 3—5, the maximum buffer content is uniformly varied. The amount is kept within the desired degree for most of the reconstructed pictures. The objective PSNR is maintained at high standard and the resulting coded rate is stable. For further comparison, a direct decode/encode work based on MPEG-2 TM5 is used. Four 8 Mbps test sequence streams are decoded into consecutive separate pictures. MPEG-2 TM5based encoding process is employed subsequently to generate 3 Mbps, 4 Mbps, 5 Mbps and 6 Mbps compressed streams. The average PSNR between proposed strategy and direct decode/encode strategy is sketched in Table 3. The objective average PSNR of the proposed algorithm is considered to be equivalent to the direct decode/encode method.

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

233

Fig. 5. Actual coded bit rate (Mbps) from 8 Mbps to 3 Mpbs scaling with 16 kbytes buffer.

On the average, the difference is below 4 dB at the worst case. Since direct decode/encode process uses full motion compensation and adaptive rate control to evaluate the picture, the optimal objective PSNR ratio of the compressed can be achieved. However, it is a heavy work and mostly it can only be used to serve very limited number of access users. A common program center serve several end office sites and each end office site provides link to several thousands of users. Hence, direct decode/encode is the most expensive nature in common digital video delivery service. The proposed strategy can achieve equivalent quality with much efficient strategy and simplified architecture. In the proposed strategy, reducing the qualityirregularity caused by different buffer size is the most important object. As illustrated in Table 3, different buffer size conditions exhibit similar objective PSNR over fixed rate conversion. In high

quality to very low quality down conversion, additional prediction error will be created due to the neighboring scene transition. If the sequence contains more than one scene transition, large buffer size can produce higher picture quality of the target frame and neighboring frames temporarily. Accordingly, extra bits are also created. Course quantization must be activated in subsequent coding to achieve the target rate. It is a trade off between reconstructed quality and channel rate. In summary, the average PSNR will not increase evidently and may cause inconsistent quality of service. Besides, most of the consecutive pictures exhibit plain transition. Small buffer is enough to regulate the data difference. Therefore, the quality improvement is minor under large buffer. Furthermore, MPEG-2 separates consecutive pictures into independent groups. The improvement of the average PSNR is limited within group. The PSNR of entire sequence is not evidently increased. Different to MPEG-2

234

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223–235

Table 2 List of parameters

Table 3 Average PSNR comparison with different buffer size and direct transcode

1) Original high quality coded bit rate 8 Mbps Scaled rate (Mbps)

2) Target transcoded bit rate 3 Mbps, 4 Mbps, 6 Mbps with 3) Buffer size 8 KBytes, 16 KBytes

Buffer size (kbytes)

3

4

5

6

4) For I—picture c1"2 c2"2.5 c3"2.5 c4"3

(a) Test sequence-1 16 32 64 224 Direct transcode

41.06 41.07 41.19 41.2 44.59

42.29 42.34 42.45 42.54 46.54

43.39 43.49 43.64 43.72 47.17

44.59 44.85 45.06 45.28 48.58

(b) Test sequence-2 16 32 64 224 Direct transcode

45.26 45.46 45.47 45.48 49.21

45.59 46.73 46.8 46.92 50.41

47.54 47.71 47.75 47.81 50.85

48.77 48.99 49.05 49.1 51.51

(c) Test sequence-3 16 32 64 224 Direct transcode

45.71 45.74 45.78 45.79 49.16

46.81 46.95 47.09 47.19 50.37

47.78 47.94 48.03 48.1 50.92

48.5 48.7 48.71 48.76 51.6

(d) Test sequence-4 16 32 64 224 Direct transcode

40.63 40.64 40.65 40.71 40.87

42.16 42.17 42.18 42.23 42.73

43.40 43.44 43.51 43.59 43.68

44.73 44.76 44.78 44.84 45.38

for P—picture c1"6 c2"6.5 c3"8 c4"10 for B—picture c1"10 c2"12 c3"12 c4"14

TM5, the resource distribution among neighboring frames and the statistics of each transformed macroblock are not considered here. Therefore, available data rate and resource evaluation of each picture is restricted to the decoded intelligence. Unless motion search work is activated to obtain new motion vectors and reduce the prediction error for compensation, the quality improvement by using large buffer is very restricted in proposed strategy. In the proposed transcoding, consistent and impressive reconstructed picture quality can be achieved without using motion search work, since transcoding work is not a regular service. It is designed to overcome unexpected network congestion situation. Therefore, sustaining service, reducing end-to-end processing delay, preserving good and uniform quality of service are the highest objects of transcoding. As soon as the network congestion crisis is over, the admitted quality of service shall be recovered.

5. Summary and conclusions In this paper, we derived a unique MPEG-2 video transcoding architecture by exploiting a

direct and simple knowledge-based bit rate control algorithm to accurately scale the precoded MPEG-2 bitstream into desired data rate. In addition, the entire system complexity including transcoding buffer size, coding operation is held under a minimum requirement. This strategy strictly confirms to the MPEG-2 main profile and main level standard. No extra user data information is required to support the processing. In general, the processing time of each group of pictures is ten times less compared with the regular direct decode/encode architecture. The scaled bitstream is fully complied with the commercial available real time MPEG-2 decoder. The details of the proposed video transcoding architecture with minimum buffer requirement for compressed MPEG-2

K.-S. Kan, K.-C. Fan / Signal Processing 67 (1998) 223—235

stream are also described in this paper. The experimental results indicate that the knowledgebased bit rate control architecture proposed in this paper will unquestionably afford the scaled video sequence with stable output rate and consistent perceptual quality. The coding results are considered to be equivalent to the optimal solution which is direct decode/encode bitstream scaling method. In regular direct decode/encode bitstream scaling method, time-consuming motion estimation must be utilized to generate demanded bitstream. Optimal data rate as well as perceptual quality can then be achieved. However, it costs dual system complexity. Hence, it is not suitable for the application related to public service which requires the capability to provide extensive access over distinct transmission facilities. A simple alternative choice is to scale the transform coded DCT coefficients under proportionate manner. In this way, the required data rate can be reached. However, it will cause unacceptable perceptual quality. The proposed knowledge-based bit rate control strategy can successively overcome the buffer overshot and undershot problems under very low buffer size condition. Since the coding resources are evaluated intelligently and distributed efficiently, the possible perceptual distortion can be reduced, or even avoided. Therefore, the visual quality among successive frames is remained consistent. Moreover, it is the scaled rate which is excellently controlled.

235

References [1] C.T. Chen, Error detection and concealment with an unsupervised MPEG2 video decoder, Journal of Visual Communication and Image Representation 6 (3) (September 1995) 265—279. [2] A.W. Chen, A self-governing rate buffer control strategy for pseudoconstant bit video coding, IEEE Trans. Image Process. 2 (1) (January 1993) 50—59. [3] ISO/MPEG, Committee draft for MPEG video coding standard, CD 11172-2, November 1993. [4] ISO-IEC/JTC1/SC29/WG11, MPEG II, Test Model 5, April 1993. [5] ISO-IEC/JTC1/SC29/WG11, Information technology — Generic coding of moving pictures and associated audio information — Part 2: Video, Draft International Standard ISO/IEC DIS 13818-2, October 1996. [6] ISO/IEC JTC1/SC29/WG11, Information technology: Coding of moving pictures and associated audio: Digital storage media command and control, Committee Draft, June 1995. [7] M. Kawashima, C.T. Chen, F.C. Jeng, S. Singhal, Adaptation of the MPEG video-coding algorithm to network application, IEEE Trans. Circuit Systems for Video Tech. 3 (4) (August 1993) 261—269. [8] MPEG video SM Editorial Group, ISO-IEC/JTC1/ SC2/WG11/MPEG90/041, MPEG video simulation model Three, July 1990. [9] A. Puri, R. Aravind, B.G. Haskell, R. Leonardi, Video coding with motion-compensated interpolation for CDROM applications, Signal Processing: Image communication 2 (August 1990) 127—144. [10] W.K. Sun, J.W. Zdepski, Architectures for MPEG compressed bitstream scaling, IEEE Trans. Circuit Systems Video Tech. 6 (2) (August 1996) 191—199.