Accepted Manuscript Review Complexity-based Intra Frame Rate Control by Jointing Inter-Frame Correlation for High Efficiency Video Coding Mingliang Zhou, Yongfei Zhang, Bo Li, Hai-Miao Hu PII: DOI: Reference:
S1047-3203(16)30239-5 http://dx.doi.org/10.1016/j.jvcir.2016.11.013 YJVCI 1898
To appear in:
J. Vis. Commun. Image R.
Received Date: Revised Date: Accepted Date:
3 June 2016 8 November 2016 16 November 2016
Please cite this article as: M. Zhou, Y. Zhang, B. Li, H-M. Hu, Complexity-based Intra Frame Rate Control by Jointing Inter-Frame Correlation for High Efficiency Video Coding, J. Vis. Commun. Image R. (2016), doi: http:// dx.doi.org/10.1016/j.jvcir.2016.11.013
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Complexity-based Intra Frame Rate Control by Jointing Inter-Frame Correlation for High Efficiency Video Coding* a,b1
a
Mingliang Zhoua,Yongfei Zhang ,Bo Lia,b,Hai-Miao Hua,b Beijing Key Laboratory of Digital Media, School of Computer Science and Engineering, Beihang University, Beijing, China, 100191 b
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China,100191
Abstract. Rate control is of great significance for the High Efficiency Video Coding (HEVC). Due to the high efficiency and low complexity, the R-lambda model has been applied to the HEVC as the default rate control algorithm. However, the video content complexity, which can help improve the code efficiency and rate control performance, is not fully considered in the R-lambda model. To address this problem, an intra-frame rate control algorithm, which aims to provide improved and smooth video quality, is developed in this paper by jointly taking into consideration the frame-level content complexity between the encoded intra frames and the encoded inter frame, as well as the CTU-level complexity among different CTUs in texture–different regions for intra-frame. Firstly, in order to improve the rate control efficiency, this paper introduces a new prediction measure of content complexity for CTUs of intra-frame by jointly considering the inter-frame correlations between encoding intra frame and previous encoded inter frames as well as correlations between encoding intra frame and previous encoded intra frame. Secondly, a frame-level complexity-based bit-allocation-balancing method, by jointly considering the inter-frame correlation between intra frame and previous encoded inter frame, is brought up so that the smoothness of the visual quality can be improved between adjacent inter- and intra-frames. Thirdly, a new region-division and complexity-based CTU-level bit allocation method is developed to improve the objective quality and to reduce PSNR fluctuation among CTUs in intra-frame. In the end, related model parameters are updated during the encoding process to increase rate control accuracy. As a result, as can be seen from the extensive experimental results that compared with the state-of-the-art schemes, the video quality can be significantly improved. More specifically, up to 10.5% and on average 5.2% BD-Rate reduction was achieved compared to HM16.0 and up to 2.7% and an average of 2.0% BD-Rate reduction was achieved compared to state-of-the-art algorithm. Besides, a superior performance in enhancing the smoothness of quality can be achieved, which outperforms the state-of-the-art algorithms in term of flicker measurement, frame and CTU-wise PSNR, as well as buffer fullness. Keywords: HEVC, Complexity, Intra-frame, R-lambda model, Region-based, Quality smoothness.
1 Introduction High Efficiency Video Coding (HEVC) is the latest video coding standard developed by JCT-VC (Joint Collaborative Team on Video Coding). Compared with former standards of video coding, the coding efficiency can be dramatically improved with HEVC. Due to the various advanced encoding tools employed in HEVC, the compression performance can be improved to a great extent and half of the bit rate can be saved when compared with the H.264 for the same perceptual video quality. In addition, rate control (RC) is of great significance for the transmission of high-quality video data through the communication channel and it is aimed at achieving the best video quality under certain restrictions like bandwidth, delay of decoding, buffer capacity as well as content complexity. Intra-frame coding means that the diverse kinds of techniques in lossless and lossy compression are carried out relative to the information that is included in the current frame only, rather than to any other frames in the video sequence. Due to the significant improvement of the motion compensated prediction (MCP) in HEVC, inter frames are higher compared with the intra frames. In 1
Corresponding Author: E-mail:
[email protected],
[email protected], Tel: + (0086)010-82314108.
*This work was partially supported by the National Key Research and Development Plan (Grant No.2016YFC0801001) and the National Natural Science Foundation of China (No. 61272502).
this way, the channel bandwidth occupied by the HEVC’s intra frame bits is more than those of the H.264/AVC in a compressed video stream. Moreover, the intra frame rate control is of greater importance in HEVC than in H.264/AVC, which may cause the buffer overflow and frame-skip. Thus, the intra frame rate control of HEVC is not only one of the most hot research point but also the valuable key point. The majority of studies on the H.264 rate control technology pay attention to P-frames, and there are fewer researches on intra-frame rate control [2-16]. Nevertheless, some deficiencies exist in those schemes: (1) incorrect estimation of QP: The estimation of intra-frame QP is conducted based on bits per pixel (bpp) only. (2) Insufficient ability of buffer control: Buffer overflow tends to occur sometimes and frame skipping is also likely to be caused, especially under the condition of a low bit rate. (3) Inexact frame-level rate control: a very large deviation is shown between the target bits of a frame and the generated encoding bits of it. There are at least the following two reasons causing the above problems. One is that those algorithms don’t give full consideration to the complexity of frames. The other is that a unified model is adopted while allocating bit for macro block (MB) of diverse content characteristics. The rate models in HEVC mainly include three categories, namely, quadratic model (URQ) [17], ρ-domain model [18] as well as R-lambda model [19-20]. A rate control scheme [17] was applied to HEVC, while the same quadratic rate quantization model was also adopted, in which the MAD estimation was seen as the complexity measurement. Nevertheless, several restrictions are presented in the proposed rate control scheme to obtain accurate results of rate control, which is caused by its inaccurate rate-quantization model. In [18], the rate control schemes based on ρ-domain is proposed by adopting the percentage of zero-quantized coefficients. Regardless of their rate models’ higher accuracy, mapping ρ to the sizes of quantization step is of great difficulty. λ-domain RC that has been integrated into the HEVC reference software has been proposed by Li et al. [19][20] for more accurate rate estimation. In comparison with the quadratic and ρ-domain models, the overall bit rate which includes not only the transform coefficient bits but also the overhead bits is considered in the R-lambda model. It should be pointed out that the R-λ model only considers the target bit but ignores the complexity characteristics. More specifically, the value of the current intra frame is directly set as the QP of each CTU in frame level [19]. Thus, the content complexity has been considered to improve R-λ model. And the most representative methods [21]-[30] will be analyzed and summarized in the next. Wang and Karczewicz [21] proposed to use summation of absolute transformed differences (SATD) to measure the complexity of an intra-frame. And some modifications have been made in the intra frame rate control through enabling bit allocation and QP computing at CTU level. However, if intra period is excessively long, it will be unable to avoid the reduction of coding efficiency. Yimin Zhou and Ling Tian [25] proposed a rate control scheme for HEVC based on the proposed novel R-D model and a PID buffer feedback controller. A rate control method based on frame and content is proposed for the intra frame coding of HEVC [26]. However, above two methods are proposed for all intra coding structure (AI), and not suitable for other coding structure. S Li and M Xu [29] proposed a novel weight-based R-λ rate control scheme to improve the perceived visual quality of compressed conversational video, based on the weights of face regions and facial features learned from eye-tracking data. However, it failed therein to take account the content complexity in frame level rate control, which usually causes bit fluctuation. Wang and Ngan [27] presented a novel rate control framework based on the Lagrange multiplier in high-efficiency video coding. This method can get ideal result for inter frame, and is especially suitable for the application of inter frame. However, it does not consider the application of intra frames, which restricts the range of application. What’s more, the method does not discuss the difference of different types of frames in HEVC, and cannot maintain the stability of the quality of different types of frames. M. Zhou etc. [30] proposed a content-adaptive model coefficients estimation scheme for multi-dimensional rate control. In short, the existing HEVC rate control methods have considered the content complexity of intra frame to a certain extent. However, the considerations are not adequate. In the meantime, the relationship between complexity and the different types of frames didn’t discuss. We will carry out further analysis in the next section. To address this problem, a complexity-based rate control scheme for HEVC is developed for the first time in this paper, which is particularly appropriate for the intra frames under low delay main coding structure (LD) and random access coding structures (RA) configurations. So far as we known, there has been no comparable rate control scheme for HEVC. The rest of this paper is organized as follows. In the second section, the R-lambda model–based RC algorithm in current HEVC is reviewed briefly. We present inter-frame correlation-based complexity estimation model in the third section. Then our proposed content-based RC algorithm is elaborated in the fourth section, while the evaluation of performance is presented in the fifth section. At last, we provide the conclusion of this paper.
2 Relevant works and Observations In this section, the de-efficiency of the mainstream rate control algorithms of HEVC is analyzed and then the solution is given out briefly. Due to high efficiency and low complexity, the R-lambda model [19] is well adopted by JCT-VC and well applied in many academic fields. The R-lambda model in HEVC is similar to the rate distortion ( RD ) curve’s slope, which is represented by as below
D R R
(1)
(1) shows that value is determined by bitrates R. In which refers to a partial derivative, while and respectively stand for the parameters relevant with the sequence’s RD features [19]. Two major parts are included in the algorithm, one of which is bit allocation and the other is QP determination. It works at the group of pictures (GOP), frame and CTU levels. GOP level: GOP is a coding group which includes some successive frames and has the default number of four in Low Delay (LD) configuration and eight in random access (RA) configuration. More brief details can refer to reference [1]. On the basis of a target bit rate along with the GOP’s size, the frame rate as well as a virtual buffer size, the mean number of every GOP’s bits is calculated. Frame level: Every frame’s target bit rate, is calculated based on every frame’s average allocated bits as well as the frame’s hierarchical level. Afterwards, the corresponding value of target bit rate is calculated with the application of the R-lambda model. CTU level: in terms of the inter-frames, the target bits of each CTU are calculated by considering the bit budget that is allocated to the current frame. In addition, the CTUs’ weights are computed to be the MAD between the current CTU and the same collocated CTU in former coded frame at the same hierarchical level. With the purpose of achieving a better performance of rate control and low computation complexity at both picture and CTU levels, following equation is used to determine the values of QP : QP 4.2005In 13.7122
(2)
As for the inter-coded frames, the real quantity of encoded bits and values of are applied to update of the model parameters at the levels of GOP, frame as well as CTU, however, the value of the current intra frame is directly set as the QP of each CTU in frame level. In [21], to better control bitrates in an intra-coded frame, a complexity measure (sum of Absolute Transformed Differences) of the current CTU for intra frame is additionally taken into consideration in the R-lambda model. However, the considerations are not adequate. And some new methods are proposed recent years; these methods still have many problems to deal with according to Section 1. Several problems are listed as follows: Firstly, in the procedure of frame-level bit-allocation of intra-frames, the bitrates that are allocated to the intra-frames are based on the bits per pixel (bpp) only. The intra-frame quality can be ensured, however, the bit starvation of subsequent inter-frames will be caused. As a result, the quality of subsequent inter-frames will degrade unavoidably. This would be even severe for the last several frames of a sequence. This phenomenon can be well explained in Fig.1, and a length of 200 frames. Section 5 will depict the details of experimental setting. As the frame-level rate control of intra frame only takes bpp into consideration instead of the complexity, quite a few bits has been spent for the encoding of frames in earlier part of intra-periods, whereas a sharp decrease is shown in the bit rates over the last several frames in intra period. All these defects may lead to the degradation of average PSNR and the unfavorable fluctuation of PSNR, which is particularly for the conditions that the sequences are complex and the target bitrates are low. Secondly, for CTU-level rate control in intra frames, if the top-down ordinal process is adopted, it may result in the overdraft of the bit-budget for CTUs at the beginning and the bit-starvation for CTUs later in a frame. And it might cause a quality fluctuation of CTUs within an intra-frame, which will further effect the video quality of subsequent inter frames due to motion estimation and compensation. The quality fraction has obvious regional characteristics usually. In other words, the PSNR in different region vary significantly. And human eyes have different levels of sensitivity to quality variation in different regions. So the bit allocation in different region should be varied. The reason can be well explained by Fig.2. Because of adopting one uniform model for CTUs of an entire frame, the quality of each part of intra frame is not consistent, and varies in different regions, which might have an influence of inter frame.
Meanwhile we can see that the unreasonable distribution can lead to the quality difference between intra frame and inter frame. For Fig.2 (a), the quality of intra frame is better than inter frame, however, for Fig.2 (b) the quality of inter frame is better than intra frame. To the best of our knowledge, the traditional region-based rate control methods [31]-[36] major are concern of how to divide region for inter frame, but for intra frame. In general, for the traditional region-based rate control schemes, the coding complexity of each CTU in the next inter frame can be predicted from those of the current inter frame. Usually motion information is taken into account in the redistribution operation. Those methods are not suitable for intra frame. Hence, a region-based bits allocation is proposed for intra frames by considering different characteristics of different regions. A new rate control scheme for HEVC is proposed in this paper with the purpose of dealing with all above-mentioned deficiencies. Meanwhile, a study is made on the frame-content complexity and it is merged into an R-lambda model. The major characteristics can be concluded as follows: 1) content complexity estimation based bit-allocation-balancing technique by jointly considering the inter-frame correlation between intra frame and previous encoded inter frame, is brought up to achieve a smooth visual quality between inter- and intra-frame. It can effectively solve the first problem which is aforementioned. 2) A region-based intra-frame rate-control scheme is brought up and a new prediction measure of content complexity for CTU of intra-frame is proposed based on the inter-frame correlation between intra frame and previous encoded inter frames and the intra-frame correlation between intra frame and previous encoded intra frame so that the subjective quality can be improved and the PSNR fluctuation among CTUs can be reduced, it can effectively solve the second problem which is aforementioned. In addition, updates related model parameters during the encoding process to increase rate control accuracy.
BlowingBubbles (416×240,200 kbps)
BQMall (832×480, 1000kbps )
FourPeople (720p, 2000 kbps)
PartyScene (1080p, 12000kbps)
(a)The PSNR of each frame.
BlowingBubbles (416×240,200 kbps)
BQMall (832×480, 1000kbps )
FourPeople (720p, 2000 kbps)
PartyScene (1080p, 12000kbps)
(b)The bits of each frame. Fig.1 The PSNR /Bits of each frame in HM16.0 (Intra period=12).
3 The proposed Inter-frame Correlation-based Complexity Estimation Model In this section, we analyze the shortcomings of the existing complexity estimation model of intra frame, and then introduce our proposed complexity estimation model in detail. Different from R-λ model (As shown in (1)), a complexity measure C of the current frame (CTU) is additionally taken into
consideration in the R-λ model [21] as follows:
C C R R
(3)
The C may also be in a larger range. Problems still exists despite related works and our method is optimized to solve these problems. We will describe in detail as follows. In HM content complexity of each CTU is used to allocate bit-rates, the content complexity is measured by MAD (Mean Absolute Difference) of the CTU at the same location in the reference frame for the HEVC inter frame coding [19-20]. However, the QP of each CTU is directly set to the value of frame-intra frame in JCT-VC K0103 [19]. To reduce bit estimation error intra frame, in JCT-VC M0257 [21], a complexity measure (the Hadamard transform) of the CTU for intra frame is additionally taken into consideration in the R-λ model. The complexity measure of each LCU, estimated by the complexity measure (the Hadamard transform) of collocated frame in the previous coded intra frame belonging to the same level. Compared with previous approaches, JCT-VC M0257 has made great progress which can achieve good application results, especially for all intra coding structure (AI). While, this scheme also has a problem, for RA and LD, it would be inappropriate to predict the current frame with previous intra-frame if the intra period is too long and the correlation between current intra frame and previous intra frame is relatively weak. This phenomenon can be well explained in Fig.3, the correlation limit between two frames decreased with the increase in the interval. So far, there is no way to solve it effectively. In case of the correlation intra coded frame will be smaller than the correlation between inter coded frames since intra frames will be placed in larger temporal distance from each other. Therefore, taking into account the influence of inter-frame and the intra frame, the inter-frame correlation-based model is established for CTU complexity estimation between the former intra frame and the former inter frame as follows.
47-th(inter frame)
(a)PartyScene(832×480, 800kbps)
48-th(intra frame)
80-th(intra frame)
79-th(inter frame)
(b) BasketballPass(416×240,150kbps) Fig.2 Reconstructed successive frames under the the HM16.0 is adopted.
N
C p v ( SATD pI ) (1 v)
SATD
i p
/N
(4)
i 1
where p refers to p th CTU. C p refers to the complexity of p th CTU. N is the length of intra period. SATD is sum of absolute values of coefficients obtained after applying Hadamard transform, which is taken from JCT-VC M0257[21]. Where SATD pI represents the average of absolute values of coefficients p th CTU at the same location in former intra frame and N
SATD
i p
/ N represents the average of absolute values of coefficients at the same location in inter frames in the last intra
i 1
period. v is a weighted factor determined by the interrelationship between the former intra frame and the preceding inter frames in the last intra period. v max(0, (min(
1 ,1)) N
(5)
As can be seen from the above equation, the intra period is longer, the value of v is greater, and vice. We thus propose to use inter-frame correlation-based model for CTU complexity estimation of intra frame because: 1) the proposed method can effectively avoid the weak relevance correlation between current intra frame and previous intra frame in the long intra period. 2) It matches the prediction characteristic in HEVC.
Fig.3 The average similarity between different sequence frames at different intervals.
4 Proposed RC Algorithm Fig.4 depicts the proposed complexity content-based rate control scheme, which includes the frame level bit allocation, region division and bit allocation, CTU level bit allocation and model parameter updating. Firstly, judge whether the present frame is the intra frame, if it is positive, allocate target bits for current intra frame. Secondly, divide regions and allocate target bits for each region. Thirdly, apply the bit rates to each CTU and determine the QP of CTU based on R-lambda model. Fourthly, update of parameters of frames and CTUs.
Fig.4 The flowchart of the proposed rate control scheme.
4.1 The Proposed Inter-frame Correlation-based Region Division Algorithm Human eyes have different visual sensitivity to distortions of different regions. It is straightforward to improve the video coding efficiency by distinguishing the moving regions and non-moving regions based on the characteristics of human visual systems. Due to the more sensitivity of moving regions to HVS, paying more importance to these regions becomes reasonable. Furthermore, the R-λ Curve Fitting curves (rate-distortion curve) of different region within an intra-frame are different; meanwhile, based on
different regions, the curves can be categorized into different types with different R-λ characteristics. Fig.5 shows two examples, it can be seen from that, and the different regions have different R-λ curves and different region of R-λ easily gathered together. From Fig.5, the R-λ curves in non-motion region descend more rapidly. It indicates that compared to moving region; the non-moving region needs only a small number of bits. Furthermore, as shown by the direction of arrows in Fig.5, the R-λ Curve Fitting curves (rate-distortion curve) of non-moving region are broken, which means that non -moving region can be further divided into two different parts.
(a)Keiba(832×480,180-th, intra frame)
(b)ParkScene( 1080p, 60-th, intra frame)
Fig. 5 R-λ Curve Fitting for of all the CTUs.
We categorize moving and non-moving regions according to the difference in luminance value between two adjacent frames. And we let the original luminance value of the (i, j ) in the k th frame. To avoid the influence of noise with high frequency,
I k (i, j ) goes through the low-pass filter (using a 3x3 mean filter), and I k' (i, j ) refers to the filtered pixel value of (i, j ) . The difference of the p -th CTU in the k -th frame and former inter-frame can be calculated by the following equation
Diff k ( p)
1 ML NL
|I
' k
(i, j ) I k 1' (i, j ) |
(6)
(i , j ) p
M L and N L denote the numbers of rows and columns in a CTU respectively, k represent frame number. Based on the
above difference, the CTUs in a frame can be divided into moving and non-moving regions as follows 1, MRk ( p) 0
Diff k ( p)
1 NCTU
NCTU
Diff ( x) k
x 1
(7)
Where NCTU is the total number of CTUs in a frame. When MRk ( p) equals to 1, the CTU belongs to the moving region. Otherwise, it belongs to the non-moving region. Then the CTUs in the non-moving regions are further divided into smooth regions and complex region, based on the complexity, as calculated by Eqn. (4). For a CTU in the non-moving regions, if its content complexity C p is smaller than the average complexity of the CTUs of the coding frame, the CTU is considered to belong to the smooth region. Otherwise, the CTU belongs to complex region. In all, based on the video content complexity and the characteristics, the CTUs in a frame can be categorized into moving region, complex region and smooth region, of which the bit rate allocation will be unequally treated to improve the video quality. An example of region division is shown in Fig. 6. With the application of the proposed method, the CTU well can be extracted
under the conditions of both a still and dynamic background. Besides, the complex region and moving region can also be effectively extracted with our method.
4.2 The Proposed Frame Level Rate Control 4.2.1 Inter-frame Correlation-based Frame Level Bit Allocation The coding complexity (because of motion, shapes as well as textures) should be first acquired so that QP for frame can be determined. It shows that the bit rates of inter frame is well allocated [19-20]; hence we only discuss the rate allocation in frame level and CTU level for intra frame. However, content complexity should be estimated for current intra frame because the content complexity can hardly obtained in coding process. Meanwhile, the estimation is based on the coded frames and the capacity of current buffer need to be taken into consideration. A new method of forecasting the complexity of current frame through the application of the harmonic mean of the real coding content complexity statistics acquired from former encoded frames is proposed. In this, the current buffer fullness is also considered during the allocation of target bit-rates in this paper so that the buffer overflow can be prevented. So the equation shown below is used to calculate the number of the target bits that are allocated for n -th intra-frame ( n 1 ), represented by RnI :
RnI RIntraAvg Wn 1 (1 BRn )
(8)
where RIntraAvg are average bits per intra period. And we use the method presented in [2] to obtain the initial QP. BRn , which represents the ratio of the fullness of current buffer, is calculated by the following equation: BRn Buffer / BufferSize
(9)
Buffer refers to the fullness of the current buffer, while BufferSize means the buffer size. In order to adjust the buffer fullness, the pre-bit-allocating-proportion should be decrease when the current buffer is going to full, and vice. Wn 1 , the proportion of
coded bits (proportion of the estimated complexity) [13] between intra and inter frames in previous intra period, is computed by the formula shown below: QP I _ act RnI _1act
Wn 1
n1
N
(
QP i 2
I _ act NI NI n 1 (i ) Rn1 (i )) QPn1
RnI _1act
(10)
In which QPnI_1act and RnI _1act are the QP and the real bits that are coded for the intra frame in (n 1) th intra period. QPnNI 1 (i ) and RnNI1 (i) respectively refer to the QP and the real bits that are coded for i th inter-frame in (n 1) th intra period.
(a)BlowingBubbles (416×240,72th) moving region
(b)BQMall
(c)Keiba
(c)FourPeople
(832×480,60th)
(832×480,180th)
(720p,60th)
complex region
smooth region
Fig.6 Results of the proposed region division method (listed frames are all intra frames).
4.2.2 Model Parameter Updating For accurately obtaining the overall bit rate, it is of great importance that the parameters should be updated at the level of frame during the encoding process. After encoding an intra-frame, we use the real bpp , real value and frame complexity ( W ) to update and of the model by (12) and (13). Different from the default model, the complexity is considered in model parameter updating so as to obtain more accurate model parameters. old ( n 1) I I comp (n) old (n 1) bppreal I
(11)
I I I I I new (n) old (n 1) I ln real (n 1) ln comp (n 1) old (n 1)
bpp real
I I I I new (n) old (n 1) I ln real (n 1) ln comp (n 1) ln(
Wn 1
(12) )
(13)
I (n) refers to real of a CTU that has been encoded of n -th intra-frame where bppreal is bits per pixel of intra frame, real I I I (n) is target of a CTU that has been encoded of n -th intra-frame ( n 1 ). old (n 1) and old (n 1) denote ( n 1 ), comp
I I the old parameters of CTU of (n 1) -th intra-frame ( n 1 ). new (n) and new (n) denote the updated parameters of CTU of
n -th intra-frame ( n 1 ). In our experiments, I and I are set to be 0.1 and 0.15, respectively. The proof that and can be updated by (12) and (13) is provided in Appendix A. Fig.7 shows results of the per frame in the proposed method as well as the HM16.0 algorithm in HEVC. According to the results, a smaller fluctuation can be achieved with the application of our method, which suggests a smaller fluctuation of QP, in comparison with HM16.0. It is shown in the figure that the quality between frames of our proposed algorithm is becoming more stable, especially between intra frame and inter frame.
(a)Blowing Bubbles
(b)Keiba
(c)Parkscene
(240p, 200 kbps, LP-Main)
(480p, 1000 kbps, RA)
(1080p, 8000 kbps, LB-Main)
Fig.7 Comparison between the updated of the HM16.0 and the proposed method
4.2.3 Complexity Estimation Model-based Regions Bit Allocation We can allocate the bit rate to each region easily after we acquire the bit rate of intra frame.
RnI R1 R2 R3
(14)
N1 Ravg1 N 2 Ravg 2 N3 Ravg 3
Ravg1 Ravg 2 Ravg 3
(15)
where RnI , R1 , R2 , R3 are the target bit counts allocated to the current intra-frame, moving region, complex region and smooth region respectively. K refers to K th region. N K represents the number of CTU in K th region, N1 , N 2 , N 3 denote the numbers of CTUs in moving region, complex region and smooth region respectively. Ravg1 , Ravg2 , Ravg 3 denote the optimal target bits of CTU for moving region, complex region and smooth region respectively. For the moving region, more bits should be allocated for the intra-frames, as reference of follow-up inter-frames, so that the later inter-frames quality can be guaranteed. Due to large prediction residues in the inter-frames, relatively fewer bits should be allocated for the smooth regions so that more left bits are allocated for the subsequent inter-frames, thus leading to higher quality encoding. Therefore, considering the proportional relation existing between intra-frame and inter-frame in current video sequence is of great necessity. and are used to control the rate allocation in different regions, the greater value the is, the less bit rate should allocate to complex region, and vice, the greater value the is, the less bit rate should allocate to smooth region, and vice. Two values should be moderate, too large or too small will cause quality fluctuation in different regions, traditional region-based methods are based on empirical value [31] [32], however, we use complexity acquired in the previous section to calculate the values of and dynamically. 2
1/ 2 (
max(
NK
C
K,p )
K 1 p 1 N2
C
,1)
(16)
, )
(17)
2, p
p 1 3
1/ 3 (
max(
NK
C
K 1 p 1 N3
C
3, p
p 1
K, p )
Where
N K represents the number of CTUs in K -region. CK , p stands for the complexity for the p th CTU of K th
region .
4.3 The Proposed CTU Level Rate Control 4.3.1 Complexity Estimation Model -based CTU Level Bit Allocation In R-lambda model, the uniform model for all CTUs might cause an unsmooth quality among CTUs of different regions. The content complexity of each CTU should be taken into account in CTU level bit allocation that must be dynamic adjusted according to the coded CTU buffer. In other words, decrease the bit when current buffer is going to full. In order to achieve the reduction of bit estimation error, we measure the content complexity of each CTU acquired in the previous section and allocate the bit rate for each CTU as follows i 1
(R
K , actp
RK ,i RK ,Re mBits wK ,i (1
p 1
RK
RK , p ) )
(18)
where RK ,i stands for the bits that are allocated for the current CTU of K th region , i refers to the current CTU( i 1 ). RK ,Re mBits denotes the bits that are left and applied to encoding the remaining CTUs in K th region , RK refers to target bit
of K -th region, RK ,actp represents the real bits of the p th CTU in the K th region, RK , p represents the target bits of the p th i 1
(R
K , actp
CTU in the K th region, (1
p 1
RK
RK , p ) ) is an adjustment term ,which is based on the accuracy of encoded CTU of
K th region. And the current CTU’s weight wK ,i is denoted by following equation
wK ,i
C K ,i NK
C
(19) K,p
p i
Once the bit of each CTU is obtained, the R-lambda model is employed to achieve the QP. Among the CTUs in the same region, some of them may not be next to one another spatially, even though they are sequentially encoded after all the same region CTUs is coded. Therefore, if the values of current QPCTU are clipped according to the QP value of the CTUs that have been coded only, it may lead to the outcome that the CTUs are encoded at very low or high qualities, especially those CTUs in the same region. For purpose of avoiding this phenomenon, the values of QPCTU are clipped by the proposed RC algorithm at the CTU level in a smaller extent, which is determined by the region’s average QP value: QP K QPCTU QP K
(20)
QP K presents the average QP of coded CTU in K th region. refers to adjustment coefficient. Similar to [19], here
equals to 2. 4.3.2 Model Parameter Updating For accurately obtaining the overall bit rate, it is of great importance that the parameters should be updated at the level of CTU
during the different region’s encoding process. Thus, RCTU the real bit rate along with the real CTU value of every CTU is used to update the model parameters at the CTU level. Particularly, the values of and are updated as the and values’ geometric mean of the CTUs that have already coded in own region. After every CTU is encoded, the update of these parameters is conducted as below:
K ,new K ,old (bppK ,act )
K ,old
(21)
K ,new K ,old ( InK ,real InK ,comp ) K ,old
(22)
bppK ,act
K ,new K ,old ( InK ,real InK ,comp ) In(
C K ,i1
)
(23)
where bppK ,act is bits per pixel of an encoded CTU of K th region , K , real refers to real of a CTU that has been encoded in K th region , K ,comp is target of a CTU that has been encoded in K th region . K ,old , K ,old denote the old parameters of CTU of K th region . K ,new , K ,new and K ,new denote the updated parameters of CTU of K th region . In our experiments, and are set to be 0.1 and 0.02, respectively. The proof that and can be updated by (22) and (23) is provided in Appendix B. We update model parameters in region can reduce the estimation error of model parameters. Description information refers to Appendix B. The actual model parameters ( and ) and predicted values were also examined. We selected several frames at random to demonstrate the effectiveness of our experiments. Fig.8 shows the variation of model parameters α and β according to the CTU number for the 141-th frame of “Keiba”, and the 180-th frame of “FourPeople” test sequences. The model parameter values do vary along the CTUs, and most fall within the estimated range. In effect, then, our scheme effectively estimates the complexity when working in an actual coding process; there is relatively little error between the estimation and the true model parameters.
(b) Alpha( )
(b) Beta( ) Keiba(144-th)
FourPeople(180-th)
Fig.8 Accuracy of Alpha and Beta (listed frames are all intra frames)
4.4 Summary of the Proposed Algorithm In our algorithm, improving the quality of an intra-frame is a vital factor in strengthening the coding efficiency for its following inter frames whose predictions are motion compensated from the intra frame in a direct or indirect way. There are eight major steps in the proposed rate control algorithm and they are concluded as below. It is noteworthy that some of the scheme’s steps (namely, GOP level bit allocation, QP computation) are the same as R-lambda model described in [19-20]. Symbols used in the proposed method are tabulated in Table 1. 1) Frame level bit allocation according to (8); Adjust the frame level curF . If curF is smaller than the previous frame preF , curF is clipped by min( preF e
4 3 , curF )
3
Otherwise, curF is clipped by min(curF , preF e 4 ) . 2) Region division according to (6) (7); 3) Region based bit allocation by (14) (15) (16) (17); 4) Allocation bit for each CTU according to equation (18) (19); 5) Calculation of QP for every CTU by (2); 6) Encoding and record of relevant encoding parameters; 7) Update of parameters as per equation (12) (13) (22) (23); 8) Going to Step 1 until the end of encoding. Symbol λ, ,
Table 1 Summary of notations Symbol Description , I ( i , j ) The original and filtered luminance k The parameters relevant with the ' value of pixel (i, j ) in the k th frame sequence’s RD features I (i, j ) Description
k
M L , NL
Wn 1 BRn
The numbers of rows and columns in a CTU The proportion of coded bits (proportion of the estimated complexity) of (n 1) th intra frame The ratio of the fullness of n th intra frame buffer
Diff k ( p) NCTU
The difference of the p -th CTU in the k -th frame and former inter-frame The total number of CTUs in a frame
QPnI_1act
The QP coded for the (n 1)th intra frame in former intra period
RnI _1act
The real bits coded for the (n 1)th intra frame in former intra period
QPnNI 1 (i )
The QP coded for i th inter-frame in (n 1)th intra period
RnNI1 (i)
The real bits that are coded for i th inter-frame in former intra period
RnI
The target bit counts allocated to the n th intra-frame
R1 , R2 , R3
Target bit allocated to moving region, complex region and smooth region
NK
The number of CTU in K th region
Optimal target bits of CTU for moving Ravg2 , Ravg 3 region, complex region, moving region
RK
Target bit of K th region
Ravg1 ,
RK , p , RK ,actp K ,old
, K ,old ,
K ,comp
K ,new , K ,new , K ,real
The target bits and real bits of the p th CTU in K th region The known K th region The new K th region
model
model
parameters
in
parameters
in
RK ,Re mBits
The bits that are left and applied to encoding the remaining CTUs in K th region
wK ,i
The current K th region
CTU’s
weight
in
C K ,i
The complexity of the current CTU in K th region
It should be noted that a few of rate control schemes based on complexity are proposed lately [4]-[7] [16] [25] [30] [33]-[37]. Nevertheless, there are some differences between the proposed complexity-based basic rate control and the traditional complexity-based rate control, which are concluded in six aspects as follows: Firstly, in this passage, we divided each intra frame into three different regions from the necessity of the intra frame during the coding procedure. Different bit rates are applied into different regions, by doing this; we can improve the quality of intra frame objectively and subjectively. And more importantly, we can also improve the quality of inter frame after we improve the quality of intra frame, that different from the former algorithms which are just focused on inter frame. In addition, since CTU quality of HEVC is not stable and presents regional differences, the intra frame is divided into different regions to obtain better quality. Secondly, despite of there are several relevant algorithms in measuring the intra frame complexity [4]-[7] [25] [33] [34], previous methods generally utilize a large amount of calculation to predict the complexity of intra frame and inter frame. For instance, [6] proposed to utilize gradient to measure the complexity, and entropy to measure the complexity of inter frame, which adds a large amount of extra complexity and is not practical for real-time video application. By contrast, our algorithm is efficient, simple and reasonable, which suitable for real-time transmission. Thirdly, the traditional methods only focusing on the complexity content of current frame in intra frame rate control, however, our method take in the content complexity of the former intra frame and all inter frames in former intra period when estimate the current intra frame complexity. Meanwhile, we also considered the value of former inter frame during the region division. So our algorithm appropriate for the intra frames under low delay main coding structure (LD) and random access coding structures (RA) configurations. Fourthly, in our previous work [30], we proposed a content-adaptive model coefficients estimation scheme for multi-dimensional rate control. Even though both of the previous article and this paper take content complexity into consideration, but the emphasis are quite different. More specifically, the previous article particularly focused on multi-dimensional rate control, which discussed how to obtain the best quality by joint consideration of both the frame rate and QP at a given rate. And the previous article aims at improving the problem about Q-R-TQ [30]. However, in this paper, we focus on the QP selection for intra frame in order to obtain a more stable quality by selecting the appropriate QP in frame level and CTU level. In addition, the definitions of content complexity in the previous article and this paper are not the same. The former one focuses on multi-dimensional rate control, and uses entropy and frame difference to define complexity. And this paper only adjusts QP, and defines complexity by considering the relationship between current frame and former frames. Fifthly, our algorithm is especially suitable for the occasions of high requirement for intra-frame and low bit rate application because we take into consideration the complexity of the encoded intra frame and the encoded inter frame for the intra frame rate control. More specifically, when using the default rate control of HEVC in the application of low bit rate, intra frame and inter frame rate fluctuation is particularly evident, leading to fluctuations in the quality. Our algorithm which aims to provide improved and smooth video quality, is developed by jointly taking into consideration the frame-level content complexity between the encoded intra frames and the encoded inter frames, so our algorithm is especially suitable for the low bit rate applications. Finally, in HEVC, although former algorithms have considered the intra frame rate control problems [25] [26], however, it is proposed for all intra coding structure (AI), and not suitable for LD and RA aspect. A rate control scheme is proposed for HEVC in this paper for the first time to easily solve this problem. And in [25] [26], the model parameter is updated by whole frame, while our algorithm is updated by small region.
5 Experimental Evaluations For experiments, our intra-frame rate control scheme is implemented based on the HM16.0 by replacing its intra-frame rate control scheme. Three similar schemes, namely the rate-rate control scheme used by HM16.0 [38] (labeled as “HM”), JCT-VCM0257 [21] (labeled as “M0257”), and the latest Intra-frame rate control scheme proposed by Wang and Karczewicz [26] (labeled as “[26]”), are adopted to compare with the proposed scheme under all intra coding structure. Wang and Karczewicz’another paper [27] (labeled as “[27]”) is mainly aimed at the inter frame. Therefore, we compare M0257, HM 16.0 and [27] to verify the effectiveness of our algorithm under the LD and RA coding structure. The performance of the proposed rate-control scheme for the intra frame is fully assessed from five aspects, including the R-D
Performance, quality evaluation of the frame level performance, quality evaluation of the CTU level performance as well as computational complexity analysis. For the experiments, CTU size is set 64×64 (32×32-sized CTU is used for the sequences of 416×240 size). We compared the rate control schemes under the three coding configurations which are LP-Main, LB-main and RA. As the HM default configuration, the length of intra period should be integer times of GOP. Three lengths of intra period are adopted to prove the algorithm validity. The length of intra period equals 12 under LP-main coding structure (GOP is set to 4, the same as default configuration), 20 under LB-main (GOP is set to 4, the same as default configuration), and 32 under RA coding structure (GOP is set to 8, the same as default configuration). The rest of the encoder settings is configured the same as default HM configuration [1]. More detail information is shown in Table 2. Table 2 Experimental Conditions Parameter
Value
Coding structure
AI, LP-main, LB-main, RA
CTU size
64×64 (32×32 for 416×240)
Intra period
12(LP-main),20(LB-main),32(RA)
Total number
200
SAO
On
RDO
On
Other
the same as default HM configuration
5.1 R-D Performance The R-D performances which are related to BD-PSNR and BD-Rate results are introduced for the comparison with JCTVC-M0257, HM16.0, Wang [26] and Wang [27] shown in the Table 3-7. The entire recommended test sequences from Class A-F in common test conditions (HEVC common test conditions) are used for testing [1]. Besides, to further verify the effectiveness of the algorithm, experimental results on 4K sequence and the 8K sequence (HD) are also added [39]. In the experimental process, we only listed the average value of each class when calculating the BD-PSNR values. Moreover, as denoted by the positive BD-PSNR and the negative BD-Rate, our algorithm can achieve better R-D performance. More specifically, up to 10.5% and on average 5.2% BD-Rate reduction was achieved compared to HM16.0 and up to 2.7% and an average of 2.0% BD-Rate reduction was achieved compared to state-of-the-art algorithm. As compared to the Wang method [26] [27], which yields the best results among the other three comparable schemes, the proposed scheme can obtain 0.12dB (AI), 0.28dB ((LP-Main, intra period 12), 0.27(LB-Main, intra period 20) and 0.253 dB (RA, intra period 32) in the average PSNR, respectively, as can be seen from Table 3-7. Compared to HM16.0, Wang [26] and Wang [27] have some improvements and corrections. Our algorithm is better than Wang [27] in giving full consideration to the complex relationship between intra frames and inters frames in bit allocation. According to the comparison, our method can also obtain favorable effect in all intra coding structure (AI). To demonstrate the advantage of our algorithm in both rate-quality performance and bit control accuracy, four R-D curves are shown in Fig. 9 for different coding configurations. It can be observed the proposed algorithm has much better R-D performance than the other there algorithms for both high bit rate and low bit rate.
(a)BlowingBubbles(416×240),LB-Main
(c)PartyScene(1080p), RA
(b)RaceHorses(832×480 ),LP-main
(d)BasketPass(416×240), RA
Fig.9 Comparison of RD performances between M0257, HM16.0, Wang [27] and the proposed method.
5.2 Quality Evaluation of the Proposed Frame Level Rate Control This section discusses the effectiveness of frame level rate control in two aspects which are the quality fluctuation and the fluctuation of buffer status. During the process of encoding, the variations of PSNR are also investigated. Due to high fluctuation of PSNR values may cause perceptual annoying to viewers, the fluctuation of PSNR values is also one of the important factors for rate control. The measurement shown below is adopted to assess the quality smoothness:
N
PV
T 1 PSNRk PSNRk 1 NT 1 k 2
(24)
where PSNRk stands for the k -th frame’s peak SNR (PSNR) and NT represents the coded frames’ number in a video sequence. Generally speaking, the smoothness of the quality will be increased as the value increases. From Table 3-7, it can be observed that compared to other algorithms, a smoother quality can be achieved by the algorithm proposed in this paper. Meanwhile, in comparison with the other three methods, the PSNR’s standard deviation value in the proposed rate control scheme is smaller. The comparison between the PSNR of the two algorithms for sequences "Blowing Bubbles (416×240, 200kbps)" "Keiba (832×480, 1000 kbps)"and "PartyScene (1080p, 8000kbps)" is shown in Fig.10. It can be
seen from this figure that a much smaller PSNR variation can be achieved by our algorithm, due to which the objective video qualities can be improved. And it can be seen from the experiment that the quality of our algorithm can be kept relatively stable between frames, especially between intra frame and inter frame. Moreover, in terms of quality stability, our method is more excellent than other state-of-the-art schemes. For purpose of making a comparison between the flickering artifacts of the proposed rate control algorithm and the other three rate control algorithms in HEVC, the metric proposed in [40] which takes human eyes’ tracking features into consideration is also adopted. The big value of the flicker measurement indicates a serious fluctuation of the perceptual quality and implies a serious video flicker as well.
FM
{SSD(P(k 1, p) P(k , p), O(k 1, p) O(k , p))}
AVG
(25)
k , p( SSD(O(k 1, p),O(k , p )) )
where SSD denotes the sum of squared difference, O(k , p) denotes the p th CTU’s initial pixel value in k -th frame, P(k , p) denotes the reconstructed p th CTU’s initial pixel value in k th frame. is a threshold, more brief details can refer to reference[40]. Fig.11 presents the frame-by-frame flicker measurement comparison. From the experimental results, it can be known that our algorithm performs significantly better than other three algorithms.
(a)Blowing Bubbles
(b)Keiba
(c)PartyScene
(416×240,200 kbps, LP-Main)
(832×480,1000 kbps, RA)
(1080p, 8000 kbps, LB-Main)
Fig.10 PSNR comparison of the related RC algorithms.
Fig.12 shows buffer status levels per frame for the rate control scheme proposed in this paper and other rate control schemes for the FourPeople sequence (720p) at QP is set to 32, the Blowing Bubbles sequence (416×240) at a target bit rate of 200kbp/s, Keiba (832×480) sequence at a target bitrate of 1000kbp/s and PartyScene (1080p) sequence at a target bitrate of 8000kbp/s. It can be seen that compared with other schemes, our scheme has a far smaller variation of buffer fullness. Thus, it is known that when our method is adopted, the operating state of its encoder buffer is safer than other methods, that is to say, neither underflow nor overflow will occur. Therefore, the rate control scheme proposed in this paper performs better than other the-state-of-the-art methods.
VS M0257
VS HM16.0 (a)LP-Main
VS [27]
VS M0257
VS HM16.0
VS [27]
(a)LB-Main
VS M0257
VS HM16.0
VS [27]
(b)RA Fig.11 Flicker Measurement for each frame. (Blowing Bubbles, 416×240, 200kbps, Flicker Measurement is calculate by(25),the greater value the greater flicker)
(a)FourPeople (720p, QP:32, AI)
(b)Blowing Bubbles (416×240,200 kbps, LP-Main)
(b)Keiba
(c)PartyScene
(832×480,1000 kbps, RA)
(1080p, 8000 kbps, LB-Main)
Fig.12 Buffer fullness comparison versus frame number. Table 3 Performance comparison of the proposed scheme with state-of-the-art rate control schemes (AI, QP=32) Proposed vs.M0257 Class
Proposed vs.HM16.0
Proposed vs. Wang[26]
BD-Rat e (%)
BD-PSN R
PV
BD-Rate (%)
BD-PS NR
PV
BD-Rate (%)
BD-PS NR
PV
Class A
-2.6
0.29
-0.22
-2.5
0.28
-0.20
-1.6
0.14
-0.05
Class B
-2.3
0.26
-0.11
-2.2
0.25
-0.10
-1.5
0.13
-0.06
Class C
-1.8
0.16
-0.14
-1.7
0.15
-0.13
-0.9
0.08
-0.05
Class D
-1.6
0.14
-0.46
-1.5
0.13
-0.45
-0.2
0.02
-0.16
Class E
-2.5
0.28
-0.16
-2.4
0.27
-0.15
-1.5
0.14
-0.07
Class F
-2.8
0.34
-0.18
-2.7
0.33
-0.17
-1.7
0.15
-0.08
4k
-2.9
0.36
-0.23
-2.8
0.34
-0.21
-1.8
0.16
-0.07
8k
-3.1
0.38
-0.25
-3.0
0.37
-0.24
-2.0
0.19
-0.06
Avg
-2.45
0.276
-0.219
-2.35
0.265
-0.206
-1.40
0.126
-0.075
Table 4 Performance comparison for the proposed scheme with state-of-the-art rate control schemes (AI, QP=44) Proposed vs.M0257 Class
BD-Rat e (%)
BD-PSN R
Class A
-2.5
0.28
Class B
-2.3
Class C
Proposed vs.HM16.0 BD-Rate (%)
BD-PS NR
-0.21
-2.4
0.27
0.26
-0.09
-2.1
-1.6
0.15
-0.13
Class D
-1.5
0.14
Class E
-2.5
Class F
Proposed vs. Wang[26] BD-Rate (%)
BD-PS NR
-0.20
-1.6
0.14
-0.06
0.24
-0.08
-1.4
0.13
-0.07
-1.5
0.13
-0.11
-0.8
0.07
-0.05
-0.47
-1.4
0.11
-0.46
-0.2
0.02
-0.17
0.28
-0.17
-2.4
0.27
-0.16
-1.5
0.14
-0.08
-2.7
0.33
-0.19
-2.6
0.31
-0.18
-1.5
0.14
-0.08
4k
-2.8
0.35
-0.22
-2.7
0.34
-0.21
-1.8
0.16
-0.07
8k
-3.0
0.37
-0.20
-2.9
0.36
-0.19
-1.9
0.17
-0.05
Avg
-2.36
0.270
-0.210
-2.25
0.254
-0.199
-1.34
0.121
-0.079
PV
PV
PV
Table 5 Performance comparison for the proposed scheme with state-of-the-art rate control schemes (LP-Main, intra period=12) Class
Proposed vs.M0257
Proposed vs.HM16.0
Proposed vs. Wang[27]
BD-Rat e (%)
BD-PSN R
BD-Rate (%)
BD-PS NR
BD-Rate (%)
BD-PS NR
Class A
-10.3
0.55
-0.501
-9.9
0.51
-0.498
-2.5
0.29
-0.241
Class B
-9.0
0.52
-0.486
-8.8
0.50
-0.474
-2.4
0.27
-0.221
Class C
-4.6
0.42
-0.405
-4.4
0.38
-0.398
-2.0
0.20
-0.206
Class D
-2.6
0.33
-0.294
-2.5
0.31
-0.287
-1.9
0.18
-0.237
Class E
-6.8
0.47
-0.456
-6.6
0.45
-0.452
-2.7
0.33
-0.246
Class F
-5.3
0.46
-0.393
-5.1
0.44
-0.388
-2.6
0.32
-0.251
4k
-10.5
0.57
-0.497
-10.4
0.56
-0.486
-2.6
0.32
-0.246
8k
-10.7
0.59
-0.504
-10.5
0.57
-0.491
-2.7
0.33
-0.255
Avg
-7.48
0.489
-0.442
-7.28
0.465
-0.434
-2.43
0.280
-0.238
PV
PV
PV
Table 6 Performance comparison for the proposed scheme with state-of-the-art rate control schemes (LB -main, intra period=20) Proposed vs.M0257 Class
BD-Rat e (%)
BD-PSN R
Class A
-10.2
0.53
Class B
-8.8
Class C
Proposed vs.HM16.0 BD-Rate (%)
BD-PS NR
-0.492
-9.7
0.49
0.51
-0.477
-8.6
-4.4
0.41
-0.375
Class D
-2.5
0.31
Class E
-6.6
Class F 4k
Proposed vs. Wang[27] BD-Rate (%)
BD-PS NR
-0.487
-2.4
0.28
-0.237
0.48
-0.445
-2.3
0.26
-0.216
-4.3
0.37
-0.373
-1.9
0.18
-0.196
-0.271
-2.3
0.28
-0.269
-1.8
0.16
-0.223
0.45
-0.403
-6.5
0.43
-0.386
-2.6
0.32
-0.236
-5.1
0.44
-0.375
-4.9
0.43
-0.373
-2.5
0.31
-0.243
-10.4
0.56
-0.482
-10.3
0.55
-0.473
-2.5
0.31
-0.237
8k
-10.6
0.58
-0.492
-10.4
0.56
-0.481
-2.6
0.32
-0.247
Avg
-7.33
0.474
-0.421
-7.13
0.449
-0.411
-2.33
0.270
-0.230
PV
PV
PV
Table 7 Performance comparison for the proposed scheme with state-of-the-art rate control schemes (RA, intra period=32) Proposed vs.M0257 Class
BD-Rat e (%)
BD-PSN R
Class A
-9.7
0.51
Class B
-8.6
Class C
-4.2
Class D
Proposed vs.HM16.0 BD-Rate (%)
BD-PS NR
-0.487
-9.4
0.47
0.49
-0.469
-8.5
0.38
-0.365
-4.1
-2.2
0.27
-0.264
Class E
-6.5
0.44
Class F
-4.9
4k
Proposed vs. Wang[27] BD-Rate (%)
BD-PS NR
-0.478
-2.3
0.26
-0.230
0.46
-0.455
-2.2
0.25
-0.208
0.36
-0.363
-1.8
0.16
-0.191
-2.0
0.26
-0.261
-1.7
0.15
-0.217
-0.396
-6.5
0.41
-0.377
-2.5
0.31
-0.229
0.41
-0.372
-4.8
0.40
-0.362
-2.4
0.28
-0.236
-10.2
0.54
-0.466
-10.1
0.53
-0.441
-2.4
0.30
-0.221
8k
-10.5
0.57
-0.502
-10.3
0.55
-0.472
-2.5
0.31
-0.233
Avg
-7.10
0.45
-0.415
-6.96
0.430
-0.401
-2.23
0.253
-0.221
PV
PV
PV
5.3 Quality Evaluation of the Proposed CTU Level Rate Control This section discusses the effectiveness of CTU level rate control in two aspects which are subjective visual effect and the variation of CTU PSNR.
A few of quality comparisons are presented in Fig.13-Fig.15. Fig.13 are the initial pictures, as the quality varies in the video sequence after coding in HEVC and since different methods get different results on a certain frame, we choose representative frames both in forepart and rear part of the video sequence to observe the outcome of each method to demonstrate the effectiveness of our method. In Fig.14 -Fig.15, the first column pictures are inter frames, and the second column pictures are intra frames. The experiment result shows that Wang [27] is better than JCT-VC M0257 and HM16.0 both in forepart and rear part of the video sequence. Our method is better than Wang [27]. Despite of the acceptable visual quality of the wall in Wang [27], the visual quality of the child’s face is quite unsatisfactory. In contrast, the pictures are the reconstructed by our algorithm has a far better visual quality. The proposed rate control scheme shows great effectiveness, particularly in the test sequences with diverse types of region that has quite advantageous coding structure of HEVC. Meanwhile, it can also be easily seen that the smoothness between intra frame and its neighboring frame can be maintained effectively by ours.
59th
60th
179th
180st
Fig.13 The initial pictures
59th (I frame)
60st (P frame)
59th (I frame)
(a)M0257
59th (I frame)
60st (P frame) (c)Wang [27]
60st (P frame) (b)HM16.0
59th (I frame)
60st (P frame) (d)Our
Fig.14 Subjective Quality (at the front of Blowing Bubbles, 416×240, Intra period=12, 160kbps).
179th (I frame)
180st (P frame)
179th (I frame)
(a)M0257
179th (I frame)
180st (P frame) (b)HM16.0
180st (P frame)
179th (I frame)
(c)Wang [27]
180st (P frame) (d)Our
Fig.15 Subjective Quality (at the back of Blowing Bubbles, 416×240, Intra period=12, 160kbps). In order to verify the effectiveness of CTU level rate control, we select two representative frames to analysis and output the PSNR of CTU in each frame. In Fig. 16, the variation of PSNR is introduced based on the CTU number for the “PartyScene (832×480)” and “Basketball Pass (416×240)”, from Fig. 16, we can see that, HM16.0 rate control algorithm can generate high PSNR in the beginning of a frame and poor PSNR in the ending of the frame. Nevertheless, the proposed algorithm can generate slightly high PSNR. We can also be seen from the Fig. 16 that compared with other methods, not only can we get better quality, but also get more stable quality. According to the experimental results, although Wang [27] can get better effect in the CTU level, it is not effective in the frame level. Our method can achieve excellent results in both the frame level and the CTU level.
79-th(inter frame)
80-th(intra frame)
(a)BasketballPass(416×240, 200kbps)
47-th(inter frame)
48-th(intra frame)
(b)PartyScene(832×480, 800kbps)
Fig.16 The variation of CTU PSNR within different frames.
5.4 Computational Complexity Analysis Fig.17 shows the comparison between the computational complexity of the proposed rate control scheme and the other three algorithms in HEVC, where T is calculated as
T
Tpro Torg Torg
100%
(26)
where Torg and Tpro respectively represents the cumulative coding time of the other three algorithm and the proposed rate control scheme. According to Fig.17, a conclusion can be made that the complexity of the algorithm proposed in this passage is a little complicated compared to JCT-VC M0257, HM16.0, and Wang [26] and Wang [27]. In our method, we allocate the bit rate of intra frame based on the complexity as well as the current buffer capacity. Meanwhile, we divided the intra frame into different regions according to the importance of each region, also updates related model parameters. However, the complexity overhead of our algorithm is ignored.
6 Conclusions and Discussions In this paper, an intra-frame rate control algorithm, which aims to provide improved and smooth video quality, is developed in this paper by jointly taking into consideration the frame-level content complexity between the encoded intra and inter frames, as well as the CTU-level content complexity among different CTUs in texture-different regions for intra-frame. Firstly, a frame-level content complexity based bit-allocation-balancing technique, by jointly considering the inter-frame correlation between intra frame and previous encoded inter frame, is brought up to achieve a smooth visual quality between inter- and intra-frames. Secondly, a region-based intra-frame rate-control scheme is brought up and a new prediction measure of complexity for CTUs of intra-frames by jointly considering the inter-frame correlation is proposed so that the bit estimation error and the PSNR fluctuation among CTUs can be reduced. In addition, rate control accuracy is gained by updates the related model parameters during the encoding procedure. The experimental results demonstrate that the proposed scheme can achieve a higher coding performance over the-state-of-the-art schemes. And compared with the state-of-the-art schemes for HEVC, the quality of video can be improved, a superior performance in enhancing the smoothness of quality can be achieved. Note that the proposed scheme is designed to maintain the stable quality among intra-frames and cannot be directly applied to the rate control for inter-frame which will be one of our future works.
(a)AI(QP:32)
(b)LP-main(Intra period=12)
(b)LP-main(Intra period=20)
(c)RA(Intra period=32)
Fig.17 Encoding computation comparison of proposed scheme with other there algorithms.
Appendix A I I I The old , old , bppreal , and real are known variables. According to (3) (10) and [19], after the In operation
bpp I Incomp In In( real
W
bpp ) ' In( real
W
)
(A.1)
The squared error between calculated and the real is I I e2 ( Inreal Incomp )2
e2 '
e2 I Incomp
I Incomp
'
(A.2)
I I 2( Inreal Incomp )
I InIncomp e2 e2 bpp I I 2( Inreal Incomp ) In( real ) I W Incomp
(A.3)
(A.4)
According to adaptive Least Mean Square (LMS) method with iteration only once and (A.3), After Taylor’s expansion and ignore high-order items, I ' I ' I I new old (2( In real Incomp )) I ' I I old 2 ( In real In comp )
(A.5)
I ' I I old I ( In real In comp )
According to (A.1), we have I I I I In new Inold I ( Inreal Incomp )
(A.6)
Thus, I I I ( Inreal Incomp )
I new e
After Taylor’s expansion and ignore high-order items
e Inold I
(A.7)
I I I ( Inreal Incomp )
I new e
e Inold I
(A.8)
I I I I new old (1 I ( In real Incomp ))
(A.9)
I I I I old I ( In real Incomp ) old
It is similar to the updating of β, according to LMS method and (A.4), bpp I I I I new old (2( In real Incomp ) In( real )) W bpp I I I old 2 ( In real Incomp ) In( real
W
bpp I I I old I ( In real Incomp ) In( real
W
)
(A.10)
)
Let n represent n -th intra-frame ( n 1 ), we have
I I I I I new (n) old (n 1) I ln real (n 1) ln comp (n 1) old (n 1)
bpp real
I I I I new (n) old (n 1) I ln real (n 1) ln comp (n 1) ln(
Wn 1
(A.11) )
(A.12)
Appendix B According to (3) and [19], we have In new Inold ( Inreal Incomp )
(B.1)
Thus, ( Inreal Incomp )
new e
eInold
(B.2)
Different from appendix A, after Taylor’s expansion. ( Inreal Incomp )
new e
' ' ' ' new old ( new old )
e2 '
Inopt
(B.3)
' ' ' ' ( new old ) 2 2 (e 2 ) ( new old ) n n (e 2 ) ' ' ... o(( new old )n ) 2 n! ( ' )2 ( ' ) n
(Error term)
E(Estimated Item) ' ' ' old ( new old )
e
e2 '
(B.4)
On the basis of these theories, the Lagrange mean value theorem and the finite increment theorem, it can be draw the conclusion ' ' ' ' old ) →0, which mean that the new as follow. When the ( new → old ,
' ' old ) ,by minimizing tend to 0. ( new
the estimation error of model parameters errors , we can be aware from Fig.5 that the model parameters of the same region are more easily to gathered together. So we update model parameters in region can reduce the estimation error of model parameters. Let bppK ,act refers to the bits per pixel of an encoded CTU of K th region , K , real refers to real of a CTU that has been
encoded in K th region , K ,comp denotes target of a CTU that has been encoded in K th region . K ,old , K ,old denote the old parameters of CTU of K th region . K ,new , K ,new and K ,new denote the updated parameters of CTU of K th region . It is similar to the Appendix A, we have
K ,new K ,old ( InK ,real InK ,comp ) K ,old
(B.5)
K ,new K ,old ( InK ,real InK ,comp ) In(bppreal C
K ,i 1
)
(B.6)
Reference
[1] G.J. Sullivan, J.Ohm, W.Han, T.Wiegand, Overview of the high efficiency video coding (HEVC) standard, IEEE Trans .Circuits Syst. Video Technology, 22(12)(2012)1649–1668. [2] M.-C. Chien, R.-J. Wang, C.-H .Chiu, and P.-C. Chang, Quality Driven Frame Rate Optimization for Rate Constrained Video Encoding, IEEE Transactions on Broadcasting, 58(2) (2012) 200-208. [3] Chenggang Yan, Yongdong Zhang, Xu Jizheng, Feng Dai, Jun Zhang, Qiong hai Dai and Wu Feng, Efficient parallel framework for HEVC motion estimation on many-core processors, IEEE Trans. Circ. Syst. Video Technology, 24 (12)(2014)2077-2089. [4] B. Yan, M. Wang, Adaptive distortion-based intra-rate estimation for H. 264/AVC rate control, IEEE Signal Process. Lett, 16(3) (2009)145–148. [5] Yanwei Liu, Qingming Huang, Siwei Ma, Debin Zhao, and Wen GAO,A Novel Rate Control Technique for Multiview Video plus Depth based 3D Video Coding, IEEE Transactions on Broadcasting, 57(2) (2011)562-571. [6] X. Jing,L.-P. Chau and W.-C. Siu, Frame complexity-based rate quantization model for H. 264/AVC intra frame rate control, IEEE Signal Process. Lett, 15 (2008)373–376. [7] Siwei Ma, Wen Gao, Feng Wu,Yan Lu, Rate Control for JVT Video Coding Scheme with HRD Considerations, The 2003 IEEE International Conference on Image and Processing, ICIP(2003)793-796. [8] Hongkai Xiong, J. Sun, S. Yu, J. Zhou, C. Chen, Rate Control for Real-Time Video Network Transmission on End-To-End Rate-Distortion and Application-Oriented QoS, IEEE Trans. Broadcasting, 51(1) (2005)122-132. [9] F. Bossen, Common test conditions and software reference configurations, Joint Collaborative Team on Video Coding, Document: JCT-VC G1200, Nov. 2011. [10] M. Wang and B. Yan, Lagrangian multiplier based joint three-layer rate control for H. 264/AVC, IEEE Signal Process. Lett, 16(8) (2009) 679-682. [11] Weiyao Lin, K. Panusopone, D. Baylon, M.-T. Sun, Z. Chen, H. Li, A fast sub-pixel motion estimation algorithm for H.264/AVC video coding, IEEE Trans. Circuits and Systems for Video Technology, 21( 2)(2011) 237-242. [12]J Si,S Ma,X Zhang,W Gao, Adaptive rate control for High Efficiency Video Coding, IEEE Visual Communications & Image Processing, 42(4)(2012)1-6. [13] Sudeng Hu, Hanli Wang, Sam Kwong, Adaptive Quantization Parameter Clip Scheme for Smooth Quality in H.264/AVC, IEEE Trans. on Image Processing, 21(4)( 2011)1911-1919. [14]Z. G. Li, W. Gao, F. Pan, S. W. Ma, K. P. Lim, G. N. Feng, X. Lin,S. Rahardja, H. Q. Lu, and Y. Lu, Adaptive rate control for H.264,J. Visual Comm. Image Representation, 17( 2)(2006)376-406. [15] Weiyao Lin, Ming-Ting Sun, Radha Poovendran, Zhen you Zhang, Activity recognition using a combination of category components and local models for video surveillance, IEEE Trans. Circuits and Systems for Video Technology, 18(8)(2008) 1128-1139. [16] W. Lin, K. Panusopone, D. Baylon, M.-T. Sun, A computation control motion estimation method for complexity scalable video coding, IEEE Trans. Circuits and Systems for Video Technology, 20(11) (2010) 1533-1543.
[17] H. Choi, J.Yoo,J. Nam,D.Sim, I.Bajic, Pixel-wise unified rate- quantization model for multi-level rate control, IEEE J. Sel. Top. Signal Process, 7 (6) (2013)1112-1123. [18] X. Liang, Q. Wang, Y. Zhou, B. Luo, and A. Men, A novel RQ model based rate control scheme in HEVC, Vis. Commun. Image Process (2013) 1-6. [19] B. Li, H. Li, L. Li, J. Zhang, Rate control by R-lambda model for HEVC, Document: JCTVC-K0103, Joint Collaborative Team on Video Coding. [20] B. Li, H. Li, L. Li, J. Zhang, λ domain based rate control for high efficiency video coding, IEEE Trans. Image Process, 23 (9) (2014) 3841–3854. [21] X. Wang, M. Karczewicz, Intra frame rate control based on SATD, Document: JCTVC-M0257, Joint Collaborative Team on Video Coding. [22] Hosking, Brett, et al. An adaptive resolution rate control method for intra coding in HEVC, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016) 1486-1490. [23] B. Li, H. Li, L. Li, Adaptive bit allocation for R-λ model rate control in HM, Document: JCTVC-M0036, Joint Collaborative Team on Video Coding. [24] B. Lee, M. Kim, Q. Truong, A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for High Efficiency Video Coding, IEEE Trans. Circuits Syst. Video Technology, 24(3)(2014) 465-479. [25] Y. Zhou, L. Tian, and X. Ning, Intra frame constant rate control scheme for high efficiency video coding, in IEEE Int. Conf. Computing, Networking and Communications (ICNC) (2013)648-652. [26] M. Wang, K.N. Ngan and H. Li, An Efficient Frame-content based Intra Frame Rate Control for High Efficiency Video Coding, IEEE Signal Processing Letters, 22(7) (2015) 896-900. [27] M. Wang, K. N. Ngan, and H. Li, Low-delay Rate Control for Consistent Quality Using Distortion-based Lagrange Multiplier, IEEE Transactions on Image Processing, 25(7)(2016) 2943-2955. [28] M. Wang, K. N. Ngan, H. Li and H. Zeng, Improved Block Level Adaptive Quantization for High Efficiency Video Coding,(ISCAS2015), IEEE International Symposium on Circuits and Systems (ISCAS) (2015) 509-512. [29] S Li, M Xu,X Deng, Z Wang, Weight-based R-λ rate control for perceptual HEVC coding on conversational videos, Signal Processing Image Communication, 38(C)( 2015)127-140. [30] M. Zhou, B. Li, Y. Zhang, Content-adaptive Parameters Estimation for Multi-dimensional Rate Control, Journal of Visual Communication and Image Representation, 34 (C) (2015) 204-218. [31] H.M. Hu, B. Li, W. Lin, W. Li and M.T. Sun, Region-Based Rate Control for H.264/AVC for Low Bit-rate Applications, IEEE Transaction on Circuits Systems and Video Technology, 22(11) (2012) 1564-1576. [32] Y. Liu, Z. G. Li, and Y. C. Soh, Region-of-interest based resource allocation for conversational video communication of H.264/AVC, IEEE Trans. Circuits Syst. Video Technology, 18(1)(2008) 134–139. [33] J. Liu, Y. Cho, Z. Guo, and C. C. J. Kuo, Bit allocation for spatial scalability coding of H.264/SVC with dependent rate-distortion analysis, IEEE Trans. Circuits Syst. Video Technology, 20(7) (2010)967-981. [34] JING Xuan, CHAU L P, SIU W C, Frame complexity based rate-quantization model for H.264/AVC intra frame rate control[J], IEEE Signal Processing Letters (2008)(15)373- 376. [35] W. Lin, M.-T. Sun, H. Li, Z. Chen, W. Li, B. Zhou, Macroblock classification for video applications involving motions, IEEE Trans. Broadcasting, 58(1)(2012)34-46. [36] W. Lin, Y. Mi, W. Wang, J. Wu, J. Wang, T. Mei, A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes, IEEE Trans. Image Processing, 25(4)(2016) 1674-1687. [37] Hu H M, Li B, Lin W, A rate-control algorithm using inter-layer information for H. 264/SVC for low-delay applications , J. Visual Comm. Image Representation, 22(6) (2011)504-515. [38] HM Reference Software 16.0 [Online], Available: http:// hevc. hhi. fraunh ofer. de /svn/ svn_HEVC Soft ware. [39] F. Bossen, Common HM Test Conditions and Software Reference Configurations (JCTVC-L1100), Document: JCT-VC, 2013. [40] Xiaopeng Fan, Wen Gao, Yan Lu, Debin Zhao, Flicking reduction in all intra frame coding, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG 5th Meeting, Document: JVT-E070, Geneva, Switzerland, Oct. 2002
Highlights 1. An intra-frame rate control algorithm by jointing inter-frame correlation is developed. 2. A new prediction measure of content complexity for CTUs of intra-frame is proposed. 3. A frame-level complexity-based bit-allocation-balancing method is brought up. 4. A new region-division and complexity-based CTU-level bit allocation method is developed. 5. Experimental results demonstrate that our proposed algorithms have good performance.