J. Vis. Commun. Image R. 58 (2019) 462–476
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
Fast prediction for quality scalability of High Efficiency Video Coding Scalable Extension q Chih-Hsuan Yeh a, Jie-Ru Lin a, Mei-Juan Chen a, Chia-Hung Yeh b,c,⇑, Cheng-An Lee a, Kuang-Han Tai a a
Department of Electrical Engineering, National Dong Hwa University, Taiwan, ROC Department of Electrical Engineering, National Taiwan Normal University, Taiwan, ROC c Department of Electrical Engineering, National Sun Yat-sen University, Taiwan, ROC b
a r t i c l e
i n f o
Article history: Received 23 October 2017 Revised 16 August 2018 Accepted 8 December 2018 Available online 10 December 2018 Keywords: High Efficiency Video Coding Scalable Extension SHVC Quality scalability Fast decision Inter-layer prediction
a b s t r a c t In response to the increased demand for high-resolution video, the new generation of video standards, High Efficiency Video Coding (HEVC) and its scalable extension (SHVC) have been finalized. The compression of HEVC/SHVC is efficiently improved and supports ultra-high resolution (UHD). Therefore, the coding complexity of HEVC/SHVC is much higher than those of previous standards. The framework of SHVC is based on HEVC and is divided into several types of scalable video. SHVC can be decoded into various video resolutions, frame rates and qualities, and only needs to be encoded once, but with higher complexity than HEVC. Thus, how to reduce the coding complexity of SHVC is the purpose of this paper. Our proposed algorithm accelerates the enhancement layer (EL) prediction by utilizing encoded Coding Unit (CU) sizes, prediction modes, motion vectors and Rate-Distortion Costs (RD-Costs) of the base layer (BL) and encoded CU sizes of the enhancement layer for quality scalability of SHVC. Experimental results show that the proposed algorithm can save lots of time while maintaining good video quality, and the performance is better than those of previous works. Ó 2018 Elsevier Inc. All rights reserved.
1. Introduction With increased demand for high-resolution video, a new generation video standard known as High Efficiency Video Coding (HEVC) [1,2] was finally realized in 2013. The compression efficiency of HEVC is nearly doubled than that of H.264/AVC (Advanced Video Coding) [3,4], and HEVC can support 4K 2K and up to 8K 4K ultra-high resolution (UHD). Therefore, the coding complexity of HEVC is considerably higher than those of previous standards. In addition, High Efficiency Video Coding Scalable Extension (SHVC) [5,6] was completed in 2014. SHVC, an extension based on HEVC, makes it possible for videos to be encoded directly into multi-layer bitstream, such as various video resolutions, frame rates or qualities. SHVC includes one base layer (BL) and several enhancement layers (ELs) and is able to provide spatial, temporal and quality scalabilities by exploiting the BL and the ELs [7]. Considering the bandwidth status or decoder equipment, the bitstream
q
This paper has been recommended for acceptance by Zicheng Liu.
⇑ Corresponding author.
E-mail addresses:
[email protected] (C.-H. Yeh), 810523001@gms. ndhu.edu.tw (J.-R. Lin),
[email protected] (M.-J. Chen),
[email protected] (C.-H. Yeh),
[email protected] (C.-A. Lee),
[email protected]. tw (K.-H. Tai). https://doi.org/10.1016/j.jvcir.2018.12.021 1047-3203/Ó 2018 Elsevier Inc. All rights reserved.
layers will be transmitted selectively over IP network as illustrated in Fig. 1. The advanced coding tools help to upgrade the coding efficiency. However, they are at the cost of the additional computational complexity. Many fast coding strategies have been developed for HEVC. In Ref. [8], the importance of thirteen neighboring coding tree units (CTUs) are checked and classified into three cases by K-means method to estimate the best Coding Unit (CU) depth level. Besides, the fast Prediction Unit (PU) decision is proposed to skip the PU modes which are seldom used. Ref. [9] makes use of the values of rough mode cost (RMC) to speed up the CU as well as Transform Unit (TU) partition depth. Moreover, the PU modes with higher RMC are removed from the candidate list and thus decrease the coding time of HEVC encoder under all-intra configuration. In addition, several studies have also been presented to reduce the coding complexity and coding time of SHVC. Ref. [10] utilizes the motion vector correlation between the BL and the EL to reduce the unnecessary search range. Ref. [11] obtains the maximum depth of the CTU of the co-located BL as a threshold; the encoding process is early terminated if the current CU depth is greater than the threshold. Ref. [12] uses the information of the Skip mode to omit the unlikely prediction modes. Ref. [13] tries to combine machine learning and a Bayesian
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
463
Fig. 1. The illustration of selective multi-layer transmission in SHVC.
approach to reduce the complexity of SHVC. In Ref. [14], three ways to speed up the encoding process are proposed. The first way involves directly copying the CU split flags of the BL to the EL, so the Rate-Distortion Optimization (RDO) of the EL has to be performed only on one depth. This can reduce the EL encoding time significantly. In the second method, intra prediction is omitted in the EL because it is concluded that the inter-layer reference mode of SHVC is better than that of the intra prediction. In the third method, the PU of the EL is not split in an orthogonal direction, when the co-located BL is split horizontally or vertically. Ref. [15] separates the CTU partition methods into 18 kinds of labels. By using the CTU messages coded before and Bayes’ rule, those CTUs, whose structures are similar to the current CTU, might be tagged, and the current CTU will be coded with the similar partition.
Fig. 3. The illustration of inter-layer motion prediction.
Fig. 2. The illustration of inter-layer texture prediction (ILTP).
Ref. [16] explores the correlations of the CU depth and intra modes between BL and EL to accelerate the mode decision procedure in EL for all intra spatial scalability in SHVC. By acquiring the information of RD-Cost and CU size in BL, Ref. [17] defines the inter layer reference prediction (ILRP) as a new mode to reduce the coding complexity of intra coding in EL. Ref. [18] proposes a hybrid complexity reduction strategy in EL for spatial scalability in SHVC by combining the quad-tree based and layer-information based prediction. Ref. [19] builds a probabilistic model for all the available modes in EL. Afterward, an online-learning-based fast mode assigning (FMA) method is utilized to predict the mode in EL. By the sample-based weighted prediction for EL coding (SELC), Ref. [20] presents a low-complexity EL compression scheme for lossless SHVC. Ref. [21] proposes a content adaptive early termination scheme by using inter-layer information for quality scalable HEVC. Ref. [22] early determines the best CU depth and prediction mode in EL for SHVC intra coding by considering spatial and inter-layer RD relationship. However, most of the works discussed above only consider the correlation between the BL and the EL but neglect the information from neighboring CTUs of the encoding CTU in the EL. In this paper, we propose a fast algorithm for quality scalability of SHVC. Our fast algorithm consists of four parts. In the first part,
464
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
Fig. 4. The implementation of PUs in the encoder of SHVC reference software (SHM).
Table 1 Probability distributions (%) of CU depth between BL and EL. DepthBL DepthEL
0
1
2
3
0 1 2 3
27.95 11.90 3.61 0.81
2.28 16.02 8.45 2.62
0.39 3.32 10.44 4.58
0.06 0.76 2.36 4.46
our proposed algorithm uses CU depths from the EL and the BL to determine the CU depth range of the current CTU. In the second part, we propose an adaptive search range algorithm, which employs the motion vector difference (MVD) of the BL to obtain a new search range of the EL. In the third part, our proposed algorithm utilizes the Coded Block Flag (CBF) and the prediction modes of the CTU in the co-located BL to skip the unlikely prediction modes in the current CTU. In the fourth part, our proposed algorithm calculates a threshold based on the RD-Costs of the interlayer reference mode; the encoding process is early terminated if the RD-Cost of the current CU is less than the threshold. The rest of this paper is organized as follows. In Section 2, an overview of SHVC is presented. A detailed description of the
proposed fast algorithm is given in Section 3. The experimental results of our proposed algorithm are demonstrated in Section 4. We conclude this paper in Section 5.
2. Overview of High Efficiency Video Coding Scalable Extension (SHVC) The framework of SHVC is built on the fundamentals of HEVC, and it also uses the CU, PU and TU for the coding processing. CU is based on quad-tree structure with CU depths various from depth 0 (64 64, CTU) to depth 3 (8 8). PU provides numerous partition types to find the most suitable one. In the BL, the architecture
465
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
Fig. 5. The CTUs of the enhancement layer.
Fig. 6. Illustration of the spatial and temporal CTUs in BL.
Table 2 Accumulated probability (%) obtained from CTU correlation. Sequence
1st
2nd
3rd
4th
5th
6th
Traffic Kimono1 BasketballDrill BasketballPass Vidyo1
65.42 53.71 64.92 73.58 69.87
84.09 75.94 85.90 89.34 86.96
92.56 85.80 93.36 96.98 95.04
95.56 90.65 96.19 99.10 97.27
97.04 93.46 97.54 99.34 98.29
97.66 95.03 98.08 99.55 98.60
of SHVC is identical to the architecture of HEVC. In the EL, SHVC can utilize additional reference information from the BL to predict the current frame. The inter-layer reference (ILR) is the main coding tool newly designed for SHVC, which includes two main techniques, namely inter-layer texture prediction and inter-layer motion prediction.
2.1. Inter-layer texture prediction Inter-layer texture prediction [23] performs inter prediction in EL by setting the reference frame as BL, and the motion vector (MV) is also set as zero. Thus, the EL can directly copy the blocks
from the BL as shown in Fig. 2. We denote inter-layer texture prediction as ILTP in the rest of our paper.
2.2. Inter-layer motion prediction Advanced motion vector prediction (AMVP) [24] finds the best motion vector in the reference frames as a starting search point for motion estimation. In order to reduce the required data, only the reference index, motion vector difference (MVD) and residual will be transmitted for AMVP. By adding the decoded frame from the BL into the reference frames, the EL can choose a motion vector from the BL during the AMVP process. Fig. 3 shows motion vectors
466
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
Fig. 7. The correlation between BD-BR, coding time and search range.
Fig. 8. The flowchart of the proposed adaptive search range algorithm.
in EL, which are reused from BL. This method is called inter-layer motion prediction. 2.3. Prediction units in scalable High Efficiency Video Coding Fig. 4 shows the implementation of PU modes in the encoder of SHVC reference software (SHM). The PU coding process will firstly go through the prediction modes of Merge/Skip 2N 2N, Inter 2N 2N and Inter Symmetric Partitions (SMP, Inter N 2N and Inter 2N N). Then, the followings are Inter Asymmetric Partitions (ASP, Inter 2N nU, Inter 2N nD, Inter nL 2N, and Inter nR 2N), which are selectively performed according to the best mode in previous phase. In the end, Intra 2N 2N and ILTP are
executed at the finally. Inter N N and Intra N N are only implemented in the smallest coding unit (SCU). Asymmetrical modes are not for SCU. During the prediction of Merge 2N 2N, it will acquire the motion vectors of the coded CTUs from spatial and temporal domains.
3. Proposed fast algorithm for SHVC In this paper, we provide two modes, Quality Mode and Speed Mode. Quality Mode focuses on maintaining the video quality, while Speed Mode emphasizes accelerating the encoding process. Users can choose from the two modes according their needs.
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
3.1. Fast depth prediction algorithm 3.1.1. Depth information from BL In SHVC, the video contents have a high correlation between the BL and EL. Thus, the CU depth distributions are very similar between the BL and EL. We have explored the depth relationship Table 3 Mode distribution of BL. Mode
Probability
Skip Merge 2N 2N Inter 2N 2N Inter 2N N Inter N 2N Inter N N Inter 2N NU Inter 2N ND Inter nL 2N Inter nR 2N Intra 2N 2N Intra N N
67.18% 4.63% 7.37% 3.43% 3.58% 0.80% 1.37% 1.16% 1.30% 1.08% 7.35% 0.76%
467
between the best CU depth in a CTU at the EL (DepthEL) and the best co-located CU depth in a CTU at the BL (DepthBL) as shown in Table 1, which shows the probability distributions of DepthEL and DepthBL. The statistical analysis in Table 1 is conducted by encoding the PeopleOnStreet, Kimono1, Cactus and BasketballDrive sequences on SHVC reference software (SHM6.1) under random-access configuration for the QP(BL,EL) set as (26,20), (30,24), (34,28) and (38,32). It can be observed that the possibility of DepthEL = DepthBL is the highest, which the maximum reaches to 27.95% when both DepthEL and DepthBL are 0. The followings are DepthEL = DepthBL + 1 and DepthEL = DepthBL + 2. The observation indicates the high dependence between DepthEL and DepthBL. Therefore, the proposed algorithm chooses DepthBL + 2 as the maximum depth in Quality Mode and DepthBL as the maximum depth in Speed Mode, and DepthBL as the minimum depth in both modes. The depth range will be denoted as the depth range of base layer (BLD) in Section 3.1.3. 3.1.2. Depth information from EL We access the spatial and temporal reference CTUs of EL as shown in Fig. 5. To determine if the spatial or temporal depth
Fig. 9. The flowchart of the proposed fast mode decision algorithm.
468
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
(a)Depth0
(b)Depth1
(c)Depth2
(d)Depth3
Fig. 10. The fitted curve by linear regression of the RD-Cost ratio between best mode in EL and ILTP in each depth.
information of the EL CTUs should be additionally adopted in the proposed algorithm, we employ Pearson’s correlation coefficient [25], denoted by qX;Y as shown in Eq. (1), to sort the correlations of the spatial and temporal CTUs in BL. The Pearson’s correlation coefficient is a measurement of the linear correlation between two sets of variables, X and Y, in statistics. The spatial and temporal correlations can be calculated as Eqs. (2) and (3), respectively.
qX;Y
P P P ð xi yi Þ xi yi n ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P 2 ð xi Þ2 P 2 ð yi Þ2 xi n yi n
ð1Þ
where X and Y are the RD-Cost values of the co-located CTUs in the BL and the corresponding spatial or temporal candidates CTUs, respectively; n is the number of the elements in X and Y; xi and yi are the elements of two sets X and Y.
X S ¼ fRDcost B ; RDcostB0 ; RDcostB1 g
Y Sa ¼ fRDcostBa ; RDcost Ba0 ; RDcostBa1 jafL; U; UL; URgg
X T ¼ fRDcost B ; RDcostBL ; RDcostBU ; RDcostBUL g Y Tb ¼ RDcostBb ; RDcostBLb ; RDcostBUb ; RDcostBULb jbf0; 1g
probability gets higher. The proposed algorithm accesses the reference CTUs in EL, which is presented in Fig. 5. Then, we obtain minimum depth and maximum depth values from some of these reference CTUs as the depth range, which will be denoted as the depth range of enhancement layer (ELD) in Section 3.1.3. In Quality Mode, we take the top four CTUs as the reference CTUs to maintain quality. In Speed Mode, we only take the top two CTUs as the main references and further accelerate the coding speed.
3.1.3. Combined depth information from BL and EL To speed up the CU process, our proposed algorithm combines the depth information from the BL and EL. The proposed algorithm applies the union of depth information from the BL and EL as a rough depth prediction (RPD), and the intersection of depth information from the BL and EL as an accurate depth prediction (APD) as shown in Eqs. (4) and (5).
ð2Þ
ð3Þ
where X S and Y Sa are the sets utilized to calculate spatial correlation; X T and Y Tb are sets utilized to calculate temporal correlation as indicated in Fig. 6. Table 2 tabulates the accumulated probability hit for the sorting rule determined by Eqs. (1), (2) and (3) with the sequences encoded by SHM6.1 under random-access configuration. As can be seen the highest two relevancies of CTU accumulated probability hit exceed 80% for most sequences. The highest four relevancies of CTU accumulated probability hit exceed 90%. Therefore, if we continue to increase the number of reference CTUs, accumulated
RPD ¼ BLD [ ELD
ð4Þ
APD ¼ BLD \ ELD
ð5Þ
The BLD and the ELD represent the depth ranges of reference CTUs obtained from the BL and the EL, respectively, which are described in Section 3.1.1, and Section 3.1.2. There is a high possibility that the best CU depth is within the range of the APD. The proposed algorithm therefore encodes the CTU in the range of the APD more carefully, and the proposed fast mode decision algorithm is applied in the range of the APD. The possibility of the best CU depth being in the range of the RPD is relatively lower. Thus, only Merge/Skip 2N 2N, Inter 2N 2N, and ILTP are tested in the range of the RPD.
469
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
Fig. 11. The flowchart of whole the proposed algorithm.
Table 4 The configuration of experimental environment. Configurations
Settings
Codec version Coding structure Compared with [14,15] BL QP Compared with [14] EL1 QP Compared with [15] EL1 QP Compared with [15] EL2 QP Search range ME algorithm
SHM 6.1 Random-access [14,15], Low-delay [14] 26, 30, 34, 38 20, 24, 28, 32 22, 36, 30, 34 18, 22, 26, 30 64 TZ Search
3.2. Adaptive search range Motion estimation involves much computation for encoding the PU. The fixed search range is set at 64, in general, in SHVC. However, the AMVP deviates little from the best motion vector (MV) by the operation of the inter-layer motion prediction in SHVC. Therefore, a fixed search range set at 64 is too large, and will cause unnecessary computation in most cases. As shown in Fig. 7, we have analyzed the correlation between the Bjontegaard delta
Table 5 Test sequences in our experiments.
Traffic PeopleOnStreet Kimono ParkScene Cactus BasketballDrive BQTerrace BasketballDrill BQMall PartyScene RaceHorses BasketballPass BQSquare BlowingBubbles RaceHorses Vidyo1 Vidyo3 Vidyo4 FourPeople Johnny KristenAndSara
Frame rate
Resolution
Number of frame
30 30 24 24 50 50 60 50 60 50 30 50 60 50 30 60 60 60 60 60 60
25601600 25601600 19201080 19201080 19201080 19201080 19201080 832480 832480 832480 832480 416240 416240 416240 416240 1280720 1280720 1280720 1280720 1280720 1280720
150 150 240 240 500 500 600 500 600 600 300 500 600 500 300 600 600 600 600 600 600
470
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
the co-located CTU in the BL to judge whether to shrink search range or not. If the average MVD of the co-located CTU in the BL is larger than half of the original search range, which is 32 in general in SHVC, the proposed adaptive search range algorithm conducts motion estimation with the original search range. Fig. 8 shows the flowchart of the proposed adaptive search range algorithm.
Table 6 Parametric arrangement of the proposed Quality Mode and Speed Mode.
The depth range of base layer (BLD) in Section 3.1.1 The depth range of enhancement layer (ELD) in Section 3.1.2 c in Section 3.3 a in Section 3.4
Proposed Quality Mode
Proposed Speed Mode
Minimum: DepthBL, Maximum: DepthBL + 2
Minimum: DepthBL, Maximum: DepthBL
Depth range among the first four reference CTUs with the highest correlations 90% 0.8
Depth range among the first two reference CTUs with the highest correlations 80% 0.9
3.3. Fast mode decision algorithm
bitrate (BD-BR), coding time and search range with the five-class sequences encoded by SHM6.1 under random-access configuration, including Class A (Traffic), Class B (Kimono1), Class C (BasketballDrill), Class D (BasketballPass) and Class E (Vidyo1). It can be observed that the BD-BR is close to zero when the search range is larger than 4, which means the best matching results in terms of motion estimation are similar when the search range is set at 4 or greater than 4. Furthermore, the BD-BR increases greatly when the search range is smaller than 2, which means setting the search range at less than 2 might not obtain the best matching results. Therefore, we set the maximum and minimum values for the search range according to the average motion vector difference (MVD) of the BL as shown in Eq. (6), where BLMVDav g is the average MVD of the co-located CTU in the BL.
SRnew ¼
2; if BLMVDav g ¼¼ 0
ð6Þ
4; if BLMVDav g > 0
To prevent the quality from degrading in the area with a high motion vector because of the proposed adaptive search range algorithm, the proposed algorithm utilizes the average MVD of
Merge 2N 2N [26] is a new mode in HEVC designed to diminish the required data. Merge 2N 2N searches for the best motion vector from the spatial and temporal neighboring CTUs. Furthermore, only the location index of the best motion vector and the residual will be conveyed. Moreover, if the transformed residual is zero, Merge 2N 2N will be denoted as Skip mode and the Coded Block Flag (CBF) will be set as zero. At that time, it is unnecessary to transmit the residual. According to the works of [27] and [28], the CBF can be a factor in judging whether the prediction mode should be ignored or not. To maintain optimal video quality, we utilize the CBF and the mode distribution of the BL as a criterion for bypassing the prediction modes. From the mode distribution of the BL as analyzed from the Traffic, PeopleOnStreet, Kimono1, ParkScene, Cactus and BasketballDrive sequences encoded by SHM6.1 under random-access configuration, as shown in Table 3, it can be observed that the possibility of Skip mode is the highest, followed by Inter 2N 2N, Intra 2N 2N, and Merge 2N 2N. In addition, Skip mode, Merge 2N 2N and Inter 2N 2N are usually selected when the variation of the video content is relative small. Consequently, we define Skip mode, Merge 2N 2N and Inter 2N 2N as homogeneous modes. In other words, the content of those modes could be with similar motion variation. The proposed algorithm defines the proportion
Table 7 Performance comparison between [14], the proposed Quality Mode and the proposed Speed Mode under random-access structure. Class
Sequence
Size
Bailleul et al. [14]
Proposed Quality Mode
Proposed Speed Mode
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
A
Traffic PeopleOnStreet Average
2560 1600 2560 1600
7.59 7.61 7.60
0.208 0.312 0.260
81.86 77.64 79.75
45.53 42.73 44.13
0.90 0.67 0.79
0.025 0.028 0.027
65.95 52.85 59.40
36.55 28.92 32.74
2.11 2.02 2.07
0.059 0.084 0.072
78.18 67.86 73.02
43.45 37.23 40.34
B
Kimono ParkScene Cactus BasketballDrive BQTerrace Average
1920 1080 1920 1080 1920 1080 1920 1080 1920 1080
4.69 6.39 8.49 7.02 6.42 6.60
0.123 0.188 0.134 0.118 0.114 0.135
81.76 80.72 80.81 80.39 81.66 81.07
45.35 45.11 45.63 44.36 47.15 45.52
0.70 0.85 0.70 0.61 1.02 0.78
0.019 0.026 0.011 0.010 0.018 0.017
60.00 62.90 59.67 60.05 60.00 60.52
33.23 34.99 33.48 33.08 34.25 33.81
1.84 2.16 1.86 2.15 1.77 1.96
0.049 0.065 0.031 0.037 0.031 0.043
76.77 75.49 73.24 74.77 72.75 74.60
42.58 42.03 41.20 41.22 41.69 41.74
C
BasketballDrill BQMall PartyScene RaceHorses Average
832 480 832 480 832 480 832 480
8.06 7.38 5.32 6.26 6.76
0.313 0.267 0.256 0.250 0.272
79.71 79.21 79.01 78.44 79.09
44.39 43.97 44.71 44.33 44.35
1.03 1.21 0.97 0.85 1.02
0.041 0.045 0.047 0.034 0.042
60.64 56.62 58.11 50.30 56.42
33.67 31.30 32.77 28.31 31.51
2.90 3.34 2.63 2.56 2.86
0.113 0.122 0.127 0.103 0.116
73.41 70.76 70.33 65.68 70.05
40.81 39.20 39.74 37.03 39.20
D
BasketballPass BQSquare BlowingBubbles RaceHorses Average
416 240 416 240 416 240 416 240
6.94 5.07 6.26 8.37 6.66
0.352 0.196 0.251 0.433 0.308
77.27 78.86 77.54 75.14 77.20
42.54 44.94 43.86 42.07 43.35
0.96 0.66 1.08 1.03 0.93
0.049 0.027 0.044 0.054 0.044
59.13 57.35 54.84 49.85 55.29
32.49 32.51 30.93 27.97 30.98
2.77 2.18 3.26 3.83 3.01
0.141 0.086 0.132 0.199 0.140
70.61 69.99 68.53 63.38 68.13
38.88 39.77 38.72 35.55 38.23
E
Vidyo1 Vidyo3 Vidyo4 FourPeople Johnny KristenAndSara Average
1280 720 1280 720 1280 720 1280 720 1280 720 1280 720
6.39 7.70 7.51 6.37 6.45 5.81 6.71
0.143 0.178 0.163 0.160 0.105 0.132 0.147
83.04 82.88 82.70 82.69 83.07 82.89 82.88
44.74 44.77 45.03 44.85 44.95 44.79 44.86
0.56 0.68 0.80 0.63 1.09 0.85 0.77
0.011 0.015 0.016 0.015 0.018 0.019 0.016
74.01 73.10 71.68 72.75 73.71 72.76 73.00
39.67 39.29 38.79 39.22 39.64 39.07 39.28
1.84 2.15 2.20 1.91 2.92 2.46 2.25
0.040 0.050 0.047 0.046 0.047 0.057 0.048
83.31 82.79 82.46 81.99 83.06 82.43 82.67
44.80 44.71 44.75 44.27 44.79 44.42 44.62
6.77
0.209
80.35
44.56
0.85
0.027
62.20
34.29
2.42
0.079
74.66
41.28
Average
471
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
of homogeneous area (HAP) to determine whether the prediction modes in EL should be ignored or not. The proportion of homogeneous area (HAP) indicates the area percentage of the homogeneous modes that account for the co-located CTU in BL. If the HAP of the co-located CTU in the BL is higher than c and the CBF of the current prediction mode is equal to zero, the rest of the prediction modes are omitted. We have performed numerous rigorous experiments in advance to determine c. The larger the value of c is, the better video quality it will be maintained, whereas the fewer the coding time reduction it will be. c is set to 90% in Quality Mode and 80% in Speed Mode empirically. Fig. 9 shows the flowchart of the proposed fast mode decision algorithm.
3.4. Early termination by RD-cost The texture from the co-located CU in the BL can be replicated directly by ILTP. Therefore, the RD-Cost of ILTP probably have a large correlation with the RD-Cost of the best mode in EL. Thus, we analyze the RD-Cost ratio between the best mode in EL and ILTP for the QP (BL, EL) set as (18, 12), (22, 16), (26, 20), (30, 24), (34, 28), (38, 32) and (42, 36). The analysis in EL consists of the sequences in Class A (Traffic and PeopleOnStreet), Class B (Kimono1, ParkScene, Cactus, and BasketballDrive), Class C (BasketballDrill, BQMall, and RaceHorses), Class D (BasketballPass, BQSquare, and BlowingBubbles), and Class E (Vidyo1, Vidyo3, and Vidyo4) encoded on SHM6.1 under random-access configuration. The statistical results are shown in Fig. 10. The RD-Cost ratio between the best mode in EL and ILTP is defined as b as shown in Eq. (7), where RDBest and RDILTP are the RD-Costs of the best mode in EL and ILTP, respectively. Furthermore, we determine the correlation between QP and b for each depth by linear regression as shown in Eq. (8), (9), (10) and (11), which the regression lines
and the R2-values in Fig. 10 validate the steady linear variation of the RD-Cost ratio b in accordance with QP.
bðQP EL Þ ¼ RDBest =RDILTP
ð7Þ
bDepth0 ðQPEL Þ ¼ 0:0004QP 2 0:0305 QP þ 1:1725
ð8Þ
bDepth1 ðQPEL Þ ¼ 0:0004QP 2 0:0293 QP þ 1:1804
ð9Þ
bDepth2 ðQPEL Þ ¼ 0:0157 QP þ 1:0591
ð10Þ
bDepth3 ðQPEL Þ ¼ 0:0161 QP þ 1:0641
ð11Þ
estimated RD ¼ a bDepthx ðQPEL Þ RDILTP
ð12Þ
In other words, we are able to assess b in each depth by Eqs. (8), (9), (10), and (11) during the encoding process. In addition, after the prediction of ILTP, we can calculate the estimated RD-Cost of the best mode in EL by multiplying b and the RD-Cost of ILTP. According to the analysis and description above, the estimated RD-Cost of the best mode in EL for each depth can be calculated by Eq. (12), where a is a controllable parameter. As a result, we may acquire the RD-Cost of ILTP after checking ILTP mode. Then, if the RD-Cost of the current best mode in EL is smaller than or equal to the estimated RD-Cost of the best mode after the processing of the proposed fast mode decision algorithm, the CU procedure will be early terminated. Fig. 11 shows the whole flowchart of the proposed fast algorithm, where the controllable parameter a is adjustable for balancing the video quality and coding time reduction. We have performed numerous experiments in advance to define the suitable values of a for both Quality Mode and Speed Mode. If a is set at a smaller value, the encoder will accelerate the coding process and preserve the video with more delicate quality. On the other hand, the encoder will reduce more coding time and
Table 8 Performance comparison between [14], the proposed Quality Mode and the proposed Speed Mode under low-delay structure. Class
Sequence
Size
Bailleul et al. [14]
Proposed Quality Mode
Proposed Speed Mode
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
BD -BR (%)
BD PSNR (dB)
TS (EL1) (%)
TS (Total) (%)
A
Traffic PeopleOnStreet Average
2560 1600 2560 1600
8.66 6.86 7.76
0.249 0.304 0.277
80.69 76.79 78.74
44.92 41.81 43.37
1.51 0.76 1.14
0.045 0.034 0.040
62.49 54.26 58.38
34.59 29.50 32.05
4.41 2.18 3.30
0.129 0.098 0.114
76.34 68.82 72.58
42.32 37.22 39.77
B
Kimono ParkScene Cactus BasketballDrive BQTerrace Average
1920 1080 1920 1080 1920 1080 1920 1080 1920 1080
4.54 7.77 7.76 5.89 5.81 6.35
0.136 0.236 0.145 0.096 0.128 0.148
81.28 78.83 80.00 80.05 80.77 80.19
45.45 44.88 45.38 44.61 46.30 45.32
1.00 1.51 1.18 0.93 0.98 1.12
0.030 0.047 0.023 0.015 0.022 0.027
60.72 59.01 57.12 60.84 60.26 59.59
34.07 33.49 32.21 33.86 34.23 33.57
2.34 3.72 3.27 2.83 2.33 2.90
0.071 0.115 0.061 0.046 0.053 0.069
77.88 73.01 72.73 76.49 74.39 74.90
43.60 41.52 41.10 42.55 42.38 42.23
C
BasketballDrill BQMall PartyScene RaceHorses Average
832 480 832 480 832 480 832 480
8.23 6.68 5.29 5.48 6.42
0.325 0.265 0.279 0.251 0.280
79.04 78.57 77.59 77.23 78.11
43.91 43.50 43.84 43.26 43.63
1.13 1.64 1.24 0.92 1.23
0.045 0.066 0.065 0.043 0.055
58.44 56.28 53.49 51.39 54.90
32.36 30.99 30.18 28.68 30.55
3.84 4.54 3.32 2.32 3.51
0.152 0.181 0.174 0.107 0.154
72.45 70.27 67.22 66.38 69.08
40.10 38.74 37.92 36.91 38.42
D
BasketballPass BQSquare BlowingBubbles RaceHorses Average
416 240 416 240 416 240 416 240
6.54 5.99 6.85 7.63 6.75
0.346 0.261 0.300 0.442 0.337
77.54 77.16 75.45 74.09 76.06
42.39 43.56 42.64 41.30 42.47
1.58 1.16 1.21 1.31 1.32
0.085 0.052 0.054 0.076 0.067
61.17 52.55 48.22 50.39 53.08
33.45 29.59 27.31 28.17 29.63
4.54 3.34 3.64 3.92 3.86
0.239 0.147 0.160 0.226 0.193
70.71 66.14 63.08 63.31 65.81
38.59 37.25 35.66 35.32 36.71
E
Vidyo1 Vidyo3 Vidyo4 FourPeople Johnny KristenAndSara Average
1280 720 1280 720 1280 720 1280 720 1280 720 1280 720
6.94 8.40 7.01 7.02 8.19 6.68 7.37
0.168 0.228 0.143 0.177 0.143 0.155 0.169
82.11 81.47 82.03 82.04 82.15 82.10 81.98
44.27 44.00 45.01 44.52 44.68 44.49 44.50
1.34 1.71 1.64 1.41 2.50 2.26 1.81
0.032 0.048 0.035 0.035 0.044 0.051 0.041
71.90 69.80 70.38 72.09 71.64 72.05 71.31
38.57 37.47 38.37 38.90 38.71 38.88 38.48
4.93 4.37 4.64 4.82 6.52 7.58 5.48
0.121 0.122 0.097 0.121 0.111 0.173 0.124
82.89 80.90 82.66 82.49 83.09 83.35 82.56
44.42 43.40 45.03 44.42 44.93 44.94 44.52
6.87
0.227
79.38
44.03
1.38
0.045
60.69
33.50
3.97
0.129
74.03
40.87
Average
472
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
Table 9 Performance comparison between Ref. [15], the proposed Quality Mode and the proposed Speed Mode under low-delay structure. Class
Sequence
Size
layer
Tohidypour et al. [15]
Proposed Quality Mode
BD -BR (%)
BD PSNR (dB)
TS (%)
Proposed Speed Mode
BD -BR (%)
BD PSNR (dB)
TS (%)
BD -BR (%)
BD PSNR (dB)
TS (%)
A
Traffic
2560 1600
EL1 EL2 Tolal
1.30 1.92
0.038 0.055
69.69 54.97 43.71
0.12 0.25
0.022 0.030
55.24 45.43 31.52
0.16 0.26
0.028 0.031
55.21 45.28 31.32
A
PeopleOnStreet
2560 1600
EL1 EL2 Tolal
0.97 1.09
0.060 0.050
56.89 46.83 36.57
0.15 0.42
0.006 0.018
51.24 41.26 29.88
0.18 0.83
0.007 0.035
53.58 44.01 31.74
B
Kimono
1920 1080
EL1 EL2 Tolal
0.77 2.33
0.021 0.050
62.75 52.35 40.61
0.31 0.72
0.009 0.015
59.53 42.62 32.71
0.42 1.47
0.012 0.030
63.53 54.17 38.52
B
BasketballDrive
1920 1080
EL1 EL2 Tolal
1.35 1.52
0.026 0.032
66.64 50.94 41.23
0.10 0.31
0.002 0.007
57.57 36.34 29.58
0.22 0.62
0.004 0.013
61.50 46.82 35.51
B
BQTerrace
1920 1080
EL1 EL2 Tolal
0.68 1.53
0.010 0.031
61.11 55.76 42.79
0.49 0.78
0.010 0.010
54.18 41.49 32.34
0.88 1.00
0.011 0.020
58.54 51.52 38.10
C
BasketballDrill
832 480
EL1 EL2 Tolal
1.16 1.55
0.080 0.090
57.12 51.49 38.69
0.35 2.49
0.013 0.102
54.19 54.81 36.19
0.35 3.49
0.013 0.159
54.29 56.62 37.00
D
BasketballPass
416 240
EL1 EL2 Tolal
0.95 1.35
0.068 0.110
52.48 44.86 34.30
0.03 1.15
0.003 0.066
48.87 44.61 30.11
0.00 0.02
0.003 0.107
48.59 45.10 30.25
D
BQSquare
416 240
EL1 EL2 Tolal
0.77 1.55
0.090 0.120
56.88 49.88 39.03
0.24 0.95
0.012 0.043
50.19 59.89 37.07
0.28 2.76
0.012 0.122
55.52 64.96 40.74
D
BlowingBubbles
416 240
EL1 EL2 Tolal
0.67 1.59
0.074 0.137
52.86 48.54 36.42
0.14 2.99
0.006 0.133
44.41 53.31 33.02
0.14 4.88
0.006 0.214
50.10 58.12 41.09
EL1 EL2 Tolal
0.96 1.60
0.052 0.075
59.60 50.62 39.26
0.21 1.12
0.009 0.047
52.82 46.64 32.49
0.29 1.70
0.011 0.081
55.65 51.84 36.03
Average
PeopleOnStreet 44.0
PeopleOnStreet 41.0
SHM6.1
SHM6.1
40.8
40.0
Quality Mode
38.0 Speed Mode
36.0 34.0
Referen Reference c[18] [14]
32.0 0
20000 40000 bit-rate(kbps) (a) RD-curve
PSNR(dB)
PSNR(dB)
42.0
40.6
Quality Mode
40.4
Speed Mode
40.2
Referenc Reference [18] [14]
40.0 20000
25000 30000 bit-rate(kbps) (b) RD-curve partially enlarged
Fig. 12. RD-curve Comparison of PeopleOnStreet (2560 1600) sequence for random access configuration.
maintain acceptable video quality if a is set at a larger value. As a result, the appropriate values of a for the proposed Quality Mode and Speed Mode are empirically set as 0.8 and 0.9, respectively.
4. Experimental results The proposed algorithm with Quality Mode and Speed Mode is implemented on SHVC reference software version SHM 6.1 [29] and compared with [14] and [15]. We summarize experimental parameters set as Table 4 and list the specifications of test sequences in Table 5. The testing platform of our experiments is
built on a PC of 64-bit Windows 7 operation system with Intel(R) Core(TM) i54570
[email protected] GHz and 8 GB memory. Bjontegaard delta bitrate (BD-BR and BD-PSNR) [30,31] are used as the criteria for evaluating the efficiency of proposed method. Time-saving is denoted as TS and calculated by Eq. (13). In order to access how the different methods perform at the expense of the coding efficiency, we adopt the evaluation approach in Ref. [32] and compute the ratio of BD-BR and TS by (14), which means the RD efficiency loss (BD-BR increase) in terms of per unit of time-saving (TS).
TS ¼
TimeðreferenceÞ - TimeðproposedÞ TimeðreferenceÞ
ð13Þ
473
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
FourPeople
44.0
FourPeople 43.0
SHM6.1
SHM6.1
42.8 Quality Mode
40.0
Speed Mode
38.0
Referenc Reference [18] [14]
36.0 0
2000
4000
PSNR(dB)
PSNR(dB)
42.0
Quality Mode
42.6 42.4
Speed Mode
42.2
Referenc Reference [18] [14]
42.0 2000
bit-rate(kbps) (a) RD-curve
2500 3000 bit-rate(kbps) (b) RD-curve partially enlarged
Fig. 13. RD-curve Comparison of FourPeople (1280 720) sequence for low-delay configuration.
(a)SHM6.1, PSNR=31.38dB
(b)Reference[14], PSNR=30.67dB
(c)Quality Mode, PSNR=31.28dB
(d)Speed Mode, PSNR=31.21dB
Fig. 14. Subjective quality comparison of RaceHorses (416 240) sequence(the 103rd frame, BL QP = 38, EL QP = 32) for random-access configuration.
BD-BR=TS ¼
BD BR TSðELÞ
ð14Þ
To clearly illustrate the difference between the proposed Quality Mode and Speed Mode, we tabulate the parameters of the proposed algorithm and list the numerical arrangement in Table 6. When Quality Mode is enabled, we aim to preserve more video quality while providing considerable encoding time-saving. As a result, the parametric arrangement of Quality Mode will be stricter. If the CU depth splitting tends to be early terminated or PU modes are able to be early determined to accelerate the coding process, they should go through stringently methodological check by strictly parametric setting to guarantee high video quality. On the other hand, for Speed Mode, we focus on diminishing the coding time significantly while maintaining allowable video quality at the same time. Accordingly, the parametric arrangement of Speed Mode will be more flexible to maximize the effeteness of the proposed fast algorithm and encode the videos as fast as possible. 4.1. Performance comparison We compare our method to previous works [14,15] in BD-BR, BD-PSNR and TS. Table 7 and Table 8 compare the experimental
results of Ref. [14], the proposed Quality Mode and the proposed Speed Mode under random-access and low-delay configuration, respectively. Table 9 compares the experimental results of Ref. [15], the proposed Quality Mode and the proposed Speed Mode under low-delay configuration. In Quality Mode, we carefully constrain the parameters such as DepthBL, the number of the reference CTU, c and a, and focus on the video quality during the speed-up process. In Speed Mode, we adjust the parameters to emphasize the time-saving performance of the coding process. For the proposed Quality Mode under random-access configuration as shown in Table 7, the average time-saving of EL1 is 62.20% and the average total time-saving is 34.29%. Despite the fact that the average time-saving of EL1 is 80.35% and the average total time-saving is 44.56% in Ref. [14], the BD-BR and BD-PSNR of the proposed algorithm are better than [14]. There are only 0.85% BD-BR increase and 0.027 dB BD-PSNR decrease, which greatly outperform 6.77% BD-BR increase and 0.209 dB BD-PSNR decrease in Ref. [14]. The average time-saving of EL1 in Class E of our proposed can reach to 73.00%, which is only a little lower than 82.88% of Ref. [14]. However, the BD-BR increases only 0.77%, which is significantly lower than 6.71% of Ref. [14]. Because Class C and Class D are the sequences with low resolution and the CUs tend to be split into smaller sizes rather than being early terminated at larger size,
474
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
(a)SHM6.1, PSNR=32.49dB
(b)Reference[14], PSNR=32.19dB
(c)Quality Mode, PSNR=32.40dB
(d)Speed Mode, PSNR=32.25dB
Fig. 15. Subjective quality comparison of BQMall (832 480) sequence (the 27th frame, BL QP = 38, EL QP = 32) for low-delay configuration.
Table 10 Efficiency cost comparison of the proposed method and Ref. [14]. Random-access
Low-delay
Class
Sequence
Size
Bailleul et al. [14] BD-BR/TS
Quality Mode BD-BR/TS
Speed Mode BD-BR/TS
Bailleul et al. [14] BD-BR/TS
Quality Mode BD-BR/TS
Speed Mode BD-BR/TS
A
Traffic PeopleOnStreet Average
2560 1600 2560 1600
0.093 0.098 0.096
0.014 0.013 0.014
0.027 0.030 0.029
0.107 0.089 0.098
0.024 0.014 0.019
0.058 0.032 0.045
B
Kimono ParkScene Cactus BasketballDrive BQTerrace Average
1920 1080 1920 1080 1920 1080 1920 1080 1920 1080
0.057 0.079 0.105 0.087 0.079 0.081
0.012 0.014 0.012 0.010 0.017 0.013
0.024 0.029 0.025 0.029 0.024 0.026
0.056 0.099 0.097 0.074 0.072 0.080
0.016 0.026 0.021 0.015 0.016 0.019
0.030 0.051 0.045 0.037 0.031 0.039
C
BasketballDrill BQMall PartyScene RaceHorses Average
832 480 832 480 832 480 832 480
0.101 0.093 0.067 0.080 0.085
0.017 0.021 0.017 0.017 0.018
0.040 0.047 0.037 0.039 0.041
0.104 0.085 0.068 0.071 0.082
0.019 0.029 0.023 0.018 0.022
0.053 0.065 0.049 0.035 0.051
D
BasketballPass BQSquare BlowingBubbles RaceHorses Average
416 240 416 240 416 240 416 240
0.090 0.064 0.081 0.111 0.087
0.016 0.012 0.020 0.021 0.017
0.039 0.031 0.048 0.060 0.045
0.084 0.078 0.091 0.103 0.089
0.026 0.022 0.025 0.026 0.025
0.064 0.050 0.058 0.062 0.059
E
Vidyo1 Vidyo3 Vidyo4 FourPeople Johnny KristenAndSara Average
1280 720 1280 720 1280 720 1280 720 1280 720 1280 720
0.077 0.093 0.091 0.077 0.078 0.070 0.081
0.008 0.009 0.011 0.009 0.015 0.012 0.011
0.022 0.026 0.027 0.023 0.035 0.030 0.027
0.085 0.103 0.085 0.086 0.100 0.081 0.090
0.019 0.024 0.023 0.020 0.035 0.031 0.025
0.059 0.054 0.056 0.058 0.078 0.091 0.066
0.084
0.014
0.033
0.087
0.022
0.053
Average
the time-savings of Class C and Class D are limited. In addition, Class E sequences are with static motion content, so the PUs are likely to choose Skip mode as best prediction mode and the CUs are more likely being early terminated at larger size, which provides much more time-saving for EL1. For the proposed Speed Mode under random-access configuration as also shown in Table 7, the average time-savings of Class A and Class B of our method are 40.34% and 41.74%, which are similar
to 44.13% and 45.52% of Ref. [14]. However, the average BD-BR of Class A and Class B of our method are only 2.07% and 1.96%, which greatly outstand than 7.60% and 6.60% of Ref. [14]. In the middle and low resolution sequences such as Class C, Class D and Class E, the CUs tend to split into small sizes. In the Vidyo1 sequence, the time-saving for EL1 of our method is 83.31%, which is better than 83.04% of Ref. [14]. Also, the 1.84% BD-BR increase of our method is greatly superior to the 6.39% of Ref. [14]. Even though
475
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476 Table 11 Performance evaluation of individual part in the proposed Quality Mode under random-access structure. Class
Sequence
Fast depth prediction
Fast mode decision
RD early termination
Adaptive search range
BD -BR (%)
TS (EL) (%)
BD -BR (%)
TS (EL) (%)
BD -BR (%)
TS (EL) (%)
BD -BR (%)
TS (EL) (%)
A
Traffic PeopleOnStreet Average
0.62 0.54 0.58
45.89 36.31 41.10
0.22 0.21 0.22
39.87 13.27 26.57
0.03 0.03 0.03
26.38 7.46 16.92
0.01 0.05 0.03
6.41 14.04 10.23
B
Kimono1 ParkScene Cactus BasketballDrive BQTerrace Average
0.34 0.40 0.45 0.40 0.42 0.40
43.91 42.30 42.78 42.05 45.35 43.28
0.11 0.26 0.20 0.18 0.24 0.20
22.49 35.70 29.29 29.35 34.28 30.22
0.00 0.12 0.02 0.03 0.00 0.03
9.39 22.38 8.38 8.88 13.52 12.51
0.00 0.11 0.04 0.14 0.15 0.09
17.40 7.59 10.93 18.87 8.36 12.63
C
BasketballDrill BQMall PartyScene RaceHorses Average
0.34 0.63 0.41 0.42 0.45
37.75 38.62 37.41 31.72 36.38
0.16 0.16 0.07 0.15 0.14
22.74 24.47 17.34 8.59 18.29
0.33 0.11 0.31 0.06 0.20
26.82 23.19 22.15 6.37 19.63
0.06 0.15 0.00 0.13 0.09
12.54 8.28 8.84 17.80 11.87
D
BasketballPass BQSquare BlowingBubbles RaceHorses Average
0.36 0.34 0.54 0.42 0.42
36.60 36.53 31.72 31.24 34.02
0.03 0.03 0.15 0.20 0.10
23.85 25.04 15.97 3.78 17.16
0.28 0.07 0.38 0.21 0.24
38.95 35.31 17.90 8.97 25.28
0.06 0.02 0.02 0.17 0.07
6.80 4.16 7.31 14.57 8.21
E
Vidyo1 Vidyo3 Vidyo4 FourPeople Johnny KristenAndSara Average
0.43 0.54 0.75 0.44 0.45 0.74 0.56
50.33 48.79 50.27 51.75 52.30 50.53 50.66
0.15 0.30 0.25 0.02 0.48 0.30 0.25
49.26 45.18 47.13 49.72 50.84 49.86 48.67
0.03 0.00 0.17 0.11 0.28 0.26 0.14
34.36 32.43 31.54 37.19 29.70 33.21 33.07
0.13 0.02 0.32 0.06 0.33 0.06 0.15
6.43 6.75 7.94 5.77 6.26 6.72 6.65
0.48
42.10
0.18
30.38
0.13
22.59
0.10
9.70
Total average
that there is still room for our method to improve the BD-BR, the proposed approach is still able to retain low BD-BR increase rather than Ref. [14] with higher BD-BR. In the high resolution sequences such as Class A, Class B, and Class E, there are more CUs can be early terminated at large CU sizes and the proposed method is able to accelerate these CUs and maintain good performance of BD-BR and BD-PSNR. For the proposed Quality Mode under low-delay configuration as shown in Table 8, although the average time-saving of our method is also slightly lower than Ref. [14], the average BD-BR increase and BD-PSNR decrease is almost only around one-fifth of Ref. [14]. Table 8 also shows the comparison on the experimental results of the proposed Speed Mode with [14] under low-delay configuration. Taking the viewpoint of average results of the Speed Mode in Table 8, the average BD-BR is 3.97%. Although the results are inferior to random-access configuration, the BD-BR performance is still better than 6.87% of Ref. [14] with the similar time-saving. We also compare the experimental results of the proposed Quality Mode to Ref. [15] under low-delay configuration in Table 9. For the sequences with high resolution such as Class A and Class B, our method can restrict the BD-BR increase under the maximum 0.78% of the EL2 for the BQTerrace sequence, which is superior to the highly vibrant BD-BR performance of Ref. [15]. In terms of the average results, the total time-saving of our proposed algorithm is 32.49%, which is slightly lower than the 39.26% of Ref. [15]. However, our method works with better BD-BR and BDPSNR for both EL1 and EL2. Table 9 also shows the comparison of experimental results between the proposed Speed Mode and Ref. [15] under low-delay configuration. Regardless of that the BD-BR of EL2 is slightly higher than Ref. [15], our method can achieve lower BD-BR of EL1 than Ref. [15] and remain the resembling time-saving.
4.2. RD curve and subjective comparison We evaluate the proposed method not only from coding time and quality, but also from RD-curves and subjective quality. Figs. 12 and 13 show the RD-curve comparison under random access and low delay configuration, respectively. It can be easily observed that the RD-curve of SHM-6.1 is followed in the order of the proposed Quality Mode, proposed Speed Mode, and the reference method [14]. In Fig. 12, we can see the RD-curve of the proposed Quality Mode and Speed Mode almost overlap the RD-curve of SHM 6.1. Likewise, in Fig. 13, the RD-curve of the proposed Quality Mode is close to the RD-curve of SHM 6.1, and then the proposed Speed Mode. Under the random access and low delay configurations, the RD-curve of the reference method [14] is the farthest from the original RD-curve of SHM 6.1. We highlight the coding errors by red circles in Fig. 14 to compare subjective quality under random access configuration. Our proposed method has better image quality, and it generates more natural quality like SHM6.1 generated. In Fig. 15, our method generates more natural image quality than reference method [14] under low delay configuration. In summary, the proposed Quality Mode and Speed Mode can both provide significant time-saving with the different demands on video quality or coding acceleration, and also achieve better RD performance than Ref. [14].
4.3. Efficiency cost Table 10 compares the BD-BR/TS ratio between the proposed method and Ref. [14] under different coding configurations by (14). The BD-BR/TS ratio is evaluated with dividing BD-BR by time-saving. In other words, the lower the BD-BR/TS ratio is, the better the algorithm performs.
476
C.-H. Yeh et al. / J. Vis. Commun. Image R. 58 (2019) 462–476
For the random-access configuration, the average BD-BR/TS ratios of the proposed Quality Mode and Speed Mode are 0.014 and 0.033, respectively, while Ref. [14] is 0.084. For the low-delay configuration, the total average BD-BR/TS ratio of the proposed Quality Mode and Speed Mode are 0.022 and 0.053, respectively, while Ref. [14] is 0.087. From the discussions in the previous paragraphs, we can see the proposed Quality Mode of the proposed algorithm can accelerate the coding with excellent BD-BR and BD-PSNR under only one sixth of BD-BR/TS ratio compared to Ref. [14]. The Speed Mode of the proposed algorithm provides similar time-saving with Ref. [14], moreover, our method preserves the BD-BR and BD-PSNR under low BD-BR/TS ratio compared to Ref. [14]. The outstanding BD-BR/TS ratio of our proposed method demonstrates its robustness and the ability to overcome the difficulty under the requirement of quality or speed. 4.4. Performance evaluation of individual part in the proposed algorithm Table 11 demonstrates the performance evaluation of individual part in the proposed Quality Mode under random-access structure. Fast depth prediction in the proposed method can reduce the EL coding time about 42.10% for Quality Mode, which contributes the most among the four parts of our method because the recursive quad-tree partitioning is the most time-consuming during the coding process. The time-saving contributions of the proposed schemes are in the order of fast depth prediction, fast mode decision, RD early termination and adaptive search range. The statistical data also verify that individual part of our method has contribution and the combination of each part will provide solid performance with ignorable BD-BR degradation. 5. Conclusion In this paper, we present a computation-efficient algorithm for quality scalability of SHVC. The proposed algorithm utilizes the information of both BL and EL in the encoding processing. Moreover, we provide two modes, Quality Mode and Speed Mode, which users can choose from to meet their needs. For the case with one EL under random-access configuration, our experimental results show that the proposed algorithm can achieve 62.2% of EL1 time-saving with a 0.85% BD-BR increase in Quality Mode, and 74.66% of EL1 time-saving with a 2.42% BD-BR increase in Speed Mode. For the case with two-EL under low-delay configuration, our experimental results show that the proposed algorithm can achieve 52.82% of EL1 time-saving and 46.64% of EL2 time-saving with 0.21% and 1.12% BD-BR increase in Quality Mode. The proposed algorithm can achieve 55.65% of EL1 time-saving and 51.84% of EL2 timesaving with a 0.30% and 1.91% BD-BR increase in Speed Mode. Our proposed algorithm provides better performance than previous works. Acknowledgement This work was supported by Ministry of Science and Technology, Taiwan, under grant MOST 102-2221-E-259-022-MY3, 1052221-E-259-016-MY3 and MOST 106-2221-E-110-083-MY2. References [1] T. Wiegand, J.R. Ohm, G.J. Sullivan, W.J. Han, R. Joshi, T.K. Tan, K. Ugur, Special section on the joint call for proposals on High Efficiency Video Coding (HEVC) standardization, IEEE Trans. Circuits Syst. Video Technol. 20 (12) (2010) 1661– 1666. [2] G.J. Sullivan, J.R. Ohm, W.J. Han, T. Wiegand, Overview of the High Efficiency Video Coding (HEVC) standard, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (2012) 1649–1668.
[3] T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC Video Coding standard, IEEE Trans. Circuits Syst. Video Technol. 13 (7) (2003) 560–576. [4] H. Schwarz, D. Marpe, T. Wiegand, Overview of the Scalable Video Coding Extension of the H.264/AVC Standard, IEEE Trans. Circuits Syst. Video Technol. 17 (9) (2007) 1103–1120. [5] Y. Ye, P. Andrivon, The scalable extensions of HEVC for ultra-high-definition video delivery, IEEE Trans. Multimedia 21 (3) (2014) 58–64. [6] J.M. Boyce, Y. Ye, J. Chen, A.K. Ramasubramonian, Overview of SHVC: scalable extensions of the high efficiency video coding standard, IEEE Trans. Circuits Syst. Video Technol. 26 (1) (2016) 20–34. [7] C. Chen, J. Boyce, Y. Ye, M.M. Hannuksela, Scalable HEVC (SHVC) Test Model 6 (SHM 6), JCTVC-Q1007, Valencia, ES, April 2014. [8] Z. Liu, T.L. Lin, C.C. Chou, Efficient prediction of CU depth and PU mode for fast HEVC encoding using statistical analysis, J. Vis. Commun. Image Represent. 38 (2016) 474–486. [9] Z.Y. Chen, P.C. Chang, Rough mode cost based fast intra coding for high efficiency video coding, J. Vis. Commun. Image Represent. 43 (2017) 77–88. [10] H.R. Tohidypour, M.T. Pourazad, P. Nasiopoulos, Adaptive search range method for spatial scalable HEVC, in: Proceedings of IEEE International Conference on Consumer Electronics (ICCE), January 2014. [11] Q. Ge, D. Hu, Fast Encoding Method Using CU Depth for Quality Scalable HEVC, in: Proceedings of IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA), September 2014. [12] N. Shi, R. Ma, P. Li, P. An, Q. Zhang, Efficient mode decision algorithm for scalable high efficiency video coding, in: Proceedings of SPIE Optoelectronic Imaging and Multimedia Technology III, vol. 9723, no. 37, November 2014. [13] H.R. Tohidypour, H. Bashashati, M.T. Pourazad, P. Nasiopoulos, Fast mode assignment for quality scalable extension of the High Efficiency Video Coding (HEVC) Standard: a Bayesian Approach, in: Proceedings of the 6th Balkan Conference in Informatics, September 2013. [14] R. Bailleul, J.D. Cock, R.V. Walle, Fast Mode Decision for SNR Scalability in SHVC, in: Proceedings of 2014 IEEE International Conference on Consumer Electronics (ICCE), January 2014. [15] H.R. Tohidypour, M.T. Pourazad, P. Nasiopoulos, Probabilistic approach for predicting the size of coding units in the quad-tree structure of the quality and spatial scalable HEVC, IEEE Trans. Multimedia 18 (2) (2016) 182–195. [16] X. Zuo, L. Yu, Fast mode decision method for all intra spatial scalability in SHVC, in: Proceedings of 2014 IEEE Visual Communications and Image Processing Conference, December 2014. [17] T. Katayama, W. Shi, T. Song, T. Shimamoto, Low-complexity intra coding algorithm in enhancement layer for SHVC, in: Proceedings of 2016 IEEE International Conference on Consumer Electronics (ICCE), January 2016. [18] H.R. Tohidypour, M.T. Pourazad, P. Nasiopoulos, An encoder complexity reduction scheme for quality/fidelity scalable HEVC, IEEE Trans. Broadcast. 62 (3) (2016) 666–674. [19] H.R. Tohidypour, H. Bashashati, M.T. Pourazad, P. Nasiopoulos, Onlinelearning-based mode prediction method for quality scalable extension of the High Efficiency Video Coding (HEVC) standard, IEEE Trans. Circuits Syst. Video Technol. 27 (10) (2017) 2204–2215. [20] A. Heindel, E. Wige, André Kaup, Low-complexity enhancement layer compression for scalable lossless video coding based on HEVC, IEEE Trans. Circ. Syst. Video Technol. 27 (8) (2018) 1749–1760. [21] H.R. Tohidypour, M.T. Pourazad, P. Nasiopoulos, Content Adaptive Complexity Reduction Scheme for Quality/Fidelity Scalable HEVC, in: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), May 2013. [22] T. Katayama, W. Shi, T. Song, T. Shimamoto, Early Depth Determination Algorithm for Enhancement Layer Intra coding of SHVC, in: Proceedings of 2016 IEEE Region 10 Conference (TENCON), November 2016. [23] P. Yin, X. Xiu, Y. Ye, Inter-Layer Reference Picture Placement, JCTVC-L0174, Geneva, CH, January 2013. [24] J.L. Lin, Y.W. Chen, Y.W. Huang, S.M. Lei, Motion vector coding in the HEVC standard, IEEE J. Sel. Top. Signal Process. 7 (6) (2013) 957–968. [25] P.C. Wang, G.L. Li, S.F. Huang, M.J. Chen, S.C. Lin, Efficient mode decision algorithm based on spatial, temporal, and inter-layer rate-distortion correlation coefficients for scalable video coding, ETRI J. 32 (4) (2010) 577– 587. [26] P. Helle, S. Oudin, B. Bross, D. Marpe, M.O. Bici, K. Ugur, J. Jung, G. Clare, T. Wiegand, Block merging for quadtree-based partitioning in HEVC, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (2012) 1720–1731. [27] R.H. Gweon, Y.L. Lee, J.Y. Lim, Early Termination of CU Encoding to Reduce HEVC Complexity, JCTVC-F045, Torino, IT, July 2011. [28] H.M. Yoo, J.W. Suh, Fast coding unit decision based on skipping of inter and intra prediction units, Electron. Lett. 50 (10) (2014) 750–752. [29] SHVC Reference Software Version SHM 6.1, available online at https://hevc. hhi.fraunhofer.de/svn/svn_SHVCSoftware/tags/SHM-6.1. [30] G. Bjontegaard, Calculation of Average PSNR Differences between RD Curves, ITU-T SG16/Q6 Document, VCEG-M33, Austin, April 2001. [31] G. Bjontegaard, Improvements of the BD-PSNR Model, ITU-T SG16/Q6, Document, VCEG-AI11, Berlin, July 2008. [32] G. Correa, P.A. Assuncao, L.V. Agostini, L.A.D.S. Cruz, Fast HEVC encoding decisions using data mining, IEEE Trans. Circuits Syst. Video Technol. 25 (4) (2015) 660–673.