Signal Processing: Image Communication 78 (2019) 171–179
Contents lists available at ScienceDirect
Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image
Multiple classifier-based fast coding unit partition for intra coding in future video coding✩ Zongju Peng ∗, Chao Huang, Fen Chen, Gangyi Jiang, Xin Cui, Mei Yu Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
ARTICLE
INFO
Keywords: Quad-tree plus binary tree Multiple classifier Support vector machine Block partitioning Intra coding Future video coding
ABSTRACT Future Video Coding (FVC) significantly improves the compression efficiency over the preceding High Efficiency Video Coding (HEVC) standard but at the cost of extremely huge computational complexity. The flexible quad-tree plus binary tree (QTBT) block partitioning structure in FVC is largely responsible for the high computational complexity. In order to address the issue of huge computational complexity of QTBT, we propose a multiple classifier-based fast QTBT partitioning algorithm for FVC intra coding. The proposed multiple classifier-based QTBT algorithm contains three stages including horizon binary-tree decision model (HBTDM), vertical binary-tree decision model (VBTDM), and quad-tree decision model (QTDM). Consequently, the computational complexity of FVC intra coding can be drastically reduced by replacing the brute-force search with HBTDM, VBTDM, and QTDM to decide the optimal QTBT partitioning. To achieve high prediction accuracy, the descriptive features of intra QTBT partitioning decision are extracted to train corresponding classifiers. An efficient optimal parameter selection method of training classifiers is introduced to balance the computational complexity and rate distortion (RD) performance. Experimental results show that in comparison with the original FVC reference software, the proposed overall algorithm can reduce 64.54% intra coding time with negligible degradation of RD performance. Meanwhile, the proposed algorithm outperforms four state-of-the-art algorithms in terms of computational complexity reduction.
1. Introduction With the ever-increasing demand of High Definition and UltraHigh Definition video services, the amount of video data increases enormously, which requires more transmission bandwidth. In order to relieve the burden of video transmission, a new video coding technology superior to the High Efficiency Video Coding (HEVC) standard needs to be developed [1–4]. Therefore, Moving Picture Experts Group (MPEG) and Video Coding Experts Group (VCEG) jointly established the Joint Video Exploration Team (JVET), which aims to develop the Future Video Coding (FVC) standard [5]. The JVET developed Joint Exploration Test Model (JEM) as the common test platform for FVC. Compared with HEVC, the latest FVC reference software can achieve around 28.5% bitrate reduction while maintaining the same video quality [6–8]. FVC improves the coding efficiency by employing a series of advanced technologies, such as multi-level recursive quad-tree plus binary tree (QTBT) block partitioning structure and 67 intra modes [9,10]. The flexible QTBT block partitioning structure significantly improves the coding efficiency. However, it greatly increases the computational
complexity. The complexity of FVC is 12 times higher than that of HEVC [6]. The huge computational complexity severely limits the practical application of FVC, especially for some real-time applications and power constrained devices. Therefore, it is necessary to address the issue of huge computational complexity of QTBT block partitioning process in FVC. As the previous efforts, many fast block partitioning algorithms have been proposed for quad-tree (QT) structure in HEVC [11–30]. In general, these algorithms can be divided into two categories: statistical analysis-based [11–19] and machine learning-based algorithms [20– 30]. The algorithms belonging to the first category are developed based on statistical information of Coding Unit (CU). They are proposed to early skip (ES) the unnecessary partitioning modes or early terminate (ET) the process of QT partition. Chen et al. proposed a fast inter coding algorithm based on motion and texture features, in which the statistical properties of CUs and temporal correlation are utilized to predict the optimal QT depth range and inter partitioning mode [11]. Li et al. proposed a prediction mode and QT depth decision scheme where the Rate Distortion (RD) cost of Skip/Merge inter mode and parent CU mode are utilized to speed up the process of QT partitioning [12].
✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.image.2019.06.014. ∗ Corresponding author. E-mail address:
[email protected] (Z. Peng).
https://doi.org/10.1016/j.image.2019.06.014 Received 8 December 2018; Received in revised form 12 June 2019; Accepted 28 June 2019 Available online 2 July 2019 0923-5965/© 2019 Published by Elsevier B.V.
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
Li et al. predicted the optimal QT depth range by CU temporal correlation, then unnecessary modes were effectively skipped [13]. Tai et al. proposed a fast QT partition method based on the RD cost of special modes. The RD optimization (RDO) of current depth level is early skipped, and the RD cost of Skip/Merge modes are utilized to early terminate the process of QT partition [14]. Tan et al. proposed a QT depth decision method based on the statistical analysis of residual map information [15]. By analyzing the depth information of encoded CU, Gao et al. established a QT depth probability model to guide the current CU partition [16]. Zupancic et al. presented a novel QT depth decision algorithm, where the CU is adaptively checked in reverse from bottom to top, and the coding information of high depth level is used to skip the RDO process of low depth level [17]. Mallikarachchi et al. proposed a content adaptive fast inter QT depth decision algorithm for HEVC, in which the RDO process is replaced by two CU partitioning likelihood models [18]. In [19], Zhang et al. proposed a fast CU partition method for intra coding in HEVC based on statistical analysis, in which the spatial CU depth correlation is utilized to predict the optimal QT depth range, and a simple RDO model was presented to early terminate the process of CU partitioning. The second category of fast QT partition algorithms is developed based on machine learning. The CU decision in block partitioning structure can be transformed into a typical classification problem. Many learning methods, Support Vector Machine (SVM) [20–23], Bayesian [24–26], Decision Tree [27], and Convolution Neural Network (CNN) [28,29], are utilized to reduce the coding time of CU QT partitioning. Zhu et al. proposed a SVM-based fast CU decision method to early terminate the QT partition process and skip unnecessary modes [20]. Liu et al. established a SVM-based fast CU QT depth decision model by jointly utilizing the texture and direction features of current CU [21]. Zhang et al. proposed a fast intra QT depth decision algorithm which is based on an offline ES model and an online ET model [22]. Zhang et al. utilized texture gradient feature to reduce the number of intra prediction modes, and established a CU QT depth early decision model based on SVM [23]. Kim and Park converted the CU QT partition process into binary classification problem, and built a QT early termination model based on Bayesian algorithm [24]. Zhang et al. proposed a fast inter CU decision algorithm for QT structure based on Bayesian rule and conditional random fields [25]. Goswami et al. proposed a HEVC fast coding algorithm based on Bayesian rule, in which the Markov Chain Monte Carlo model is used to compute the prior and conditional probabilities [26]. Correa et al. used data mining tools to generate decision tree model which are used to predict CU QT depth [27]. In [28], a CNN based CU QT depth fast decision method was proposed to speed up the intra coding. Liu et al. implemented a hardwired intra encoder in which CNN is used for CU QT depth decision [29]. The above fast block partitioning methods for QT structure in HEVC, cannot be directly used to address the issue of high computational complexity of FVC due to the use of a new QTBT block partition structure and the extension of 67 intra modes. To further reduce the computational complexity of FVC, it is necessary to consider the unique characteristics of QTBT in FVC. Actually, there are a few initial attempts on this problem [31–35]. Wang et al. proposed a inter QTBT decision algorithm based on the statistical analysis of motion and texture features, where a confidence interval based QTBT block partitioning model is used to replace the process of CU recursive partition [31]. In [32], Lin et al. utilized the RD cost of parent and sub-CUs to skip the QTBT partitioning process of the second child sub-CUs of a binary tree (BT) partitioning. Huang et al. proposed a QTBT fast decision method, where the block partitioning process is early terminated when the sub-CUs satisfy the specific constraints [33]. In [34], Jin et al. proposed a fast intra QTBT decision algorithm, where CNN is utilized to predict the CU depth range. In [35], Wang et al. proposed a fast intra QTBT partition decision algorithm based on HEVC reference software HM13.0, in which a decision tree model is established to early terminate the process of QTBT partitioning. However, this algorithm does not work
Fig. 1. QTBT block partitioning structure in FVC.
well with JEM due to the temporal sampling coding mechanism of JEM. In general, the aforementioned algorithms for intra coding in FVC save limited encoding time and cannot achieve a good trade-off between the encoding complexity and RD performance. The reasons are: (1) with the increase of QTBT partitioning modes in FVC, it is difficult for conventional methods to achieve a good trade-off between computation complexity and RD performance; (2) the QTBT structure has not been fully optimized in the above fast intra coding algorithms; (3) the training parameters should be optimal in the learning-based fast QTBT partition algorithms. In this paper, a multiple classifier-based fast QTBT partition algorithm for intra coding in FVC is proposed. The main contributions of this paper are summarized as follows: (1) The recursive QTBT structure is replaced by three multiple classifier-based QTBT partitioning decision models, which are horizon binary-tree decision model (HBTDM), vertical binarytree decision model (VBTDM), quad-tree decision model (QTDM), respectively. (2) The descriptive features of intra QTBT partition decision are extracted to train corresponding classifiers. (3) An efficient optimal parameter selection method of training classifiers is introduced to balance the computational complexity and RD performance. The rest of this paper is organized as follows. Section 2 introduces the motivation and analysis of QTBT block partitioning structure. Section 3 describes the proposed multiple classifier-based fast QTBT partition for FVC intra coding. Section 4 gives the experimental results and discussions. Finally, we conclude this paper in Section 5. 2. Motivation and statistical analyses 2.1. QTBT block partitioning structure Different from the QT block partitioning structure in HEVC, FVC adopts the recursive QTBT structure to improve the coding performance. In the main profile of FVC, each video frame is divided into many 128 × 128 coding blocks, called Coding Tree Unit (CTU). Fig. 1 shows the QTBT block partitioning structure in FVC. A CTU is first partitioned into multiple square coding blocks according to QT partitioning rule. The QT nodes are ranging form 128 × 128 to 8 × 8, and the Quad-Tree Depth (QTD) is ranging from 0 to 4. In addition, the 172
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179 Table 1 PR of each partitioning mode [%]. QP
Seq1
Seq2
Seq3
Seq4
Seq5
Average
𝑆𝐻𝑜𝑟𝐵𝑇
22 27 32 37
31.17 29.32 25.50 20.26
35.1 26.1 19.2 14.2
13.2 12.0 10.5 7.52
35.7 28.0 21.8 16.2
19.9 13.5 10.3 7.09
27.0 21.8 17.4 13.0
20.37
𝑆𝑉 𝑒𝑟𝐵𝑇
22 27 32 37
43.06 37.75 32.55 25.78
37.1 27.2 19.5 14.3
30.9 24.2 18.8 14.0
25.6 22.6 18.8 14.3
20.3 14.5 10.7 7.38
31.4 25.2 20.1 15.1
23.63
𝑆𝑄𝑢𝑎𝑑𝑇
22 27 32 37
67.25 60.14 54.95 42.27
76.1 67.7 47.7 42.3
38.6 35.2 36.2 32.2
49.5 46.4 42.2 37.2
43.7 42.1 39.5 38.6
55.0 50.3 44.1 38.5
47.31
Note that Seq1, Seq2, Seq3, Seq4, and Seq5 are RaceHorses, BasketballDrill, Johnny, ParkScene and TrafficFlow, respectively.
partitioning mode, respectively. The non-partition rate (NPR) of s is calculated as NPR(s) = 1 − PR(s). We statistically analyze 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 for five video sequences with different contents and resolutions under four QPs (22, 27, 32, and 37) on JEM-7.0 test platform. All intra encoding configuration were used. Table 1 shows the distribution of each partitioning mode. We have three important observations.
Fig. 2. Flowchart of the optimal QTBT partitioning selection in FVC.
(1) The PR is related to QP. When QP is 22, PR of 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 are 27.04%, 31.41%, and 55.05%, respectively. While QP is 37, the PR of 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 are 13.07%, 15.19%, and 38.59% respectively. PR of each partitioning mode significantly decreases when the QP increases. (2) The content of test sequence is also an important factor that affects the partitioning mode decision. For low resolution sequences which contain many complex texture regions, such as ‘‘RaceHorses’’ and ‘‘ParkScene’’, the PR significantly larger. However, for the sequences, ‘‘Johnny’’ and ‘‘TrafficFlow’’, which have a lot of homogeneous regions, CUs tend to choose non-partition mode. (3) The PR of each partitioning mode is less than 50%, and the average PR of 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 are 20.37%, 23.63%, and 47.31% respectively. In other words, the theoretical complexity redundancy of 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 are 79.63%, 76.37% and 52.69%, respectively.
leaf nodes of the QT can be further partitioned by a binary tree (BT) structure. Therefore, the leaf node of the QT is the root node of the BT, and its Binary Tree Depth (BTD) is 0. Two types of block partitioning, symmetric horizontal partitioning and symmetric vertical partitioning, are used in a BT partitioning structure. In the FVC intra coding, the maximal size of root node in a BT is 32 × 32 (QTD = 2), and the range of BTD is from 0 to 3. The optimal partitioning and prediction mode for QTBT structure are determined by the RDO process. In intra coding of FVC, the RDO process can be described as {𝑠∗ , 𝑝∗ } = argmin(argmin 𝐽 (𝑠, 𝑝)) 𝑠∈𝐒
𝑝∈𝐏
𝐒 = {𝑠𝑁𝑜𝑛𝑆𝑝𝑙𝑖𝑡 , 𝑠𝐻𝑜𝑟𝐵𝑇 , 𝑠𝑉 𝑒𝑟𝐵𝑇 , 𝑠𝑄𝑢𝑎𝑑𝑇 } 𝐏 = {𝑝𝐼𝑛𝑡𝑟𝑎1 , 𝑝𝐼𝑛𝑡𝑟𝑎2 , … , 𝑝𝐼𝑛𝑡𝑟𝑎67 }
(1)
where {𝑠∗ , 𝑝∗ } is a combination of optimal partitioning and prediction mode, S indicates the set of four partitioning modes in FVC, P represents the set of 67 candidate intra prediction modes, J (s, p) is the RD cost function under the combination {s, p} [7]. Fig. 2 shows the process of the optimal QTBT partitioning selection in FVC. Actually, the main novelty of QTBT lies in the adoption of binary tree (BT) structure. Specifically, the BT structure in QTBT allows recursive binary partitioning, such that the CUs can be flexibly partitioned into diverse shapes to adapt to the video content with fine granularity. Compared to QT structure in HEVC, QTBT structure also need calculate additional RD costs of BT structure. The QTBT is a more efficient block partitioning structure than the QT structure, but the computational complexity significantly increases due to the introduction of BT structure.
If the non-partition CUs can be accurately predicted, the RDO process of these CUs could be early terminated, which can greatly speed up the process of QTBT partitioning without any RD performance degradation. 3. Multiple classifier-based fast QTBT partition for FVC intra coding 3.1. Framework of the proposed algorithm With the number increase of partitioning modes in QTBT structure, the correlation of partitioning mode between CUs becomes weaker. It is difficult for conventional methods to balance the RD performance and encoding complexity. Therefore, we proposed a multiple classifierbased fast QTBT partitioning algorithm for FVC intra coding, which includes three stages of classification. Three SVM-based block partitioning decision models, HBTDM, VBTDM and QTDM, are trained to replace the RDO processes of HBT partitioning mode, VBT partitioning mode, and QT partitioning mode, respectively. Fig. 3 shows the flowchart of the proposed algorithm, and the detailed descriptions are as follows: Step 1 Check all kinds of intra prediction modes without any partitioning, and then the RD cost of the optimal mode is stored as 𝐽𝑁𝑜𝑛𝑆𝑝𝑙𝑖𝑡 , then go to step (2).
2.2. Statistical analyses of QTBT structure In order to facilitate the analysis of the distribution of partitioning modes, we define the partitioning rate (PR) of each partitioning mode as follows: 𝑁𝑢𝑚𝑃 (𝑠) , 𝑠 ∈ {𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 , 𝑆𝑄𝑢𝑎𝑑𝑇 } (2) 𝑃 𝑅(𝑠) = 𝑁𝑢𝑚𝑂𝑃 (𝑠) where s is the partitioning mode, 𝑁𝑢𝑚𝑝 (s) is the number of CUs with optimal partitioning mode of s, 𝑁𝑢𝑚𝑜𝑝 (s) is the number of CUs which satisfy the condition of s partitioning mode, 𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 and 𝑆𝑄𝑢𝑎𝑑𝑇 are the HBT partitioning mode, VBT partitioning mode, and QT 173
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
pixels of the pixel at (i, j), respectively. The direction complexity is expressed as 𝐷𝐶 =
𝑊 ∑ 𝐻 ∑ 1 (|𝐺 (𝑖, 𝑗)|+|𝐺𝑉 𝑒𝑟 (𝑖, 𝑗)|+|𝐺45 (𝑖, 𝑗)|+|𝐺135 (𝑖, 𝑗)|) (4) 𝑊 × 𝐻 𝑖=1 𝑗=1 𝐻𝑜𝑟
where W and H are the width and height of current CU, 𝐺𝑛 (i, j) is the Sobel gradient, and 𝐺𝑛 (i, j) is calculated by 𝐺𝑛 (𝑖, 𝑗) = 𝐒𝐧 ∗ 𝐅, (𝑛 = 𝐻𝑜𝑟, 𝑉 𝑒𝑟, 45◦ , 135◦ )
(5)
where 𝐒𝐧 represent the four angular Sobel operators for the pixel at (i, j), and F is the 3 × 3 pixel matrix centered at (i, j). Besides the texture and direction complexity, three types of texture divergence are selected as features. They are horizontal texture divergence, vertical texture divergence and quad-tree texture divergence (denoted as 𝑥𝐻𝑜𝑟𝑖𝑧𝑜𝑛 , 𝑥𝑉 𝑒𝑟𝑡𝑖𝑐𝑎𝑙 and 𝑥𝑄𝑢𝑎𝑟𝑡𝑒𝑟 , respectively). The texture divergence is obtained by 𝑇 𝐷𝑠 =
Step 2: Collect the features for three CU classification decision models, then go to step (3). Step 3: If the output of the HBTDM is RDO, calculate the sum of RD cost of the two symmetric horizontal sub-CUs, which is stored as 𝐽𝐻𝑜𝑟𝐵𝑇 ; otherwise, go to the step (4). Step 4: If the VBTDM is the RDO prediction, the RD cost of the two symmetric vertical sub-CUs is calculated as 𝐽𝑉 𝑒𝑟𝐵𝑇 ; otherwise, go to the step (5). Step 5: If the output of the QTDM is RDO, the RD cost of the QT partitioning mode of the current CU is the sum of RD cost of the four sub-CUs, which is stored as 𝐽𝑄𝑢𝑎𝑑𝑇 . Otherwise, go to the step (6). Step 6: Compare 𝐽𝑁𝑜𝑛𝑆𝑝𝑙𝑖𝑡 , 𝐽𝐻𝑜𝑟𝐵𝑇 , 𝐽𝑉 𝑒𝑟𝐵𝑇 , and 𝐽𝑄𝑢𝑎𝑑𝑇 , and the partitioning mode with minimal RD cost is selected as the optimal QTBT partitioning mode.
3.3. Sample selection for classifier Actually, sample selection is also important for training the classifiers. Compared with online training, the off-line training model has the advantages of diversity training samples and no extra time consumption during the encoding process. Therefore, the off-line training model is widely employed in fast coding algorithm [21,28,29,34,35]. In this paper, the three stages of classification models are obtained by offline training. We select three sequences, Tango (4096 × 2160), Cactus (1920 × 1080) and BQMall (832 × 480), with different video contents and resolutions to extract training samples. They are encoded with the original JEM-7.0 under all intra configuration condition. Four QPs, 22, 27, 32 and 37, are tested. The extracted training samples contain the features and the labels (non-partitioning or partitioning). The computational complexity of the proposed SVM-based classification model highly depends on the number of supporting vectors, which increases remarkably with the number of training samples [22]. Thus, in sample selection stage, three frames of each test sequence are encoded with four QPs to achieve a tradeoff between the sample diversity and the computational complexity of classification models. In the process of FVC intra coding, the QTBT structure is restricted by CU size. The BT partitioning structure works when the CU satisfies the condition QTD ≥ 2, while the QT partitioning is active when BTD is 0. To facilitate the description, we define the combination of QTD and BTD as quad-tree plus binary tree depth (QBD). For example, QBD = 10 represents QTD = 1 and BTD = 0. For training of the BT partitioning decision models (HBTDM and VBTDM), we collected CUs with eight kinds of QBDs (20, 21, 22, 30, 31, 32, 40, 41) in the samples. While, there are three kinds of CUs with QBDs (10, 20, 30) in the sample for training of QTDM. We control the number of different QBDs samples, because the uneven distribution of CU QBDs in the training samples will affect the PA of classifiers. In addition, achieving a trade-off between the number of partitioning and non-partitioning samples is helpful to improve the PA of classifiers. Thus, we collect the same number of partitioning and non-partitioning samples.
3.2. Feature extraction for classifier The features play an important role in the prediction accuracy (PA) of classifiers, because they distinguish different categories. The computational complexity of features extraction and prediction have been taken into consideration when we select the features for different classifiers. Inspired by the analysis of QTBT in Section 2, the selected √ features are listed in Table 2 where the mark ‘‘ ’’ represents selected features. The first category of features is related to video content, e.g., texture complexity, direction complexity, and texture divergence. The second category of features is about the coding information of current CU. Texture complexity and direction complexity are widely used to measure the complexity of video contents [21,22]. Therefore, texture complexity (denoted as 𝑥𝑇 𝐶 ) and direction complexity (denoted as 𝑥𝐷𝐶 ) are selected as features for training classifiers in this paper. The texture complexity is defined as 𝑊 ∑ 𝐻 ∑ 1 |𝑝(𝑖, 𝑗) − 𝑝(𝑖, 𝑗)| 𝑊 × 𝐻 𝑖=1 𝑗=1
(6)
where 𝑇 𝐷𝑠 is the texture divergence of sub-CUs with respected to the partitioning mode of s, N is the number of sub-CUs, 𝑇 𝐶𝑠𝑛 is the texture complexity of 𝑛th sub-CU with respected to the partitioning mode of s, and 𝑇 𝐶 𝑠 is the mean texture complexity of sub-CUs with respected to the partitioning mode of s. Moreover, coding information of the current CU depth level is directly related to partitioning mode decision. Thus, the RD cost and bit consumption are selected as features, and denoted as 𝑥𝑅𝐷𝐶𝑜𝑠𝑡 , 𝑥𝐵𝑖𝑡𝑠 , respectively. In addition, QP is also selected since it can directly affect the QTBT block partitioning structure.
Fig. 3. Flowchart of the proposed algorithm.
𝑇 𝐶(𝛺) =
𝑁 1 ∑ (𝑇 𝐶𝑠𝑛 − 𝑇 𝐶 𝑠 )2 , 𝑠 ∈ {𝑆𝐻𝑜𝑟𝐵𝑇 , 𝑆𝑉 𝑒𝑟𝐵𝑇 , 𝑆𝑄𝑢𝑎𝑑𝑇 } 𝑁 𝑛=0
(3)
where 𝛺 is the pixels block of current CU, W and H are the width and height of current CU, p(i, j) is the luminance component of the pixel at (i, j) and 𝑝(𝑖, 𝑗) is the mean luminance component of the eight adjacent 174
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
Table 2 Selected features for classifiers. ID
Feature
Feature description
1 2 3 4 5 6 7 8
𝑥𝑇 𝐶 𝑥𝐷𝐶 𝑥𝐻𝑜𝑟𝑖𝑧𝑜𝑛 𝑥𝑉 𝑒𝑟𝑡𝑖𝑐𝑎𝑙 𝑥𝑄𝑢𝑎𝑟𝑡𝑒𝑟 𝑥𝑄𝑃 𝑥𝐵𝑖𝑡𝑠 𝑥𝑅𝐷𝐶𝑜𝑠𝑡
Texture complexity of current CU Direction complexity of current CU Texture divergence between horizontal partition sub-CUs Texture divergence between vertical partition sub-CUs Texture divergence between quad-tree partition sub-CUs Quantization Parameter Coding bits of current CU depth level Optimal RD cost of current CU depth level
HBTDM √ √ √
VBTDM √ √
QTDM √ √
√ √ √ √
√ √ √
√ √ √ √
3.4. SVM-based partitioning mode decision model SVM [34] is widely applied in fast coding algorithms [20–23] for HEVC because of its excellent performance in classification problem. Therefore, we build three decision models (HBTDM, VBTDM and QTDM, respectively) of block partitioning based on SVM. Then, the models are used to classify the CUs into two types, partitioning and non-partitioning, respectively denoted by Class 0 and Class 1. The sample labels of Class 0 and Class 1 are −1 and 1, respectively. The goal of SVM is to derive a hyperplane which maximizes the margin between these two classes for separable classification problem [23]. { } Giving a training sample data set, 𝐓𝐒 = (𝐱1 , 𝑦1 ), (𝐱2 , 𝑦2 ), … , (𝐱𝐾 , 𝑦𝐾 ) , { } where 𝐱𝑖 , 𝑦𝑖 is the 𝑖th sample, 𝐱𝑖 is the feature vector and 𝑦i ∈ {−1, 1} is the label, the hyperplane can be obtained by 𝑓 (𝐱) = (𝐰)𝑇 𝜙(𝐱) + 𝑏
Fig. 4. Classification example for (a) 𝑊0 < 𝑊1 and (b) 𝑊0 > 𝑊1 .
as Class 1, and (2) CUs labeled with Class 1 are wrongly classified as Class 0. For the first case of misclassification, the RDO process of CUs should be early terminated in our algorithm. However, the process of RDO is not terminated due to the misclassification. Consequently, the time saving is reduced. Specially, the first case of misclassification does not deteriorate the RD performance since the RDO process is the same as the original JEM. For the second case of misclassification, the RDO process of these misclassified CUs will be early terminated by the ET strategy. This misclassification will degrade the RD performance. Based on the above analyses, the correct classification of Class 0 can improve the speedup performance, while the correct classification of Class 1 can maintain the RD performance. In order to train the decision models of block partitioning mode, we introduce the prediction accuracies of Class 0 and Class 1 (denoted as 𝑃 𝐴0 and 𝑃 𝐴1 , respectively)
(7)
where 𝜙(𝑔) is a nonlinear mapping function, 𝐰 is a trained model, and b is an offset. The optimal hyperplane can be obtained by 1 1 (𝐰)𝑇 𝐰 = ‖𝐰‖2 𝑠.𝑡.(𝐰)𝑇 𝜙(𝐱𝑖 ) + 𝑏 ≥ 1 (8) 2 2 where the constraint condition indicates that all training samples are correctly classified. However, some samples with different labels 𝑦𝑖 have similar features vector 𝐱𝑖 . In order to reduce the prediction error, the weight SVM is used in this paper. Then the hyperplane can be represented by 𝐽 (𝐰) =
𝐽 (𝐰, 𝑏, 𝝃) =
𝐾0 𝐾0 +𝐾1 ∑ ∑ 1 𝜉𝑖 + 𝑊1 𝜉𝑖 ) ‖𝐰‖2 + 𝐶(𝑊0 2 𝑖=1 𝑖=𝐾 +1
(9)
0
𝑠.𝑡.
(𝐰)𝑇 𝜙(𝐱𝑖 ) + b
≥ 1 − 𝜉𝑖 , 𝜉𝑖 ≥ 0, 𝑖 = 1, … , 𝐾
𝑃 𝐴0 =
where C is a parameter that balances the importance of the two terms, and 𝜉𝑖 is the relax factor of the 𝑖th sample. 𝑊0 and 𝑊1 are the weights of Class 0 and Class 1 respectively, which are used to regulate the hyperplane and PA. 𝐾0 and 𝐾1 are the number of samples labeled with −1 and 1, respectively, and 𝐾 = 𝐾 0 + 𝐾1 . Because of the excellent performance of Radial Basis Function (RBF) in dealing with nonlinear classification problem, we adopt the RBF as kernel function. In the proposed algorithm, ET strategy is used in the CUs classified as Class 0, while original RDO process is used in the CUs classified as Class 1. Therefore, the output of the CU classification partitioning decision model is represented by { 𝐸𝑇 𝑂𝑖 (𝑞) = −1 𝐷𝑖 (𝑞) = 𝑖 ∈ {𝐻𝐵𝑇 𝐷𝑀, 𝑉 𝐵𝑇 𝐷𝑀, 𝑄𝑇 𝐷𝑀} (10) 𝑅𝐷𝑂 𝑂𝑖 (𝑞) = 1
𝐾00 , 𝐾01 + 𝐾00
𝑃 𝐴1 =
𝐾11 𝐾11 + 𝐾10
(11)
where 𝐾00 is the number of CUs correctly classified as Class 0, 𝐾01 is the number of the first misclassified CUs, 𝐾10 is the number of the second misclassified CUs, and 𝐾11 is the number of CUs correctly classified as Class 1. They satisfy 𝐾0 = 𝐾01 + 𝐾00 , and 𝐾1 = 𝐾11 + 𝐾10 . In order to achieve a better trade-off between RD performance and coding complexity, we regulate hyperplane to control the prediction accuracies of Class 0 and Class 1. For the weight SVM, parameters 𝑊0 and 𝑊1 can control the migration of trained hyperplane. Thus, the trade-off between RD performance and complexity is converted into the optimization problem of parameters 𝑊0 and 𝑊1 . Fig. 4 shows a classification example. The black circle and red triangle respectively represent the samples labeled with Class 0 and Class 1. The solid line is the hyperplane when 𝑊0 = 𝑊1 , and the dotted line is the migrated hyperplane. In Fig. 4(a), many CUs are the second case of misclassification which cause RD degeneration. When 𝑊0 ∕𝑊1 is increased, the hyperplane will shift to Class 0. Then, more CUs are classified as Class 1, and 𝑃 𝐴1 increases. In Fig. 4(b), many CUs are the first case of misclassification which will reduce the time saving of the proposed algorithm. Thus, it is necessary to increase 𝑊1 ∕𝑊0 to shift the hyperplane to Class 1. More CUs are classified as Class 0, and 𝑃 𝐴0 increases. To optimize the parameters of the three models in this paper, we quantify the influence of the parameters (𝑊0 and 𝑊1 ) on the proposed method at each block partitioning mode. Two validation video
where 𝐷𝑖 (q) and 𝑂𝑖 (q) are the output of the decision model and SVM classifier for the 𝑖th partitioning mode, respectively. q is the QBD of CU. ET represents the RDO process will be early terminated, and RDO represents the RDO process will not be changed. 3.5. Optimal parameter determination Since Class 0 and Class 1 usually overlap each other in hyperspace, it is difficult to find a perfect hyperplane to distinguish these two categories, and avoid misclassification. There are two kinds of misclassification: (1) the CUs labeled with Class 0 are wrongly classified 175
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
Fig. 5. The relationship between training parameters and prediction accuracy of classifiers. (a) The prediction accuracy of HBTDM for Class 0. (b) Prediction accuracy of VBTDM for Class 0. (c) Prediction accuracy of QTDM for Class 0. (d) Prediction accuracy of HBTDM for class 1. (e) Prediction accuracy of VBTDM for Class 1. (f) Prediction accuracy of QTDM for Class1. Table 3 Prediction accuracy and optimal parameters.
sequences with different resolutions and contents are selected, which are CatRobot (3840 × 2160) and BasketballDrive (1920 × 1080). In the process of model training, the range of 𝑊1 ∕𝑊0 is from 1:5 to 5:1, and the step is 1, i.e., {1:5, 1:4, 1:3, 1:2, 1:1, 2:1, 3:1, 4:1, 5:1}. The relationship between training parameters and prediction accuracies of models at different QBDs are shown in Fig. 5. As 𝑊1 ∕𝑊0 increases, 𝑃 𝐴0 of each model decreases, and 𝑃 𝐴1 increases. It is consistent with the aforementioned analysis. Usually, when ratio of CUs with the second classification error exceeds 10%, it will cause significant RD loss. Moreover, the higher 𝑃 𝐴0 can save more coding time in the proposed algorithm. To achieve significant coding complexity reduction and maintain the RD performance, we guarantee that PA1 is over 90%. Therefore, the rule of parameters can be defined as arg max 𝑃 𝐴0 , s.t. 𝑃 𝐴1 ≥ 𝑇 .
QBD
HBTDM
VBTDM
QTDM
𝑊1 /𝑊0
PA
𝑊1 /𝑊0
PA
𝑊1 /𝑊0
PA
10 20 21 22 30 31 32 40 41
– 1:1 1:1 1:1 1:1 2:1 5:1 2:1 3:1
– 90.5% 90.1% 90.2% 90.0% 90.4% 90.1% 92.2% 91.0%
– 1:1 3:1 4:1 1:1 3:1 3:1 5:1 2:1
– 90.5% 90.7% 90.2% 92.4% 90.9% 90.2% 91.1% 90.6%
1:1 1:2 – – 2:1 – – – –
94.1% 90.5% – – 90.4% – – – –
Average
–
90.5%
–
90.8%
–
91.7%
(12)
(𝑊0 ,𝑊1 )
(Lin [32], Huang [33], Jin [34], and Wang [35]) on JEM-3.1 to further
The PA and optimized parameters of each model are listed in Table 3. The prediction accuracies of HBTDM for CUs with different QBDs are ranging from 90.0% to 92.2%, with an average PA as 90.5%. The PA range of VBTDM is from 90.2% to 92.4%, and the average PA is 90.8%. The PA range of QTDM is from 90.4% to 94.1% for different QBDs, with an average PA as 91.7%.
demonstrate the advantages of the proposed algorithm. Bjøntegaard Delta Bit Rate (BDBR) [37] was used to evaluate the RD performance of different algorithms. Moreover, the encoding time saving was employed to evaluate the complexity reduction of each algorithm, which is defined as 𝑄𝑃
4. Experimental results and analyses 𝛥𝐸𝑇 𝑖𝑚𝑒𝑗 = In this section, we conducted extensive experiments to verify the effectiveness of the proposed algorithm. All experiments were conducted under the common test conditions (CTC) of JVET [36]. In the experiments, all-intra main configuration was used, and the QPs were 22, 27, 32, and 37. Firstly, the RD performance and complexity reduction of HBTDM, VBTDM, QTDM and overall algorithm were analyzed on JEM-7.0. Then, we compared the proposed multiple classifier-based fast QTBT partitioning algorithm with four state-of-the-art algorithms
𝑄𝑃𝑖
𝑖 4 1 ∑ 𝐸𝑇 𝑖𝑚𝑒𝐽 𝐸𝑀 − 𝐸𝑇 𝑖𝑚𝑒𝑗 𝑄𝑃 4 𝑖=1 𝐸𝑇 𝑖𝑚𝑒 𝑖
𝐽 𝐸𝑀
𝑄𝑃𝑖 ∈ {22, 27, 32, 37} 𝑗 ∈ {𝐿𝑖𝑛, 𝐻𝑢𝑎𝑛𝑔, 𝐽 𝑖𝑛, 𝑊 𝑎𝑛𝑔, 𝐻𝐵𝑇 𝐷𝑀, 𝑉 𝐵𝑇 𝐷𝑀, 𝑄𝑇 𝐷𝑀, 𝑂𝑣𝑒𝑟𝑎𝑙𝑙} 𝑄𝑃
× 100% (13)
𝑖 where 𝐸𝑇 𝑖𝑚𝑒𝐽 𝐸𝑀 is the encoding time of original JEM encoded with 𝑄𝑃 the 𝑄𝑃 𝑖 , and 𝐸𝑇 𝑖𝑚𝑒𝑗 𝑖 is the encoding time of the 𝑗th algorithm
encoded with the 𝑄𝑃 𝑖 . 176
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
Table 4 Performance of proposed individual algorithms compared with JEM [%]. Class
Sequence
HBTDM
VBTDM
QTDM
Overall
BDBR
𝛥ETime
BDBR
𝛥ETime
BDBR
𝛥ETime
BDBR
𝛥ETime
A1
Drums100 Campfire ToddlerFountain
0.50 0.74 0.69
32.02 33.97 36.06
0.42 0.90 0.68
32.09 34.22 35.84
0.00 0.20 0.13
3.11 8.59 8.77
1.75 2.91 2.98
61.83 59.74 66.96
A2
CatRobot DaylightRoad Rollercoaster
1.18 1.65 0.57
33.66 36.01 27.75
1.10 1.50 0.45
33.16 35.67 28.46
0.11 0.04 0.02
9.37 11.76 2.82
4.07 5.15 1.90
60.75 65.54 56.76
B
Kimono ParkScene BasketballDrive BQTerrace
0.24 0.86 2.39 1.14
29.82 36.58 33.57 40.45
0.17 0.75 0.90 1.01
30.77 35.65 31.99 38.86
−0.03 0.14 0.11 0.54
2.02 8.98 8.36 7.46
0.82 3.54 4.88 4.08
59.90 63.98 61.79 65.10
C
PartyScene RaceHorses
1.01 1.07
42.07 35.56
1.03 0.80
41.72 34.64
1.29 0.28
4.33 11.37
4.27 3.18
63.33 62.49
D
BasketballPass BlowingBubbles RaceHorses
2.10 1.19 0.96
31.61 36.80 35.27
1.40 1.19 0.94
32.60 38.22 36.07
0.95 0.68 0.71
9.56 9.66 8.40
4.66 3.57 3.77
57.98 61.74 61.02
E
FourPeople Johnny KristenAndSara
1.85 1.13 1.39
31.89 31.15 31.01
2.14 2.32 2.14
31.94 29.64 31.18
0.17 0.03 0.58
10.05 7.37 7.68
4.70 4.21 4.28
59.93 58.39 58.02
1.15
34.18
1.10
34.04
0.33
7.76
3.60
61.40
Average
4.1. Performance of the proposed individual algorithm
4.2. Comparison with state-of-the-art algorithms
The RD performance and encoding time saving of HBTDM, VBTDM, QTDM and overall algorithm are listed in Table 4. For six classes of test sequences, the 𝛥𝐸𝑇 𝑖𝑚𝑒𝐻𝐵𝑇 𝐷𝑀 are 34.02%, 32.47%, 35.11%, 38.82%, 34.56% and 31.35% respectively, and the average 𝛥𝐸𝑇 𝑖𝑚𝑒𝐻𝐵𝑇 𝐷𝑀 is 34.18%. The 𝛥𝐸𝑇 𝑖𝑚𝑒𝑉 𝐵𝑇 𝐷𝑀 for six classes of test sequences are 34.05%, 32.43%, 34.32%, 38.18%, 35.63% and 30.92%, respectively, and 34.04% on average. For six classes of test sequences, the 𝛥𝐸 𝑇 𝑖𝑚𝑒𝑄𝑇 𝐷𝑀 are 6.82%, 7.98%, 6.71%, 7.85%, 9.21% and 8.37%, respectively, and the average 𝛥𝐸𝑇 𝑖𝑚𝑒𝑄𝑇 𝐷𝑀 is 7.76%. The proposed algorithm is the combination of HBTDM, VBTDM, and QTDM. Overall algorithm outperform the three models, HBTDM, VBTDM, and QTDM, in terms of encoding time saving. For different classes test sequences, the 𝛥𝐸𝑇 𝑖𝑚𝑒𝑂𝑣𝑒𝑟𝑎𝑙𝑙 are 62.84%, 61.02%, 62.69%, 62.91%, 60.25% and 58.78%, respectively, and 61.40% on average. The statistical data shows that the 𝛥𝐸𝑇 𝑖𝑚𝑒𝐻𝐵𝑇 𝐷𝑀 and 𝛥𝐸𝑇 𝑖𝑚𝑒𝑉 𝐵𝑇 𝐷𝑀 are obviously more than 𝛥𝐸𝑇 𝑄𝑇 𝐷𝑀 . The underlying reason is the number of CUs with QT as the best partition mode is small. In other words, the number of CUs that can be terminated by HBTDM and VBTDM are more than QTDM. Through the analysis of each model, the encoding time saving of the proposed algorithm is mainly derived from the optimization of BT structure in QTBT. It reflects the limitation of the QT optimization in HEVC to FVC, which further explains the significance of this work. In terms of RD performance, if only HBTDM is used, the average BDBR is 1.15%. When only VBTDM is activated, the BDBR is 1.10%. If only QTDM is utilized, the average BDBR is 0.33%. Moreover, when all the techniques in this paper are used, the average BDBR is 3.60%. It can be observed from Table 4 that our algorithm has better RD performance for high resolution video sequences (Class A1). Actually, the high resolution video sequences contain a large number of homogeneous regions, while the low resolution video sequences contain many complex texture regions. Consequently, the QTBT structure of high resolution video sequences is simpler than that of the low resolution video sequences. The probability of misclassification for complex QTBT structure is higher. Hence, the misclassification often happens in complex texture regions. In addition, the RD performance degradation caused by misclassification in homogeneous regions is weaker than that of the complex regions. Fig. 6 shows the QTBT structure comparison between the proposed algorithm and FVC platform JEM. It is clear that block partitioning results of the proposed algorithm are nearly consistent with the JEM encoder.
To further verify the advantages of the proposed algorithm, we compare our algorithm with four state-of-the-art algorithms. Table 5 lists the BDBR and 𝛥ETime for different algorithms. Lin [32] can achieve 13.08% encoding time saving with BDBR as 0.07%, Huang [33] only reduces 11.62% encoding time on average and BDBR is 0.04%, the average encoding time saving is 42.89% and the BDBR is 0.71% for Jin [34], and Wang [35] reduces 40.37% encoding time with BDBR as 2.35%. While, the proposed algorithm can reach 64.54% encoding time saving on average and the BDBR is 2.99%. Compared with four state-of-the-art algorithms, the proposed algorithm significantly improve the speedup performance. With the flexibility increase of partitioning structure in JEM, the correlation of partitioning mode between different CUs becomes weak. This leads to the fact that traditional statistical analysis based algorithms are difficult to achieve good result. Lin [32] and Huang [33] are based on the statistical analysis of RD cost, and the setting conditions are strict, so the encoding time saving of them are extremely limited. Jin [34] utilizes CNN to classify CUs, and early skips/terminates the process of QTBT partitioning to achieve a trade-off between computation complexity and RD performance. However, the QTBT has not been fully optimized from the perspective of complexity reduction in Jin [34]. Wang [35] is based on the temporal and spatial correlation between CUs. Because the correlation varies with the feature of sequence, the encoding time saving of Wang [35] is unstable for different test sequences. By comparison, the proposed algorithm utilizes machine learning tool to fuse multiple features which are highly correlated to QTBT structure, then early terminates the process of QTBT partitioning. Compared with the related works, the number of CUs that meet the ET condition in the proposed algorithm is larger, and it can save more encoding time. In terms of RD performance, since the low resolution sequences with complex texture are sensitive to the misclassification, our algorithm achieves significant encoding time reduction, but some RD loss is inevitable. In terms of ratio between time saving and BDBR, the proposed method (21.59) is superior to Wang [35] (17.18), but worse than Jin [34] (60.41). Actually, Jin [34] utilizes CNN to improve the prediction accuracy for obtaining better RD performance, while the proposed method aims to improve the time saving. In addition, it is clear that our algorithm maintains high consistency in encoding time saving for different video sequences. All three QTBT partitioning modes of CU with different QBDs are optimized in the proposed algorithm, which eliminates the fluctuation of the encoding time saving due to the uneven distribution of QBD and the QTBT partitioning mode. 177
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179
Fig. 6. QTBT structure comparison between JEM-7.0 reference software and the proposed algorithm. (a) BasketballPass (JEM). (b) BasketballPass (proposed). (c) RaceHorses (JEM). (d) RaceHorses (proposed). (e) BlowingBubbles (JEM). (f) BlowingBubbles (proposed). (g) PartyScene (JEM). (h) PartyScene (proposed).
Table 5 Overall performance comparison of the proposed algorithm with four state-of-the-art algorithms [%]. Sequence
Lin [32]
Huang [33]
Jin [34]
Wang [35]
Proposed
BDBR
𝛥ETime
BDBR
𝛥ETime
BDBR
𝛥ETime
BDBR
𝛥ETime
BDBR
𝛥ETime
Drums100 Campfire ToddlerFountain Kimono ParkScene RaceHorses BlowingBubbles Vidyo1 KristenAndSara
0.04 0.06 0.08 0.02 0.07 0.04 0.08 0.13 0.12
16.45 11.80 11.76 17.93 14.12 10.44 8.10 14.03 13.12
0.01 0.07 0.01 0.04 0.03 0.02 0.04 0.06 0.07
12.26 10.56 9.46 12.30 12.76 12.04 11.17 10.92 13.14
0.44 1.26 0.64 0.50 0.74 0.41 0.65 0.81 0.97
40.39 42.67 40.10 51.38 40.82 42.26 44.31 40.56 43.55
0.49 3.46 1.30 0.49 1.60 3.01 1.20 4.83 4.78
34.96 37.16 32.18 55.43 30.76 32.16 31.87 55.74 53.07
1.76 2.92 2.99 0.88 3.55 3.21 3.62 4.12 3.88
63.98 61.96 69.50 63.19 68.20 65.88 65.13 61.70 61.31
Average
0.07
13.08
0.04
11.62
0.71
42.89
2.35
40.37
2.99
64.54
5. Conclusion
Acknowledgments This work was supported by the National Natural Science Foundation of China under Grant Nos. 61771269, 61620106012, and 61671258, the Natural Science Foundation of Zhejiang Province, China under Nos. LY16F010002, LY17F010005, and the K. C. Wong Magna Fund in Ningbo University.
With the increase of QTBT partitioning modes in FVC, it is difficult for conventional methods to achieve a good trade-off between computation complexity and RD performance. In this paper, a multiple classifier-based fast QTBT partitioning algorithm is proposed to reduce the computation complexity of FVC intra coding. Firstly, three multiple classifier-based QTBT partitioning mode decision models (HBTDM, VBTDM and QTDM) are designed for three different partitioning modes of QTBT structure to speed up the process of QTBT partitioning. Then, in order to balance the RD performance and computational complexity, extensive experiments are conducted to select the optimal training parameter. Finally, the experimental results show that in comparison with the original FVC reference software, the proposed overall algorithm can reduce 64.54% intra coding time with negligible degradation of RD performance. Meanwhile, the proposed algorithm outperforms four state-of-the-art algorithms in terms of computational complexity reduction. In addition, the proposed algorithm maintains high consistency in encoding time saving for different class test sequences. Last, the experimental results further indicate that the encoding time saving of the proposed algorithm is mainly contributed by the optimization of BT in QTBT. In the future work, the multiple classifier-based CU partition model will be further applied to develop the fast encoding method for FVC inter coding.
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] G. Sullivan, J. Ohm, W. Han, T. Wiegand, Overview of the high efficiency video coding (HEVC) standard, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (2012) 1649–1668. [2] Z. Chen, Y. Li, Y. Zhang, Recent advances in omnidirectional video coding for virtual reality: Projection and evaluation, Signal Process. 146 (2018) 66–785. [3] Z. He, L. Yu, Possibility distribution based lossless coding and its optimization, Signal Process. 150 (2018) 122–1345. [4] X. Wang, S. Kwong, H. Yuan, Y. Zhang, Z. Pan, Possibility distribution based lossless coding and its optimization, Signal Process. 112 (2015) 189–1985. [5] J. Chen, X. Li, F. Zou, M. Karczewicz, W. Chien, T. Hsieh, Performance evaluation of JEM 1 tools by Qualcomm, Joint Video Exploration Team (JVET), doc. JVET-B0045, Feb. 2016. 178
Z. Peng, C. Huang, F. Chen et al.
Signal Processing: Image Communication 78 (2019) 171–179 [22] Y. Zhang, Z. Pan, N. Li, X. Wang, G. Jiang, S. Kwong, Effective data driven coding unit size decision approaches for HEVC INTRA coding, IEEE Trans. Circuits Syst. Video Technol. 28 (11) (2019) 3208–3222. [23] T. Zhang, M. Sun, D. Zhao, W. Gao, Fast intra-mode and CU size decision for HEVC, IEEE Trans. Circuits Syst. Video Technol. 27 (8) (2017) 1714–1726. [24] H. Kim, R. Park, Fast CU partitioning algorithm for HEVC using an onlinelearning-based Bayesian decision rule, IEEE Trans. Circuits Syst. Video Technol. 26 (1) (2016) 130–138. [25] J. Zhang, S. Kwong, X. Wang, Two-stage fast inter CU decision for HEVC based on Bayesian method and conditional random fields, IEEE Trans. Circuits Syst. Video Technol. 28 (11) (2019) 3223–3235. [26] K. Goswami, B. Kim, A design of fast high efficiency video coding (HEVC) scheme based on Markov Chain Monte Carlo Model and Bayesian classifier, IEEE Trans. Ind. Electron. 65 (11) (2018) 8861–8871. [27] G. Correa, P. Assuncao, L. Agostini, L. da Silva Cruz, Fast HEVC encoding decisions using data mining, IEEE Trans. Circuits Syst. Video Technol. 25 (4) (2015) 660–673. [28] T. Li, M. Xu, X. Deng, A deep convolutional neural network approach for complexity reduction on intra-mode HEVC, in: 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 1255–1260. [29] Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, D. Wang, CU Partition mode decision for HEVC hardwired intra encoder using convolution neural network, IEEE Trans. Image Process. 25 (11) (2016) 5088–5103. [30] C. Huang, Z. Peng, F. Chen, Q. Jiang, G. Jiang, Q. Hu, Efficient CU and PU decision based on neural network and gray level co-occurrence matrix for intra prediction of screen content coding, IEEE Access 6 (2018) 46643–46655. [31] Z. Wang, S. Wang, J. Zhang, S. Wang, S. Ma, Probabilistic decision based block partitioning for future video coding, IEEE Trans. Image Process. 27 (3) (2018) 1475–1486. [32] P. Lin, C. Lin, Y. Jen, AHG5: Enhanced fast algorithm of JVETE0078, Joint Video Exploration Team (JVET), JVET-F0063, Mar. 2017. [33] H. Huang, S. Liu, Y. Huang, AHG5: Speed-up for JEM-3.1, Joint Video Exploration Team (JVET), JVET-D0077, Oct. 2016. [34] Z. Jin, P. An, L. Shen, C. Yang, CNN Oriented fast QTBT partition algorithm for JVET intra coding, in: 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, 2017, pp. 1–4. [35] Z. Wang, S. Wang, J. Zhang, S. Wang, S. Ma, Effective quadtree plus binary tree block partition decision for future video coding, in: 2017 Data Compression Conference (DCC), Snowbird, UT, 2017, pp. 23–32. [36] C. Chang, C. Lin, LIBSVM: A library for support vector machine, ACM Trans. Intell. Syst. Technol. (TIST) 2 (3) (2011) 1–27. [37] G. Bjøntegaard, Calculation of average PSNR differences between RD-curves, document M33, ITU-T Video Coding Experts Group (VCEG), Jul. 2001.
[6] X. Li, Report of AHG3 on JEM software development, Joint Video Exploration Team (JVET), JVET-H0003, Oct. 2017. [7] Q. Jiang, F. Shao, W. Lin, K. Gu, G. Jiang, H. Sun, Optimizing multi-stage discriminative dictionaries for blind image quality assessment, IEEE Trans. Multimed. 20 (8) (2018) 2035–2048. [8] Q. Jiang, F. Shao, W. Lin, G. Jiang, Learning a referenceless stereopair quality engine with deep nonnegativity constrained sparse autoencoder, Pattern Recognit. 76 (2018) 242–255. [9] J. An, H. Huang, K. Zhang, Y. Huang, S. Lei, Quadtree plus binary tree structure integration with JEM tools, Joint Video Exploration Team (JVET), doc. JVET-B0023, Feb. 2016. [10] E. Alshina, A. Alshin, K. Choi, M. Park, Performance of JEM 1 tools analysis, Joint Video Exploration Team (JVET), doc. JVET-B0022, Feb. 2016. [11] F. Chen, P. Li, Z. Peng, G. Jiang, M. Yu, F. Shao, A fast inter coding algorithm for HEVC based on texture and motion quad-tree models, Signal Process. Image Commun. 47 (C) (2016) 271–279. [12] Y. Li, G. Yang, Y. Zhu, X. Ding, X. Sun, Unimodal stopping model-based early SKIP mode decision for high-efficiency video coding, IEEE Trans. Multimed. 19 (7) (2017) 1431–1441. [13] Y. Li, G. Yang, Y. Zhu, X. Ding, X. Sun, Adaptive inter CU depth decision for HEVC using optimal selection model and encoding parameters, IEEE Trans. Broadcast. 63 (3) (2017) 535–546. [14] K. Tai, M. Hsieh, M. Chen, C. Chen, C. Yeh, A fast HEVC encoding method using depth information of collocated CUs and RD cost characteristics of PU modes, IEEE Trans. Broadcast. 63 (4) (2017) 680–692. [15] H. Tan, C. Ko, S. Rahardja, Fast coding quad-tree decisions using prediction residuals statistics for high efficiency video coding (HEVC), IEEE Trans. Broadcast. 62 (1) (2016) 128–133. [16] Y. Gao, P. Liu, Y. Wu, K. Jia, Quadtree degeneration for HEVC, IEEE Trans. Multimed. 18 (12) (2016) 2321–2330. [17] I. Zupancic, S. Blasi, E. Peixoto, E. Izquierdo, Inter-prediction optimizations for video coding using adaptive coding unit visiting order, IEEE Trans. Multimed. 18 (9) (2016) 1677–1690. [18] T. Mallikarachchi, D. Talagala, H. Arachchi, A. Fernando, Content-adaptive feature-based CU size prediction for fast low-delay video encoding in HEVC, IEEE Trans. Circuits Syst. Video Technol. 28 (3) (2018) 693–705. [19] Y. Zhang, S. Kwong, G. Zhang, Z. Pan, H. Yuan, G. Jiang, Low complexity HEVC INTRA coding for high-quality mobile video communication, IEEE Trans. Ind. Inf. 11 (6) (2015) 1492–1504. [20] L. Zhu, Y. Zhang, Z. Pan, R. Wang, S. Kwong, Z. Peng, Binary and multi-class learning based low complexity optimization for HEVC encoding, IEEE Trans. Broadcast. 63 (3) (2017) 547–561. [21] X. Liu, Y. Li, D. Liu, P. Wang, L. Yang, An adaptive CU size decision algorithm for HEVC intra prediction based on complexity classification using machine learning, IEEE Trans. Circuits Syst. Video Technol. 29 (1) (2019) 144–155.
179