Journal of Visual Communication and Image Representation 50 (2018) 83–92
Contents lists available at ScienceDirect
Journal of Visual Communication and Image Representation journal homepage: www.elsevier.com/locate/jvci
Advanced texture and depth coding in 3D-HEVC☆ ⁎
T
Jian-Liang Lin, Yi-Wen Chen , Yu-Lin Chang, Jicheng An, Kai Zhang, Yu-Wen Huang, Shawmin Lei MediaTek, Taiwan
A R T I C L E I N F O
A B S T R A C T
Keywords: 3D-HEVC Inter-view motion prediction Disparity derivation Depth coding
The 3D extension of High Efficiency Video Coding (3D-HEVC) is a new international video coding standard developed by the Joint Collaborative Team on 3D Video Coding Extensions (JCT-3V) in order to support coding of multiple views and its associated depth data. 3D-HEVC aims at improving the coding efficiency of 3D and multi-view videos by introducing new coding tools to utilize the correlations between views and between texture and depth components. In this paper, an inter-view motion prediction (inter-view merge candidate) and an intercomponent motion prediction (texture merge candidate) are proposed to explore the inter-view and the intercomponent redundancies for texture and depth components, respectively. Moreover, a new coding mode termed as single depth mode which simply reconstructs a coding block with a single depth value based on block merging scheme under the HEVC quad-tree based block partitioning is also introduced. All the proposed schemes are adopted in 3D-HEVC. The experimental results evaluated under the common test conditions (CTC) for developing 3D-HEVC show that the proposed inter-view merge candidate, texture merge candidate, and single depth mode achieve significant BD-rate reductions of 19.5% for dependent texture views and 8.3% for the synthesized texture views.
1. Introduction Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing and the multi-view video is a key technology for 3D TV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. Due to the strong demand of improving the coding efficiency of 3D and multi-view videos caused by the requirements of coding multiple view data, larger picture resolution and better quality, various techniques have been proposed [1,2]. As an extension of HEVC and a next generation 3D video coding standard, the standardization of 3D-HEVC video coding standard was formally launched by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) in July 2012 and was finalized after the 11th JCT-3V meeting held in February 2015. In order to support the auto-stereoscopic multi-view display more practically, multi-view video plus depth (MVD) format was introduced as a new 3D video format for 3D-HEVC [3]. The MVD format consists of a texture picture and its
associated depth map. Unlike a texture picture representing the luminance and chrominance information of an object, a depth map is an image containing information relating to the distance of the objects from the camera-captured plane, and it is generally employed for virtual view rendering as non-visual information. Since all cameras capture the same scene from different viewpoints, a multi-view video contains a large amount of inter-view redundancy. In 3D-HEVC, to share the previously encoded motion information of adjacent views, the motion information for a current block can be predicted by the motion information of one or more corresponding blocks, which are located by a disparity vector (DV), in the inter-view pictures [4,5]. The disparity vector for locating inter-view corresponding blocks could be derived either by the coded disparity motion vector or the depth information from a corresponding block [6]. To fully utilize the motion information of the inter-view pictures, a sub-PU inter-view motion prediction (SPIVMP) method is applied to obtain the motion predictor at fine granularity [7]. Since the texture picture and its associated depth map correspond to projections of the same scenery from the same viewpoint at the same time instant, the motion characteristics of the texture picture and its associated depth map should be similar. In order to enable efficient encoding of the depth map data, the motion information of depth map could also be predicted or inherited
☆ ⁎
This paper has been recommended for acceptance by Zicheng Liu. Corresponding author. E-mail address:
[email protected] (Y.-W. Chen).
https://doi.org/10.1016/j.jvcir.2017.11.003 Received 26 January 2017; Received in revised form 28 August 2017; Accepted 4 November 2017 Available online 16 November 2017 1047-3203/ © 2017 Elsevier Inc. All rights reserved.
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Block A Block B
(a)
(b)
Block A
Block B
(d)
(c)
Fig. 1. (a) The texture picture (POC = 3 @ view1) of testing sequence “Balloons” and (b) the associated depth map; (c) the zoom-in of the block A; (d) the zoom-in of the block B.
this paper, a general overview of basic coding structure, MV coding techniques and depth intra prediction techniques are first presented in Section 2. Sections 3–5 describe our proposed inter-view motion prediction, inter-component motion prediction schemes and single depth mode which were adopted into the 3D-HEVC standard, respectively. Experimental results and conclusions are given in Sections 6 and 7, respectively.
from the corresponding video signal [8]. To explore the inter-view and the inter-component redundancies for texture and depth coding, in this paper, we introduce the inter-view motion vector prediction and the inter-component motion vector prediction to inherit the motion information from the neighboring views or from the associated texture picture for 3D video coding. Moreover, as an example shown in Fig. 1(a) and (b), the depth map shows different signal characteristics compared to natural video data. The most intuitive perception is that depth map contains a lot of smooth area with similar pixel value. For most of the cases, the pixels within the smooth area even share one identical pixel value. The existing intra prediction modes (e.g. DC, planar and 33 angular prediction) in [9] cannot efficiently specify the pixels within current block sharing the same single depth value which is also the same as one of the neighboring pixel. In 3D-HEVC, pictures are usually decomposed into blocks such that each block is associated with a particular set of model or coding parameters. Each block is either spatially or temporally predicted and the resulting prediction residual is represented by using transform coding. For the purpose of partitioning, quad-tree structured schemes are suitable for image coding, as they can be optimized in the ratedistortion (R-D) sense by simple algorithms. However, it has been pointed out that quad-tree structured partitioning may result in suboptimal RD performance when dependencies between leaf nodes of different parents are not exploited [10]. In this paper, we further propose the single depth intra mode to efficiently code the smooth area within a depth picture. The concept of single depth mode is to simply reconstruct the current coding unit (CU) as a smooth area with a single depth sample value. With the help of this new coding mode, the smooth area within a depth map can be coded more efficiently by incorporating a leaf merging of the pixel values. The proposed single depth mode has been adopted into 3D-HEVC and has been evaluated under common test conditions [11]. To provide a satisfactory introduction to the motion vector (MV) prediction, MV coding and the depth intra prediction in 3D-HEVC, in
2. Overview of the basic coding structure, MV coding and depth intra coding tools in 3D-HEVC 2.1. Quad-tree partitioning structure As shown in Fig. 2, 3D-HEVC is being developed for encoding/decoding multi-view video sequences. One of the views, which is also referred to as the base view or the independent view, is coded independently of the other views using a conventional HEVC video coder. The other views are usually termed as dependent views since they may be coded depending on the data of the other views. The dependent texture views and all depth views apply 3D-HEVC coding which is also based on a hybrid block-based motion-compensated transform coding architecture [12]. The basic unit for compression is termed coding tree unit (CTU). Each CTU may contain one coding unit (CU) or recursively split into four smaller CUs until the predefined minimum CU size is reached. Each CU (also named leaf CU) contains one or multiple prediction units (PUs) and a tree of transform units (TUs). For each PU, the prediction parameters are further signaled, such as intra prediction mode or motion information, depending on whether current CU is intra or inter coded. 2.2. Motion vector coding For MV coding in 3D-HEVC, a motion vector competition (MVC) based scheme is applied to select one motion vector predictor (MVP) among a given candidate set of MVPs. There are three inter-prediction 84
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Fig. 2. An example of prediction structure for 3D-HEVC.
Inter‐View Prediction Dimension Texture Access Unit
Depth map
V1 (Dependent View)
V0 (base view)
V2 (Dependent View)
T0
Current Picture T1 T2 Temporal Prediction Dimension and its associated depth map, respectively. Fig. 1(c) and (d) shows the zoom-in of the depth block A and depth block B as highlighted in Fig. 1(b). As can be seen, the content within the block A is very smooth while the block B is divided into two segments by a sharp edge. As a result, block A can be well predicted by the intra DC prediction and block B can be well predicted by the DMM mode. It is noted that, in 3D-HEVC, the depth residual can be selectively coded using the conventional residual quad-tree structure as HEVC or using the method of segment-wise DC coding (SDC) which does not code the quantized transform coefficients but only code one delta DC value for each segment within the CU as the residual signal [16]. To code the delta DC for the segments (one or two segments) within the intra coded PU, a residual flag is first transmitted to indicate if there is delta DC signal for the segments. If any segment within the intra PU has delta DC, a sign flag is then transmitted along with a syntax element to indicate the absolute value of the delta DC for each segment. The decoder can reconstruct the delta DC value after receiving the sign flag and the magnitude (absolute value) of the delta DC. Unlike typical camera-captured videos, the depth map is usually composed of few depth values. As shown in Fig. 1(b), many consecutive blocks contains only a single depth value and could be well predicted by their neighboring reconstructed pixels. We therefore introduced a new prediction mode for depth coding to represent a CU/PU with single depth value efficiently. The detailed description of the proposed single depth mode is given in the Section 5.
modes including Inter, Skip, and Merge in 3D-HEVC. To simplify the design, the Inter mode in 3D-HEVC reuses the advanced motion vector prediction (AMVP) technique in HEVC to select a predictor among an AMVP candidate set including two spatial MV predictors and one temporal MV predictor. As for the merge and skip modes, the merge scheme is used to select a motion vector predictor among a merge candidate set. Based on the rate-distortion optimization (RDO) decision, the encoder selects one final MVP within a given candidate set and transmits the index of the selected MVP to the decoder. The selected MVP may be linearly scaled according to temporal distances or view distances. In 3D-HEVC [3], the merge mode of the independent texture view is coded using the original HEVC merge candidates. To improve the coding efficiency and explore the interview correlation in 3D-HEVC, 3D extra merge candidates are added in the 3D-HEVC merge candidate set for the dependent texture and depth map. The 3D extra merge candidates include the inter-view motion predictor (IVMP) candidates [5,6], the disparity vector (DV) candidates, the inter-component motion prediction candidates (texture merge candidate) [8], the disparity derived depth (DDD) candidate [13], and view synthesis prediction (VSP) candidate [14]. During the 3D-HEVC standardization, we proposed several motion vector coding techniques for motion vector prediction, which were adopted into the standard. The proposed motion vector prediction techniques including the inter-view motion vector prediction (interview merge candidate) and the inter-component motion vector prediction (texture merge candidate) utilized to explore the inter-view and the inter-component redundancy, respectively. The detailed description of the proposed inter-view motion vector prediction and inter-component motion vector prediction are given in the following Sections 3 and 4, respectively.
3. Inter-view motion vector prediction The basic concept of the proposed inter-view prediction of motion parameters is illustrated in Fig. 3. In order to derive the motion parameters of inter-view motion prediction (IVMP) candidate for a current PU in a dependent view, a disparity vector (DV) is derived for the current prediction unit (PU). By adding the derived DV to the center position of the current PU, a reference sample location is obtained. The prediction block that covers the sample location in the already coded picture of the reference view is used as the reference block. If this reference block is coded using motion compensated prediction (MCP), the associated motion parameters is used as the IVMP for the current PU. Four coding tools are proposed to improve the inter-view motion vector prediction. Depth oriented neighbor block disparity vector (DoNBDV) improves the accuracy of the derived disparity vector. Sub-PU interview motion vector prediction helps to obtain finer motion information. Second inter-view motion vector prediction provides more motion
2.3. Intra coding tools for depth map coding In 3D-HEVC, for an intra coded depth CU, the depth block can be intra predicted by one of the conventional intra modes (35 intra prediction modes in HEVC) or depth modeling mode (DMM) [15]. DMM is developed for a better representation of edges in depth maps. A depth block is approximated by a model to partition the area of the block into two non-rectangular regions, where each region is represented by a constant value. The constant value of each region is than predicted by an average value of the adjacent samples of the neighboring left and top blocks. The residual signal is also signaled for each DMM coded CU. Fig. 1(a) and (b) depict one picture in the testing sequence “Balloons” 85
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Fig. 3. The derivation of inter-view motin prediction.
3.3. Second inter-view motion vector prediction
information for merge candidate. Pruning process preserves meaningful candidates in the merge candidate list.
A second inter-view merge candidate is proposed to be included in the merge candidate list of 3D-HEVC to further improve the coding efficiency [18]. The second inter-view candidate is derived from a corresponding block which is located as the H position shifted by a disparity vector as shown in Fig. 6. The derived disparity vector is the same as the one used in the derivation of the first inter-view candidate in the merge candidate list. In order to reduce the complexity, the subPU derivation process is only applied for the first inter-view merge candidate not for the second inter-view merge candidate.
3.1. Depth oriented neighboring block disparity vector (DoNBDV) In the dependent views, disparity motion vectors become available to be used for motion prediction. The disparity motion vectors should be similar within an object while considering the spatial and temporal consistency. Neighboring block disparity vector (NBDV) exploits the afore-mentioned characteristic [17]. It is derived by checking the availability of disparity motion vector in the spatial and temporal domain. However, NBDV is still a derived disparity vector that might not be representative enough as the real disparity vector of an object. Depth oriented neighboring block disparity vector (DoNBDV) is proposed to further utilize the coded depth information from another view. DoNBDV uses NBDV to identify a depth block in an already coded depth view to perform backward warping to improve the accuracy of derived disparity vectors. The derived disparity vector could be used in interview motion prediction, advanced residual prediction, illumination compensation and view synthesis prediction. An example is depicted in Fig. 4. For the current block, a corresponding depth block in the coded depth of the reference view is located by the disparity vector estimated by NBDV. The depth in the corresponding depth block is assumed to be the “virtual depth block” of the current block in the dependent view. Then the maximum depth value of the four edge samples of the virtual depth block is retrieved and converted to disparity by camera parameters. The newly derived disparity vector, DoNBDV, could replace the NBDV to provide more accurate disparity information.
3.4. Pruning process for redundancy removal A parallelizable pruning process is proposed to remove the redundancy in the merge candidate list for the merge and skip modes. The derived motion vectors of the spatial and inter-view merge candidates are most likely to be the same due to the observations that the spatial neighbor and the inter-view neighbor are most probably associated with same object. In the proposed pruning process for 3D-HEVC merge candidate list construction as shown in Fig. 7, the first two spatial merge candidates are firstly compared with the first inter-view merge candidate for redundancy removal. The second inter-view merge candidate is also compared to the first inter-view merge candidate. Same as the pruning process in HEVC, the temporal merge candidate is still exempted from the pruning process in order to reduce the latency of the merge candidate list construction. 4. Inter-component motion vector prediction To explore the inter-component motion correlation between texture and depth map, a texture merge candidate is proposed for depth coding to share the coded motion information of the associated texture picture. The proposed texture merge candidate is added into the merge candidate set for the depth merge/skip mode coding. In texture coding, the motion vectors (MVs) and reference index of the corresponding block in the inter-view are reused as an inter-view merge candidate. Similar to the concept of inter-view merge candidate, the proposed texture merge candidate directly reuses the MVs and reference index of the corresponding texture block as a merge candidate in depth coding. As shown in Fig. 8, the corresponding texture block is
3.2. Sub-PU inter-view motion vector prediction To further improve motion prediction accuracy, a sub-PU inter-view motion prediction is proposed to allow each sub-PU to inherit its corresponding motion information. In this proposed sub-PU structure, the current PU is firstly divided into multiple sub-PUs with a smaller size as shown in Fig. 5. For each sub-PU, the corresponding sub-PU in the interview picture is located by the derived DV and the associated motion information derived from the corresponding sub-PU is used as the motion vector predictor of current sub-PU.
86
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Fig. 4. The proposed DoNBDV process.
Similar to sub-PU IVMP, to fully utilize the motion information of the inter-view pictures, a sub-PU texture merge candidate is applied to obtain the motion predictor at fine granularity [19]. The merge candidate list for dependent texture pictures and depth map in 3D-HEVC are summarized in Table 1. As listed in the Table 1, the inter-view merge candidate (IVMP) is applied on both dependent view and depth map, while the texture merge candidate is only applied on depth map.
selected as the 4 × 4 block located to the right bottom of the center of the current PU in the corresponding texture picture. With this proposed scheme, the merge operations for texture and depth are also unified, as shown in Table 1, which could reduce the overhead in terms of software or hardware design. Note that, since the texture merge candidate directly reuses the motion parameters from the corresponding texture block, no MV scaling is required.
Fig. 5. The derivation of sub-PU inter-view motion prediction.
87
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
depth value with the pixels of the neighboring blocks. This may still result in redundant sets of prediction parameters being transmitted. For example, if a given CU is divided into four sub CUs, all sub CUs are typically coded separately, even if two or three of them share the same pixel value. The conventional intra prediction scheme may still need to signal the prediction mode (Intra/Inter mode), intra prediction mode (DC, planar or any other angular prediction) and residual signal. In this paper, we thus propose the single depth intra mode to efficiently code the smooth area within a depth map and to provide a leaf merging functionality in the pixel value domain. 5.1. Signaling of the proposed single depth intra mode A slice header flag is transmitted to specify whether the single depth intra mode can be used or not for coding the CUs for each slice. This allows the encoder to perform an adaptation of enabling single depth mode at higher level to increase coding efficiency for coding the sequences with complex depth map. When the slice-level flag is enabled, one CU-level flag is further signaled to indicate the enabling of the single depth mode for each coding CU. When single depth mode is applied for coding current CU, a sample candidate list is first constructed using the neighboring pixels. A candidate index is then transmitted to the decoder to indicate which sample candidate is selected to represent current CU with an efficient merging mechanism similar to the motion merge scheme in HEVC [9]. The construction of sample candidate will be described in the next section. Besides the single depth enabling flag and sample candidate index, no further side information is transmitted for single depth mode in order to efficiently code the single depth mode. Since no residual is signaled for single depth mode, the reconstruction process of a single depth mode coded CU is simple.
Fig. 6. The derivation of the first (I) and second (I2) inter-view candidates.
Fig. 7. The proposed pruning process in 3D-HEVC. The arrows represent the comparisons between candidates.
5.2. Candidate list construction To reconstruct a CU coded as single depth mode, a sample candidate list is first constructed by inserting the neighboring depth samples of current CU in a predefined order into the candidate list. As shown in Fig. 9 (each circle represents a sample), the spatial neighboring samples are those two reconstructed samples around the current CU and are inserted into the candidate list in the following order: (An/2, Bn/2). In the proposed scheme, the size of the sample candidate list is fixed to 2. If no sample candidate is available, a default candidate with value equal to 1 ≪ (bitDepth-1), e.g. 128 for a 8-bits depth map, will be used to fill in those empty entries. The encoder signals an index to indicate which sample candidate among the sample candidate list is selected to represent the depth value of current coding CU according to the ratedistortion (R-D) optimization results. To achieve higher coding efficiency, the concept of the proposed single depth intra mode is further extended to incorporate the conventional intra horizontal and vertical prediction as two additional prediction modes and is termed as depth intra skip mode (DIS) [20]. To summarize, there are four prediction modes for DIS including single depth mode from left and above samples and conventional intra horizontal and vertical predictions. The generation of the prediction signal for the intra horizontal and vertical modes is the same as the angular intra prediction mode in HEVC with horizontal or vertical direction. However, the intra predicted signal is set directly to be the reconstructed signal without signaling residue. Therefore, the DIS mode does not require the residual signal to be present after the prediction signal is generated and thus is simple to be reconstructed.
Fig. 8. The derivation of corresponding texture block.
5. Single depth intra mode 6. Experimental results Although the existing 3D-HEVC Intra prediction schemes exploit the localized prediction process to predict the pixel values of current block from the neighboring reconstructed pixels, it does not provide an efficient way to indicate the pixels within current block sharing single
The evaluation of the coding efficiency of the proposed algorithms is performed based on HTM-16.0 [21], and several experiments are conducted in comparison with the anchor generated by HTM-16.0 under 88
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Table 1 The merge candidate list for dependent texture pictures and depth map. Dependent View Texture
Depth Map
IVMP Candidate Spatial Candidates A1,B1,B0 DV Candidate VSP Candidate Spatial Candidates A0,B2 2nd IVMP Candidate 2nd DV Candidate Temporal Candidate
Texture Merge Candidate DDD Candidate IVMP Candidates Spatial Candidates A1,B1,B0 Spatial Candidates A0,B2 Temporal Candidate
Table 3 BD-rate performance of the proposed inter-view motion prediction compared to HTM16.0. Sequence
Video 1a
Video 2a
Video PSNR/video bitrateb
Video PSNR/total bitratec
Synth PSNR/total bitrated
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shark
23.0% 19.7% 14.4% 19.7% 28.2% 11.4% 16.0% 21.1%
24.0% 20.0% 12.5% 21.0% 29.9% 11.1% 16.2% 21.0%
9.0% 8.0% 5.1% 5.1% 11.5% 3.5% 4.7% 4.7%
9.2% 8.8% 5.8% 4.6% 10.9% 3.7% 4.6% 4.6%
7.9% 7.4% 5.3% 3.6% 9.5% 3.4% 4.0% 4.2%
1024 × 768 1920 × 1088
19.1% 19.3%
18.9% 19.8%
7.4% 5.9%
7.9% 5.7%
6.9% 4.9%
Average
19.2%
19.5%
6.4%
6.5%
5.7%
a Video 1 &2: The BD-rate performance considering Y-PSNR of view 1 and pendent views) b Video PSNR/video bitrate: The BD-rate performance considering Y-PSNR coded texture views over the bitrates of texture data. c Video PSNR/total bitrate: The BD-rate performance considering Y-PSNR coded texture views over the bitrates of texture data and depth data. d Synth PSNR/total bitrate: The BD-rate performance considering Y-PSNR synthesized texture views over the bitrates of texture data and depth data.
Fig. 9. The spatial sample candidates used to construct the sample candidate list.
Resolution
Frames number
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shark
1024 × 768 1024 × 768 1024 × 768 1920 × 1088 1920 × 1088 1920 × 1088 1920 × 1088 1920 × 1088
300 300 300 250 200 250 250 300
of the of the of the
Table 4 BD-rate performance of the proposed inter-component motion prediction compared to HTM-16.0.
Table 2 Testing sequences and corresponding resolutions. Test sequence
2 (de-
the common test conditions used for 3D-HEVC standardization activities. The hierarchical B prediction structure is utilized in the common test conditions [22] and the testing sequences are listed in Table 2. The coding performance was measured by the Bjøntegaard-distortion (BD-rate) saving [23] which calculates the average bit-rate reduction under equal PSNR conditions. The simulations are carried out under 64bit Linux platform with Xeon 5160 3.0 GHz CPUs.
Sequence
Video 1
Video 2
Video PSNR/ video bitrate
Video PSNR/total bitrate
Synth PSNR/ total bitrate
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shart
0.0% 0.0% 0.0% 0.6% −0.1% 0.1% 0.1% 0.1%
0.1% −0.2% 0.1% 0.6% −0.3% 0.0% −0.2% 0.1%
0.0% 0.0% 0.0% 0.1% −0.1% 0.0% 0.0% 0.0%
1.1% 2.0% 1.1% 1.0% 1.6% 0.9% 1.2% 1.7%
2.0% 3.4% 2.0% 1.7% 3.6% 1.4% 1.7% 2.2%
1024 × 768 1920 × 1088
0.0% 0.2%
0.0% 0.1%
0.0% 0.0%
1.4% 1.3%
2.5% 2.1%
Average
0.1%
0.0%
0.0%
1.3%
2.3%
Table 5 BD-rate performance of the proposed inter-view and inter-component motion prediction compared to HTM-16.0. Sequence
Video 1
Video 2
Video PSNR/ video bitrate
Video PSNR/ total bitrate
Synth PSNR/ total bitrate
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shart
23.1% 19.7% 14.4% 21.7% 28.2% 11.4% 15.8% 21.6%
24.2% 20.0% 12.5% 22.9% 29.9% 11.1% 16.2% 21.5%
9.1% 8.0% 5.1% 5.5% 11.5% 3.5% 4.6% 4.9%
10.3% 9.9% 6.2% 6.2% 13.5% 4.6% 6.1% 6.5%
9.7% 9.7% 6.5% 5.8% 14.2% 4.7% 6.2% 6.4%
1024 × 768 1920 × 1088
19.1% 19.7%
18.9% 20.3%
7.4% 6.0%
8.8% 7.4%
8.6% 7.5%
Average
19.5%
19.8%
6.5%
7.9%
7.9%
6.1. Results of the inter-view motion prediction The BD-rate performance of the proposed inter-view motion prediction as described in Section 3 is given in Table 3. As shown in Table 3, the proposed inter-view motion prediction could achieve significant BD-rate reductions of an average 19.3% for the dependent view 1 and 2, and 5.7% for the synthesized texture view. 6.2. Results of the inter-component motion prediction The BD-rate performance of the proposed inter-component motion prediction is listed in Table 4. Since the texture merge candidate is only applied in the depth map coding, it brings negligible impact to the BDrate performance of texture data. Because the depth map is generally employed for rendering virtual views, the coding performance of depth map mainly affects the quality of the synthesized view. As shown in the results, the overall bitrates are reduced by 2.3% in terms of the synthesized PSNRs.
The BD-rate performance of the combination of the proposed interview motion prediction and inter-component is also listed in Table 5. The inter-view merge candidate and texture merge candidate together bring an average BD-rate reduction of an average 19.6% for dependent view and 7.9% for the synthesized texture view. 89
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
HTM-12.0 excluding single depth mode
39.0
46.0
37.0
44.0 PSNR (dB)
PSNR (dB)
HTM-12.0 (include single depth mode)
35.0 33.0 31.0
42.0 40.0 38.0
29.0
36.0 5.0
55.0
105.0
5.0
25.0
Bit Rate (kbps)
45.0
65.0
Bit Rate (kbps)
(a)
(b)
Fig. 10. The R-D curves for coding depth maps using HTM-12.0 excluding single depth mode and HTM-12.0 which already includes single depth mode; (a) “center view” depth sequence of “Kendo” and (b) “center view” depth sequence of “PoznanHall2”.
Table 6 BD-rate performance of the proposed single depth mode compared to HTM-16.0. Sequence
Video 1
Video 2
Table 7 BD-rate performance of the proposed inter-view motion prediction, inter-component motion prediction and single depth mode compared to HTM-16.0.
Video PSNR/ video bitrate
Video PSNR/total bitrate
Synth PSNR/ total bitrate
Sequence
Video 1
Video 2
Video PSNR/ video bitrate
Video PSNR/ total bitrate
Synth PSNR/ total bitrate
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shart
−0.1% 0.2% −0.1% −0.5% 0.0% 0.0% −0.5% −0.6%
0.0% −0.1% 0.0% −0.1% −0.1% 0.1% −0.4% −0.7%
0.0% 0.0% 0.0% −0.1% 0.0% 0.0% −0.1% −0.1%
0.2% 0.7% 0.4% 0.4% 0.9% 0.2% 0.1% 0.1%
0.2% 0.5% 0.3% 0.9% 1.0% 0.2% 0.0% −0.1%
Balloons Kendo Newspapercc GhostTownFly PoznanHall2 PoznanStreet UndoDancer Shart
23.2% 19.9% 14.5% 21.3% 28.1% 11.1% 15.0% 21.2%
24.2% 20.0% 12.6% 22.6% 29.8% 11.0% 16.0% 21.1%
9.1% 8.0% 5.2% 5.4% 11.5% 3.4% 4.5% 4.8%
10.5% 10.6% 6.7% 6.7% 14.6% 4.7% 6.2% 6.6%
9.9% 10.1% 6.8% 6.9% 15.3% 4.9% 6.3% 6.4%
1024 × 768 1920 × 1088
0.0% −0.3%
0.0% −0.2%
0.0% −0.1%
0.4% 0.3%
0.4% 0.4%
1024 × 768 1920 × 1088
19.2% 19.3%
18.9% 20.1%
7.4% 5.9%
9.3% 7.8%
8.9% 8.0%
Average
−0.2%
−0.2%
0.0%
0.4%
0.4%
Average
19.3%
19.7%
6.5%
8.3%
8.3%
6.3. Results of the single depth mode Some rate-distortion (R-D) curves of the depth map coding are plotted in Fig. 10. The R-D curves with and without the proposed single depth mode are visually distinguishable as shown in Fig. 10(a) and (b). In the two experimental results, the single depth mode can improve the depth coding by 4.9% and 5.5% BD-rate reduction, respectively. When encoding the depth maps, it is noted that, current HTM encoder optimizes the distortion of the synthesized views instead of the distortion of the depth map for R-D optimization. It is believed that the coding gain of the proposed single depth mode could be further increased when R-D optimization is based on the distortion of the depth map itself. The overall BD-rate performance of the proposed single depth intra mode is given in Table 6. Since the single depth mode is only applied in the depth map coding, it brings negligible impact to the BD-rate performance considering PSNR of the coded texture views over the bitrates of texture data. The coding performance of depth map mainly affects the quality of the synthesized view because the depth map is generally employed for rendering virtual views. As shown in the results, the
Fig. 11. Examples of single depth mode used for coding the depth map (POC3 @ view 1) of sequence “Balloons”.
90
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
Fig. 12. The R-D curves of sequence “Kendo” using (a) HTM-12.0; (b) HTM-12.0 excluding inter-view motion prediction; (c) HTM-12.0 excluding inter-view and intercomponent motion prediction and (d) HTM-12.0 excluding inter-view, inter-components motion prediction and single depth mode.
HTM12.0 HTM‐12.0 excluding inter‐view motion prediciton HTM12.0 excluding inter‐view and inter‐component motion prediction HTM12.0 excluding inter‐view and inter‐component motion predcition and single depth mode
44 43 42 41 40 39 38 100
300
500
700
900
1100
1300
1500 Fig. 13. The R-D curves of sequence “PoznanHall2” using (a) HTM-12.0; (b) HTM-12.0 excluding inter-view motion prediction; (c) HTM-12.0 excluding inter-view and intercomponent motion prediction and (d) HTM-12.0 excluding inter-view, inter-components motion prediction and single depth mode.
HTM12.0 HTM‐12.0 excluding inter‐view motion prediciton HTM12.0 excluding inter‐view and inter‐component motion prediction HTM12.0 excluding inter‐view and inter‐component motion predcition and single depth mode 44 43 42 41 40 39 38 37 36 35 200
400
600
800
1000
1200
1400
1600
1800
2000
7. Conclusion
overall BD-rate reduction is around 0.4% in terms of the synthesized PSNRs over the total bit-rate. Because the bit-rate of the depth map usually only accounts for 10–15% of the total bit-rates, the overall performance of the proposed single depth mode is diluted when evaluating the BD-rate performance considering PSNR of the synthesized texture views over the total bit-rates (bit-rates of texture data and depth maps). Since no transform is performed for the single depth mode and only one additional rate-distortion calculation is required for the single depth mode, the additional encoding time caused by the single depth mode is negligible. Moreover, the proposed single depth mode could reduce the decoding time to 96% because the reconstruction of single depth mode is simple relative to the other coding modes. One result of the selection of single depth mode for a tested sequence “Balloons” is shown in Fig. 11. The blocks that predicted by single depth mode are highlighted in square blocks. Since there is large area with smooth depth values and some of them contain only single depth value, the single depth mode is enabled for many blocks over different regions in the depth map. The BD-rate performance of the combination of the proposed interview motion prediction, inter-component motion prediction and the single depth mode is also listed in Table 7. The rate-distortion (R-D) curves of the proposed methods are plotted in Figs. 12 and 13. The inter-view merge candidate, texture merge candidate and the single depth intra mode together bring an average BD-rate reduction of 19.5% for dependent view and 8.3% for the synthesized texture view.
To exploit the inter-view and inter-component motion redundancies, in this paper, we have introduced the inter-view motion vector prediction as well as the inter-component motion vector prediction to inherit the motion information from the neighboring views or from the associated texture picture for 3D video coding. To further improve the coding efficiency, we proposed a single depth intra mode to provide a functionality of pixel domain merging scheme. A single depth coded depth CU is simply merged to a selected sample candidate derived from the neighboring pixels to better utilize the information of the available neighboring depth values. Significant BD-rate reductions of an average 19.5% for dependent view and 8.3% for the synthesized texture view are achieved by the proposed inter-view motion prediction, inter-component motion prediction, and single depth mode. All the proposed schemes introduced in this paper were adopted in the 3DHEVC video coding standard. References [1] ISO/IEC JTC1/SC29/WG11, Text of ISO/IEC 14496-10:200X/FDAM 1 Multiview Video Coding, Doc. N9978, 2008. [2] M.M. Hannuksela, Y. Chen, T. Suzuki, J.-R. Ohm, G. Sullivan, 3D-AVC Draft Text 8, Doc. JCT3V-F1002, 2013. [3] G. Tech, K. Wegner, Y. Chen, S. Yea, 3D-HEVC Draft Text 6, JCT3V-J1001, 2014. [4] L. Zhang, Y. Chen, V. Thirumalai, J.-L. Lin, Y.-W. Chen et al., Inter-view motion prediction in 3D-HEVC, in: Proc. ICASSP, 2014, pp. 17–20. [5] J. An, Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE5.h related: inter-view
91
Journal of Visual Communication and Image Representation 50 (2018) 83–92
J.-L. Lin et al.
[14] D. Tian, F. Zou, A. Vetro, “CE1.h: Backward View Synthesis Prediction using Neighbouring Blocks”, Doc. JCT3V-C0152, Jan. 2013. [15] N. Stefanoski, P. Espinosa, O. Wang et al., Description of 3D Video Coding Technology Proposal by Disney Research Zurich and Fraunhofer HHI, document m22668, ISO/IEC JTC1/SC29/WG11, 2011. [16] F. Jäger, 3D-CE6.h related: Model-based Intra Coding for Depth Maps using a Depth Lookup Table, JCT3V-A0010, 2012. [17] L. Zhang, Y. Chen, M. Karczewicz, Disparity vector based advanced interview prediction in 3D-HEVC, in: IEEE International Symposium on Circuits and Systems (ISCAS), Beijing, 2013, pp. 1632–1635. [18] J.-L. Lin, Y.-W. Chen, Y.-W. Huang, S. Lei, 3D-CE5.h related: Additional inter-view merging candidate, Doc. JCT3V-D0109, 2013. [19] Y. Chen, H. Liu, L. Zhang, “CE2: Sub-PU based MPI”, Doc. JCT3V-G0119, Jan. 2014. [20] J.-Y. Lee, M.-W. Park and C. Kim, “3D-CE1: Depth intra skip (DIS) mode”, Doc. JCT3V-K0033, February 2015. [21] G. Tech, H. Liu, Y.-W. Chen, 3D-HEVC Software Draft 4, JCT3V-N1012, 2016. [22] D. Rusanovskyy, K. Müller, A. Vetro, Common test conditions of 3DV core experiments, JCT3V-D1100, Incheon, 2013. [23] G. Bjøntegaard, Improvements of the BD-PSNR Model, document VCEG-AI11, ITU-T SG16, 2008.
motion prediction for HEVC-based 3D video coding, Doc. JCT3V-A0049, 2012. [6] Y.-L. Chang, Y.-W. Chen, J.-L. Lin, N. Zhang, J. An, Y.-W. Huang, S. Lei, 3D-CE2.h related: Simplified DV Derivation for DoNBDV and BVSP”, Document of Joint Collaborative Team on 3D Video Coding Extension Development, JCT3V-D0138, Incheon, 2013. [7] J. An, K. Zhang, J.-L. Lin, S. Lei, 3D-CE3: Sub-PU level inter-view motion prediction, Doc. JCT3V-F0110, 2013. [8] Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE3.h results on removal of parsing dependency and picture buffers for motion parameter inheritance, Doc. JCT3VC0137, 2013. [9] J. Lainema, F. Bossen, W.-J. Han, J. Min, K. Ugur, Intra coding of the HEVC standard, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (Dec. 2012) 1792–1801. [10] P. Helle, K. Ugar, Block merging for quadtree-based partitioning in HEVC, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (Dec. 2012) 1720–1731. [11] Y.-W. Chen, J.-L. Lin, Y.-W. Huang, S. Lei, 3D-CE2: Single depth intra mode for 3DHEVC, JCT3V-I0095, 2014. [12] I.-K. Kim, J. Min, T. Lee, W.-J. Han, J. Park, Block partitioning structure in the HEVC standard, IEEE Trans. Circuits Syst. Video Technol. 22 (12) (Dec. 2012) 1697–1706. [13] K. Zhang, J. An, J.-L. Lin, Y.-L. Chang, S. Lei, 3D-CE2: results on additional merging candidates for depth coding, Doc. JCT3V-G0063, 2014.
92