Signal Processing: Image Communication 77 (2019) 40–48
Contents lists available at ScienceDirect
Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image
Depth map upsampling with a confidence-based joint guided filter✩ Yoonmo Yang, Hean Sung Lee, Byung Tae Oh ∗ Korea Aerospace University, Republic of Korea
ARTICLE Keywords: Upsampling Super-resolution Depth map Confidence map Guided filter
INFO
ABSTRACT Depth maps are often obtained simultaneously alongside color images, but their resolution is much lower than that of the respective color images. This paper proposes a confidence-based joint guided filter for depth map upsampling using corresponding color information. Many previous studies have been conducted to incorporate color images into the estimation of upsampling filter coefficients. The scheme proposed in this paper instead focuses on the order of the filter for unreliable pixel refinement, and the order is calculated using the confidence map derived from the shape of unreliable regions, depth map, and color pixel values. In terms of quantitative and qualitative measurements, the proposed method demonstrates superior performance over current state-of-the-art algorithms.
1. Introduction In recent years, 3-dimensional (3D) technology has grown in popularity due to rapid technological development. As a result, 3D image analysis is important in the field of computer vision and image processing [1,2]. It is used in various applications including robotics, automotive driver assistance systems, and 3D reconstruction [3–5]. One of the key pieces of information in a 3D image is its depth map. A depth map is image-formatted data that represents the range information of a space. It is essential for recognizing and understanding the scene structure in 3D image analysis. Traditionally, the depth map is acquired in one of two ways: passively or actively [6–12]. A passive method indirectly obtains depth information. An example of passive method is stereo matching, which estimates depth information through two or more scenes; it determines pixel similarity or correlation between images to estimate the depth map [6–9]. With this method, finding the corresponding pixels between views is challenging, especially in textureless or occluded regions; this is because many-to-one matching (textureless) or no correspondence matching (occluded) occur. An active method measures depth information using specific devices such as laser range scanners; examples include the time-of-flight or structured light methods [10–12]. Recently, low-cost devices for acquiring the depth map such as Microsoft Kinect and SoftKinect have been released [11,12], and the number of applications using these devices is rapidly increasing. However, the quality of depth maps is still unsatisfactory due to the limitations of current hardware technology. For example, in the case of Kinect-v2, the obtained depth map has many holes, especially near the boundary
of objects. Furthermore, the resolution of depth maps is far behind that of color images. Kinect-v2 only provides a 512 × 424 resolution depth map, while its color image is 1920 × 1080 (HD). For this reason, many studies have proposed enhancing the resolution of depth maps through depth-oriented post-processing. In this paper, we propose a filter-based depth upsampling scheme called confidence-based joint guided filtering. The main contribution of this scheme is the introduction of a confidence map, which is derived from the similarities between the color and depth map intensities and edge directions. Based on the confidence map, the filtering order is recalculated, and the kernel weights are re-estimated for depth upsampling. The remainder of this paper is organized as follows. Section 2 briefly reviews related works. The proposed algorithm is described in detail in Section 3. In Section 4, the proposed algorithm’s performance evaluation is presented and discussed. Finally, Section 5 presents a conclusion and describes future work directions. 2. Related works Generally, depth map upsampling methods are categorized in two ways. One is upsampling the depth map on its own. This concept is similar to single-image super-resolution, i.e., only the given depth map is used for its upsampling, where the prior information of the depth map is usually used for unknown pixel prediction. The conventional bilinear and bicubic methods fall into this category, as do many updated image upsampling methods [13].
✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.image.2019.05.014. ∗ Corresponding author. E-mail address:
[email protected] (B.T. Oh).
https://doi.org/10.1016/j.image.2019.05.014 Received 8 September 2018; Received in revised form 9 January 2019; Accepted 28 May 2019 Available online 30 May 2019 0923-5965/© 2019 Elsevier B.V. All rights reserved.
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
residual is defined as the difference between values of a tentatively estimated high-resolution depth map and a low-resolution depth map. Hua proposed an extension of the guided filter called the extended guided filter (EGF) [28,29]. The additional term considers the local 2nd order gradients of the depth map. The distinct feature of EGF is the use of onion peel-like filtering, which greatly improves filtering performance. Our proposed scheme is motivated by and based on EGF; it is presented in the next section.
In the practical scenario, however, depth maps are acquired with a combination of other visual data, such as RGB color images. As mentioned in Section 1, depth maps have a lower resolution than color images, but they are highly correlated. Therefore, some information in the higher resolution color images, such as the shape of the edge in object boundaries, can be helpful to increase the resolution of depth maps. In [14], Diebel and Thrun proposed an upsampling method using Markov random field (MRF) formulation, where the data term is determined by a given depth map and the smoothness term is determined by estimated high-resolution depth samples derived from the high-resolution color image. Park et al. improved the MRF formulation with an additional term called non-local mean regularization, which is implemented by the anisotropic structure-aware filter [15]. It makes similar but far away pixels to contribute to the estimation during filtering. Kim and Yoon also used the MRF formulation for depth map upsampling [16], where the objective function is designed to minimize the edge noise, which is the derived limitation of the depth range sensor in the depth discontinuous region. Here, the smoothness term is newly defined by color similarity, gradient information, and region segmentation information. Lu et al. also proposed a method using MRF formulation [17]; however, unlike in previously proposed methods, a novel data formulation, which measures the truncated absolute difference between estimated depth value and input depth value, is used for depth map upsampling. In [18], Yang et al. used the cost volume filtering method. This method builds the cost volume from the preupsampled depth map using iterative joint bilateral filtering and finds a label with minimum cost. Preserving the edge structure of a depth map gives better results than filtering directly. Liu and Gong approached the depth upsampling problem with the heat diffusion framework [19]. In this framework, known pixels of the depth map are set as a heat source, and they are diffused to unknown pixels using color similarity. Although the abovementioned methods demonstrate good upsampling performance for depth maps, optimization-based approaches are somewhat impractical for real-world applications due to the large amount of computation during iterative optimization processes. For this reason, filtering-based depth upsampling methods have been also rigorously studied. Most such methods use the joint bilateral filter and/or its extension to incorporate the corresponding color image during filtering. Kopf et al. first suggested the joint bilateral filter for various applications including image fusion and depth map upsampling [20]. As an extension of the joint bilateral filter, Kim et al. proposed the trilateral filtering method [21]. This method focuses on blurring artifacts frequently occurring around the edge due to misalignment of depth and color images; it reduces these artifacts using the additional confidence term, which measures the degree of alignment. Jung proposed the adaptive joint trilateral filter, which enhances both color images and depth maps [22]. It categorizes all patches in depth maps and filters each patch using different parameter values according to its category. Lo et al. also proposed an extension of the joint bilateral filter to avoid textural copying artifacts that occur when two adjacent pixels have similar colors but different depth values [23]. This problem is alleviated by integrating the local gradient of depth maps during filtering. In [24], Chan et al. proposed a noise-aware filter for depth upsampling. The filter restores pixel values by a linear combination of two bilateral filters, and the degree of the combination ratio is determined by local characteristics. Min et al. proposed a filtering-based method called the weighted mode filter [25]. Unlike previous bilateral methods, it calculates filter coefficients based on local statistical information, which is generated by a histogram. Along with the bilateral filter, the guided filter proposed in [26] has been paid much attention due to its simplicity and superiority. The basic structures of the bilateral and guided filters are similar; they are both kernel-based filters. Therefore, tens of schemes have been proposed as extensions of the guided filter for depth map upsampling. Konno et al. used the residual interpolation method for depth map upsampling [27]. This method performs upsampling in the residual domain, where the
3. Proposed algorithm 3.1. Overall framework and motivation In its essence, the proposed method is an enhancement of the abovementioned EGF proposed by Hua et al. [28]. The overall block diagram of the EGF is illustrated in Fig. 1(a). As shown in the figure, it first upsamples the low-resolution depth map as an initialization and then determines the unreliable pixels in the upsampled depth map, mostly around the image boundary. The determination of unreliable pixels is relatively simple. For each pixel of the initially interpolated image, a pixel is considered as unreliable if the difference between the maximum and minimum values within a reference window is larger than the predetermined threshold. Then, the unreliable pixels are iteratively refined with onion peel filtering, where unreliable pixels around reliable pixels are processed at each iteration and then refined pixels are considered as reliable ones. Further details will be explained in Section 3.6. Likewise, the contributions of the EGF are twofold: the use of a reliability map for the refinement of unreliable pixels and onion peel filtering with order determination. Despite the EGF’s good performance with low complexity, we found that there is still much room for improvement. First, the generation of a reliability map is not carefully considered. The EGF only refines unreliable pixels, while reliable pixels are left unchanged. Therefore, it is of great importance to properly find unreliable pixels in preprocessing. Secondly, and more importantly, the order of the onion peel filtering is critical for the final results. However, the EGF simply determines the order with a set of near-reliable pixels step-by-step, which results in the filtering order being fully dependent on the reliability map. Instead, it is necessary to carefully determine the onion peel filtering order. The proposed scheme resolves these two problems through the addition (shaded box in Fig. 1(b)) and modification (dotted boxes in Fig. 1(b)) of the functional blocks, which results in a dramatic performance improvement. In the following subsections, the details of each function block and modifications are described. 3.2. Initialization of upsampled image Most image upsampling methods prefer to use the bicubic scheme to obtain an initial upsampled image due to its sufficient performance and simplicity. The EGF also uses the bicubic method for initialization, and the reliability map and all following processes are conducted with the bicubic upsampled image. However, we found that the cubic-spline method often causes the over or undershooting around image edges, and this could be a critical issue during onion peel filtering. As will be explained later, the proposed filtering is applied in a step-by-step manner, where the neighboring reliable pixels have an important role in determining the unreliable pixels. Therefore, it is important to keep the reliable pixels near boundaries as noise-free as possible. In the propose scheme, we simply replace them with the bilinear upsampled data. It is established that the bilinear method is simpler, but it yields poorer result. Nonetheless, the bilinear method does not produce any over or undershooting problems, and it is more adequate for the proposed scheme to adopt the bilinear method for guided filtering. 41
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
Fig. 1. Overall block diagram of (a) the EGF [28] and (b) the proposed method, where the shaded box is newly inserted, and dotted boxes are modified.
some pixels are only estimated by filtering with information from the other side. The EGF uses guided filtering by considering the color similarity and the 2nd order gradient of the depth map, which alleviates the abovementioned problem, but this is not the fundamental solution. The proposed method suggests a solution that uses a confidence map. Instead of determining the filtering order based on the geometric proximity, we consider the similarities of color and depth map intensities and edge directions. In this paper, we propose the vector-formatted confidence map for order determination. In mathematical language, the confidence value at pixel position 𝐱 = (𝑥1 , 𝑥2 ) is formatted as ] [ 𝐶 (𝐱; +∇𝑇 (𝐱)) 𝐶 (𝐱) = 𝐶 (𝐱; −∇𝑇 (𝐱))
3.3. Generation of a reliability map As explained in Section 3.1, the proposed scheme only applies to unreliable pixels. Therefore, it is important to correctly classify reliable and unreliable pixels in the initial upsampled depth map. In the previous method, however, it simply determines the unreliable pixels when the difference between the maximum and minimum neighboring pixels is sufficiently large. This often causes unexpected error in areas with fine details or where two objects are close together, as shown in Fig. 2. As an example, in the figure, the neighboring reference pixels are completely different from the target pixels, and consequently the unknown fine details cannot be restored correctly. In this case, the pixel would be classified as a reliable pixel instead. To remedy this problem, we propose the simple 1st order filtering method because it can deal with the directional patterns of image contents. In other words, as shown in Fig. 3, a 1st order derivative edge filter can differentiate the fine details with zero-crossing, which remain as reliable regions. This method is especially effective when the scaling factor is large, because fine details are highly unpredictable.
where 𝑇 represents the distance map, and ∇ is the gradient operator. Therefore, ∇𝑇 represents the propagation direction for the filtering. Likewise, the confidence map contains the confidence values for both propagation directions. The confidence values for reliable pixels are initialized as [1, 0]𝑇 . Then, confidence values for unreliable pixels with two propagation directions are independently obtained by the following equation: ( ) ( ) ( ) 1 ∑ 𝑛 𝐶 𝐱; ∇𝐱 = 𝛼 𝐵 (𝐱, 𝐲) 𝐺 𝐱, 𝐲; ∇𝐱 , ∇𝐲 𝐶 𝐲; ∇𝐲 (1) 𝐾𝐶 𝐲∈𝑊 (𝐱)
3.4. Generation of confidence map The major contribution of the EGF is the use of onion peel filtering, i.e., instead of refining the unreliable pixels all at once, filtering is performed in order. This means that the newly refined pixels in the previous steps participate in the next step’s filtering process. Therefore, it is extremely important to decide the filtering order because it greatly affects overall performance. The filtering order in the EGF is determined in a simple manner. For n-th iteration, all unreliable pixels near reliable pixels are searched for, and the order of these pixels is determined as n. Then, these pixels are marked as reliable for the future iterations. After a sufficient number of iterations, all unreliable pixels’ orders are determined. This step is similar to the distance transform [30]. That is, unreliable pixels are refined based on their neighboring reliable pixels at each step, and newly updated pixel values are propagated to the center of unreliable regions. Therefore, the center line or curve between two boundaries in the initial reliability map is always set to be the image boundary, as shown in Fig. 4(a). However, it is likely that the ground truth edges of objects are not the centers of the unreliable regions. When misclassified,
where 𝑊 (𝐱) returns neighboring reliable pixels around 𝐱, and 𝐾𝐶 is the normalization factor. ∇x and ∇y represent ±∇𝑇 (𝐱) and ±∇𝑇 (𝐲), respectively. 𝐵 and 𝐺 are defined as ( ) ( ) 𝐵 (𝐱, 𝐲) = Exp −𝛾𝐼 ‖𝐼(𝐱) − 𝐼(𝐲)‖2 Exp −𝛾𝐷 ‖𝐷(𝐱) − 𝐷(𝐲)‖2 (2) ⎞ ⎛ ∇T ∇ ( ) 𝐱 𝐲 𝐺 𝐱, 𝐲; ∇𝐱 , ∇𝐲 = max ⎜ , 𝛿⎟ | | ⎜ |∇ | |∇ | ⎟ ⎝ | 𝐱| | 𝐲| ⎠
(3)
where 𝛿 ≥ 0, and 𝐼 and 𝐷 return the color and depth map values, respectively. Here, 𝐵 measures the consistency of the depth map and its corresponding color intensities, where 𝛾𝐷 and 𝛾𝐼 are adjusting parameters. This means the confidence values are weakened when there is a large change in depth map or color values during propagation. 𝐺 measures the consistency of the normal vector of the boundary, which controls the confidence propagation only when both positions have similar directions. If two normal vectors have opposite or orthogonal directions, the confidence value is set to 𝛿 by (3). Finally, 𝛼 is the 42
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
Table 1 Performance comparisons with the scaling factor of 4 in terms of MSE/MAE/SSIM.
decaying factor (0 < 𝛼 ≤ 1), i.e., the confidence value decreases when the number of propagations, denoted by 𝑛, increases. It can be assumed that the proposed method generalizes the previous EGF method, i.e., by setting 𝛾𝐼 = 𝛾𝐷 = 0, 𝛿 = 1, it becomes identical to the EGF. In the proposed scheme, we set 𝛼 = 0.8, 𝛾𝐼 = 0.1−2 , 𝛾𝐷 = 0.01−2 , and 𝛿 = 0.
To summarize, the confidence map-based order determination has the following two advantages: (1) It estimates the unreliable pixels with the combination of multiple candidates. In the original method, some pixel values were only affected by one side’s values. However, the proposed method always returns the soft confidence value, which will be eventually the weight of multiple candidates. (2) It reduces the effect of the reliable pixel’s determination results. The original distance transform critically relies on the shape of the reliability map, while the proposed scheme uses the confidence of each side, so that it can guarantee more reliable confidence.
3.5. Order determination Once the confidence map is prepared, the order for the onion peel filtering is determined with the confidence value. Whichever of ∇𝑇 (𝐱) and −∇𝑇 (𝐱) has the larger confidence value is determined as the propagation direction at each value of 𝐱. The idea behind this step is assigning the order for a large confidence value. For example, the previous order determination step in the EGF simply assigns the order with step size one, as in Fig. 4(a), while the proposed method determines the propagation direction with the larger confidence value (blue-lined region), and assigns the order based on the soft confidence value, as in Fig. 4(b). As can seen in Fig. 4, with the original method, the order of both sides always meets at the center of unreliable regions, while in the proposed method, the order is adjusted with consideration of the color consistency and edge direction. Furthermore, the soft confidence value enables the estimation of the linear combinations of each side.
3.6. Onion peel filtering Based on the obtained ordering map, the following guided filter is applied as the EGF. The guided filter can be written as the conventional filtering with kernel ( ) ∑ (𝐼 (𝐱) − 𝜇𝐤 )(𝐼 (𝐲) − 𝜇𝐤 ) 1 𝐾 (𝐱, 𝐲) = 1+ (4) 𝜎𝐤2 + 𝜀 |𝐵|2 𝐤 ∶ (𝐱,𝐲)∈𝐵(𝐤) Finally, the refinement by the guided filtering to obtain the upsampled depth map 𝐷𝑢 is written as 1 ∑ 𝐷𝑢 (𝐱) = 𝐾 (𝐱, 𝐲) 𝐷(𝐲) (5) 𝐾𝐷 𝐲∈𝑊 (𝐱) 43
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
Table 2 Performance comparisons with the scaling factor of 8 in terms of MSE/MAE/SSIM.
Fig. 2. Example of overly assigned unreliable pixels in fine details. The selected regions are classified into reliable (gray) and unreliable (color) pixels, where unreliable pixels are determined when the difference between the maximum and minimum neighboring pixels is sufficiently large. Unreliable pixels noted in green regions are mostly not recovered by filtering.
44
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
Fig. 3. Classifying the fine details by 1st order filtering with zero-crossing. From left to right: low-resolution depth map, upsampled depth map, and 1st order filtered depth map.
Fig. 4. Filtering order determination schemes.
where the guided filter is applied to unreliable pixels, with reliable pixels only controlled by 𝑊 (𝐱) as defined in (1). 𝐾𝐷 indicates the adjusted normalization factor.
the proposed scheme was quantitatively and qualitatively measured
4. Experimental results and discussion
anisotropic diffusion filter [19], noise-aware filter [24], weighted mode
and compared to seven state-of-the-art image upsampling algorithms: joint bilateral filter (JBF) [33], joint bilateral upsampling filter [20],
filter (WMF) [25], edge-weighted non-local means regularization [15], 4.1. Test setup
and EGF [28]. Note that some of the algorithms were designed for depth map upsampling, while others were designed for general purpose. All
To experimentally evaluate the proposed algorithm, we used the Middlebury stereo dataset [31] with 1380 × 1100 images and Sintel dataset [32] with 1024 × 436 images, where the first frames of sequences are tested. All images are composed with colored texture and a depth map, and they are completely aligned. We downsampled the depth maps with factors of four and eight. Then, the performance of
schemes use the corresponding color information during processing. For the fair comparisons, all free parameters are sufficiently adjusted to achieve the best performance. The following two subsections show the comparison results and their full analyses. 45
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48
Fig. 5. Subjective comparisons for the upsampled Art, Reindeer, and Market5 images by various methods (with scale factor 8).
Fig. 6. Subjective comparisons for the upsampled Art, Reindeer, and Market5 images by various methods (with scale factor 4).
highest SSIM) values are in bold, and the second lowest (or highest) are underlined. As shown in the results, the proposed method strongly outperforms the other state-of-the-art methods for all quantitative metrics. The proposed method was always ranked first in terms of SSIM and mostly had the lowest (although sometimes second or third lowest) MSE and MAE. This demonstrates that the proposed scheme often yields
4.2. Quantitative comparisons For the quantitative comparisons, we measured the mean squared error (MSE), mean absolute error (MAE), and structural similarity (SSIM) [34] between the ground truth and upsampled depth maps. First, Table 1 shows the upsampling results with a scale factor of four for MSE, MAE, and SSIM, respectively. The lowest MSE and MAE (or 46
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48 Table 3 Complexity comparisons with the scaling factor of 4.
distorted results, but they are mostly only small. On the other hand, the EGF mostly performed well, but it sometimes yielded large errors, which were caused by the misclassification of unreliable pixels. It is worth noting that the proposed scheme improved the EGF method for all cases. The other schemes performed well for some specific images, but they did not provide stable results. The quantitative results for a scaling factor of eight are shown in Table 2. These results are very similar to those with the scaling factor of four, i.e., the proposed scheme outperformed all others in terms of MSE and SSIM, and it ranked second overall for MAE. However, MAE does not always match with actual visual distortion, which will be discussed in the next subsection. In Tables 1 and 2, we also provided the intermediate results in the last column when the early part of the proposed scheme, i.e., only the modification of the generation of reliability map is applied. As shown by the results, errors were reduced in most cases. According to depth map contents, both small and large amounts of error can be reduced. It can be interpreted that the modification of the reliability map largely affects the overall performance when the depth map includes large displacement information. On the other hand, the next refinement stage with the confidence map consistently reduces errors.
Algorithm
[33] [20] [19] [24] [25]
[15]
[28] Proposed
Average running time (s) 39.1 26.5 35.6 46.4 289.3 109.5 6.7
9.2
5. Conclusion In this paper, an improvement of the extended guided filter for depth map upsampling was proposed. The proposed scheme focuses on order-based onion peel filtering, and includes a novel order assignment algorithm inspired by the inpainting scheme. The confidence is carefully computed and information is propagated from reliable pixels to unreliable pixels. Then, pixels with high confidence values are filled first. Furthermore, the generation of a reliability map and initial interpolated depth map are also modified to fully consider the confidence-based onion peel filtering. As a result, the proposed scheme mostly outperformed the EGF and other state-of-art depth map upsampling schemes in terms of MSE, MAE, and SSIM. The subjective results also support the superiority of the proposed scheme. However, the proposed method can only be applied to well-aligned texture and depth maps; therefore, in future work, we will study a more general approach for conventional depth maps obtained by active or passive methods that are not perfectly aligned with their color images.
4.3. Qualitative comparisons Some examples of the upsampled images are given in Figs. 5 and 6. Due to space limitations, we selected three images — Art and Reindeer for fine details and Market5 for large depth discontinuity. For a more convenient visual comparison, we enlarged selected regions. For scaling factor four, we adjusted some zooming ratios when the visual difference is subtle. As can be seen in the figures, the upsampled images by the proposed scheme have much fewer visual artifacts compared to the other methods. According to the quantitative result in Tables 1 and 2, JBF and WMF often yielded the lowest MAE. However, as can be seen in the visual results, it is clear that the proposed method has no bleeding or texture copying artifacts, and the object structure is well preserved, while JBF and WMF caused unpleasant bleeding artifacts and the edge shape is often distorted. The EGF also generated relatively robust results, but it sometimes yielded critical artifacts, which did not preserve the fine details; this increased its MSE greatly. Compared to the EGF, the proposed method clearly preserved the fine objects, which yielded a large MSE and MAE performance improvement. However, both the EGF and proposed method tend to simplify the shape of the object boundary during the refinement process, which often removes the fine details. By comparing Tables 1–2 and the related figures, we see that MSE was good at measuring the bleeding artifacts, because it greatly penalized object-background misclassifications. On the other hand, MAE gave higher weights for exact pixel reconstruction. Therefore, the mode estimation by histogram in WMF works particularly well for MAE. However, its side effects, including bleeding artifacts, are visually unpleasant, especially when a higher scaling factor was used. SSIM focused on the overall structure of objects, by which the proposed scheme always yields the highest SSIM scores. To summarize, the proposed scheme minimized the visually unpleasing distortions and mostly yielded the lowest average MSE, MAE, and highest average SSIM.
Acknowledgments This research was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03930917), and by the GRRC program of Gyeonggi Province, Korea [2017-B02, Study on image processing and UI platform for mobile media devices]. References [1] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, A. Fitzgibbon, Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera, in: ACM Symposium on User Interface Software and Technology, 2013, pp. 559–568. [2] C.B. Choy, D. Xu, J. Gwak, K. Chen, S. Savarese, 3D-R2n2: A unified approach for single and multi-view 3D object reconstruction, in: Springer European Conference on Computer Vision, 2016, pp. 628–644. [3] A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The KITTI dataset, Int. J. Robot. Res. 32 (11) (2011) 1231–1237. [4] G.P. Stein, Y. Gdalyahu, A. Shashua, Stereo-assist: Top-down stereo for driver assistance systems, in: IEEE Intelligent Vehicles Symposium, 2010, pp. 723–730. [5] A. Geiger, J. Ziegler, C. Stiller, Stereoscan: Dense 3D reconstruction in real-time, in: IEEE Intelligent Vehicles Symposium, 2011, pp. 963–968. [6] A. Klaus, M. Sormann, K. Karner, Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure, Int. Conf. Pattern Recognit. 3 (2006) 15–18. [7] Q. Yang, L. Wang, R. Yang, H. Stewénius, D. Nistér, Stereo matching with colorweighted correlation, hierarchical belief propagation, and occlusion handling, IEEE Trans. Pattern Anal. Mach. Intell. 31 (3) (2009) 492–504. [8] T. Kanade, M. Okutomi, A stereo matching algorithm with an adaptive window: Theory and experiment, IEEE Trans. Pattern Anal. Mach. Intell. 16 (9) (1994) 1088–1095. [9] J. Sun, N.N. Zheng, H.Y. Shum, Stereo matching using belief propagation, IEEE Trans. Pattern Anal. Mach. Intell. 25 (7) (2003) 787–800. [10] D. Scharstein, R. Szeliski, High-accuracy stereo depth maps using structured light, IEEE Comput. Vision Pattern Recognit. (2003) 195–202. [11] L. Yang, L. Zhang, H. Dong, A. Alelaiwi, A.E. Saddik, Evaluating and improving the depth accuracy of kinect for windows v2, IEEE Sens. J. 15 (8) (2015) 4275–4285. [12] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia 19 (2) (2012) 4–10. [13] S.C. Park, M.K. Park, M.G. Kang, Super-resolution image reconstruction: a technical overview, IEEE Signal Process. Mag. 20 (3) (2003) 21–36. [14] J. Diebel, S. Thrun, An application of Markov random fields to range sensing, Adv. Neural Inf. Process. Syst. (2006) 291–298. [15] J. Park, H. Kim, Y. Tai, M.S. Brown, I. Kweon, High quality depth map upsampling for 3D-TOF cameras, in: IEEE International Conference on Computer Vision, 2011, pp. 1623–1630.
4.4. Complexity comparisons Finally, we measured the computational complexities of all methods, where the average running time is only reported for simplicity. As shown in Table 3, the EGF method has the lowest complexity with a large margin, and the proposed method has the second lowest complexity over all other methods. This is because the EGF and the proposed method only refine the unreliable pixels, where the determination of reliability is relatively simple. The proposed method additionally computes the confidence map, which increases the complexity by approximately 38%. 47
Y. Yang, H.S. Lee and B.T. Oh
Signal Processing: Image Communication 77 (2019) 40–48 [26] K. He, J. Sun, X. Tang, Guided image filtering, IEEE Trans. Pattern Anal. Mach. Intell. 35 (6) (2013) 1397–1409. [27] Y. Konno, Y. Monno, D. Kiku, M. Tanaka, M. Okutomi, Intensity guided depth upsampling by residual interpolation, in: The Abstracts of the International Conference on Advanced Mechatronics: Toward Evolutionary Fusion of IT and Mechatronics, 2015, pp. 1–2. [28] L. Hua, K. Lo, Y.F.F. Wang, Extended guided filtering for depth map upsampling, IEEE Multimedia 23 (2) (2016) 72–83. [29] K. Lo, Y.F.F. Wang, K. Hua, Edge-preserving depth map upsampling by joint trilateral filter, IEEE Trans. Cybern. 48 (1) (2018) 371–384. [30] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital Image Processing using MATLAB, Pearson Prentice Hall, 2004. [31] H. Hirschmüller, D. Scharstein, Evaluation of cost functions for stereo matching, IEEE Comput. Vision Pattern Recognit. (2007) 1–8. [32] D.J. Butler, J. Wulff, G.B. Stanley, M.J. Black, A naturalistic open source movie for optical flow evaluation, in: Springer European Conference on Computer Vision, 2012, pp. 611–625. [33] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, K. Toyama, Digital photography with flash and no-flash image pairs, ACM Trans. Graph. 23 (3) (2004) 664–672. [34] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612.
[16] D. Kim, K. Yoon, High-quality depth map up-sampling robust to edge noise of range sensors, in: IEEE International Conference on Image Processing, 2012, pp. 553–556. [17] J. Lu, D. Min, R.S. Pahwa, M.N. Do, A revisit to MRF-based depth map superresolution and enhancement, in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2011, pp. 985–988. [18] Q. Yang, R. Yang, J. Davis, D. Nistér, Spatial-depth super resolution for range images, IEEE Comput. Vision Pattern Recognit. (2003) 1–8. [19] J. Liu, X. Gong, Guided depth enhancement via anisotropic diffusion, in: Springer Pacific-Rim Conference on Multimedia, 2013, pp. 408–417. [20] J. Kopf, M.F. Cohen, D. Lischinski, M. Uyttendaele, Joint bilateral upsampling, ACM Trans. Graph. 26 (3) (2007) 96. [21] J. Kim, J. Lee, S. Han, D. Kim, J. Min, C. Kim, Trilateral filter construction for depth map upsampling, IEEE Image Video Multidimens. Signal Process. Workshop (2013) 1–4. [22] S.W. Jung, Enhancement of image and depth map using adaptive joint trilateral filter, IEEE Trans. Circuits Syst. Video Technol. 2 (23) (2013) 258–269. [23] K. Lo, Y.F. Wang, K. Hua, Joint trilateral filtering for depth map super-resolution, IEEE Vis. Commun. Image Process. (2013) 1–6. [24] D. Chan, H. Buisman, C. Theobalt, S. Thrun, A noise-aware filter for real-time depth upsampling, in: Workshop on Multi-Camera and Multi-Modal Sensor Fusion Algorithms and Applications, 2008. [25] D. Min, J. Lu, M.N. Do, Depth video enhancement based on weighted mode filtering, IEEE Trans. Image Process. 21 (3) (2012) 1176–1190.
48