Available online at www.sciencedirect.com
Pattern Recognition Letters 29 (2008) 1230–1235 www.elsevier.com/locate/patrec
Local stereo matching with adaptive support-weight, rank transform and disparity calibration Zheng Gu, Xianyu Su *, Yuankun Liu, Qican Zhang Department of Optoelectronics, Sichuan University, Chengdu 610064, China Received 9 March 2007; received in revised form 29 December 2007 Available online 21 February 2008 Communicated by P. Bhattacharya
Abstract In this paper, a new window-based method for stereo matching is proposed. Differing with the existing local approaches, our algorithm divides the matching process into two steps, initial matching and disparity calibration. Initial disparity is first approximated by an adaptive support-weight and a rank transform method, and then a compact disparity calibration approach is designed to refine the initial disparity, so an accurate result can be acquired. The experimental results are evaluated on the Middlebury dataset, showing that our method is better than other local methods on standard stereo benchmarks. Ó 2008 Elsevier B.V. All rights reserved. Keywords: Stereo matching; Window-based; Disparity calibration
1. Introduction Stereo matching is the process of finding corresponding points in two or more images. It is one of the most important and challenge subjects in computer vision. So it has been the focus of this field for a long time. A comprehensive overview on stereo matching can be found in (Scharstein and Szeliski, 2002). In general, matching algorithms can be classified into global and local methods. Global approaches (Klaus et al., 2006; Lei et al., 2006; Sun et al., 2005; Y’ang et al., 2006) are incorporating explicit smoothness assumptions and are determining all disparities simultaneously by applying energy minimization techniques. These methods usually have high matching accuracy. However, most of them are computationally expensive and sometimes need many parameters that are hard to determine. Local approaches (Kanade and Okutomi, 1994; Veksler, 2001, 2003; Wang, 2004; Yoon and Kweon, 2006) are utilizing the color or intensity values within a finite window to determine the disparity for each pixel. *
Corresponding author. Tel.: +86 28 6638 5559. E-mail addresses:
[email protected],
[email protected] (X. Su).
0167-8655/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2008.01.032
Compared with global methods, local approaches have simpler structure and higher efficiency. Nevertheless for pixels in homogeneous region, it may be difficult to find their correspondences in other images perfectly by using windowbased methods. Therefore, local methods usually require two major factors to determine: local matching windows and correlation measure. How to select an appropriate matching window for each pixel is a focus in recent area-based matching research. Adaptive-window (Veksler, 2001, 2003) and adaptiveweight (Yoon and Kweon, 2006; Xu et al., 2002) are two familiar methods. Adaptive-window methods try to find an optimal support window for each pixel and that adaptive-weight methods try to assign appropriate support weights to the pixels in a support window whose size and shape are usually fixed. Correlation measure is another important portion in area-based methods. Sum of absolute differences (SAD) and sum of squared differences (SSD) are still two main correlation measures. Otherwise, for reducing the disturbance by noise, truncated AD, rank transform, etc., have been proposed too. Although these effective methods have been applied in local stereo matching, the precision of the area-based
Z. Gu et al. / Pattern Recognition Letters 29 (2008) 1230–1235
methods is still not gratifying. In our consideration, the main reason is that traditional local methods are always one-step. The matching accuracy of them excessively depends on the veracity of matching window selection and correlation measure. It is difficult to obtain the perfect matching result before the perfect window selection method and correlation measure have been found. So in this paper, we try to divide matching process into two steps: initial matching and disparity calibration. By using adaptive support-weight and rank transform to obtain an initial disparity and adding a disparity calibration process to refine it, a precise result can be produced. In detail, our algorithm can be described like this: firstly, both an adaptive support-weight (Yoon and Kweon, 2006) and a rank transform method (Wang, 2004) are employed to acquire an initial disparity map. Then an appropriate calibration window is selected for each pixel using color similarity and geometric proximity. By using these calibration windows, a local disparity calibration method is designed to acquire accurate result disparity. We give a detailed explanation for each part in Sections 2 and 3, show some experimental results in Section 4.
1231
space. Due to our disparity calibration process, this change does not prevent us from achieving accurate results. In the reference image, it is assumed that p is the pixel under consideration and q is a certain point in p’s support window. When DCpq represents the Euclidean distance between two colors Cp = [Rp, Gp, Bp] and Cq = [Rq, Gq, Bq] in the RGB color space. The color similarity of p and q is defined as fs ðDC pq Þ ¼ expðDC pq =cc Þ
ð1Þ
where cc is an experimental value. Otherwise, the spatial proximity of p and q is expressed as fp ðDgpq Þ ¼ expðDgpq =cp Þ
ð2Þ
where Dgpq is the Euclidean distance betweenp and q in the image domain and cp is determined according to the size of the support window as cp / (window size). So the support-weight of a pixel w(p, q) can be written as wðp; qÞ ¼ expððDC pq =cc þ Dgpq =cp ÞÞ:
ð3Þ
2.2. Rank transform 2. Initial matching 2.1. Matching window selection Support window selection is a crux of correspondence search. For acquiring more exact results not only in homogeneous regions but also at depth discontinuities, an appropriate window should be selected adaptively for each pixel. According to the results of Kanade and Okutomi (1994) and Veksler (2001), reliable windows should be large enough to include enough intensity variation for reliable matching and small enough to avoid crossing depth discontinuities. To satisfy this requirement, adaptive-window methods change the size and shape of the support window adaptively. But in fact, finding the optimal support window with an arbitrary shape and size is very difficult and generally known as an NP-hard problem. Differing with adaptive-window methods, adaptive-weight approaches use a general support window, rectangular window, and assign adaptive weight by some compact operations. They are always more precise and have less computational cost than adaptive window methods. So in our implementation, we use the adaptive support-weight approach proposed recently by Yoon and Kweon (2006) which does not depend on initial disparity estimation, which may be erroneous. It assigns an adaptive support-weight to each pixel in a fixed support window based on the color similarity and the spatial proximity to the pixel, which is the center of the support window. In Yoon and Kweon’s work, the color proximity between two pixels within the support window is measured in the CIELab color space. To simplify the computation, we just measure the color similarity in the RGB color
Correlation measure is another important part of the local stereo matching. Using an appropriate correlation measure can obtain more reliable initial disparity. In our work, we select the rank transform (Wang, 2004). The reliability of this method has been demonstrated in (Jasmine and Mohammed, 2001; Wang, 2004). Comparing with other correlation measures, rank transform is more insensitive to image noise and brightness difference of stereo images. In general, this method can be described as follows: firstly, to calculate the intensity difference Diff. between the center pixel p and other pixels in p’s support window. Then according to the intensity difference we define five grades – smallest, smaller, equal, bigger, biggest for every pixel, as the following equation: 8 2 when Diff: < v Smallest > > > > > > < 1 when v 6 Diff: < u Smaller ð4Þ Rank ¼ 0 when u 6 Diff: 6 u Equal > > > 1 when u < Diff: 6 v Bigger > > > : 2 when Diff: > v Biggest For a pixel p(i, j) in the reference image, a rank matrix LeftRank can be acquired by using Eq. (4). Correspondingly, a rank matrix RightRank can also be acquired in the same way for p’s corresponding point pd (i, j + d) in the target image. Then we define an empty matrix F which has the same size with the support window. For each point in LeftRank, if the corresponding point in RightRank has the same rank or not, we assign 1 or 0 to the corresponding element of F. So based on the relation of LeftRank and RightRank, we can assign 1 or 0 to every element of F. Then the similarity of p(i, j) and pd(i, j + d) Corrd can be expressed as
1232
Z. Gu et al. / Pattern Recognition Letters 29 (2008) 1230–1235
P12ðW x 1Þ Corrd ði; jÞ ¼
x¼12ðW x 1Þ
P12ðW y 1Þ
wðði; jÞ; ði þ x; j þ yÞÞ F ðx þ ðW x þ 1Þ=2; y þ ðW y þ 1Þ=2Þ P12ðW x 1Þ P12ðW y 1Þ wðði; jÞ; ði þ x; j þ yÞÞ x¼1ðW 1Þ y¼1ðW 1Þ
y¼12ðW y 1Þ
2
x
2
ð5Þ
y
where Wx andWy represent the length and width of the matching window, respectively. We know that if pd (i, j + d) is the corresponding point of p, points in a finite neighbor region X around p(i, j) and pd(i, j + d) are also corresponding. So the initial disparity Di can be expressed as 8 9 1ðX 1Þ x 1Þ y < 12ðX = 2X X Corrd ði þ x; j þ yÞ Di ði; jÞ ¼ arg max d2Rd : ; 1 1 x¼2ðXx 1Þ y¼2ðXy 1Þ
ð6Þ where Rd = {dmin, . . ., dmax} is the set of all possible disparities. Xx and Xy represent the length and width of X, respectively. In addition, for each pixel under matching, an appropriate X can be acquired by using the edge-based approach (Wang, 2004) which selects the appropriate window by intensity variation. 3. Disparity calibration For local stereo algorithms, there are always some emendable errors in the initial matching result, which decrease the precision of them. If these errors can be corrected efficiently, the accuracy of local methods will be comparable to the global methods. So we introduce a disparity calibration method to overcome this problem. Disparity calibration based on the assumption that in a finite region points which have similar color and short spatial distance should have the similar depth. So we can calibrate a point p’s initial disparity based on the distribution of all points’ initial disparities in a certain window whose center is p. Concretely, the calibration algorithm can be designed as follows: for a pixel p, we first choose a calibration window for it. A reliable window should satisfy the condition that all pixels in this window have the similar color and short distance. Then, we analysis the distribution of all pixels’ initial disparities in this window, namely we calculate the occurrence number of each disparity from dmin to dmax in this window. The disparity which has the highest frequency of occurrences is assigned to p as its final disparity. So our calibration algorithm includes two important parts: calibration window selection and disparity calibration. 3.1. Calibration window selection Like the matching window selection, calibration window selection also assigns an adaptive support-weight wi(p, q) to
each pixel in a fixed support window based on the color and the spatial proximity to the pixel under calibration: wi ðp; qÞ ¼ expððDC pq =ci þ Dgpq =cp ÞÞ
ð7Þ
where ci is also an experimental value. It is worthy of notice that Eq. (7) has a difference with Eq. (3), which is ci. Because the precondition of disparity calibration is that all points in calibration window should have similar color. So in calibration window selection, the constraint of color similarity should be stricter. Therefore, compared with w(p, q) in Eq. (3), wi(p, q) is more sensitive to the change of color, namely ci should be less than cc. 3.2. Calibration After initial matching and calibration window selection, disparity calibration can be executed as follows: 1. For a certain disparity d in predefined search range [dmin dmax], define a zero matrix Cd which has the same size with the reference image. 2. Traverse the initial disparity Di. If there are some pixels whose initial disparity is d, assign 1 to the corresponding elements of matrix Cd. 3. Do similar operations at the other disparities in [dmin dmax]. A series of matrixes which are corresponding with the set of all possible disparities can be obtained. 4. Finally, the result disparity D is expressed as
Dði; jÞ ¼ arg max d2Rd
8 < :
1ðW 1Þ ix 2
X
1ðW 1Þ iy 2
X
wi ðði; jÞ;
x¼12ðW ix 1Þ y¼12ðW iy 1Þ
ði þ x; j þ yÞÞ C d ði þ x; j þ yÞg
ð8Þ
where Wix and Wiy represent the length and width of the calibration window, respectively. It is worthy of notice that comparing with iterative algorithms (Prazdny, 1985; Darrel, 1998; Y’ang et al., 2006) our disparity calibration method has simpler structure and no global reasoning. So it can produce an accuracy result with less computational cost. 4. Experimental results The proposed method has been evaluated on the Middlebury dataset, which are often used for the performance comparison of various methods (Scharstein and Szeliski, 2003). It is run with a fixed parameter set across all four images: the size of the matching window and calibration
Z. Gu et al. / Pattern Recognition Letters 29 (2008) 1230–1235
1233
Fig. 1. Dense disparity maps for the ‘‘Tsukuba”, ‘‘Venus”, ‘‘Teddy” and ‘‘Cones” images. Table 1 Performance of our method Algorithm
AdaptingBP DoubleBP SymBP + occ Our method Segm + visib C-SemiGlob RegionTreeDP EnhancedBP AdaptWeight
Avg. rank
1.9 2.5 5.9 6.4 6.7 7.3 8.3 8.6 9.3
Tsukuba
Venus
Teddy
Cones
nonocc
All
Disc
nonocc
All
Disc
nonocc
All
disc
nonocc
All
Disc
1.11 0.88 0.97 1.19 1.30 2.61 1.39 0.94 1.38
1.37 1.29 1.75 1.42 1.57 3.29 1.64 1.74 1.85
5.79 4.76 5.09 6.15 6.92 9.89 6.85 5.05 6.90
0.10 0.14 0.16 0.23 0.79 0.25 0.22 0.35 0.71
0.21 0.60 0.33 0.34 1.06 0.57 0.57 0.86 1.19
1.44 2.00 2.19 2.50 6.76 3.24 1.93 4.34 6.13
4.22 3.55 6.47 7.80 5.00 5.14 7.42 8.11 7.88
7.06 8.71 10.7 13.6 6.54 11.8 11.9 13.3 13.3
11.8 9.70 17.0 17.3 12.3 13.0 16.8 18.5 18.6
2.48 2.90 4.79 3.62 3.72 2.77 6.31 5.09 3.97
7.92 9.24 10.7 9.33 8.62 8.35 11.9 11.1 9.79
7.32 7.80 10.9 9.72 10.2 8.20 11.8 11.0 8.26
window are both 21 21, u = 2, v = 9, cc = 25, ci = 15, cp = 10.5 (radius of the support window). As shown in Fig. 1, the proposed method can produce accurate piecewise smooth disparity maps.
The performance of the proposed method for the testbed images is summarized in Table 1 to compare the performance with others. The numbers in Table 1 represent the percentage of bad pixels (pixels whose absolute dispar-
1234
Z. Gu et al. / Pattern Recognition Letters 29 (2008) 1230–1235
Fig. 2. Dense disparity map for ‘‘Sawtooth” image. (a) Left image, (b) ground truth, (c) AdaptWeight and (d) our method.
Fig. 3. Dense disparity map for ‘‘Map” image. (a) Left image, (b) ground truth, (c) AdaptWeight and (d) our method.
ity error are greater than 1) for all pixels, ‘‘all”, pixels in non-occluded regions, ‘‘nonocc”, and pixels near depth discontinuities, ‘‘disc”. The ninth method of Table 1, ‘‘AdaptWeight”, is demonstrated to be the best among the state-ofthe-art area-based local methods in (Yoon and Kweon, 2006). Our method, however, has exceeded it in almost all test items except ‘‘all” of Teddy and ‘‘disc” of Cones as illustrated in the table because of our disparity calibration process. Some global algorithms such as Segm + visib (Bleyer and Gelautz, 2004), C-SemiGlob (Hirschmueller, 2006) and RegionTreeDP (Lei et al., 2006) which are insurmountable for AdaptWeight, are surpassed by our method. Moreover, the top three methods, AdaptingBP (Klaus et al., 2006), DoubleBP (Y’ang et al., 2006) and SymBP + occ (Sun et al., 2005), which are prior to our algorithm are all global methods. So the proposed method is the best among the area-based methods on standard stereo benchmarks. Of course, the proposed method is a little more expensive than other local methods in computational cost because it adds a disparity calibration process. Fortunately, comparing with the obvious increase of matching precision, the little additional computational cost is negligible. It is worthy of notice that the result of ‘‘Teddy” dataset is worse than Segm + visib, C-SemiGlob and RegionTreeDP which are surpassed by the proposed method as a whole. This is because there are many repetitive textures and textureless regions in the ‘‘Teddy” dataset such as the regions around the mauve bear. So in these areas, the amount of aggregated support for each pixel may be deficient. Finally the decrease of the discriminative power of
our method results in some false matches. Furthermore, the proposed method may produce inaccurate results in massive textureless regions because it uses the finite support window which is insufficient sporadically. More results for the real images are shown in Figs. 2 and 3. Both image pairs, Sawtooth and Map, are also often used to test stereo matching algorithm. As shown in Figs. 2 and 3, the proposed method yields the accurate results in both homogeneous and depth discontinuous regions for the test images. 5. Conclusions In this paper, a novel area-based stereo matching algorithm based on adaptive support-weight, rank transform and disparity calibration has been proposed. In our method, the matching process is divided into two steps: initial matching and disparity calibration. Adaptive supportweight and rank transform are combined to obtain a reliable initial disparity map and that a concise calibration method is designed to refine the initial result efficiently. As a result, an accurate matching result can be produced. As demonstrated on the Middlebury stereo evaluation test bed, the proposed method has high correct matching rate and it is better than other local methods. Acknowledgement The work was supported by the National Natural Science Foundation of China (Grant No. 60527001).
Z. Gu et al. / Pattern Recognition Letters 29 (2008) 1230–1235
References Bleyer, M., Gelautz, M., 2004. A layered stereo algorithm using image segmentation and global visibility constraints. IEEE Conf. Image Process. 5, 2997–3000. Darrel, T., 1998. A radial cumulative similarity transform for robust image correspondence. Proc. IEEE Conf. Comput. Vision Pattern Recognition, 656–662. Hirschmueller, H., 2006. Stereo vision in structured environments by consistent semi-global matching. CVPR 2, 2386–2393. Jasmine, Banks, Mohammed, Bennamoun, 2001. Reliability analysis of the rank transform for stereo matching. IEEE Trans. Systems Man Cybernet. 31, 870–880. Kanade, T., Okutomi, M., 1994. A stereo matching algorithm with an adaptive window: Theory and experiments. J. IEEE Trans. Pattern Anal. Machine Intell. 16 (9), 920–932. Klaus, A., Sormann, M., Karner, K., 2006. Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. ICPR 3, 15–18. Lei, C., Selzer, J., Yang, Y., 2006. Region-tree based stereo using dynamic programming optimization. CVPR 2, 2378–2385. Prazdny, K., 1985. Detection of binocular disparities. Biological Cybernet. 52, 93–99.
1235
Scharstein, D., Szeliski, R., 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Internat. J. Comput. Vision 47 (1/2/3), 7–42. Scharstein, D., Szeliski, R., 2003. High-accuracy stereo depth maps using structured light. CVPR 1, 95–202. Sun, J., Li, Y., Kang, S.B., Shum, H.Y., 2005. Symmetric stereo matching for occlusion handling. CVPR 2, 399–406. Veksler, O., 2001. Stereo matching by compact windows via minimum ratio cycle. ICCV 1, 540–547. Veksler, O., 2003. Fast variable window for stereo correspondence using integral images. CVPR 1, 556–561. Wang, K., 2004. Adaptive stereo matching algorithm based on edge detection. ICIP 2, 1345–1348. Xu, Y., Wang, D., Feng, T., Shum, Y.H., 2002. Stereo computation using radial adaptive windows. ICPR 3, 595–598. Y’ang, Q., Wang, L., Yang, R., Stew’enius, H., Nist’er, D., 2006. Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. CVPR 2, 2347– 2354. Yoon, K.J., Kweon, I.S., 2006. Adaptive support-weight approach for correspondence search. J. IEEE Trans. Pattern Anal. Machine Intell. 28 (4), 650–656.