Photo Hull regularized stereo

Photo Hull regularized stereo

Image and Vision Computing 28 (2010) 724–730 Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com...

497KB Sizes 1 Downloads 107 Views

Image and Vision Computing 28 (2010) 724–730

Contents lists available at ScienceDirect

Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis

Photo Hull regularized stereo q Shufei Fan *, Frank P. Ferrie Center for Intelligent Machines, McGill University, 3480 University, Montreal, Canada H3A2A7

a r t i c l e

i n f o

Article history: Received 17 December 2006 Received in revised form 17 June 2008 Accepted 7 October 2008

Keywords: Stereo Photo Hull Regularization

a b s t r a c t A regularization-based approach to 3D reconstruction from multiple images is proposed. As one of the most widely used multiple-view 3D reconstruction algorithms, Space Carving can produce a Photo Hull of a scene, which is at best a coarse volumetric model. The two-view stereo algorithm, on the other hand, can generate a more accurate reconstruction of the surfaces, provided that a given surface is visible to both views. The proposed method is essentially a data fusion approach to 3D reconstruction, combining the above two algorithms by means of regularization. The process is divided into two steps: (1) computing the Photo Hull from multiple calibrated images and (2) selecting two of the images as input and solving the two-view stereo problem by global optimization, using the Photo Hull as the regularizer. Our dynamic programming implementation of this regularization-based stereo approach potentially provides an efficient and robust way of reconstructing 3D surfaces. The results of an implementation of this theory is presented on real data sets and compared with peer algorithms. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction Inferring a three-dimensional (3D) model of a scene from two or more images is a central problem in Computer Vision. The modeling process leads to reconstructed 3D surfaces of a scene. Two different ways of using these images have resulted in two categories of 3D reconstruction: the stereo matching based approach and the photo-consistency based approach. The stereo matching based approach reconstructs 3D surfaces by using correspondence information. It first identifies the corresponding pixels of an image pair. After that, the surface points can be estimated by geometrically triangulating the optical rays that produced these two corresponding pixels, based on information from the cameras that took the images. In contrast, photo-consistency based reconstruction does not explicitly match points between images. It infers a 3D volume, called the Photo Hull, in the space that looks consistent with the input images. This approach exploits an important constraint of images – color. Assuming Lambertian surface and constant illumination, a valid point on the scene surface appears with the same color over all images that are visible. This constraint is known as photo-consistency. As output of the algorithm, the Photo Hull is a collection of surface voxels in 3D space that are believed to lie on the surfaces of the object. The stereo matching based approach deals with correspondence, which itself is a difficult problem; while the photo-consis-

q

Originally submitted for consideration in the Canadian Robotic Vision 2005–2006 Special Issue, published in issue IMAVIS 27/1–2. * Corresponding author. Tel.: +1 514 3982185. E-mail addresses: [email protected] (S. Fan), [email protected] (F.P. Ferrie). 0262-8856/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2008.10.008

tency based approach avoids correspondence and is concerned with the volume hypothesis. Although the former performs badly for occluded regions, it has the advantage of reconstructing at a relatively fine resolution. In contrast, the latter attacks the occlusion problem by surrounding the object with cameras, but it can only produce an approximate model of the scene. In this paper, we show that a refined reconstruction can be achieved using both photo-consistency and stereo matching. The fusion is conducted through a regularization-based approach. Working under the stereo matching based paradigm, we incorporate knowledge from the Photo Hull as an additional soft constraint to our surface reconstruction. A photo-consistency cost, measuring the deviation of assigned disparity from that predicted by the Photo Hull, is introduced into the stereo matching algorithm. The scanline based dynamic programming (DP) algorithm is used for global minimization. This paper is organized as follows. Section 2 discusses related work. Sections 3 and 4 describe our proposed method. Experimental results are reported in Section 5 and we conclude this paper in Section 6. 2. Related work 2.1. Stereo As outlined in a survey of the state-of-the-art [13], stereo vision is categorized into the following four steps: matching cost computation, cost aggregation, disparity computation, and disparity refinement. Most methods differ in the way they compute disparity. Global methods [2,5,4] are shown to produce more accurate

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

725

era calibration errors and partial emptiness of surface voxels [1]. A detailed survey of photo-consistency based reconstruction is available in [17]. More recently, a quantitative evaluation of a broad range of multi-view reconstruction algorithms, including photo-consistency based reconstruction, was reported by Seitz et al. [15]. Our regularized stereo algorithm uses the Photo Hull as part of our input, in the form of regularization. So in the preliminary step, the Photo Hull is generated by the algorithm in [1]. For one pair of images among the image set, we can produce an approximate disparity (we call it predicted disparity, denoted by dpred ) using the Photo Hull. Due to the coarse and inaccurate nature of the Photo Hull, dpred is usually within a small distance from the true disparity but may contain wildly erroneous values at some locations.

disparity [13]. They are cast as energy-minimization problems and most of them are computationally expensive. Among the global methods, dynamic programming [4,9,13] computes the global minimum for independent scanlines in polynomial time and is found to be efficient in finding dense scanline matches. For each pair of scanlines, a square matrix of correspondence cost is constructed. With the x and y coordinates of the matrix representing the left and right scanlines respectively, each element ðx; yÞ of this matrix represents the cost of matching pixel x of the left scanline to pixel y of the right scanline. By computing a minimum-cost path through this matching cost matrix, the optimal disparity is identified. Under uniqueness and ordering constraints, recent DP [4,6] approaches have modeled partial occlusions explicitly. Whenever a partial occlusion occurs in one of the images, a group of pixels in one image is assigned to a single pixel in the other, and at the same time an occlusion cost is incurred for it. With the ordering constraint, relative ordering of pixels on a scanline is assumed to remain the same between the two views. For most scenes except when there is a thin foreground, this ordering constraint holds. Pre-calculated ground control points (GCPs) are introduced to eliminate mismatches [4]. These GCPs consist of highly confident matches extracted from feature correspondences. Another line of DP is to ignore occlusion completely and include only smoothness constraints. Scanline Optimization (SO) [13] considers all candidates at the previous pixel. Large disparity jumps are allowed, but are penalized with a discontinuity cost. Reliability-based Dynamic Programming (RDP) [9,10] progressively recovers disparity via multiple dynamic programming passes. After each dynamic programming pass, reliability at each pixel can be calculated. The reliability is used to measure how much better the chosen disparity is than other possible disparity assignments. The disparities with high enough reliability (higher than a certain threshold) are considered confident and used as GCPs for the next DP pass. A rather different approach to occlusion problem is to implicitly model occlusion. Photo-consistency based reconstruction [12,16] (detailed in Section 2.2) belongs to this category. Recently, Drouin et al. [8] investigated inconsistencies between the depth maps and was able to obtain sharp and well-located depth discontinuities.

We are given a pair of rectified stereo images. Let I0 denote the reference (left) image and I1 the matching (right) image. We want to find an optimal disparity dðpÞ for every pixel pðx; yÞ of I0 that best describes the shape of the surfaces in the scene. Hence, dðpÞ is a function defined in the image domain I0 ðpÞ. For a horizontally rectified image pair, pixel p0 ðx; yÞ of I0 corresponds to pixel p1 ðx þ d; yÞ of I1 . Every pair ðp; dðpÞÞ (abbreviated as ðp; dÞ) is called a match, which can accurately identify a 3D point by triangulation. A match cost function Cðp; dÞ states the cost of declaring pixel p as having disparity d.

2.2. Photo-consistency based reconstruction

3.2. Energy-minimization framework

Seitz et al. [16] first proposed the idea of reconstructing scene models based on photo consistency. Kutulakos [12] proposed an algorithm called Space Carving, which yields an unique reconstruction, the Photo Hull. Essentially it uses color information of the images as constraints and builds a volumetric model that is photo-consistent with all the input images. The process starts from an initialized 3D volume that contains the scene as an unknown sub-volume. The volume is usually a large cube and is divided into a set of small cubes, called voxels. Each voxel in the 3D space is used as an approximation of whether or not the scene occupies that position. The photo-consistency based algorithm proceeds by iteratively testing the photo-consistency of each of the surface voxels. Each voxel is projected onto images that are visible to it and these projections are compared. If these projections are consistent in color, this voxel is identified as a true surface voxel and kept in the Photo Hull; otherwise, this voxel is declared as not occupied by the object and carved out of the Photo Hull. The algorithm ends when all surface voxels are consistent with the input images. To solve the visibility problem, Space Carving has to check voxels in a certain order. Alternatively, Generalized Voxel Coloring (GVC) [7] keeps a list of the surface voxels and updates the exact visibility. More recently, improved results has been achieved by dealing with cam-

Under global optimization methods, disparities of all pixels are assigned so that overall matching cost, or energy of this disparity configuration

2.3. Overview of our work Using the Photo Hull generated by Space Carving as a prior constraint, we compute a globally optimal disparity map. Using dpred as a regularization term, we perform global minimization using dynamic programming. We run dynamic programming in two passes. In the first vertical pass, only matching points with sufficient confidence are retained. Then they are used as GCPs for the subsequent horizontal dynamic programming pass. 3. The method 3.1. Notation

EðdÞ ¼

X

Cðp; dÞ;

p

is minimum. 3.2.1. Data matching cost In dense stereo, the pixel matching cost is based on intensity or color differences of the corresponding pixels: C data ðp; dÞ ¼ f ðI0 ðx; yÞ; I1 ðx þ d; yÞÞ This cost function can be the sum of squared differences (SSD) or the normalized correlation. We use the Birchfield–Tomasi matching cost that is insensitive to image sampling[3]. Thus, the pixel-matching cost across the image is

C data ðdÞ ¼

X

C data ðp; dÞ:

ð1Þ

p

3.2.2. Smoothness cost To generate a realistic disparity map, the surface height is also considered to be varying smoothly. Differences in neighboring pixels’ disparities are penalized with smoothness costs

726

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

C smoothness ðdÞ ¼

X p

0 @

X

1

qðdp0  dp ÞA;

ð2Þ

p0 2N p

where N p is the four-pixel neighborhood of pixel p. qðÞ is usually a monotonically increasing function. To preserve discontinuities at object boundaries, this work adopts the robust regularization framework outlined in [13] and chooses qðÞ to be a truncated quadratic function. 3.2.3. Energy-minimization Global optimization [13] integrates the above constraint in an energy-minimization framework. The energy functional to be minimized is generalized as follows:

EðdÞ ¼ C data ðdÞ þ ks  C smoothness ðdÞ:

ð3Þ

The data term C data measures how well the disparity function ðp; dÞ agrees with the input image pair. While the smoothness term C smoothness encodes the surface smoothness assumption. 3.3. Photo Hull regularization

X

rðdp ; dpred Þ:

ð4Þ

p

Photo-consistency cost C photo ðdÞ carries information from the Photo Hull and acts as one more important constraint for disparity computation. Function rðÞ is designed to be a monotonically increasing function of the difference between the assigned disparity d and the predicted disparity dpred . A typical choice of r would be a quadratic function. Disparities within a certain distance of the predicted disparity are considered acceptable; hence, very small costs are introduced. In practice, the Photo Hull contains stray voxels in empty space and missing voxels on the object surface, which cause outliers in the predicted disparity. To obtain a regularization term robust to noise, r is chosen as a truncated quadratic function. For those pixels occluded in the matching image, no predicted disparities are available. In this case, a medium value of cost is applied, leaving those pixels unbiased by the Photo Hull. The global energy functional is obtained by adding together the data matching cost, the smoothness cost and the photo-consistency cost as follows:

EðdÞ ¼ C data ðdÞ þ ks  C smoothness ðdÞ þ kp  C photo ðdÞ:

4.1.2. Matching over foreground Unlike most stereo algorithms, we ran our experiments on image pairs that have changes in the scene – namely the background. Adapted to this situation, our dynamic programming stereo algorithm only matches pixels in or near the foreground area of the images. A foreground segmentation process makes it possible to search over a reduced range. Projecting the Photo Hull onto the target image, its footprint delineates the object’s approximate foreground. The outermost region of its footprint is defined as boundary of the search range. Due to the characteristics of the Photo Hull, although this foreground is not an accurate segmentation, it is guaranteed to contain the true foreground. Algorithm 1

Under the above mentioned global energy-minimization framework, we integrate the Photo Hull by introducing an additional data term, called the photo-consistency cost C photo ðdÞ

C photo ðdÞ ¼

ordering constraint at the first stage and allow the path to disparity d of pixel p to originate from any disparity of pixel p  1. This idea is illustrated in Fig. 1. The corresponding algorithm for searching for the optimal path is shown in Algorithm 1. Once the final pixel X is reached, the path to the optimal match can be traced back from the array T.

Optimal path search. In the pseudo-code, C cum ðx; dÞ is the best cumulative matching cost up to pixel x for disparity assignment d; cs ðx; dÞ is the smoothness cost at ðx; dÞ; cðx; dÞ is the matching cost of pixel x for disparity d; Tðx; dÞ stores the trace of the best path, i.e. the best disparity at its previous pixel ðx  1Þ if ðx; dÞ were on the best path. From pixel x ¼ 1 to X { From disparity d ¼ 1 to D { 0 D C1 ¼ mind0 ¼1 ðC cum ðx  1; d ÞÞ þ cs ðx; dÞ þ cðx; dÞ C2 ¼ C cum ðx  1; dÞ þ cðx; dÞ C cum ðx; dÞ ¼ minðC1; C2Þ If ðC cum ðx; dÞ ¼ C1Þ 0 Tðx; dÞ ¼ argmind0 C cum ðx  1; d Þ else Tðx; dÞ ¼ d } }

4.2. Confident matches In computing confident matches, two measures are used to minimize outliers. One measure is a reliability measure [9] and the other is pixel similarity.

ð5Þ

4. Dynamic programming implementation We propose a dynamic programming algorithm that searches for the optimum disparity over an object’s surface in two stages. The first stage identifies the disparity of the most confident surface points. The second stage computes the disparity of the rest of the surfaces. 4.1. Dynamic programming over foreground 4.1.1. Dynamic programming During cumulative cost aggregation, we search for the best path and cumulative cost C c umðx; dÞ that leads to ðx; dÞ. Array T stores the best path leading to the current point. The conventional method is to impose uniqueness and ordering constraints on the path search. However, this usually blurs the disparity map and causes a loss of detail. Similar to [9], we ignore the

Fig. 1. Cumulative cost aggregation. Cumulative cost C cum ðx þ 1; dÞ of pixel x þ 1 having disparity d is updated from previously calculated C cum , according to the dynamic programming updating rule. Dashed squares indicate which direction the path might originate from. Note in (a) (our approach), all disparities at pixel x are possible, whereas in (b) (conventional approach), only transitions from three disparities are considered.

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

727

4.2.1. Reliability Rðp; dÞ of match ðp; dÞ is defined as the cost difference between the best path that does not pass through ðp; dÞ and the best path that passes through ðp; dÞ [9]. From the cumulative costs of the scanline computed in two directions from both ends, reliability can be efficiently calculated [10]. From its definition, reliability Rðp; dÞ is always positive. The higher Rðp; dÞ, the more likely the matched pixel will have disparity d. 4.2.2. Pixel similarity Sðp; dÞ of match ðp; dÞ is a measure of the single-site matching cost. It is defined as the matching cost produced by the image pair and the regularization term

Sðd; pÞ ¼ C data ðp; dÞ þ kp  C photo ðp; dÞ:

ð6Þ

Note it is based on both image dissimilarity and photo-consistency. Sðp; dÞ will put large penalties on the background areas, because they are changed between images. So, even if they are matched due to inaccuracy in the Photo Hull, they will not be considered as confident matches due to pixel dissimilarity. Only those pixel matches that pass both reliability and similarity thresholds are kept as confident matches. 4.3. Complete matching Once confident matches are found in the first pass, a second dynamic programming pass is conducted using confident matches as GCPs. Since no ordering constraint is enforced, this dynamic programming algorithm is not limited to the scanline direction, which usually produces unwanted streaks in the disparity map. The second dynamic programming pass is oriented in the direction perpendicular to the first one. In this paper, the first pass is along the vertical direction and the second is along the horizontal. 5. Experiments We used the test bed proposed by Scharstein and Szeliski [13] that is available online [14] and compared our method against algorithms not using Photo Hull regularization, including Dynamic Programming with Occlusion Modeling, Scanline Optimization, Graph Cut, and Simulated Annealing. With regard to Graph Cut, we used the algorithm and implementation by Kolmogorov et al. [11]. The plan of this section is as follow. After describing data set we used, we will show our comparative advantage in terms of accuracy of the disparity map. Then we will show two key components of our approach: the Ground Control Points obtained after the first DP pass and the effect of regularization term. Finally we compare run time performance of our method against the ones mentioned above.

Fig. 2. Pipe image pair and the Photo Hull from corresponding views.

Encoding coarse correspondence information, the Photo Hull can be readily used to produce the predicted disparity ðdpred Þ. dpred is produced with respect to the chosen image pair and is used in subsequent stereo matching as the prior disparity. Since the Photo Hull can contain occluded voxels, dpred will explicitly mark the pixel in the reference image as occluded if the voxel is not visible to the matching image. The predicted disparity of Pipe is shown in Fig. 3. 5.2. Disparity map analysis Since consistent improvements are observed on both Pipe and Munk, we will focus much of our comparative analysis on results obtained on the Pipe (in Section 5.2.1). We will give the results on the Munk in Section 5.2.2. 5.2.1. Improvement introduced by Photo Hull regularization Working on the Pipe, we show in Fig. 4 a comparison of disparity maps generated by three methods: Graph Cut Optimization, Scanline Optimization, and the proposed method. The proposed method used the Photo Hull as regularizer, while the other two did not. We found by experiment that the approaches under comparison achieve their best results when we used a smoothness cost of 20. For our approach, the regularization coefficient kp was set to 2 and the reliability threshold to 30.

5.1. Data set Full testing requires generation of test sequences in our laboratory with full camera calibration and inter-calibration detail. We experimented our method on two 3D objects, the Pipe and the Munk. In each of the cases, the object was sitted on a turn table at the center and a sequence of 12 images are taken. To prevent background from introducing false photo-consistency, we change background color between neighboring images. These images are calibrated manually using the online toolbox [18]. We computed the Photo Hull using the algorithm described in [1]. An image pair from the Pipe sequence and the corresponding Photo Hull are shown in Fig. 2.

Fig. 3. Predicted disparity (Pipe). Black denotes background part; white denotes occluded part; and gray-scale levels denote the predicted disparity values in integer.

728

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

Fig. 4. Disparity maps comparison (Pipe).

In addition to nice segmentation, noticeable disparity improvement on the surface was achieved with our approach (Fig. 4(c)). First of all, the proposed method achieved the most visually detailed disparity map. Taken at a short distance, the input images of the Pipe are more widely separated than narrow baseline stereo images. In addition, the object surfaces are neither highly textured nor Lambertian. Hence for this data set, accurate matching is a more challenging task. The disparity map by the Graph Cut is correct only at a coarse scale (see Fig. 4(a)), failing to recover fine depth variations at most parts of the images (such as head, fingers, pipe, etc.). In comparison, disparity map by our method is much more refined. Secondly, compared to traditional dynamic programming methods, fewer streaks are found. This is due to two factors: regularization and two perpendicular dynamic programming passes. The regularization term encodes consistency between scanlines and its property is inherited in our approach. Running the first dynamic programming pass vertically will produce vertically consistent GCPs. These GCPs enforce neighboring scanlines to be consistent in the second horizontal pass. Thirdly, fewer unwanted holes and obtrusions appear with our method. Holes and obtrusions in the disparity map are due to local minima of optimization. Regularization with Photo Hull influences the optimization towards the global minimum. With our approach, we can see that the disparity map is more consistent with the object surface. One common shortcoming for all approaches is the specular area, on the man’s belly. No disparity is found due to sharp color changes incurred by reflection and improving this result is a focus of future research. Lastly, our algorithm outperforms traditional stereo on occluded regions. Occlusions are very difficult to recover from stereo algorithms. In contrast to a typical stereo data set, a large portion of our image pair is occluded, which makes matching more difficult. Areas near the left edge of the foreground are occluded, but they are assigned more realistic disparities in our method. We believe occlusions encoded in the regularization term improve the resulting disparity map in our algorithm. 5.2.2. Disparity Map on Munk We also tested this algorithm on the Munk image set. The image pair and corresponding Photo Hull are shown in Fig. 5 and its predicted disparity is shown in Fig. 6. Disparity maps obtained by the same three competing methods are shown in Fig. 7. Similar to the Pipe, stereo matching on the Munk are made difficult by several factors: the Munk surfaces are not highly textured, lighting changes occurred on many parts of the Munk, and both sides of the Munk are occluded. All of the methods have difficulty at the occluded part. On the visible part, Graph Cut produced blobs of surfaces of constant disparities and neighboring blobs often have sudden jump in depth. This is a compro-

Fig. 5. Munk image pair and the Photo Hull from corresponding views.

Fig. 6. Predicted disparity (Munk). Black denotes background part; white denotes occluded part; and gray-scale levels denote the predicted disparity values in integer.

mise of energies coming from pixel dissimilarity penalty and disparity smoothness constraint: neighboring pixels are given the same disparity to minimize smoothness cost on homogeneous regions; and sudden disparity jump will occur when pixel dissimilarity cost calls for this adjustment. In contrast, the proposed approach was able to recover continuously varying disparity thanks to the energy introduced by the Photo Hull regularization. At the same time, our disparity map also has the advantages of fewer streaks, less unwanted holes and obtrusions compared to the other two.

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

729

Fig. 7. Disparity maps comparison (Munk).

5.3. Ground control points

Table 1 Run time performance (640  480 RGB image, max disparity = 255)

The two-pass dynamic programming implementation provides robustness and inter-scanline consistency. Taking the Pipe as an example, We show in Fig. 8 the GCPs obtained from the first DP pass. Notice that the GCPs span across horizontal scanlines, which enforces inter-scanline consistency.

Algorithms

Time (s)

Regularized Stereo DP with Occlusion Modeling Graph Cut Scanline Optimization Simulated Annealing

143.4 140.4 4948 301.3 162,300

5.4. Effect of regularization coefficient Results (on Pipe) from varying regularization coefficient kp are presented in Fig. 9. With kp larger than 2, the disparity map degrades with the increase of kp . In the extreme, it can be anticipated to degrade to the predicted disparity. From this perspective, stereo images are post-processing the Photo Hull by regularization. Outliers on the Photo Hull are remedied by stereo matching. 5.5. Run time performance Table 1 summarizes the run time performance of our algorithm on the Pipe in comparison with other approaches. Test results are obtained on a Pentium IV 3.2G PC. Note that the

Fig. 8. Ground control points obtained after first dynamic programming pass.

relatively long running time is due to a large sized image (640  480) and the maximum disparity setting (255). For regularized stereo, the Photo Hull calculation is not part of the stereo algorithm so its running time is not included. Dynamic programming with occlusion modeling is the most efficient global minimization algorithm and is completed in 140 s. Regularized stereo runs two passes of dynamic programming in 143 s. Its efficiency is comparable to the state-of-the-art (excluding algorithms using graphics hardware). 6. Conclusions In conclusion, we have presented an efficient method for stereo using the Photo Hull as a further matching constraint. As expected, the resulting disparity map satisfactorily fuses advantages of stereo matching and the Photo Hull. It captures more detailed depth variation than that obtained by either Space Carving or state of the art stereo algorithms. Global stereo algorithms perform well on piece-wise planar scenes, but they often fail to recover fine depth variations (as shown in Figs. 4,6). As pointed out in Section 5.2, the Pipe data set contains some difficult spots to reconstruct, such as those corresponding to shadows, the specular belly, and large areas of occlusion. An improved result can be achieved if some post processing is performed, for example disparity interpolation over the specular regions. Furthermore, we could also improve the disparity map by incorporating more infor-

Fig. 9. Effect of Photo Hull regularization. Disparity map generated with varying regularization weight kp .

730

S. Fan, F.P. Ferrie / Image and Vision Computing 28 (2010) 724–730

mation from the Photo Hull, in which information such as occlusion is explicitly represented. We are investigating these topics as part of our current research as well performing more detailed quantitative analyses of the improvements afforded by this approach. A final point of discussion concerns the practicality of the Photo Hull regularization approach in the context of general purpose stereo algorithms. Indeed, the logistical burden of a multi-camera setup not to mention the computational overhead of a space carving algorithm would appear to come at high cost relative to the improvements gained. However, with the increasing utilization of multi-camera setups, particularly in surveillance applications, volumetric stereo is an attractive sensing modality. In addition, computatinal considerations are increasingly offset with each new generation of hardware (e.g. GPGPU). The fusion of volumetric and surface-based approaches would seem to be a logical progression with clear advantages. Acknowledgements The authors would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada under Grants RGPIN 36560-06 and STPGP-270194-03, and the GEOIDE Network of Centers of Excellence under Grant TDMDFM35. We also wish to thank the anonymous reviewers for their helpful comments in improving this paper. References [1] Z. Anwar, Towards robust voxel-coloring: handling camera calibration errors, partial emptiness of surface voxels, Master’s thesis, Department of Electrical and Computer Engineering, McGill University, August 2005. [2] S.T. Barnard, A stochastic approach to stereo vision, Readings in Computer Vision: Issues, Problems, Principles, and Paradigms (1987) 21–25.

[3] S. Birchfield, C. Tomasi, A pixel dissimilarity measure that is insensitive to image sampling, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (4) (1998) 401–406. [4] A.F. Bobick, S.S. Intille, Large occlusion stereo, International Journal of Computer Vision 33 (3) (1999) 181–200. [5] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (11) (2001) 1222–1239. [6] I.J. Cox, S.L. Hingorani, S.B. Rao, A maximum likelihood stereo algorithm, Computer Vision and Image Understanding 63 (3) (1996) 542–567. [7] W.B. Culbertson, T. Malzbender, G.G. Slabaugh, Generalized voxel coloring, in: Proceedings of ICCV Workshop on Vision Algorithms Theory and Practice, 1999, pp. 100–115. [8] M.-A. Drouin, M. Trudeau, S. Roy, Geo-consistency for wide multi-camera stereo, in: Proceedings of Computer Vision and Pattern Recognition, 2005, pp. 351–358. [9] M. Gong, Y.-H. Yang, Fast stereo matching using reliability-based dynamic programming and consistency constraints, in: International Journal of Computer Vision, 2003, pp. 610–617. [10] M. Gong, Y.-H. Yang, Fast unambiguous stereo matching using reliabilitybased dynamic programming, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (6) (2005) 998–1003. [11] V. Kolmogorov, R. Zabih, Multi-camera scene reconstruction via graph cuts, in: Proceedings of European Conference on Computer Vision, 2002. [12] K.N. Kutulakos, S.M. Seitz, A theory of shape by space carving, International Journal of Computer Vision (1999) 307–314. [13] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision 47 (1–3) (2002) 7–42. [14] D. Scharstein, R. Szeliski, Middlebury stereo vision page, available from http:// vision.middlebury.edu/stereo/, Retrieved 2006. [15] S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski, A comparison and evaluation of multi-view stereo reconstruction algorithms, in: Proceedings of Computer Vision and Pattern Recognition, 2006, pp. 519–528. [16] S.M. Seitz, C.R. Dyer, Photorealistic scene reconstruction by voxel coloring, in: Proceedings of Computer Vision and Pattern Recognition, 1997, pp. 1067–1073. [17] G.G. Slabaugh, W.B. Culbertson, T. Malzbender, R.W. Schafer, A survey of methods for volumetric scene reconstruction from photographs, Volume Graphics, 2001. [18] K. Strobl, W. Sepp, Camera Calibration Toolbox for Matlab, available from http://www.vision.caltech.edu/bouguetj/calib_doc/, Retrieved 2006.