Optics and Lasers in Engineering 72 (2015) 26–38
Contents lists available at ScienceDirect
Optics and Lasers in Engineering journal homepage: www.elsevier.com/locate/optlaseng
One-shot 3D gradient field scanning J. Matías Di Martino n, Alicia Fernández, José A. Ferrari Facultad de Ingeniería – Universidad de la República, J. Herrera y Reissig 565, 11300 Montevideo, Uruguay
art ic l e i nf o
a b s t r a c t
Article history: Received 13 February 2015 Received in revised form 10 March 2015 Accepted 1 April 2015 Available online 17 April 2015
Three-dimensional (3D) shape profiling is an important and practical problem that is being widely studied in the academia and applied in industrial field. In the present work we propose a 3D scanning technique based on the combination of orthogonal fringe projection. It allows us to compute depth field gradient maps in a fast and efficient manner by measuring the local bending of the projected fringes. In the present work we extend the ideas presented in our previous work (di Martino et al. (2014) [1]), obtaining a more general analytical description and evaluating in-depth the different parameters and steps involved in the proposed approach. In addition, extensive validation experiments are presented which show the potential and limitations of the proposed framework. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Differential 3D Depth retrieval Fringe projection
1. Introduction Scene understanding is one of the most challenging and popular problems in the field of computer vision. Robots and computers need to extract three-dimensional (3D) information of scenes in order to understand the environment and be able to perform task that for humans may be trivial. To make this possible a wide variety of methods for 3D scene reconstruction have been proposed. In a first rough classification 3D scanning techniques can be divided into Contact and Non-contact methods. Contact based methods generally provide very accurate results, nevertheless as they involve mechanical movements of a probe device from one point to another, they share some drawbacks; e.g. data acquisition can be very time consuming, they are limited to static and rigid scenes, they are sensitive to vibrations and temperature variations, and finally they need to contact the objects. One example of such methods is the Coordinates Measuring Machine (CMM) [2]. On the other hand, non-contact methods have become popular in the last decades and they are nowadays the mainly used methods for 3D shape reconstruction. Many commercial devices are being used [3,4], for example in automobile industry the ATOS scanner of Gesellschaft fur Optische Messtechnik [5]. There are different kinds of non-contact methods, we will particularly focus on optical methods. They can be divided into three categories: Time delay based, Image cue based and Triangulation based. The first class of methods operates using some kind of transreceiver device [6] that emits a signal which is received after being reflected by the test surface;
n
Corresponding author. Tel.: þ 598 2711 5444. E-mail addresses: matiasdm@fing.edu.uy (J.M. Di Martino), alicia@fing.edu.uy (A. Fernández), jferrari@fing.edu.uy (J.A. Ferrari). http://dx.doi.org/10.1016/j.optlaseng.2015.04.001 0143-8166/& 2015 Elsevier Ltd. All rights reserved.
then, the delay or any distortion in the reflected signal is measured and related to the depth of the scene [7]. The second class of methods uses some cues on images which under certain hypothesis allow to extract three-dimensional information, examples of such methods are Depth from focus and defocus (DFF/DFD) methods [8–10]. Finally, the third class of methods uses – at least – two elements in order to obtain geometrical information of the scene by means of triangulation [4,5,11,12]. Some methods use two or more cameras (multi-view stereo [13]), while other methods use one camera and control illumination sources, e.g. Structured light [14,15,1], Photometric stereo [16], Phase Shifting [17,18,11,19,20] and Lasers Scans [4]. Methods that control or set illumination conditions are called active in contrast with passive methods in which just the relative position between the camera(s) and the objects is considered. Despite the differences that exist between Active and Passive approaches, they are based on the same geometrical principles [21,22]. Numerous techniques for surface imaging by structured light are currently available. We can classify existing methods into two categories: Single-Shot and Multi-Shot as illustrated in Fig. 1 (see for example [23–25] and references therein). When the target object is static and the applications do not impose restriction over the acquisition time, multiple-shots approaches may be used. However, if the target is a moving scene, single shot methods must be applied. These methods have also an important advantage, they do not require any synchronization between the projector and the camera/s which leads to a simpler and cheaper implementation. The present work extends our previous work [1] where we presented a novel 3D scanning technique which can be identified as Single-Shot, Active Approach. In our previous work, we introduced the fundamental aspects and theory of the novel method under certain hypothesis. In the present work a more general
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
27
Fig. 2. Illustration of the acquisition setup.
Fig. 1. Classification of most popular structured light frameworks.
analytical description is addressed which has less hypothesis and raises some previous restrictions. A geometrical interpretation of the proposed techniques is also presented which allows us to understand in a more simple and intuitive way the Differential 3D (D3D) approach. In addition, we significantly extend experimental validation and we explore the effect of: the integration step, different noise sources, the light power of the projected pattern, and fringes' width. The rest of this work is organized as follows: Section 2 extends and explains the proposed frameworks, Section 3 presents the experimental analysis, and finally, Section 4 concludes the work.
Let us assume that we are projecting a rectangular fringe pattern of period p in the x and y direction over a test surface, where Iðx; yÞ is the pattern acquired by the camera, and θ the angle of the direction in which the projector was translated with respect to the camera (as illustrated in Fig. 2). Different strategies can be followed to combine vertical and horizontal fringes in a unique pattern. One can, for example, code vertical fringes in the blue channel of the projector and horizontal fringes in the red one. Of course this would limit us to work with a color pair camera–projector. Other option is to combine both fringes in a unique – gray – pattern by summing or multiplying them [26]. In that case, if Dðx; yÞ has a narrow spectral band, i.e. Dx;y \1 (where Dx and Dy denote the partial derivative with respect to x and y respectively) one can recover vertical and horizontal components by filtering in the Fourier domain. Therefore, it does not matter if the projected fringes are binary, sinusoidal1 or are codded in different colors; without loss of generality, by a filtering process, one obtains I h ðx; yÞ ¼ I 0 ðx; yÞ cos ð2π =pÞðy Dðx; yÞ sin ðθÞÞ
ð2Þ
2. Description of proposed approach and In this section we will introduce a method for 3D surface retrieval that can be identified as a single shot, structured light approach. In contrast with most single shot techniques [23] (e.g. Color Coded Stripes, Gray Scale Coded Stripes, De Brujin Sequence, Pseudo Random Binary Dots, Mini-Patterns and Color Coded Grids), the proposed method does not require finding correspondences or matching steps. As was demonstrated in [1] the pattern deformation D and surface height are related by H2 hD Bf
ð1Þ
where f is the focal length, H is the distance to the reference plane and B is the distance between camera's and projector's center. We are presenting a differential 3D shape profiling method that is based on the combination of orthogonal fringes projection. It allows us to compute depth gradient maps in a fast and efficient manner. The procedure we are proposing consists in measuring the partial derivatives of the disparity D (i.e. pattern shift), and then, to integrate them with the purpose of retrieving hðx; yÞ using Eq. (1).
I v ðx; yÞ ¼ I 0 ðx; yÞ cos ð2π =pÞðx Dðx; yÞ cos ðθÞÞ
ð3Þ
where I 0 ðx; yÞ is a function of the reflectance of the test surface, and Ih/Iv denote images with horizontal/vertical deformed fringes respectively. As usual, we are assuming2 that I 0 ðx; yÞ is a lowfrequency function in comparison with the frequency of the spatial carrier, i.e. ∂I 0 =∂i\2π =p, where i ¼ x; y. Then, by taking partial derivatives with respect to the x and y coordinates, from Eqs. (2) and (3) one obtains Ihi ðx; yÞ ð2π =pÞI 0 ðx; yÞ sin ð2π =pÞðy Dðx; yÞ sin ðθ ÞÞ ðyi Di ðx; yÞ sin ðθÞÞ
ð4Þ
1 If we intend to project a sinusoidal pattern we may first follow some calibration steps [27] or some novel projections techniques [20,18,17] which overcomes camera and projector nonlinear response. 2 If this hypothesis does not hold, we can remove it acquiring an additional picture of the scene just with the ambient illumination and using it for normalization.
28
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Fig. 3. Example of simulated data. (b)–(d) Illustrates different ways of combining horizontal and vertical fringes in a single pattern. In (b) vertical and horizontal fringes are multiplied, in (c) fringes are summed, and (d) fringes are coded in the blue and red channels. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
I vi ðx; yÞ ð2π =pÞI 0 ðx; yÞ sin ð2π =pÞðx Dðx; yÞ cos ðθÞÞ ðxi Di ðx; yÞ cos ðθÞÞ
ð5Þ where xi ¼1 and yi ¼0 for i¼x, and xi ¼0, yi ¼ 1 for i¼y. Hence, it is easy to demonstrate that Dx sin ðθÞ I ðx; yÞ ¼ hx 1 Dy sin ðθÞ I hy ðx; yÞ
ð6Þ
and Dy cos ðθÞ I vy ðx; yÞ : ¼ 1 Dx cos ðθÞ I vx ðx; yÞ Finally, one obtains I vy I hx 1 þ ð tan θÞ I hy I vx Dx ¼ I I vy 1 sin ðθÞ hx I hy I vx
Fig. 4. Geometrical interpretation.
ð7Þ
and
I 1 1 þ hx I tan ðθÞ hy : Dy ¼ I I vy 1 cos ðθÞ hx I hy I vx I vy I vx
ð8Þ
We conclude that the gradient of the disparity ðDÞ can be calculated in a simple manner as a function of the derivatives of Ih and Iv components of the image acquired by the camera Iðx; yÞ. After obtaining the partial derivatives ∂D=∂x and ∂D=∂y, we must integrate them to obtain Dðx; yÞ ( hB f =H 2 ). For this we can follow any of the integration approaches described in the previous literature [28,29], we summarize some of them in Appendix A. 2.1. Geometrical interpretation Complementing the analytical description performed previously, we will address now a geometrical deduction of Eq. (8). This intuitive interpretation of the method helps the understanding of the proposed approach, and it allows us to detect some aspects we may take into account to improve method's performance. Let us consider a synthetic example; Fig. 3(a) shows a test surface with height zðx; yÞ that contains a ramp and some Gaussian peaks. If we project the patterns shown in Fig. 3(b)–(d) over the test surface, we would obtain deformed patterns as simulated in Fig. 3(e)–(g). In this set of simulations, we supposed that the projector was translated the same distance along x and y coordinates (i.e. θ ¼ π =4) and the constant K ¼ Bf =H 2 (Eq. (1)) was arbitrary set as K ¼ 1=5. In a practical problem the value of K may be estimated, e.g. by obtaining the disparity map of a surface with known dimensions. As we can see, projected patterns are deformed when they are projected over a surface according to its height. For instance let us
assume that we are analyzing just the deformation of the horizontal fringes. As has been demonstrated [1], each pixel in the original pattern will be shifted some distance proportional to surface height. Fig. 4 illustrates the shift of some point ðx1 ; y1 Þ compare to its neighbor ðx2 ; y2 Þ. We can relate the image gradient and the disparity partial derivative by I hx ðDðx2 ; y2 Þ Dðx1 ; y1 ÞÞ sin ðθÞ ¼ tan ðαÞ ¼ : ð9Þ x2 x1 I hy ! If we define the vector u ¼ ðx2 x1 ; y2 y1 Þ the previous expres! sion can be rewrite as (where we divide and multiplied by j u j Þ: ! ! ! ! I hx Dð x þ u Þ Dð x Þ juj ¼ tan ðαÞ ¼ sin ðθÞ ð10Þ ! ! x x1 I hy juj x ¼ ðx1 ;y1 Þ 2 ! By taking the limit when j u j -0, we obtain I hx ∂Dðx1 ; y1 Þ 1 sin ðθÞ ¼ tan ðαÞ ¼ ∂u cos ðαÞ I hy
ð11Þ
which using directional derivatives' properties can be transformed into tan ðαÞ ¼
I hx ¼ Dx þ Dy tan ðαÞ sin ðθÞ I hy
ð12Þ
leading to tan ðαÞ ¼
I hx Dx sin ðθÞ : ¼ I hy 1 Dy sin ðθÞ
ð13Þ
Notice that the expression matches with those obtained in (6). Analogously we can relate vertical fringes deformation with D partial derivatives recovering I vy Dy cos ðθÞ : ¼ I vx 1 Dx cos ðθÞ
ð14Þ
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
29
Fig. 5. (a) Shows the simulated image with horizontal fringes (obtained from the red channel of the color image illustrated in Fig. 3(g)). (b) Shows one zoomed region of the image, where overlapped we display in red the vectors ðDx sin ðθÞ; 1 Dy sin ðθÞÞ (as illustrated in Fig. 4) and in blue the gradient of the image ∇I. (c) Shows absolute value of the angular error (measured in degrees) between the angle α obtained from image gradients and the value obtained from surface depth variations. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
Fig. 6. (a) Smooth version of Fig. 5(a) obtained applying a convolution between the binary fringes and a Gaussian kernel with a standard deviation σ ¼ 0:8px. (b) Shows one zoomed region of the image, where overlapped we display in red the vectors ðDx sin ðθÞ; 1 Dy sin ðθÞÞ (as illustrated in Fig. 4) and in blue the gradient of the image ∇I. (c) Shows absolute value of the angular error (measured in degrees) between the value of the angle α obtained from image gradients and one obtained from surface depth variations. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
Fig. 7. As in Figs. 5 and 6 we illustrate the simulated image with horizontal fringes, image gradients compared with depth level sets orientation, and finally the angular errors. We illustrate in this particular image two particular areas where errors may be expected, in the first row we zoomed an area where a discontinuity in the depth map occurs. And in the second row we zoomed a region where we have a narrow peak.
30
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Fig. 8. (a) Original test surface. (b) Ground truth x-partial derivative. (c) Ground truth y-partial derivative. (d) Estimated x-partial derivative following D3D method. (e) Estimated y-partial derivative following D3D method. (f) Surface obtained by integrating the gradient field illustrated in (c)–(d) following (see Appendix A) Least Squares (LS) approach . (g) Surface obtained following Frankot–Chellappa (FC) integration approach. (h) Surface obtained following M-estimator (M) integration approach. (i) Surface obtained following LS integration approach including binary weight. (j) Surface obtained following Anisotropic Diffusion (AD) integration approach. (k) Surface obtained following L1-norm minimization (L1) integration approach.
Table 1 Comparison of some integration techniques for the Ramp-Peaks experiment. The root mean squared error is displayed as a percentage of surface amplitude, and execution time of the algorithms is normalized with respect to the fastest one. Method
LS
FC
M
WLS
AD
L1
RMSE (%) Rel. time
19.1 2.3
19.8 1.0
19.1 16.5
4.3 2.6
19.0 65.8
20.8 50.8
Expressions (13) and (14) allow us to directly relate the variations on surface depth to the local bending of the fringes. This geometrical understanding of the method is also very useful to intuitively interpret some implementation details and properties of the presented technique. As we can see, as the depth partial derivatives are computed from the gradient's direction, large error may occur in those areas with very weak variations (illustrated with green dotted ellipses in Fig. 5(b)). In such areas, as the gradient is very weak its direction becomes noisy, and the estimation of surface depth variations inaccurate. To mitigate this problem we may smooth fringes transitions, by this, we can increase the number of points that provide a reliable gradient estimation. Fig. 6 shows the smoothed input image, its gradients and the new errors. The second main source of error in gradient estimation is discontinuities of the test surface, as illustrated in the first example in Fig. 7. When there is a discontinuity in the surface, the projected pattern suffers a discontinuous deformation in which image gradients are not any more orthogonal to depth partial derivatives. This kind of errors may be also present in continuous but very abrupt variations of the test surface, as illustrated in the second example shown in Fig. 7. There, a narrow peak produces the merge of two different fringes causing errors in gradient estimation. Finally, we summarize other common sources of error that may be present and we give some suggestions to minimize their effects.
Low gradients in the interior regions of the fringes. We can slightly smooth acquired images, in addition we can also use the
module of image's gradient to estimate a quality map for the estimated depth gradients. Surface discontinuities or extreme height variations. This is the most challenging source of error. If we can estimate the position of these points, we could integrate the gradient field through the smooth regions avoiding discontinuities. Surface reflectance and ambient illumination. If surface reflectance varies slowly (compared with fringes variations) this should not be a problem; if not, one can take an additional image (with just the ambient illumination) and compensate both surface reflectance and ambient illumination. Projector or camera deformation. We are assuming that the projector and the camera do not introduce significant geometrical deformation. If they do, we can calibrate [30,31] both camera and projector and then to process the rectified images.
3. Experimental results In this Section we will experimentally analyze the main features of the proposed technique (which we will call from now on D3D acronym for Differential 3D). In particular we will address the following questions: How can we handle the different sources of error described previously? When we want to retrieve a test surface from its gradients: Which is the impact of the integration step? In addition, we will compare the accuracy of the proposed technique with some popular methods for 3D shape retrieval, e.g. the projection of binary coded stripes (from now on BCS), Takeda's approach [21] (from now on Tak) and a Three-steps Phase Shifting approach (from now on PS). After this we will explore the performance of the proposed technique, when we acquire under uncontrolled ambient illumination, and when we vary fringes' width. Finally, we will test the proposed method scanning a person face while it is moving which will show the potential of the proposed technique to analyze dynamic and complex scenes.
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
31
Fig. 9. Root mean squared error (relative to surface amplitude) for some fast algorithms (LS, WLS, FC) varying the width of the median filter used to preprocess the input gradient field. In the graphic is shown the mean value and each individual RMSE error obtained for 30 faces of the Texas Database.
Table 2 Comparison of some integration techniques for one face of the Texas Database. The root mean squared error is displayed as a percentage of surface amplitude, and execution time corresponds to an experiment performed in MATLAB on a machine with Intel Core i5 CPU M 430 at 2.27 GHz 4, the size of the discrete input gradients (p and q) was 750 500 pixels.
Fig. 10. Example of a face surface of the Texas Database reconstructed by different integration approaches. The retrieved gradient field was preprocessed width a 4 4 median filter.
3.1. Proposed method analysis As was mentioned previously, due to empirical errors and surface discontinuities one obtains after measuring fringes deformation a noisy
Method
LS
FC
M
WLS
AD
L1
RMSE (%) Ex. time
3.7 12 s
8.5 2s
3.4 24 s
2.4 6s
3.2 5 min 32 s
0.81 5 min 23 s
– in general non-integrable3 – gradient field. In order to retrieve the target surface, we must follow some robust integration approach. [Agrawal Thesis [32] is an excellent survey of some robust integration approaches, in Appendix A we briefly describe those integration methods used in the present work.] In a first experiment, we will analyze the Ramp-Peaks example. Fig. 8(a) shows the ground truth surface, sub-figures (b) and (c) show the true x and y partial derivatives of the test surface respectively. Sub-figures (d) and (e) show the x and y estimated partial derivatives by following the proposed approach. As we can see, sparse errors (due to low gradients) and errors due to surface discontinuities are present in the retrieved gradient field. The first kind of errors do not represent a hard problem as they are sparse, therefore we found that a nonlinear median-like filter is enough to remove them and achieve accurate reconstructions. On the other hand, the errors due to surface discontinuities do represent a more challenging problem as they lead to large and connected regions with outliers. Because of that, even robust integration approaches tend to miss surface discontinuities; see e.g. the results illustrated in Fig. 8 where we show the surfaces obtained by integrating with (see Appendix A):
3
I.e. the irrotational constraint does not hold.
32
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Fig. 11. Results for the Ramp-Peaks example joint with the numerical error and the number of projected pattern each technique requires.
the result obtained following the previous described approach. As we can see, if we consider the proposed weights we can improve the reconstruction precision specially for surfaces that have large discontinuities. Table 1 summarizes some numerical results. We can now move a step forward and analyze in a second series of experiments more complex and realistic data such as human faces. To that end we considered some examples of the Texas 3D Face Recognition Database [33,34]. Fig. 9 illustrates the relative root mean square error for different preprocessing filters over thirty human faces. As we can see, a nonlinear preprocessing step (e.g. a median like filter with a 4 4 size) improves the accuracy on surface reconstruction while fine details of the faces are kept as can be seen in Fig. 10. Table 2 summarizes some of the numerical results obtained for the test face display in Fig. 10. As can be seen, the proposed 3D gradient approach is capable of retrieving most face details, even if the surface presents discontinuities and noise. 3.2. Comparison with other 3D retrieve techniques
Fig. 12. Illustration of input images for a test face of the Texas Database.
(f) Least Squares (LS) approach, (g) Frankot–Chellappa (FC), (h) Mestimator (M) integration approach, (j) surface obtained following Anisotropic Diffusion (AD) integration and (k) L1-norm minimization (L1). To handle errors due to surface discontinuities we propose to detect these points, and assign zero weight to them during the integration process. For example, in the present experiment we estimate a discontinuity map by comparing on the estimated gradient field each pixel with its neighborhood. If the difference was above certain threshold we set the integration weight to zero. [We refer to this method as WLS acronym for Weighted Least Squares.] In addition we found that the discontinuity map can be improved if we apply dilatation and erosion morphological operations (as discontinuities areas tend to be connected segments of several pixels). Fig. 8(i) shows
In this second series of experiments, we repeat some of the previous examples including other 3D scanning techniques such as Binary Stripe Projection (BSP), Phase Shifting (PS) and Takedas' approach (Tak). Fig. 11 shows the Ramp-Peaks reconstructed surfaces obtained by (a) proposed approach (D3D), (b) Takedas' approach (Tak), (c) a three steps phase shifting technique (PS) and (d) a 8 bits Binary codded stripes approach (BCS). The RMSE is normalized with respect to surface amplitude and the number of projected patterns that each method requires is also specified below each image. In a second set of experiments, real data from the Texas Database is considered. Fig. 12 shows the simulated images for each scanning technique after projecting each pattern over a test surface and Fig. 13 illustrates the results obtained for some arbitrary test faces. While Takedas' and PS approaches perform extremely well for simple examples, in more complex scenarios the unwrapping steps become more challenging and can lead to errors (see e.g. the third and fourth columns of Fig. 13). On the other hand Binary coded stripes methods are very robust and as the measurements are absolute (each pixel's depth is determined without considering the other ones), this technique is very useful when we have to deal with heavy discontinuities or isolated regions. Of course the main drawback of PS and BCS techniques is that they require multiple images to achieve one single reconstruction, hence they are in practice more complicated to implement. For example, a synchronization between the camera and the projector is needed and also the scene must be rigid (or quasistatic). The code of our implementation of each algorithm as well as some implementation concerns (discussed in the comments of the code) can be found at authors' web page. 3.3. On field parameters evaluation Until now, we have identified the most important sources of error and characteristics of the proposed technique. In addition, we have
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
33
Fig. 13. Results over some faces of Texas Database.
Fig. 14. Left images, illustrates some of the acquired images when we projected summed (first row) and multiplied (second row) vertical and horizontal fringes. On the right we show the obtained RMSE over the gradients fields for different fringes widths.
34
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Fig. 15. (a) Ground truth gradient field, (b) estimated gradient field, (c) error between true and estimated gradient field and (d) surface reconstructed from the estimated gradient field.
Fig. 16. Left images, illustrates some of the acquired images when we projected summed (first row) and multiplied (second row) vertical and horizontal fringes. On the right we show the obtained RMSE over the gradients fields for different relations between the background light and pattern's light power (background light intensity was approximately 0.9 μW/mm2).
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
35
Fig. 17. Some frames of the input video.
Fig. 18. Raw frames of estimated gradient field.
explored and evaluated the impact of the integration step and compared D3D with other popular 3D scanning techniques. In this new set of experiments we propose to evaluate the impact of varying fringes' period and the power of the projected light. To that end, we manufacture an hemispherical surface (with a radius of 40 mm) using a lathe with which we could achieve a precision of the order of 0.1 mm. In a first experiment, we projected several patterns over our test surface varying fringes' period as illustrated in Fig. 14. We explore two different codification strategies of vertical and horizontal fringes, multiplying and summing it. Fig. 15 shows an example of (a) the ground truth gradient field, (b) the estimated gradient field, (c) the error between true and estimated gradient field, and (d) the surface reconstructed from the estimated gradient field. For the quantitative analysis we isolated the borders of the sphere, because this region presents discontinuities in x and y partial derivatives and we already investigate the impact of such kind of errors. In the present set experiments we want to identify and quantify the effect of new sources of error.
In a second set of experiments, we introduce some arbitrary ambient illumination and varied the power of the projected pattern. The idea was to investigate the impact of uncontrolled sources of light that may affect the perceived deformation of the projected pattern. We measure the intensity of the incident ambient illumination which was approximately 0:9 μW=mm2 , we project patterns starting from an intensity of 0:01 μW=mm2 up to 3:1 μW=mm2 . Fig. 16 shows some of the acquired images when we projected summed (first row) and multiplied (second row) vertical and horizontal fringes while varying patterns' intensity; the right plot shows the obtained RMSE over the gradients fields. From the first experiment, we can see that fringes' width up to 18px leads4 to an error below the 10%, for larger fringe width the error increases significantly. On the other hand, concerning pattern intensity,
4 18 pixels measured in the camera sensor correspond in this setup to 4.5 mm in the reference plane and 6 pixels on projector sensor.
36
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Fig. 19. Frames of the retrieved surface.
the experimental results suggest that a pattern with an intensity of the order 20% of the ambient illumination intensity is enough to achieve a root-mean-squared error below the 10%. This relatively low amount of required intensity may be due to the fact that we are looking at high spacial frequencies of the acquired image, while ambient illumination tends to concentrate its power in the low spacial frequencies. 3.4. Last experiment: scanning a moving subject In this last experiment we present the results of projecting the proposed rectangular pattern over a person face while it was speaking and moving. As the proposed technique requires the projection of a unique binary pattern, in order to reconstruct a dynamic scene we just need to acquire a standard video sequence of the test surface while the proposed (binary and static) pattern is being projected. Fig. 17 shows some frames of the acquired video, while Fig. 18 shows the raw gradient field obtained by measuring fringes' blending explained previously. Fig. 19 shows the reconstruction results after integrating the estimated gradient field (following a weighted least squares approach). As we can see, the proposed technique is capable to capture fine details of the 3D shape of the observed surface, even for a moving and complex surface while keeping very low hardware and computational requirements.
4. Conclusions In this work a novel gradient approach was extended which allows us to retrieve in a very straightforward manner the depth gradient map. An analytical description and a geometrical interpretation of the proposed technique were provided which allow us to identify some practical error sources and limitations of the presented approach. As we have shown, the two main error sources are (a) smooth regions of the image (where fringes present a local maximum or minimum) and (b) surface discontinuities.
Experiments suggest that the first kind of errors can be overcome by preprocessing the input gradient field with a nonlinear filter (e.g. a simple median like filter). On the other hand, we showed that discontinuous can be retrieved by estimating a discontinuity map which guides the integration process. In order to obtain the depth map from the retrieved gradient field, it is necessary to integrate it. To this end we can follow different integration approaches, each of them with its own advantages and drawback. In the present work we briefly describe (in the Appendix) and used some popular integration techniques. Experiments were presented with simple and complex synthetic data, as well as real dynamic data. The presented approach was tested and compared to other common scanning techniques such as Binary Stripes Projection, Phase Shifting and Takeda's techniques. The procedure for 3D-shape retrieval described in the present work has various advantages with respect to other fringe projection methods presented in the literature. Firstly, image partial derivatives can be computed in an extremely fast and efficient manner. Secondly, as high frequency fringes are projected the proposed method is robust under smooth ambient illumination and/or surface's reflectivity variations. Thirdly, instead of the disparity itself, we are computing partial derivatives which are extremely useful to extract 3D features of the scene. This is particularly important in those applications [35,36] in which gradients depth are required, e.g. to have invariant SIFT descriptors [37], perform 3D face recognition [38,39,36], gesture detection [40] or objects segmentation [41]. Finally as only one binary pattern is used, no complicated calibration steps are required (such as color calibration or gamma correction), nor synchronization between the projector device and the camera.
Acknowledgments The authors thank PEDECIBA (Uruguay), the Comisión Sectorial de Investigación Científica (CSIC, UdelaR, Uruguay) and Agencia Nacional de Investigación e Innovación (ANII, Uruguay) for their financial support.
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
Appendix A. Summary of integration approaches A.1. Least squares approach Suppose that Z : Ω R2 -R is the surface obtained by integrating a given gradient field ðp; qÞ : Ω-R2 ) using some algorithm. One always can write ðZ x ; Z y Þ ¼ ðp; qÞ þ ðϵ1 ; ϵ2 Þ
ð15Þ
where the subindex x and y represents x and y partial derivatives respectively and ðϵ1 ; ϵ2 Þ represents a residual gradient field which is added to the given non-integrable field to make it integrable. The least squares solution is obtained by minimizing the energy function: Z Z
2 ð16Þ ðux pÞ2 þðuy qÞ2 dx dy ¼ ϵ1 þ ϵ22 dx dy ELS ½u ¼ Ω
Ω
and hence Z LS ¼ argminfELS ½ug is the solution which minimized the squared error. This solution can be seen as the orthogonal projection of the gradient field (p,q) onto the subspace of integrable gradient field. The Euler–Lagrange equation of (16) gives the Poisson equation: 8 2 > < ∇ Z ¼ divðp; qÞ ðZ x pÞν1 ¼ 0 on ∂Ω ð17Þ > : ðZ qÞν ¼ 0 on ∂Ω y 2 where νi is the i-th component of the outward unit normal on ∂Ω. We can find a numerical solution of Eq. (17) using finite differences and vectoring the 2D matrices in lexicographical ordering Eq. (16) can be transformed into a linear and sparse equation system Ax¼b. A.2. Frankot–Chellappa approach Frankot and Chellappa (FC) proposed an integration method that enforces integrability by projecting the input gradient field (p,q) onto the subspace of integrable gradient fields. This method is mathematically equivalent to the minimization procedure described in the previous section, but in this case the minimization is achieved by a projection in the Fourier domain which can be done in a very efficient manner using Fast Fourier Transform algorithm and Fourier properties. Using Fourier Basis, a surface Z : Ω-R can be represented by Z Zðx; yÞ ¼ F fZ gðωx ;ωy Þ ejðωx ;ωy Þðx;yÞ dωx dωy ; ð18Þ where F fZ g is the Fourier transform of Z defined as Z 1 Zðx; yÞe jðωx ;ωy Þðx;yÞ dx dy: F Zðx; yÞ ¼ 2π
ð19Þ
The derivatives of the Fourier basis functions possess the following useful properties:
∂Z ¼ jωi F fZ g; i ¼ x; y: ð20Þ F ∂i Thus it is straightforward that the solution of the Poisson equation which minimizes (16) can be computed as ( ) jωx F p jωy F q : ð21Þ Z ¼ F 1 ω2x þ ω2y A.3. M-estimator approach As we mention in the previously, L2-norm is very sensitive to the presence of outliers, we can reduce the effect of outliers by introducing weight on the error functional i.e. Z
EkM ½u ¼ ð22Þ wðϵk1 1 Þðux pÞ2 þ wðϵk2 1 Þðuy qÞ2 dx dy: Ω
37
The weights (wx ¼ wðϵk1 1 Þ, wy ¼ wðϵk2 1 Þ) at step k depend on the residual error at k 1 step using some function wðÞ that is symmetric, positive-definite and chosen to be less increasing than square. In this iterative method, we solve in each step the weighted Poisson equation: divðwx Z x ; wy Z y Þ ¼ divðwx p; wy qÞ:
ð23Þ
With the solution of Eq. (23) we obtain the new weights by evaluating at each point the residuals ϵ1, ϵ2 and we repeat this procedure until convergence. A.4. Anisotropic diffusion approach Another alternative to handle outliers and errors while keeping the L2-norm to penalize errors is to calculate a weight map from the available information in the input gradient field (p,q). The technique we will describe is based on Anisotropic Diffusion (AD) ideas proposed in the context of image restoration [42] and multi-scale image analysis [43]. We will focus on the ideas proposed by Weickert [44,45] and we will adapt the general framework to the integration problem. Instead of solving the Poisson equation given by Eq. (17), the differential equation divxy ðD ∇uÞ ¼ divxy ðD ðp; qÞÞ
ð24Þ
is considered. D represents an affine transformation (2 2 symmetric matrix for each x, y). The above equation can be thought as the Euler–Lagrange equation of the following energy function: Z ðd11 ðux pÞ2 þ ðd12 þ d21 Þðux pÞðuy qÞ E½u ¼ Ω
þ d22 ðuy qÞ2 Þ dx dy
ð25Þ
where dij ðx; yÞ can be interpreted as weights on the gradient field. Intuitively, the main idea is to identify features such as borders and exploit local coherence to guide the integration through smooth and coherent areas, avoiding large discontinuities and noisy points. To this end, the affine transformation D is obtained considering p and q and calculating (for each (x,y)) the 2 2 matrix Hðx; yÞ ¼ ðp qÞT ðp qÞ:
ð26Þ
The resulting matrix H has an orthonormal basis of eigenvectors v1 and v2, where v1 is parallel to the local gradient and v2 is perpendicular to it. H can be rewritten as " #" T # μ1 0 v1 H ¼ ½v1 v2 : ð27Þ 0 μ2 vT2 In order to penalize large gradients, H eigenvectors are modified following [44], 8 λ ¼1 > > < 2 ( 1 if μ1 ¼ 0 ð28Þ > > λ1 ¼ β þ 1 expð 3:315=μ4 Þ if μ1 4 0 : 1
and finally the tensor is obtained preserving H orientation but with the modified eigenvalues: " #" # λ1 0 vT1 D ¼ ½v1 v2 : ð29Þ vT2 0 λ2 The parameter β in Eq. (28) was introduced in [32] to ensure positive-definiteness of D but was not mentioned in [43,45]. We set β ¼ 0:02 in order to maintain consistency with [32]. A.5. L1-norm approach Methods that minimize L2-norm of the error, e.g. LS and FC are very sensitive to the presence of outliers because of the quadratic growth of the penalty function. To solve this problem we can either use a different error norm so that outliers have less influence in the global error or
38
J.M. Di Martino et al. / Optics and Lasers in Engineering 72 (2015) 26–38
keep the L2-norm but introduce weight to handle outliers and corrupt measurements. In this section we present a L1-norm method and in the following section we shall explore weighed techniques. Instead of minimizing the Energy function defined in Eq. (16) Du et al. [46] define the energy function: Z Z EL1 ½u ¼ j ux pj þ j uy qj dx dy ¼ ð j ϵ1 j þ j ϵ2 j Þ dx dy: ð30Þ Ω
Ω
The main challenge of minimizing the L1 error is that it is not differentiable and hence more complicated optimization techniques should be considered. Du et al. redefine the minimization problem of the functional (30) as 8 R > fu; a; bg ¼ argmin Ω ðaðx; yÞ þ bðx; yÞÞ dx dy > > > > > s:t: ux p r a > > > < u pZ a x ð31Þ > > uy q rb > > > > > uy q Z b > > : aZ0 bZ0 Using finite differences we can transform the previous minimization problem in a constrained linear problem (LP) where the target function and the constraints are both linear in the variables u, a and b. References [1] di Martino JM, Ayubi G, Fernández A, Ferrari JA. Differential 3d shape retrieval. Opt Laser Eng 2014;58C:114–8. [2] Desai S, Bidanda B. Reverse engineering: a review & evaluation of contact based systems. In: Kamrani A, Nasr E, edtiors. Rapid prototyping. Manufacturing systems engineering series, vol. 6. US: Springer; 2006. p. 107–31. [3] Lowery ALS. Services 〈http://lowerylandsurveys.com/surveying-services/ 3d-laser-scanning/〉, accessed: 2015-01-20. [4] scanner DD 〈http://www.david-3d.com/en〉, accessed: 2015-01-20. [5] M. par méthods optiques 〈http://www.gom.com〉, accessed: 2013-09-20. [6] Marks M. System and devices for time delay 3d. US patent 5,151,821; 1992. [7] Marshall GF, Stutz GE. Handbook of optical and laser scanning. second ed. Miami, USA: CRC Press; 2011. [8] Lertrusdachakul I. A novel 3d recovery method by dynamic (de)focused projection [Ph.D. thesis]. Ecole Doctorale Environnement – Sante; 2011. [9] Ens J, Lawrence P. An investigation of methods for determining depth from focus. IEEE Trans Pattern Anal Mach Intell 1993;15(2):97–108. http://dx.doi. org/10.1109/34.192482. [10] Nayar SK, Nakagawa Y. Shape from focus. IEEE Trans Pattern Anal Mach Intell 1994;16(8):824–31. [11] Zhang S. Recent progresses on real-time 3d shape measurement using digital fringe projection techniques. Opt Lasers Eng 2010;48(2):149–58. [12] Zhang S. Handbook of 3D machine vision: optical metrology and imaging. Series in optics and optoelectronics. Hoboken, NJ: CRC Press; 2013. [13] Seitz S, Curless B, Diebel J, Scharstein D, Szeliski R. A comparison and evaluation of multi-view stereo reconstruction algorithms. Comput Vis Pattern Recognit 2006:519–28. [14] Boyer KL, Kak AC. Color-encoded structured light for rapid active ranging. IEEE Trans Pattern Anal Mach Intell 1987;1:14–28. [15] Li B, Wang Y, Dai J, Lohry W, Zhang S. Some recent advances on superfast 3d shape measurement with digital binary defocusing techniques. Opt Laser Eng 2014;54:236–46. http://dx.doi.org/10.1016/j.optlaseng.2013.07.010 〈http:// www.sciencedirect.com/science/article/pii/S014381661300225X〉. [16] Basri R, Jacobs D, Kemelmacher I. Photometric stereo with general, unknown lighting. Int J Comput Vis 2007;72(3):239–57. http://dx.doi.org/10.1007/ s11263-006-8815-7 〈http://dx.doi.org/10.1007/s11263-006-8815-7〉. [17] Ayubi GA, Ayubi JA, di Martino JM, Ferrari JA. Pulse-width modulation in defocused three-dimensional fringe projection. Opt Lett 2010;35(21):3682–4. [18] Ayubi GA, Martino JMD, Alonso JR, Fernández A, Perciante CD, Ferrari JA. Three-dimensional profiling with binary fringes using phase-shifting interferometry algorithms. Appl Opt 2011;50(2):147–54. http://dx.doi.org/10.1364/ AO.50.000147 〈http://ao.osa.org/abstract.cfm?URI=ao-50-2-147〉. [19] Dai J, Zhang S. Phase-optimized dithering technique for high-quality 3d shape measurement. Opt Laser Eng 2013;51(6):790–5. http://dx.doi.org/10.1016/j. optlaseng.2013.02.003 〈http://www.sciencedirect.com/science/article/pii/S0143816613000596〉.
[20] Dai J, Li B, Zhang S. High-quality fringe pattern generation using binary pattern optimization through symmetry and periodicity. Opt Laser Eng 2014;52:195–200. http://dx.doi.org/10.1016/j.optlaseng.2013.06.010 〈http:// www.sciencedirect.com/science/article/pii/S0143816613001966〉. [21] Takeda M, Mutoh K. Fourier transform profilometry for the automatic measurement of 3-d object shapes. Appl Opt 1983;22(24):3977–82. http://dx.doi.org/ 10.1364/AO.22.003977 〈http://ao.osa.org/abstract.cfm?URI=ao-22-24-3977〉. [22] Hartley R, Zisserman A. Multiple view geometry in computer vision. second ed. Cambridge, UK: Cambridge University Press; 2004 ISBN: 0521540518. [23] Geng J. Structured-light 3d surface imaging: a tutorial. Adv Opt Photon 2011;3 (2):128–60. http://dx.doi.org/10.1364/AOP.3.000128 〈http://aop.osa.org/abstract.cfm?URI=aop-3-2-128〉. [24] Zhang Z. Review of single-shot 3D shape measurement by phase calculationbased fringe projection techniques. Opt Laser Eng 2012;50(8):1097–106. http: //dx.doi.org/10.1016/j.optlaseng.2012.01.007 〈http://linkinghub.elsevier.com/ retrieve/pii/S0143816612 000085〉. [25] García-Isáis C, Alcalá Ochoa N. One shot profilometry using a composite fringe pattern. Opt Laser Eng 2014;53:25–30. http://dx.doi.org/10.1016/j.optlaseng.2013.08.006 〈http://linkinghub.elsevier.com/retrieve/pii/S0143816613002492〉. [26] Flores JL, Bravo-Medina B, Ferrari JA. One-frame two-dimensional deflectometry for phase retrieval by addition of orthogonal fringe patterns. Appl Opt 2013;52(26):6537–42. http://dx.doi.org/10.1364/AO.52.006537 〈http://ao.osa. org/abstract.cfm?URI=ao-52-26-6537〉. [27] Guo H, He H, Chen M. Gamma correction for digital fringe projection profilometry. Appl Opt 2004;43(14):2906–14. http://dx.doi.org/10.1364/ AO.43.002906 〈http://ao.osa.org/abstract.cfm?URI=ao-43-14-2906〉. [28] Agrawal A, Chellappa R, Raskar R. An algebraic approach to surface reconstruction from gradient fields. In: Tenth IEEE international conference on computer vision (ICCV'05), vol. 1, 2005. p. 174–81. [29] Agrawal A, Raskar R, Chellappa R. What is the range of surface reconstructions from a gradient field?. In: Computer vision—ECCV 2006. Berlin, Heidelberg: Springer; 2006. p. 578–91. [30] Huang L, Chua PS, Asundi A. Least-squares calibration method for fringe projection profilometry considering camera lens distortion. Appl Opt 2010;49(9):1539–48. [31] Zhang Z. Flexible camera calibration by viewing a plane from unknown orientations. In: ICCV99, 1999. p. 666–73. [32] Agrawal A. Scene analysis under variable illumination using gradient domain methods [Ph.D. thesis]. University of Maryland; 2006. [33] Gupta S, Castleman K, Markey M, Bovik A. Texas 3d face recognition database. In: 2010 IEEE Southwest Symposium on image analysis interpretation (SSIAI), 2010. p. 97–100. http://dx.doi.org/10.1109/SSIAI.2010.5483908. [34] Gupta S, Markey M, Bovik A. Anthropometric 3d face recognition. Int J Comput Vis 2010;90(3):331–49. http://dx.doi.org/10.1007/s11263-010-0360-8 〈http://dx.doi. org/10.1007/s11263-010-0360-8http://dx.doi.org/10.1007/s11263-010-0360-8〉. [35] Smeets D, Claes P, Vandermeulen D, Clement JG. Objective 3D face recognition: evolution approaches and challenges. Forensic Sci Int 2010;201(1– 3):125–32. http://dx.doi.org/10.1016/j.forsciint.2010.03.023 〈http://www.ncbi. nlm.nih.gov/pubmed/20395086〉. [36] Lei Y, Bennamoun M, El-Sallam Aa. An efficient 3D face recognition approach based on the fusion of novel local low-level features. Pattern Recognit 2013;46 (1):24–37. http://dx.doi.org/10.1016/j.patcog.2012.06.023. [37] Jiang C, Geng Z-x, Wei X-f, Shen C. Sift implementation based on gpu. Proc SPIE 2013;8913 891304–891304-7. [38] Bronstein AM, Bronstein MM, Kimmel R. Three-dimensional face recognition. Int J Comput Vis 2005;64(1):5–30. [39] Zhang Y-N, Guo Z, Xia Y, Lin Z-G, Feng DD. 2D representation of facial surfaces for multi-pose 3D face recognition. Pattern Recognit Lett 2012;33(5):530–6. [40] Tsalakanidou F, Malassiotis S. Real-time 2D þ 3D facial action and expression recognition. Pattern Recognit 2010;43(5):1763–75. [41] Besl PJ, Jain RC. Invariant surface characteristics for 3d object recognition in range images. Comput Vis Graph Image Process 1986;33(1):33–80. http://dx. doi.org/10.1016/0734-189X(86)90220-3. [42] Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 1990;12:629–39. [43] Aubert G, Kornprobst P. Mathematical problems in image processing: partial differential equations and the calculus of variations, second ed. Applied mathematical sciences, vol. 147. New York, USA: Springer-Verlag; 2006. [44] Weickert J. Anisotropic diffusion in image processing. Stuttgart, New York: Teubner GB; 2008. [45] Weickert J. Anisotropic diffusion in image processing [Ph.D. thesis]. Germany: University of Kaiserslautern; 1996. [46] Du Z, Robles-Kelly A, Lu F. Robust surface reconstruction from gradient field using the l1 norm. In: 9th Biennial conference of the Australian pattern recognition society on digital image computing techniques and applications. Glenelg, Australia: IEEE; 2007. p. 203–9.