Image and Vision Computing 21 (2003) 941–954 www.elsevier.com/locate/imavis
An improved method of photometric stereo using local shape from shading Ufuk Sakarya, ˙Ismet Erkmen* ¨ BI˙TAK BI˙LTEN, Ankara, Turkey Department of Electrical and Electronics Engineering, Middle East Technical University, TU Received 18 April 2002; received in revised form 20 March 2003; accepted 9 May 2003
Abstract This paper presents an improved photometric stereo (PS) method by integrating it with a local shape from shading (SFS) algorithm. PS produces the initial estimate of image for the global accuracy and also provides the recovery of albedo, SFS supplies the more detailed information within each homogeneous area. The quality of depth obtained by integrating PS and SFS is compared with the real depth using absolute dept error function, and the improvement ranging from 2.3 to 14% over PS is obtained. q 2003 Elsevier B.V. All rights reserved. Keywords: Lambert surface model; Albedo; Photometric stereo; Shape from shading
1. Introduction Shape recovery is a classic problem in computer vision where aim is to get a 3D scene information from one or multiple 2D images. Techniques to recover shape are called shape-from-X techniques. Among these techniques, shape from shading (SFS) deals with the recovery of shape from a gradual variation of shading of the image while photometric stereo (PS) [16] is another method for shape recovery which differs from SFS in the number of input images. PS recovers shape from multiple intensity images of the same scene generated using a fixed viewing direction and different light source directions, while SFS provides the shape estimation from a single intensity image. SFS techniques can be divided into global and local approaches [9]. Global approaches can be further sub-divided into global minimization and global propagation approaches. Global minimization approaches obtain shape estimation by minimizing an energy function. Global propagation approaches propagate the shape information from known surface points to the whole image and thus generate an estimation from generalization. Local approaches derive shape only from the intensity information of the surface points in a small neighborhood. * Corresponding author. Tel.: þ 90-312-2102307; fax: þ 90-3122101261. E-mail addresses:
[email protected] (I˙. Erkmen), ufuk.sakarya@ bilten.metu.edu.tr (U. Sakarya). 0262-8856/03/$ - see front matter q 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0262-8856(03)00096-9
Pentland [6] introduced the local SFS approach from the intensity using only its first and second derivatives. It is very sensitive to noise because of the second derivatives. Lee and Rosenfeld [2] computed the slant and tilt of the surface in the light coordinate system through the first derivative of the intensity under locally spherical surface assumption. Because of this assumption it is unusable for non-spherical surfaces. Pentland [11] introduced a new approach using the linear approximation of the reflectance function in terms of surface gradient and applied a Fourier transform to linear function to get a closed form solution for the depth at each point. This algorithm gives good results on surfaces which change linearly; however, when the surface changes have got non-linear characteristics it fails. Tsai and Shah [15] used the discrete approximation of the gradient and the depth is iteratively recovered by using linear approximation of the reflectance function. It is very fast algorithm, however, it is very sensitive to the intensity noise. In real life applications in many areas, PS has been found to give better result and be more suitable. As mentioned earlier, in PS the shape can only be recovered from the areas that are illuminated in all input images and the quality of recovery increases with the number of image sequences. The construction of depth from surface gradients is also a major problem. In order to overcome this problem some methods were introduced [3,5,8]. Specially Frankot and Chellappa [8] offered an elagant method for enforcing integrability in SFS algorithms.
942
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
SFS is also used with other shape from X modules in order to get high performance on shape recovery [10,14,17]. Cryer, Tsai and Shah [10] integrated stereo and SFS modules. The method recovers the depth information keeping the low frequency information from stereo and adding with high frequency information from SFS. In order to improve the performance of shape recovery, the local SFS algorithm is integrated with PS as a new approach in this paper. PS produces the initial estimate of image to establish the global accuracy and also provides the recovery of albedo, SFS, on the other hand, supplies the more detailed information within each homogeneous area. The algorithm has the advantage of using the most efficient parts of PS and SFS as information. Varying albedo problem of SFS methods has been overcome by using three images which are illuminated from different directions in PS. Even though some of the details of surface characteristics may be lost due to least square approach in PS, the global accuracy, in most cases, is better than SFS methods. On the other hand, in most of the cases, the performance of capturing the details of surface characteristics by local SFS algorithms is superior than PS methods.
2. An improved method of photometric stereo using local shape from shading In order to get the better estimation of the real depth map not only PS method is applied on the input images but also local shape from shading algorithms (SFS) are applied. However, the order of combining these methods is also
important. PS is more robust system than SFS because of recovering albedo and being less sensitive to noise. On the other hand some SFS algorithms can give a more detailed estimation on some local areas. In general the global methods of SFS are very complex and slow. The local methods are simple, fast and give accurate local details within each homogeneous area, but these are not accurate enough globally. In addition, SFS methods assume a constant albedo value, however, generally it is not the case in the real life. The main idea of the integration of PS and SFS is to improve the performance of PS. PS produces the initial estimate of image to obtain the global accuracy and also provides the recovery of albedo, SFS, however, supplies the more detailed information within each homogenous area. By this way the algorithm has the advantage of using the most efficient parts of PS and SFS as information. Varying albedo problem of SFS methods has been avoided by using three images which are illuminated from different directions in PS. Even though some of the details of surface characteristics may be lost due to least square approach in PS, the global accuracy, in most cases, is better than SFS. On the other hand, in many cases, the performance of capturing the details of surface characteristics by local SFS algorithms is superior than PS. This method consists of two modules: PS and SFS, as shown in Fig. 1.
2.1. PS module To avoid the non-linear characteristic of determining the surface orientation by conventional PS and also to
Fig. 1. The integration method of photometric stereo and local shape from shading algorithms.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
recover the albedo, it is known that at least three images with different illumination directions are required. In general, for an N ðN $ 3Þ light source system, the irradiance equation is given as
directly as
I ¼ rSn
n ¼ ð1=rÞðS21 IÞ
ð1Þ
where I is the column vector of irradiance values recorded at a point ðx; yÞ; r is albedo, S is the light source illumination direction matrix and n is the surface normal. This equation can be solved if and only if at least vectors in the S matrix do not lie in a plane. For N ¼ 3 this equation is solved
r ¼ lS21 Il
943
ð2Þ
and ð3Þ
However, if N . 3 it is solved using the least squares method n ¼ ð1=rÞSP I P
r ¼ lS Il
Fig. 2. Mozart, sphere, penny, sombro and vase images, respectively, in the illumination direction of (0,0,1), (1,0,1) and (5,5,7).
ð4Þ ð5Þ
944
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
where S P is a psudoinvers of the S and: SP ¼ ðST SÞ21 ST
ð6Þ
The surface normals are integrated in order to get the initial depth information of the object Zinitial : In order to get a feasible solution, the method for enforcing integrability in SFS algorithms is used [8]. In addition, when using Eq. (5) PS module produces the recovered albedo map matrix rcorrected which is used in local SFS algorithm. 2.2. SFS module In developing the integrated method we have first tried the local SFS approach of Lee – Rosenfeld [2] to improve the performance of the PS. Then, for comparative purpose, we have tested the performance of the integration by using the local SFS approach of Tsai – Shah [15] instead of Lee –Rosenfeld. In these integrated methods the mentioned local SFS algorithms of Lee – Rosenfeld and Tsai – Shah are modified to include albedo correction which is coming from PS module. The reason of using these two SFS algorithms, specifically, is because of their being developed for local analysis of images rather than global approach. The performance of PS algorithm in global accuracy is in most cases better than global SFS algorithms. Therefore, our intention is to improve the performance of local SFS algorithms on each homogeneous area. The integration is realized as follows: for an N images system, given image Ik ; estimated new depth value Zk is obtained as, Zk ¼ Zk21 £ WPS þ LROSk £ WSFS
ð7Þ
where LROSk is estimated depth value using local SFS algorithm with image Ik : WPS and WSFS are weight constants in a range 0 # WPS # 1 and 0 # WSFS # 1; respectively, and satisfy WPS þ WSFS ¼ 1
ð8Þ
For k ¼ 1 Eq. (7) becomes Z1 ¼ Z0 £ WPS þ LROS1 £ WSFS
Fig. 3. Location of light sources and camera coordinate system.
This proportion of weights are obtained as a result of optimization on the data sets sphere and penny samples, where the cost function is defined as ADE error (See Section 3.2).
3. Results 3.1. Experimental images Experimental images are chosen so that the performance of our algorithm can be compared with those commonly known PS and SFS algorithm performances. Synthetic and real images were used to test our method. 3.1.1. Synthetic images The synthetic images were generated using true depth maps according to Lambertian reflectance model. True depth values are obtained from anonymous ftp under (132.170.108.42) [9]. Mozart, sombro, vase, sphere and penny images generated with light source direction (0,0,1), (1,0,1) and (5,5,7). These images are shown in Fig. 2, and are used to test most of the PS and SFS algorithms in literature [1,4,7,9,10,12,13,15].
ð9Þ
where Z0 is estimated depth value from PS module, i.e. Zinitial : As it can be seen easily, WPS and WSFS constants are very important in our integrated approach. In case WSFS ¼ 0 the algorithm reduces to simple PS method, and if WPS ¼ 0 the algorithm reduces to cascaded local SFS algorithm. Therefore the percentage of information content coming from PS and SFS modules is the prime concern for the success of the overall integrated approach. Our experimental studies led us to set up the weights of these two information sources as WPS ¼ 12:75 £ WSFS
ð10Þ
Fig. 4. Corresponding eight 2D images taken with different illumination direction.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
945
Fig. 5. (a) Atatu¨rk face; (b) leaves; (c) cartridge-case.
Table 1 Absolute depth error (ADE) Methods
Sphere
Vase
Mozart
Penny
Sombrero
Photometric stereo Lee–Rosenfeld Shah Integration (PS– Lee–Rosenfeld) Integration (PS– Shah)
16.03 6.83 6.92 14.42 15.34
11.51 14.78 26.09 10.41 11.53
23.61 12.85 9.57 21.64 22.61
5.07 6.58 8.83 4.95 5.09
5.64 7.96 5.65 4.84 5.57
Methods
Sphere
Vase
Mozart
Penny
Sombrero
Photometric stereo Lee–Rosenfeld Shah Integration (PS– Lee–Rosenfeld) Integration (PS– Shah)
4.07 4.95 6.92 4.97 3.74
9.86 14.00 26.08 8.59 9.85
10.50 5.11 9.27 10.28 9.99
3.50 5.23 8.51 3.30 3.52
5.10 7.19 3.85 4.07 5.01
Table 2 Mean depth error (MDE)
946
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
3.1.2. Real images In order to take real images with different illumination directions a special camera system is used. Eight light sources (red leds are preferred because of
nearly point light source) are mounted on a system which is closed for out-source illumination. These light sources are placed with equal height and equal distances with respect to each other (Fig. 3).
Fig. 6. The estimated Mozart depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method (Lee – Rosenfeld), (d) integration method (Shah), (e) real depth.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
Using this set up, total of eight 2D images are taken with a centered camera activating only one light source each time with known illumination direction (Fig. 4). For an image index i; the slant angle is 19.2 and the tilt angle is
947
ði £ 45Þ where 0 # i , 8: The images are taken from non-Lambertian (metallic) objects: Atatu¨rk face (from a coin), leaves (from a coin) and cartridge-case which are shown in Fig. 5a –c, respectively.
Fig. 7. The estimated sphere depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method (Lee– Rosenfeld), (d) integration method (Shah), (e) real depth.
948
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
3.2. Error analysis The performance evaluation of the proposed algorithm and the comparison with PS and SFS algorithms are carried out based on the error function defined as Absolute Depth Error, Mean Depth Error and Difference Depth Image.
† Absolute Depth Error (ADE): The estimated depth value is normalized according to real depth value. For an M £ N pixel depth value: ADE ¼ ðSM SN lZreal 2 Zest lÞ=ðM £ NÞ
Fig. 8. The estimated penny depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method (Lee –Rosenfeld), (d) integration method (Shah), (e) real depth.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
where Zest is the estimated depth value and Zreal is a real depth value. † Mean Depth Error (MDE): The estimated depth value is normalized according to real depth value. For
949
an M £ N pixel depth value: Mreal ¼ ðSM SN Zreal Þ=ðM £ NÞ Mest ¼ ðSM SN Zest Þ=ðM £ NÞ
Fig. 9. The estimated sombrero depth images: (a) Lee– Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method (Lee – Rosenfeld), (d) integration method (Shah), (e) real depth.
950
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
MDE ¼ lMreal 2 Mest l where Zest is the estimated depth value and Zreal is a real depth value. † Difference Depth Image (DDI): The estimated depth
and real depth value are normalized to 0 –255 range. The absolute difference between the estimated and real depth values are drawn in image scene. The dark areas represent small error.
Fig. 10. The estimated vase depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method (Lee –Rosenfeld), (d) integration method (Shah), (e) real depth.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
3.3. Experimental results 3.3.1. Synthetic images In this part three methods are compared with each other. † Photometric stereo method † Mean value of the results of the Lee –Rosenfeld method on applied images
951
† Proposed Integration method discussed in Section 2 for two SFS methods. Experimental results show that integration method of PS and Lee – Rosenfeld gives the best results among them. According to ADE, integration method gives better result than PS for all image samples (See Table 1). The improvements are 10, 9.5, 8.3, 2.3 and 14% over PS
Fig. 11. Difference depth images for Mozart, sphere, penny, sombrero and vase: (a) photometric stereo, (b) Lee–Rosenfeld cascaded three image, (c) integration.
952
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
for sphere, vase, Mozart, penny and sombrero samples, respectively. In addition, the integration method of PS and Shah also gives better result than PS for all image samples except for vase and penny samples (See Table 1). According to MDE, the integration method PS and Lee – Rosenfeld gives better result than PS for all image samples except for sphere sample (See Table 2). Also, the integration method PS and Shah gives better result than PS for all image samples except for penny sample
(See Table 2). These experiments show that Lee – Rosenfeld method is more suitable for this integration method. In Figs. 6 – 10, the recovery of the real depth images of Mozart, sphere, penny, sombrero and vase are illustrated. It is observed that the proposed method provides images with more detail on local parts, when compared with PS method. Fig. 11 shows the difference depth image (DDI) of Mozart, sphere, penny, sombrero and vase, respectively.
Fig. 12. The estimated Atatu¨rk depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method.
Fig. 13. The estimated leaves depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method.
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
953
Figs. 12– 14, shows the recovered depth images of Atatu¨rk, leaves and cartridge-case, respectively. It can be observed that, the integration method increases the detail on local parts, as compared to PS method. Lee – Rosenfeld method is not suitable for real images.
4. Conclusion We introduced a new approach for improving the performance of PS method by integrating it with ‘local’ SFS method. The results proved to be promising, in most of the cases that we have looked at. The performance of the algorithm in shape recovery, turned out to be superior compared to the performances of PS and SFS individually, when the initial estimate of an image is obtained by PS which is followed by a local improvement with SFS for each homogeneous area. The improvement ranging from 2.3 to 14% over PS is demonstrated. PS and SFS techniques, on the other hand, have got some limitations based on the amount of information required. Without increasing the number of images in sequence, it is fairly difficult to obtain an improvement in the shape recovery problem with PS and SFS methods especially in real life applications. PS and SFS have got another limitation in real life applications. They can only be applied successfully on very special cases, where the environment is isolated from any uncontrolled light sources. Further investigation will be carried out to eliminate the environment problem and also to improve the performance of PS method with varying albedo samples.
Acknowledgements ¨ BI˙TAK-BI˙LTEN This work was supported by TU (The Scientific and Technical Research Council of Turkey—Information Technologies and Electronics Research Institute).
References Fig. 14. The estimated cartridge-case depth images: (a) Lee–Rosenfeld method cascaded three images, (b) photometric stereo, (c) integration method.
The dark areas represent smaller errors and bright areas are associated with larger errors. 3.3.2. Real images In this part three methods mentioned above are compared with each other.
[1] J.R.A. Torreao, A Green’s function approach to shape from shading, Pattern Recognition 12 (34) (2001) 2367–2382. [2] C.H. Lee, A. Rosenfeld, Improved methods of estimating shape from shading using the light source coordinate system, Artificial Intelligence 26 (1985) 125– 143. [3] G. Healey, R. Jain, Depth recovery from surface normals, CICPR 84, Montreal, Canada, July 30–August 2 (1984) 894–896. [4] K.M. Lee, C.J. Kuo, Surface reconstruction from photometric stereo images, Journal Optics and Society of America 10 (5) (1993) 855–868. [5] Z. Wu, L. Li, A line-integration based method for depth recovery from surface normals, CVGIP 43 (1988) 53–66.
954
U. Sakarya, I˙. Erkmen / Image and Vision Computing 21 (2003) 941–954
[6] A.P. Pentland, Local shading analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 170 –187. [7] B.H. Kim, R.H. Park, Multi-image photometric stereo using surface approximation by legendre polynomials, Pattern Recognition 31 (8) (1998) 1033–1047. [8] R.T. Frankfot, R. Chellapa, A method for enforcing integrability in shape from shading algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (1988) 439–451. [9] R. Zhang, P.S. Tsai, J.E. Cryer, M. Shah, Analsis of shape from shading techniques, IEEE CVPR June (1994) 377 –384. [10] P.S. Tsai, J.E. Cryer, M. Shah, Integration of shape from shading and stereo, Pattern Recognition 28 (1995) 1033–1043. [11] A. Pentland, Shape information from shading: a theory about human perception, in: Proceedings of International Conference on Computer Vision, 1988, pp. 404–413.
[12] R. Zhang, P.S. Tsai, J.E. Cryer, M. Shah, Shape from shading, A Survey IEEE PAMI 21 (1999) 690 –706. [13] R. Zhang, M. Shah, Iterative shape recovery from multiple images, Image and Vision Computing 15 (1997) 801 –814. [14] S. Pankanti, A. Jain, Integrating vision modules: stereo, shading, grouping and line labeling, IEEE PAMI 17 (8) (1995) 831– 842. [15] P.S. Tsai, M. Shah, A simple shape from shading algorithm, in, Proceedings of Computer Vision and Pattern Recognition, 1992, pp. 734–736. [16] R.J. Woodham, Photometric method for determining surface orientation from multiple images, Optical Engineering 19 (1) (1980) 139–144. [17] H. Buthoff, H. Mallot, Integration of depth modules: Stereo and shading, Journal of Optical Society of America 5 (1998) 1749–1758.