Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 104 (2017) 452 – 459
ICTE 2016, December 2016, Riga, Latvia
Creation of a Depth Map from Stereo Images of Faces for 3D Model Reconstruction Olga Krutikovaa,*, Aleksandrs Sisojevsa, Mihails Kovalovsa a
Riga Technical University, Daugavgrivas str. 2, Riga, LV-1048, Latvia
Abstract Today the 3D reconstruction of faces is an actual task. It is being used in various fields, for example, in scientific research, in recognition, in video games and in the movie industry. One of the existing methods of reconstructing 3D models of faces uses stereo cameras. The reconstruction process usually consists of several steps: calibration of cameras, acquiring the depth map (disparity map) and the creation of the 3D model. In this paper a method of acquiring a depth map is proposed that can later be used for the reconstruction of a 3D model of a face. The proposed method was tested in a virtual environment - a 3D editor "Autodesk 3Ds Max" was used to create a virtual scene containing stereo cameras and a human head. The proposed method was also tested using two “VISAR” cameras, and an “Arduino Micro” microcontroller. "Arduino" software allows to ensure synchronization of cameras, when using the “Arduino Micro” microcontroller. The images are captured from the cameras by using the “FlyCap” program. Since the initial images contain distortions, the first step of the algorithm is the calibrations of cameras. For calibration, similar points are found on both stereo images. These points are later used to calculate the degree of distortion, and the images are rectified accordingly. The rectified images are used to calculate the depth map. The depth map is created from the front half-tone images of faces. © 2016 2017 The byby Elsevier B.V.B.V. This is an open access article under the CC BY-NC-ND license © TheAuthors. Authors.Published Published Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee the scientific committee of the international conference; ICTE 2016. Peer-review under responsibility of organizing committee of theofscientific committee of the international conference; ICTE 2016 Keywords: Depth map, face 3D reconstruction, stereo camera
1. Introduction Reconstruction of a 3D model of a geometrically complex object, for example a face, today is one of the most
* Corresponding author. Tel.: +37129824626. E-mail address:
[email protected]
1877-0509 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016 doi:10.1016/j.procs.2017.01.159
453
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
difficult and actual tasks in computer vision1. Reconstruction is especially actual in tasks of face recognition, where the possibility of recognizing a face depends on the precision of the 3D model. In general, to reconstruct an object it is required to have at least two images, on these images it is necessary to find corresponding points and construct a depth map. Many factors can affect the quality of the acquired images: lighting, the texture of the object, camera angle and the focal length of the cameras. Several methods are described in this paper: methods of acquiring stereo images, reconstruction of a 3D object in a virtual and real environment with camera calibration. The main objective of this study was to check how accurately the 3D models can be reconstructed, when using low resolution cameras. The following sections show the results of the developed algorithm. 2. The proposed algorithm of reconstructing models from two stereo images This paper describes a method of reconstructing a 3D model of a head from two-dimensional coordinates that were acquired from two images. The reconstruction of a 3D model consists of the following steps: x Camera calibration (determining the internal and external camera parameters) x Locating the corresponding points on two images that were acquired from different cameras x Constructing the three-dimensional model of an object All these steps of reconstructing an object are now going to be described in detail. 2.1. Camera calibration Camera calibration is one of the steps of 3D model reconstruction where it is necessary to determine the internal and external parameters that affect the error in determining the location of an object. Calibration is important, since it allows to link the camera parameters with the object parameters in a real 3D world. Also, the relationship between the camera measurements (pixels) and real world values (meters) are very significant when restoring the structure of a three-dimensional scene. The relationship between a point in a world coordinate system (see Fig. 1.) Q=(X,Y,Z)T and its projection on an image q=(u,v)T, in homogeneous coordinates (ݍ ൌ ሺݑǡ ݒǡ ͳሻ் ǡ ܳ෨ ൌ ሺܺǡ ܻǡ ܼǡ ͳሻ் ) is described by the expression:
q~
~ A [R t] Q ,
(1)
where A – contains the internal parameters of the optical system of the camera: § fx ¨ A ¨0 ¨0 ©
J fy 0
u0 · ¸ v0 ¸ 1 ¸¹
(2)
where: ݂௫ ǡ ݂௬ – focal length of a lens, ݑ ǡ ݒ – the coordinates of the main point in the camera's system of coordinates. There are two types of camera calibration: internal and external. Internal calibration is performed once. This is done because the internal geometric parameters, the optical characteristics of the camera lens and display device settings do not change while shooting. The internal calibrated parameters of the camera are: the focal length, the central points on the images, distortion coefficients.
454
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
Fig. 1. A point Q = (X, Y, Z) is projected onto the image plane by the ray passing through the center of projection, and the resulting point on the image is q = (x, y); the image plane is really just the projection screen “pushed” in front of the pinhole.
The distortion coordinates for point (x, y), relative to the center of the image are calculated as follows:
x
x' (1 k1 r 2 k 2 r 4 k 3 r 6 ) 2 p1 x' y' p 2 (r 2 2 x' 2 )
y
y' (1 k1r 2 k 2 r 4 k 3 r 6 ) p1 (r 2 2 y' 2 ) 2 p 2 x' y'
(3)
where: ݔᇱ ǡ ݕԢ - projection coordinates of the point, ݇ଵ ǡ ݇ଶ ǡ ݇ଷ ǡ ଵ ǡ ଶ – distortion coefficients. The aim of the external calibration is to determine the orientation and position of the camera in the scene space coordinates. Because in each new scene the camera is positioned differently, the external calibration should be repeated if the scene or the position of the camera has changed In general, the external parameters of the camera are the rotation matrix R (4) and the translation vector t: R
t
RD R E R J ,
(t x , t y , t z ) T ,
(4) (5)
where:
ܴఈ ǡ ܴఉ ǡ ܴఊ – rotation matrices about axes ܱ௫ ǡ ܱ௬ ǡ ܱ௭ by an angleߙǡ ߚǡ ߛ, ݐ௫ ǡ ݐ௬ ǡ ݐ௭ – translation values along the axes ܱ௫ ǡ ܱ௬ ǡ ܱ௭ .
Currently, there are many methods of camera calibration in a stereoscopic system, including the use of vanishing points2, 3, neural networks4, calibration of stereo cameras from two perpendicular planes5, an approach based on Faugéras and Toscani’s calibration method6, calibration using a chessboard7,8,9. The proposed algorithm uses a calibration method using a chessboard (the board consists of black and white squares), as it is the most simple and accessible, but also in comparison with the other methods the user does not have manually select the control points
455
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
or planes. For more accurate calibration, the chessboard needs to be turned at different angles as shown in Fig. 2. The required amount of pairs (left and right) of the chessboard images is approximately 8 - 20.
Fig. 2. Camera calibration using a chessboard.
A large number of chessboard images is required in order to calculate the translation, rotation and the internal camera parameters. Each new image gives us eight equations at the cost of six new extrinsic unknowns, so given enough images we should be able to compute any number of intrinsic unknowns10 The Zhang’s method7 was used to calculate the focal lengths and off sets. A different approach was used to calculate the distortion parameters that is based on method proposed by Brown11. Stereo camera calibration algorithm: Acquiring the pairs of images from camera with a chessboard If the acquired images have color, then they need to be converted to grayscale Search for characteristic points (corners) on each acquired image If the required amount of points is found (9x6), not only must all the corners be found, they must also be ordered into rows and columns as expected x Calculating the perspective distortion of the found points, the point Q of the model and its projection q are linked through the homographic matrix H: x x x x
q~
~ s H Q ,
H
s A [ RD
(6) RE
t] ,
(7)
where: s- scaling factor. x
Calculating the internal parameters vo, ߣ , fx, fy,ߛ, u0 of the camera: v0
O fx
2 ( B12 B13 B11 B 23 ) /( B11 B 22 B12 ), 2 B33 [ B13 v 0 ( B12 B13 B11 B 23 )] / B11 ,
O / B11 ,
(8) (9) (10)
456
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
fy
J u0
x
2 OB11 /( B11 B22 B12 ),
(11)
B12 f x2 f y / O ,
(12)
Jv 0 / f y B13 f x2 / f x
(13)
We seek a single rotation matrix and translation vector that relate the right camera to the left camera, to solve rotation and translation parameters of the chessboard views for each camera separately. Because of image noise and rounding errors, each chessboard pair results in slightly different values for R formula (14) and T formula (15): R
R r ( Rl ) T ,
(14)
T
Tr RTl
(15)
After applying the rotation matrix, the right camera will be in the same plane as the left camera. This makes the two image planes coplanar but not row-aligned. x
Aligning the stereo images (image rectification), after which the epipolar lines are arranged horizontally, and the scan lines are the same in both images. This process allows to align the optical axes of both cameras in parallel so that their beams intersect at infinity
x
Calculating the alignment map, separately for each camera.
2.2. Stereo correspondence matching The next step is the creation of a disparity map that is going to be used to calculate the depth map. In order to construct a disparity map, it is necessary to look for similarities on the pair of images. One of the conditions, when looking for similarities, is that the camera regions intersect. The search for similarities between two images (as proposed by Kurt Konolige 12), acquired from different cameras consists of the following steps: x
Image pre-processing step: normalization process of the input images in order to change the difference in brightness and improve quality. A mask is placed on the image, which uses a sliding windows with possible dimensions: 5ɯ5, 7ɯ7, ..21ɯ21: Ic
min(max( I c I , I cap ), I cap )
,
(16)
where: ܫ-ҧ average value in the window, ܫ – central pixel in the window, ܫ - a positive numeric limit whose default value is 30. x
The search for matching points along the epipolar lines on both images using a sliding window, which operates on the principle of a sum of absolute difference (SAD). After image rectification, every line of the image becomes a epipolar line, so the matching point on the left image must located on the same line on the right image. It is assumed that the pixel on the right image must have the coordinates (x0 — d, y0), where d - is the disparity value. The search for matching pixels is performed by calculating the maximum of the response function, which may be, for example, correlation of the surrounding pixels
457
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
x x
Image post-processing: elimination of incorrect matches using a sliding window Calculation of the depth of the image. The depth values are inversely proportional to the pixel off set value and the relationship between the disparity and depth can be expressed as follows (using the notations from the left half of Fig. 3): T d Z f
T oZ Z
fT d
(17)
where: T-stereo base, Z - distance.
Fig. 3. Calculating the depth map.
2.3. Constructing the three-dimensional model of an object A graphics editor "3Ds Max" was used to display the resulting 3D model of the object that was acquired using two cameras. A depth map was placed on a plane as a texture, then a displace modifier was applied to the plane. This allowed to quickly visualize the results. 3. Experiments The experiments consisted of two parts: a) real environment with 2 cameras, reading their data, camera calibration and processing the acquired results; b) virtual environment with a simulated 3D scene with two stereo cameras, reading their data, camera calibration and processing the acquired results. The main objective of the experiments was to see if it was possible to create 3D model from the data that was acquired from low-resolution cameras. Also, check which resolution allows to acquire the data that is necessary for the facial recognition, by generating different resolution images in a virtual environment: a) part. The first experiment used two monochrome cameras (Chameleon Point Grey), where the frame rate was 18 frames/sec, 1.3 MP, and frame resolution is 1296ൈ964. The experiment also used a microcontroller "Arduino Micro" for synchronous transmission of images from two cameras. The "FlyCap" software was used to capture images from the cameras. In order to speed up the algorithm the original images were reduced in resolution to 800x600 pixels. The calibration procedure was performed on the cameras, and they were used to acquire two twodimensional images of a person. Cameras were placed in parallel, their beams did not intersect (see Fig. 4). The matching points were found on all images and a depth map was created. From the results it is clear that the selected stereo base (15cm) was too large and since the cameras were placed at a short distance from the object (1m) and their beams were parallel, the necessary matches were not found on the images and the depth map does not allow to calculate the depth of the object.
458
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
Fig. 4. (a) left input image; (b) right input image; (c) disparity map; (d) 3D model.
To improve the results (see Fig. 5.) two images were combined horizontally (x axis) and aligned vertically ( y axis). The matching points were found on all images and a depth map was created. For both experiments the 3D model was created in a virtual environment - "Autodesk 3Ds Max".
Fig. 5. (a) combined images; (b) depth map; (c) 3D model of the depth map.
b) part. For the experiment in a virtual environment, a scene was created that consisted of two stereo cameras, 3D model of a head, that was randomly generated in a "Face Gen Modeller" program and transferred into "Autodesk 3Ds Max". This time the cameras were not placed in parallel their beams intersect on the object. In order to check how accurately the algorithm allows to reconstruct the face and its features for the task of facial reconstruction, the images were generated in several resolutions see Fig. 6. The matching points were found on all images and a depth map was created.
Fig. 6. Virtual environment experiment results.
Olga Krutikova et al. / Procedia Computer Science 104 (2017) 452 – 459
4. Conclusion Two experiments were performed, the first one was in a virtual environment with a manually created scene with a 3D model of a head and two cameras that were calibrated and used to reconstruct the virtual 3D head. The second experiment was performed in a real environment using two cameras that were calibrated and used to reconstruct 3D model of a real object. Unfortunately, it was not possible to reconstruct the 3D model from the images that were acquired from the real cameras, which proves the need for high resolution cameras in facial recognition tasks. Another downside is the need for very accurate installation and calibration of cameras, where the smallest inaccuracy can lead to distortions and tears on the model mesh. Acknowledgements Special thanks to the company "Apply" that provided the cameras for the experiments. References 1. 2.
Malik AS, Choi TS, Nisar H. Depth Map and 3D Imaging Applications: Algorithms and Technologies. IGI Global; 2012. p. 648. Capril B, Torre V. Using vanishing points for camera calibration. International Journal of Computer Vision. Vol. 4(2); 1990. p. 127– 139. 3. Cipolla R, Drummond T, Robertson DP. Camera calibration from vanishing points in images of architectural scenes. In: Proc. of British Machine Vision Conference. Vol. 2. Nottingham, UK; 1999. p. 382–391. 4. Chen B, Wang W, Qin Q. Stereo vision calibration based on GMDH neural network. Applied Optics. Vol. 51 (7); 2012. p. 841–845. 5. Hu Z, Tan Z. Calibration of stereo cameras from two perpendicular planes. Applied Optics. Vol. 44 (24); 2005. p. 5086–5090. 6. Ali-Bey M, Moughamir S, Manamanni N. Calibration method for multiview camera with coplanar and decentered image sensors. Journal of Electronic Imaging. Vol. 22 (2); 2013. Art. N023021. 7. Zhang Z. Flexible new technique for camera calibration. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 22 (11); 2000. p. 1330–1334. 8. Zhang Z. Flexible camera calibration by viewing a plane from unknown orientations. In: Proceedings of the 7th International Conference on Computer Vision. Corfu; 1999. p. 666-673. 9. Sturm PF, Maybank SJ. On plane-based camera calibration: A general algorithm, singularities, applications. In: IEEE Conference on Computer Vision and Pattern Recognition; 1999. 10. Bradski G, Kaehler A. Learning OpenCV. O’Reilly Media, Inc.USA; 2008. p. 552. 11. Brown DC. Close-range camera calibration. Photogrammetric Engineering. Vol. 37 (8); 1971. p. 855–866. 12. Konolige K. Small vision system: Hardware and implementation. In: Proceedings of the International Symposium on Robotics Research. Hayama, Japan; 1997. p. 111–116.
Olga Krutikova is a 4th year PhD student at Riga Technical University. In 2012, she successfully received the degree of master of Engineering Sciences in computer control and computer networks. Her research interests include computer vision, image processing, face recognition, scene analysis and computer graphics. Contact her at
[email protected].
459