Virtual Glasses Try-on Based on Large Pose Estimation

Virtual Glasses Try-on Based on Large Pose Estimation

Available online at www.sciencedirect.com ScienceDirect Available online at www.sciencedirect.com Procedia Computer Science00 (2018) 000–000 Scienc...

822KB Sizes 36 Downloads 66 Views

Available online at www.sciencedirect.com

ScienceDirect

Available online at www.sciencedirect.com Procedia Computer Science00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia

Procedia Computer Science 131 (2018) 226–233

8th International Congress of Information and Communication Technology (ICICT-2018)

Virtual Glasses Try-on Based on Large Pose Estimation Zhuming Fenga, Fei Jiangb, Ruimin Shen一 a,b Shanghai

Shanghai Jiao Tong University, Dongchuan RD. 200240, China

Abstract Virtual glasses try-on displays the front and side effects of the glasses to the user by loading the virtual glasses model onto the user's photos or videos. Currently, virtual glasses try-on methods are either inaccurate or depending on sophisticated equipment, and cannot be applied to a large-scale rotation situation. In this paper, we design and implement a novel virtual glasses try-on method based on large-scale head pose estimation, using 3D face reconstruction and pose estimation techniques, to estimate the continuous pose angle of a face, and then rotates the glasses model synchronously, in order to realize the accurate matching of glasses and face. Compared with other methods, the proposed method does not depend on sophisticated equipment, and also can be applied to the large-scale rotation situation. © 2018 The Authors. Published by Elsevier Ltd. © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review responsibility of organizing of committee of thecommittee 8th International Congress of Information Communication Selection andunder peer-review under responsibility the scientific of the 8th International Congressand of Information and Technology (ICICT-2018). Communication Technology. Keywords: Virtual glasses try-on, pose estimation, augment reality, face reconstruction, face alignment;

1. Introduction With the popularity of online shopping, how to enhance the users’ experience of online shopping has become an attractive topic in the research area of computer application. Currently, the most popular way to display items for online shopping is still photos and text descriptions. The demand for beauty and matching of accessories is higher, while traditional e-commerce sites’ displaying methods are not enough to meet the demand. Glasses, as a kind of accessory as well as medical device, is costly to exchange, which need to be appropriate in size, and to be beautiful. Compared with other accessories, purchasing glasses online have a higher demand of shopping experience. And customers prefer to try glasses like in physical stores. Virtual glasses try-on is to load the glasses model onto the Corresponding author. Tel.: +86 185-0585-5998. E-mail address: [email protected]

一 一

© 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Congress of Information and Communication Technology 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Congress of Information and Communication Technology 10.1016/j.procs.2018.04.207

2

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

227

user's photos or videos, which can be used to show the appearance of effect when wearing glasses. Moreover, virtual glasses try-on is an application of augment reality in the e-commerce field. Currently, the existing methods of virtual glasses try-on can be generally divided into two categories, namely, methods based on facial feature points and affine transformation, such as [1], [2], and methods based on head pose estimation, such as [3] and [4]. Among them, the facial feature point and affine transformation based method are simple, but the matching accuracy of the face and glasses is not high. And the head pose estimation based method conforms more to the physical relationship between glasses and face. However, the existing methods either lacks accuracy, or need sophisticated equipment like a depth camera or several aligned cameras, which will have a certain difficulty in the practical promotion. Moreover, almost all the existing methods do not apply to the large rotation situation. For the above problems, this paper designs a novel glasses try-on method based on the large-scale head pose estimation. The method proposed can not only correctly match the glasses model with face within large angle, but also does not need complicated equipment. The main contributions of this paper are:  A new method of glasses try-on is designed, which exploits 3D face reconstruction and head pose estimation to estimate the rotation angle of a face in spatial dimension. Then the glasses model pre-aligned with a generic face model can rotate synchronously to realize the matching of the virtual glasses and the face.  The method proposed in this paper does not use depth cameras, multiple cameras, 3D scanners or other special hardware to implement virtual try-on effect, which makes it more universal and can be applied to various practical conditions of scenario.  The algorithm used in this paper is applicable to the situation of large side face, so as to give the effect of multidirectional glasses try-on, which is more accord with the application scenario of glasses try-on and can enhance the user experience. 2. Related Work The key point for virtual glasses try-on is to match the glasses and the face in the image. Existing methods of virtual glasses try-on can be generally divided into two categories according to the matching method, i.e., methods based on facial feature points and affine transformation, such as [1], [2], and methods based on head pose estimation and eye location, such as [3] and [4]. 2.1. The Methods Based on Facial Feature Points and Affine Transformation Affine transformation refers to a vector space being transformed into another vector space through a linear transformation plus with a translation transformation. In [1], the three feature points around the eye are selected and matched through affine transformation. This method is easy to implement, but the result of the side face is not good. In [2] the user's face is scanned, and then six feature points are selected and matched. This method requires a 3D scanning instrument, and its application scenario is aimed at a specific user. The input of [5] is a video of the user's face, which makes the result more real than the picture does. But the system needs 2 special cameras and can only be fitted with a frontal try-on. In [6], 3D face reconstruction [7] is used to reconstruct the face and a head tracking system is called to track the head motion. Then, an affine transformation is applied to match four points which are in the corner of eyes and ears. This paper uses 3D head model to show the result, which is more three-dimensional than most existing methods. However, compared with the augmented reality method, this type of display method will lack authenticity in face, and be dependent on face reconstruction, texture extraction and rendering. By estimating the affine transformation parameters, the glasses model can be matched to the image through the transformation. This is the key technology used in some methods [1][2][6], but with some difference in the hardware platform, input and output. But on the whole, the effect of virtual try-on in the aspects of accuracy and stereo sense are poor. The main reason for this is that the real situation of wearing glasses is to put the glasses on the bridge of the nose and the ears. The frames of glasses are parallel to eye orbits and there is a gap between them. When the head rotates, the corresponding relations between facial feature points and pre-annotated on glasses model are no

228

Zhuming / Procedia Computer Science(2018) 131 (2018) 226–233 Zhuming FengFeng et al.et/ al. Procedia Computer Science00 000–000

3

Fig. 1. The matching method based on 3 feature points and affine transformation. In (4) it could be seen that the original corresponding blue point moves outside the eye corner, which illustrated that the corresponding relationship will change when face rotating.

longer the same, as shown in figure 1. In addition, when matching the glasses model with the image through affine transformation, only the corresponding two planes are coincided, which makes the effect to be the frames closely stick to the face, rather than the effect of frames being above the face like the real situation. 2.2. Methods Based on Head Pose Estimation The head pose estimation is to estimate attitude parameters of the head given a face image as the input. The form of attitude angle is a three-dimensional vector with value representing the rotation angle around three axes (z, y, x axis) and called roll, yaw, pitch, respectively.

Fig. 2. The face attitude angle

There are numerous algorithms for head pose estimation, which can be separated into three groups: model-based algorithms [8], appearance-based algorithms, and classification-based algorithms. The pose estimation algorithm used in [4] is based on the symmetry prior of human eyes, is an appearance-based method. The classification-based method is to learn and use a classifier to classify different attitude angle of the face. The result obtained by these two methods are usually discrete and is not suitable for the real application of virtual glasses try-on. Model-based pose estimation algorithms require a three-dimensional standard model. Its key idea is to rotate the 3D standard model so as to make the projection of the 3D feature points on the model coincides as much as possible with the feature points on the input image. Most papers use the nonlinear least square method to model this procedure, the formula for which is as follows:

4

,

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

� �,�,� = �th

h t=1

�t − � � �,�,� �t + �

229

2

where �,�,� represents three components of face attitude angle respectively , h represents the number of feature points, �t is feature points on the face to be measured, �t is the corresponding landmark points on the corresponding 3D referenced face model, R is rotation matrix and t is the translation vector. [3] uses the Kinect and AAM [10] based facial tracking algorithm to track the face movement and deal with the occlusion problem caused by head occluding some part of glasses. This method uses the AAM based face tracking algorithm to transform the glasses matching problem to a robust face tracking procedure [3]. The essence of this method is to estimate the attitude angle of the whole head. Compared with many papers uses feature points around the eyes and affine transformation, AAM considers the shape and appearance of the entire face to estimate head pose, which makes the glasses matching results more stable and accurate. However, the system requires special hardware, which is the Kinect depth camera. 3. Proposed Method The overall framework of the proposed virtual glasses try-on method is shown in Figure 3. For a facial image input by the user, a face detection algorithm is firstly applied to detect and label the region of face. Then the 3D dense face alignment algorithm (3DDFA) [11] is applied to estimate 3D parameters of reconstruction for human face in the image, including three-dimensional pose parameters, two-dimensional position parameters, onedimensional scale ratio, 199-dimensional shape parameters and 28-dimensional expression parameters. With the parameters of pose, position and scale ratio, we can rotate and translate the glasses model to match the glasses with face. On the other hand, the reconstructed 3D face model can be used in dealing with the occlusion problem accurately.

Fig. 3. The overall procedure of virtual glasses try-on

3.1. Large Scale 3D Face Alignment Blanz et al. [12] proposed a three-dimensional morphable model (3DMM) to describe the representation of human faces in three-dimensional space:

230

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

5

푆 = 푆 + 퐴t‸�t‸ + 퐴푒t��푒t� where 푆 is a 3D face, 푆 is the average face shape, 퐴t‸ is the shape principle axes trained under the neutral expression, 퐴푒t� is the expression principle axes trained under the mean shape, while �t‸ is the shape parameter and �푒t� is the expression parameter. 3D face alignment is to fit a 3D morphable face model to the face image and extract the semantic meaning of facial points [9]. [11] put forward the 3D dense face alignment algorithm which fit the Basel Face Model(BFM) [13] to a face image through convolution neural network (CNN) with the specially designed feature so as to resolve the fitting problem. The input of the 3D dense face alignment algorithm is a face image, the final output is a collection of model �

parameters P = �,�t�H�,�V�,䀀㌳䁖䁖,�2‸,�t‸,�푒t� . In the �th iteration, the convolution neural network takes medium result P� and the specially designed projected normalized coordinate code (PNCC) as the input to predict the update of model parameters ∆P� . The PNCC is constructed by the projected 3D face rendered by the Z-Buffer with normalized coordinate code (NCC) as its colormap. And the NCC is a normalized coordinate as well as a texture, which normalizing the 3D mean face into 0-1 and the normalized coordinate also as its color in RGB respectively. The NCC and PNCC can be calculated as: 푆‸ − �th 푆‸ NCC‸ = , �Vt 푆‸ − �th 푆‸ PNCC = � − �N��푒䀀 V3‸ P , 푁�� where ‸ = t,�,�. Considering both effectiveness and efficiency 3 times of iterations is chosen in 3DDFA [11]. After the fitting procedure, the 3D face can be projected onto the face image through the weak perspective: V P = � ∗ Pr ∗ R ∗ 푆 + 퐴t‸�t‸ + 퐴푒t� �푒t� + �2‸ where V P is the model projection function, � is the scaling ratio, Pr is the orthogonal projection matrix, R is the rotation matrix, and �2‸ is the translation vector. 3.2. Glasses Matching

Based on the results of large-angle 3D face alignment, we can extract the attitude angle and position of the face in the image. The glasses model needs to be pre-aligned to a suitable space position and angle, so that the glasses model can be adapted to the 3D generic face model in the frontal situation. Then the matching problem between glasses and face in images is transformed into a fitting problem between the 3D generic face model and images. We rotate and then translate the glasses model synchronously to get the matching effect. There are many kinds of representation methods for the rotation of an object in three-dimensional space, such as rotation matrix, rotation vector, Euler angle, quaternion and so on. The Euler angle is the most intuitive way to express rotation. It is a three-dimensional vector in form, which value represents the angle that the object rotates around the three coordinate axes. The result of rotation through Euler angle differs from the order of the axes the object rotates around. So there are totally 12 definitions of rotation in Euler angle. The attitude angle used in this paper is a kind of Euler angle, and the rotation order around the coordinate axis is: z-y-x. while in the calculation of coordinate transformation, it is more convenient to use the rotation method as the representation method. The rotation matrix is used both in OpenGL and DirectX. According to the attitude angle we get from 3D face alignment, we assume that the angle of Pitch, Yaw and Roll is α,β,γ respectively, then the rotation matrix can be calculated as: � �,�,� = �t � �� � �� � where 1 0 0 �t � = 0 cos � sin � , 0 − sin � cos � cos � 0 − sin � 0 1 0 �� � = , sin � 0 cos �

6

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

231

cos � sin � 0 �� � = − sin � cos � 0 0 0 1 The glasses model can be fit and projected onto the image through the same weak perspective projection with the 3D face model, that is �� ' (�,α,β,γ,�2‸ ) = � ∗ Pr ∗ � �,�,� ∗ �� + �2‸ where �� is the coordinate of glasses model in three-dimensional space. 3.3. Occlusion Handling

In this paper, we use the depth butter (Z-Buffer) algorithm along with the reconstructed 3D face model to handle the occlusion problem, without using a depth camera like [3]. Both OpenGL and DirectX have original implemented Z-Buffer function with high efficiency. When depth test is on, the rendering function will automatically compare the depth of currently rendered object with the depth recorded to ensure that only the top object could be seen. Since the virtual glasses try-on is generally designed in augment reality solution, which displays a try-on effect of virtual glasses on real face images, the reconstructed 3D face model should be set to transparency. In the experiment for this paper, we use MATLAB to implement simple Z-Buffer functions, which firstly adopt the reconstructed 3D face model as input and record the depth map t��䀀 in the region of the image and then adopt the glasses model along with depth map to project the glasses model onto original face image with some part occluded. The function can be represented as: t��䀀 = ��N��푒䀀_䀀 �푒䀀�푒t� ,�䀀t� ,t�� 䀀푒�t�� = ��N��푒䀀 �푒䀀�푒t� ,�䀀t� ,�푒t�N䀀푒� ,t��䀀,t�� where �푒䀀�푒t� is the coordinate of vertex in face model, �䀀t� is the triangles in the face model, t�� is the original image, while �푒䀀�푒t� is the coordinate of vertex in glasses model, �䀀t� is the triangles in the glasses model and �푒t�N䀀푒� is the texture of the glasses. 3.4. Effectiveness

The effectiveness of the proposed virtual glasses try-on method can be elaborated in the following aspects. Firstly, the pose estimation is used as the core method of glasses matching, which accords with the physical relationship when wearing glasses and avoids the defects described in the 2.2. Secondly, the pose estimation algorithm used in this paper adopts the features constructed by the whole face, which makes the result more accurate and stable. In addition, the algorithm used in this paper borrowers from the field of face reconstruction, while the result of reconstructed face can be applied to handling the occlusion precisely. Finally, this method can adapt to the largescale face rotation situation, which is more fit for the glasses try-on scenario. 4. Experimental Result Table 1 lists the comparisons between the method proposed in this paper and other related methods. The category of methods based on facial feature points and affine transformation make the try-on effect to be glasses closely attached to the face and lack of stereoscopic because of the internal defect described in 2.2. Although the effect of [2] is quite good, it needs the Kinect depth camera, which is not available for everyone. The result of pose estimation in [4] is discrete, as shown in Fig.4-(6), so the effect in practical applications is even worse. Fig.4 (1)-(4) shows the try-on effect generated by this paper. The accuracy and dimensionality for glasses matching could be seen, even in large scale rotation situation. (3) and (4) can also reflect the effect of precise occlusion, especially in (4), which shows that a little part of lens is occluded by the user’s nose bridge. (5) is the effect of [1], in which frames are closely stick to the face. The glasses frames and the image are in the same plane, while the face in the image obviously rotates a little around the y axes. (6) is the result of [4], which can only show discrete result in Yaw.

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

232

7

Table 1.comparison with other methods Method

Effect

Special Equipment

Occlusion Handling

Large Pose

[1]

Not accurate and stereoscopic enough

No need

Not support

Not support

[3]

Good

Depth camera

Support

Not support

[2]

Not stereoscopic enough

3D scanner

Support

Not support

[4]

Discrete, not accurate enough

No need

Not support

Not support

This paper

Good

No need

Support

Support

Fig. 4. The effect and comparison. Figure 4. (1) – (4) is the effect of this paper. It can be seen in (4) that a little part of lens is occluded by nose due to the user’s large pose. (5) is the result of [1] which generated by the method based on feature points and affine transformation. (6) is the result of [4] which uses appearance based pose estimation method. It shows that this paper’s method is better than [1] and [4] in both accuracy and stereoscopy.

5. Conclusion This paper proposes a virtual glasses try-on method based on large pose estimation, mainly focusing on the matching procedure. The proposed method can ensure the accuracy and stereoscopy, fit the large-scale face rotation situation, and do not need sophisticated equipment such as depth camera. Our further work includes the integration of the whole procedure in software engineering aspect, rendering better glasses effect with Unity or other engines, obtaining and testing more glasses models.

8

Zhuming Feng et al. / Procedia Computer Science 131 (2018) 226–233 Zhuming Feng et al. / Procedia Computer Science00 (2018) 000–000

233

Acknowledgements This work was supported by the National Natural Foundation of China (No. 61671290), the National Key Research and Development Program of China (No. 2016YFE0129500) and the Science and Technology Commission Program of Shanghai (No. 17511101903). References 1. Huang W Y, Hsieh C H, Yeh J S. Vision-based virtual eyeglasses fitting system[C]//Consumer Electronics (ISCE), 2013 IEEE 17th International Symposium on. IEEE, 2013: 45-46. 2. Huang S H, Yang Y I, Chu C H. Human-centric design personalization of 3D glasses frame in markerless augmented reality[J]. Advanced Engineering Informatics, 2012, 26(1): 35-45. 3. Tang D, Zhang J, Tang K, et al. Making 3D eyeglasses try-on practical[C]//Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on. IEEE, 2014: 1-6. 4. LU Yang, WANG Shi-gang, ZHAO Wen-ting, et al Technology of virtual eyeglasses try-on system based on face pose estimation[J]. Chinese Optics, 2015, 8(4):582-588. 5. Déniz O, Castrillón M, Lorenzo J, et al. Computer vision based eyewear selector[J]. Journal of Zhejiang Universityence C, 2010, 11(2):79-91. 6. Niswar A, Khan I R, Farbiz F. Virtual try-on of eyeglasses using 3D model of the head[C]//Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry. ACM, 2011: 435-438. 7. Nguyen H T, Ong E P, Niswar A, et al. Automatic and real-time 3D face synthesis[C]//Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry. ACM, 2009: 103-106. 8. Hartley R, Zisserman A. Multiple view geometry in computer vision[M]. Cambridge university press, 2003. 9. Liu F, Zeng D, Zhao Q, et al. Joint Face Alignment and 3D Face Reconstruction[C]// European Conference on Computer Vision. Springer International Publishing, 2016:545-560. 10. Tirkaz C, Albayrak S. Face recognition using Active Appearance Model[C]// European Conference on Computer Vision. Springer Berlin Heidelberg, 1998:581-595. 11. Zhu X, Lei Z, Liu X, et al. Face Alignment Across Large Poses: A 3D Solution[C]// Computer Vision and Pattern Recognition. IEEE, 2016:146-155. 12. Blanz V, Vetter T. Face Recognition Based on Fitting a 3D Morphable Model[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2003, 25(9):1063-1074. 13. Paysan P, Knothe R, Amberg B, et al. A 3D Face Model for Pose and Illumination Invariant Face Recognition[C]// IEEE International Conference on Advanced Video and Signal Based Surveillance. 2009:296-301.