Available online at www.sciencedirect.com
ScienceDirect
Available online at www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000
ScienceDirect
www.elsevier.com/locate/procedia
Procedia Computer Science 110 (2017) 434–439
International Workshop on Big Data and Networks Technologies (BDNT 2017)
Novel approach to pose invariant face recognition Brahim AKSASSE*, Hamid OUANAN and Mohammed OUANAN Department of Computer Sciences, ASIA Team Moulay Ismail University, Faculty of Science and Techniques BP 509 Boutalamine 52000 Errachidia, Morocco
Abstract Face verification in the wild, remains a challenging problem. This paper makes two contributions: first, for improving face recognition in the wild, at least in terms of pose variations, we propose a method for aligning faces by employing single-3D face model as reference produced by FaceGen Modeller. Second, we develop a novel face descriptor based on Gabor Filters. The proposed descriptor relies on combination of Gabor magnitude and Gabor phase informations into an unified framework, which is capable to overcome standard representations in the most popular benchmark “Labeled Faces in the Wild” (LFW). This compact descriptor has a better recognition performance, reaches an accuracy of 97.29% on the LFW dataset. © 2017 The Authors. Published by Elsevier B.V. © 2017 The under Authors. Published by B.V. Program Chairs. Peer-review responsibility of Elsevier the Conference Peer-review under responsibility of the Conference Program Chairs. Keywords: Face Recognition, Gabor Filter, Pose Correction, Support Vector Machine, RBF Kernel, LFW;
1.
Introduction
In face verification, images are presented in pairs and the task is to verify if they belong to the same or different persons. Face verification has recently gained lot of popularity owing to few public benchmark datasets being available. Fig. 1 shows example of such a task. Applications of this task are in search and authentification domains
* Corresponding author. Tel.: +212672345351; fax: +212535574485. E-mail address:
[email protected] 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs.
1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.06.108
2
Brahim AKSASSE et al. / Procedia Computer Science 110 (2017) 434–439 Brahim AKSASSE / Procedia Computer Science 00 (2015) 000–000
435
such as entertainment, human machine interaction, homeland security, and video surveillance, access control to user authentification schemes in e-commerce, e-health, and e-government services. There are many challenges in dealing with this applications listed such as variation in illumination, variability in scale, location, orientation and pose. Furthermore, facial expression, facial decorations, partial occlusion and lighting conditions change the overall appearance making it harder to recognize faces. Fig. 2 shows some examples of these types of challenges. Face recognition is really a series of several related problems: face detection, face normalization, feature extraction, and feature matching. As a human, your brain do all of this tasks automatically and instantly. Computers are at least not yet able of this sort of high-level generalization, so we have to teach them how to do each task in this process separately. To do so, we need to build a pipeline where we solve each step of face recognition separately and pass the result of the current step to the next step (see figure 3).
Fig. 1: Examples of similar and dissimilar pairs
Fig. 2: Examples of visual challenges
Fig. 3: Pipeline of a typical face recognition system The main objective of this work is to propose a reliable framework insensitive to challenges listed above, in particular capable to “identify faces from a side view” as well as when the person is directly facing the camera in the
Brahim AKSASSE et al. / Procedia Computer Science 110 (2017) 434–439 Brahim AKSASSE / Procedia Computer Science 00 (2015) 000–000
436
3
picture to approach human-level performance in this domain. In summary, the novelty of this paper comes from: (i) an effective 3D face alignment module; (ii) effective representation for describing faces using Gabor Filters. We also investigate various approaches to effectively reduce their dimension while improving their performance further; and (iii) extensive performance evaluation studies. 2.
Related Works
The conventional pipeline of a typical face verification system requires these steps: face detection, facial landmark detection, alignment, representation and classification. However, several papers focus on a few of these aspects in order to improve the overall system performance. In this work, we have focused on both alignment and the representation steps. In this section, we briefly review some recent related works on face alignment and face representation in the context of face verification. State-of-the-art face alignment: Aligning faces in under in-the-wild conditions is still a most difficult problem that has to account for many factors like non-rigid face expressions and pose. Recently, some techniques bearing capable to compensate for these difficulties, which can be roughly divided into two main categories: (i) part-based methods which represent the face by using a set of local image patches extracted around of the predefined landmark points and (ii) Holistic methods which use the whole texture of face as representation. The most-well known techniques and produced good results: In the first category methods like Active Shape Models (ASMs)2 and Constrained Local Models (CLMs)2. In the second category, methods like Active Appearance Models (AAMs)3 and 3D Deformable Models (3DMs)4. However, no complete solution is currently present in the context of face recognition in the wild because the accuracy of those detection and localization landmarks algorithms degrades as the yaw or pitch angle of the face increases. Brief review of recent face verification approaches: Representing face images has been an important topic in computer vision and image processing. The diversity of feature extraction methods is surprising. In this section, we look at some methods which produced better performance over large scale database like LFW and FERET face databases. The authors of the paper5 proposed a facial image representation giving better results on FERET database6, this method rely on Gabor filters (GFs) and Zernike moments (ZMs), where GFs is used for texture feature extraction and ZMs extracts shape features, in other hand, a simple Genetic Algorithm (GA) is applied to select the moment features that better discriminate human faces under several pose and illumination conditions. Next, the augmented extracted feature vectors are projected onto a low-dimensional subspace using Random Projection7 (RP) method. The authors of the paper8 proposed a regularization framework to learn similarity metrics for face verification in the wild. This method achieves a good results on the (LFW) database1. In the paper9, the authors proposed a joint Bayesian approach based on the classical Bayesian face recognition approach proposed by Baback Moghaddam et al.10. This approach achieved 92.4% accuracy on the LFW dataset. Another interesting approach is Fisher vector encoding performs well on LFW. However, the accuracy of those algorithms degrades on extreme poses of face like profile. This show the need of techniques capable to compensate large pose variation. 3.
The proposed framework
3.1. Pose Correction In this section, we describe briefly our method used for 3 D pose correction: In this method, we use the same textured 3 D face model as reference to align all query images. This 3 D face model is produced by FaceGen Modeller11. We begin by rendering this reference model in a fixed, frontal view. We refer to this as the reference frontal view I R (see equation 1) which serves as our reference coordinate system, 68 facial landmarks p i = ( xi , y i ) T are detected in this image using the method12, selected for its accuracy in real world face photos. T For each point detected we associate the 3D coordinates ( Pi = ( X i , Yi , Z i ) ) . Given a query image I Q , it is processed by first running the Viola-Jones detector13. We again use14 to detect the same 68 landmarks in I Q , giving ' ' ' T us points ( p i = ( x i , y i ) ) . Using these, we form correspondences ( p i , Pi ) from 2D pixels in the query photo to 3D points on the model. We then compute specific 3x4 camera matrix CM by selecting suitable intrinsic and extrinsic camera parameters, using a standard calibration method.
Brahim AKSASSE / Procedia Computer Science 00 (2015) 000–000 Brahim AKSASSE et al. / Procedia Computer Science 110 (2017) 434–439
4
437
p ' ≅ CM P CM = AM [ RM tM ]
(2)
CQ = AQ [ RQ tQ ]
(3)
(1)
Where AM : is the intrinsic matrix,
RM : is the rotation matrix, and t M : is the translation vector.
Input: Query image I Q , textured 3D face model, rendered frontal view of this model ( I R ). Output: Frontalized Face Step 1: Facial feature points ( pi
= ( xi , yi )T ) detected in the query image I Q . '
Step 2: Same facial feature points ( pi
= ( xi' , yi' )T ) will be detected in I R and their correspondence points
( Pi = ( X i , Yi , Z i )T ) on the surface of the model. Step 3: Seek 2 D − 3D correspondences between points ( pi , Pi ) . Step 4: Estimation the query
3x4 camera matrix C Q used to capture the query image I Q .
Step 5: Back-projection query intensities to I R (equation 3). Step 6: Estimation of visibility due to non-frontal poses by symmetry. Step 7: Face patches Extraction and classification. Step 8: Final frontalized crop canonical view. Fig. 4: An overview of the face alignment proposed method 3.2. Face Representation The frequency and orientation representations of Gabor filters are similar to those of the human visual system and they have been found to be particularly appropriate for texture representation. Gabor filters have been widely used in pattern analysis applications. The most important advantage of Gabor filters is their invariance to illumination, rotation, scale, and translation. Furthermore, they can with stand photometric disturbances, such as illumination changes and image noise. A 2D Gabor function g ( x, y ) and its Fourier transform G (u , v ) are as follows:
g ( x, y ) =
1 x2 y 2 exp[− ( 2 + 2 ) + 2πjωx ] 2πσ xσ y 2 σx σy 1
1 (u − ω ) 2 v 2 G (u , v) = exp[− ( + 2 )] σ u2 σv 2 where:
σu =
1 2πσ x
and σ v
=
g mn ( x, y ) = a − mG ( x ' , y ' )
(4)
(5)
1 2πσ y (6)
a ≥ 1; x ' = a − m ( x cos sθ + y sin θ ) and y ' = a − m ( y cos θ − x sin θ ) , for m = 0,1,... M − 1 and n = 0,1,... N − 1 , M is the number of resolutions and N is the number of orientations. where
Brahim AKSASSE et al. / Procedia Computer Science 110 (2017) 434–439 Brahim AKSASSE / Procedia Computer Science 00 (2015) 000–000
438
5
The feature extraction procedure can then be defined as a convolution operation of the face image I ( x , y ) with the Gabor filter G (u , v ) . The result of this operation is a complex image defined by the amplitude and the phase for each pixel of the image:
ψ u , v ( x, y ) = I ( x, y ) * Gu , v ( x, y )
(7)
Based on this equation the magnitude and the phase responses of the convolution operation can be computed as follows:
Au , v ( x, y ) = Re(ψ u , v ( x, y )) 2 + Im(ψ u , v ( x, y )) 2
ϕu , v ( x, y ) = arctan(
Im(ψ u , v ( x, y )) ) Re(ψ u , v ( x, y ))
(8) (9)
The great number of Gabor-based face recognition approaches found in the literature rely solely on the magnitude information when constructing the Gabor face representation and discard the phase information of the convolution output image. The magnitude face representation vector is computed by taking the following steps: Input: Face Image (128 × 128) pixels. Output: The validity of the Identity claim Step 1: Face Frontalization Step 2: Gabor filter construction with bank of 40 filters. Step 3: Gabor features derived from the Gabor filter magnitude response (similarly from the Gabor filter phase) are computed for all frequencies (u = 4) and orientations (v = 8) (GMFR).
Step 4: Downsampling by a factor ( ρ = 64 ) the computed GMFRs (similarly GPFRs). Step 5: The downsampled GMFRs are normalized using an appropriate normalization procedure. Step 6: The downsampled and normalized GMFRs (similarly GPFRs) in vector form are concatenated to form the augmented GMFR vector. Step 7: Project the augmented feature vectors into a subspace by using Kernel Fisher Analysis (KFA). Fig. 5: An overview of the Magnitude Gabor Descriptor
Despite the downsampling procedure, the size of the descriptors presented still reside in a very high-dimensional space. In this paper, the KFA technique15 is applied to the augmented Gabor magnitude face and augmented Gabor phase representation vectors to obtain a compact representation based Gabor-Magnitude and a compact representation based Gabor-Phase. Then, we use the SVM classifier16 based on RBF kernel to classify GM+KFA feature and GP+KFA feature extracted from face images. Finally, we combine both matching score at the matching score. The accuracy δ of the method proposed is computed using the following expression: δ = (1 − γ )δ GM + γδ GP . Where δ GM denotes the accuracy obtained from Gabor magnitude features, δ GP denotes the accuracy obtained from Gabor phase features and 4.
γ ∈]0,1[ denotes the fusion parameter.
Experiments and tests
The performance of the proposed approach is assessed by conducting experiments on the well-known LFW dataset which contains 13,233 images of 5,749 people downloaded from the Web. This database, cover large variations including: different subjects, poses, illumination, occlusion etc. For evaluation, we have used the standard protocol which defines 3,000 positive pairs and 3,000 negative pairs in total and further splits them into 10 disjoint subsets for cross validation. Each subset contains 300 positive and 300 negative pairs, portraying different people. We compare the mean accuracy of the proposed approach with some methods which achieve state of the art and other commercial systems. The results are summarized in Table 1.
Brahim AKSASSE / Procedia ComputerComputer Science 00 (2015)110 000–000 Brahim AKSASSE et al. / Procedia Science (2017) 434–439
6
439
Table 1: Accuracy of different methods on the LFW dataset. Method DeepFace17 DeepID218 Yi et al.19 Wang et al.20 Human21 Our Proposed approach
Metric unrestricted, SVM unrestricted, Joint-Bayes Cosine Cosine
Mean Accuracy 97.35% 95.43% 96.13% 96.95% 97.53% 97.29%
unrestricted, SVM The results shows that our proposed method achieve a good results on LFW dataset, which contains faces with full pose, illumination, and other difficult conditions. It is robust, especially in the presence of large head pose variations. It can be seen from the table 1 that our approach performs well comparably to other methods and commercial systems. 5.
Conclusion
In this paper, we have presented a new face verification approach. Our new approach was evaluated on LFW dataset. Experimental results demonstrate that the performance of the proposed approach is much better than the some methods which achieve state of the art and other commercial systems. The gap between our proposed approach and human performance on LFW benchmark is less than 1%. In the future, we will apply our approach to video processing. We believe that our method will demonstrate competitive performance. References 1. L. A. Jeni, J. F. Cohn, and T. Kanade. Dense 3d face alignment from 2d video for real-time use. Image and Vision Computing, 2016. 2. T. Cootes, G. Edwards, and C. Taylor. Active appearance models. TPAMI, 23(6):681–685, Jun 2001. 3. H. Chen, M. Gao, and B. Fang. An improved active shape model method for facial landmarking based on relative position feature. International Journal of Wavelets, Multiresolution and Information Processing 2017. 4. K. Kim, T. Baltruaitis, A. Zadeh, L.-P. Morency, and G. Medioni. Holistically constrained local model: Going beyond frontal poses for facial landmark detection. In Proc. British Mach. Vision Conf., 2016. 5. H. Ouanan, M. Ouanan, and B. Aksasse “Gabor-Zernike Features based Face Recognition Scheme”, International Journal of Imaging and Robotics™ Vol. 16, Issue Number 2, pp. 118-131, 2015. 6. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J., “The FERET evaluation methodology for face recognition algorithms”. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104, 2000. 7. Menon, A.K, “Random projections and applications to dimensionality reduction”, Phd thesis, School of Information Technologies, The University of Sydney, Australia, 2007. 8. Q. Cao, Y. Ying, and P. Li. Similarity metric learning for face recognition. In Proc. Int. Conf. Comput. Vision, pp. 2408–2415. IEEE, 2013. 9. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, TR 07-49, 2007. 10. D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proc. ECCV, pages 566–579, 2012. 11. Moghaddam, B., Jebara, T., Pentland, A.: Bayesian face recognition. Pattern Recognition 33 (2000) 1771–1782. 12. FaceGen accessed by https://facegen.com/modeller.htm 13. X Zhu, D Ramanan, in Proc. of. Conf. on Computer Vision and Pattern Recognition. Face detection, pose estimation, and landmark localization in the wild (Providence, RI, USA), pp. 2879–2886, 2012. 14. P. Viola and M. Jones. Robust real-time face detection. Int. J. Comput. Vision, 57(2):137–154, 2004. 15. S. Khellat-Kihel, R. Abrishambaf, J.L. Monteiro, M. Benyettou. “Multimodal fusion of the finger vein, fingerprint and the finger-knuckleprint using Kernel Fisher analysis” Applied Soft Computing, 2016, 42, pp. 439–447, 2016. 16. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. 17. Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1701–1708, 2014. 18. Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014. 19. D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014. 20. D. Wang, C. Otto, and A. K. Jain. Face search at scale: 80 million gallery. arXiv preprint arXiv:1507.07242, 2015. 21. M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? Metric learning approaches for face identification. In Proceedings of the International Conference on Computer Vision, 2009.