On Detectors and Descriptors based Techniques for Face Recognition

On Detectors and Descriptors based Techniques for Face Recognition

Available online at www.sciencedirect.com ScienceDirect Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000 Scien...

1000KB Sizes 0 Downloads 67 Views

Available online at www.sciencedirect.com

ScienceDirect

Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia

Procedia Computer Science 132 (2018) 908–917

International Conference on Computational Intelligence and Data Science (ICCIDS 2018)

On Detectors and Descriptors based Techniques for Face Recognition Vinay A, Nishant Aklecha, Meghana, K.N Balasubramanya Murthy, S Natarajan Center for Pattern Recognition and Machine Intelligence PES University, Bengaluru, India

Abstract Out of all forms of biometrics, Face Recognition (FR) emerges as the most incredible one. Apart from offering revolutionary applications for business and law-enforcement purposes, it has also opened numerous research avenues in various domains like security, surveillance and social network. One of the many factors critical to having an efficient face recognition system is having at hand, a suitable combination of feature descriptor and feature detector. A feature detector makes use of methods that make local decisions regarding the presence/absence of image features of a given type. A feature descriptor, on the other hand, simplifies the image by extracting useful information and disposing irrelevant information. Our research discusses the goodness of various feature descriptor-detector combination. We do this by simply carrying out the process of feature matching using various combinations of detectors and descriptors. This experiment includes incorporation of dimensionality reduction on the images using Hypercomplex Fourier Transform (HFT) and RANSAC for noise reduction. Out of the diverse options available, we chose to test LGHD, PCEHD and EHD for feature descriptors; for feature detectors, we chose the ones that make use of popular algorithms like Harris-Stephen Algorithm, Minimum Eigen Value and SURF. A series of strict and thorough experiments on popularly available datasets - Faces94 and Grimace led us to an astonishing observation - an accuracy of 90.67% for the former and 71.3% for the latter for Minimum Eigen Value paired with LGHD! This in comparison to those for all other combinations is a lot superior. We thereby conclusively state that feature detector using Minimum Eigen Value algorithm paired with feature descriptor - LGHD outplays other combinations making this combination the best choice for Face Recognition. © 2018 2018 The The Authors. Published by by Elsevier Elsevier B.V. Ltd. © Authors. Published This is an open access article under the scientific CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the committee of the International Conference on Computational Intelligence and Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018). Data Science (ICCIDS 2018). Keywords: Face Recognision; descriptors; detectors; features; HFT; Harris-Stephen; Minimum Eigen; SURF;feature extraction; LGHD; PCEHD; EHD; RANSAC;

Corresponding author: [email protected] 1877-0509© 2018 The Authors. Published by Elsevier B.V.

Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018). 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018). 10.1016/j.procs.2018.05.106

2

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A/ Procedia Computer Science 00 (2018) 000–000

909

1. Introduction Human face is an essence of a person's identity and hence plays an indispensable role in identification. Recognition of an individual by his/her facial features is an intrinsic mechanism happening subconsciously in our brain. This astounding property that our brain possesses has continuously been in the lime light for having potential to do wonders in the computer world. Face recognition is recognized to be one of the most important forms of biometrics. It finds application in both law enforcement and non-law enforcement scenarios. It stands out amongst the rest of the forms of biometrics for applications like surveillance and can be carried out without the cooperation or conscious presence of the test subject.Efficiently designed face recognition systems can identify people from crowd without their notice. Face recognition is required, more than anywhere else, in security [9]- both social and computer security. Despite the prominence of fingerprint scanning in this domain, face recognition is expected to easily outplay with use of effective algorithms. Deep learning algorithms are being employed for fraud detection, hence alleviating the need for passwords. These algorithms work well in controlled situations (i.e. in proper lighting, angle etc.).However, in certain uncontrolled situations, it encounters a number of challenging issues, which subsequently cause its accuracy to decline sharply. For example, face recognition may contain variations with respect to various parameters such as background, expression, rotation, etc., or due to physical changes like aging, change in physical appearance due to change of hairstyle, etc. A series of important steps build up the process of face recognition, most importantly, feature detection and description. Extraction of significant data from an image influences the recognition rate of the system. A set of extracted features can be characterized as significant only if these features belong to classes which differ by large distances. Few of the popular feature detectors include those making use of algorithms like Harris-Stephen’s Algorithm [10][11], Minimum Eigenvalue Algorithm[2], FAST[7] and SURF[3]. Feature descriptors are algorithms describing the rudimentary features of an image - some of them being shape, texture, color, motion. A few of the prominent descriptors include LGHD[5], PCEHD[4], EHD[4][13] and SURF[3]. The degree of efficiency of a Face Recognition system is seen to hugely depend on the choice of its feature descriptor-detector combination. The process of feature extraction can fetch us a lot of features. However, not all the features detected are significant. Some of them might just be redundant and hence consume quite a bit useless computation. Dimensionality reduction is hence the key to accelerate the process of Face Recognition. It concerns the removal of redundant key-points. We, in our experiment, went on to adopt Hypercomplex Fourier Transforms(HFT)[6] for the process of dimensionality reduction. Removal of noise (outliers) is done using RANSAC (Random Sample Consensus) [8]. It has exhibited considerable capability of identifying and disregarding/ignoring mismatched point pairs (outlier removal). 2. Adopted Approach: To begin with, a descriptor-detector combination is selected. A pair of images is then selected at random from the dataset. Dimensionality reduction is carried out using HFT followed by feature detection which takes place using the selected descriptor-detector combination. The process of noise reduction is done using RANSAC. The confusion matrix is constructed and the corresponding values for accuracy, recall and precision is computed for a number of sample cases. At the end of the process, we have with us, the values (accuracy, recall, precision) for the nine combinations we decided to test. The descriptor-detector combination boasting the highest value for accuracy accompanied by appreciable values for recall and precision is titled as the bestdescriptor-detector combination for facial recognition.

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A / Procedia Computer Science 00 (2018) 000–000

910

3

This section highlights the sequence of steps taken:

Fig 1: Sequence of steps taken

2.1. Feature Detectors and Feature Descriptors: Harris and Stephens corner detector[10][11] is an updated version of Moravec's corner detector. In Morovec’s detector, the image is tested pixel by pixel for a corner, and successively inspected for resemblance of a patch surrounding the pixel in focus, to itsnearby overlapping patches by calculating sum of squared errors. However, instead of using shifted patches, Harris-Stephens corner detectordirectly computes the differential of the corner score with respect to direction. When it comes across a flat region then there won’t be change in any direction. An edge won’t show any changes along edge directions. However, a corner will show significant changes along all directions.Hence, a pixel is a corner if the absolute gradient along both the directions are large. Let I be an image. Find out the x and y derivatives of the image Eq(1), followed by the products of each derivative Eq(2).

Ix 

I I I I Iy  IxI y  y xy x

I x 2  I x * I x I y 2  I y * I y I xy  I x * I y

(1)

(2)

Compute the covariance matrix, C

 I2 C   w( x, y )  x I I x, y  x y

IxI y   I y2 

(3)

4

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A/ Procedia Computer Science 00 (2018) 000–000

911

This is calculated by a window function: W(x, y)

 S (u, v)



xy

W ( x, y)[ I ( x  u, y  v)  I ( x, y)]

(4)

Eq(4) can be simplified to:

u  S (u, v)  u v  C   v 

(5)

from Eq(5) calculate C which is the second moment matrix. Calculate Eigen value  : if  is tends to 0, then it cannot beconsidered as a corner. Hence, we need a large value for  . Response function, R is later computed:

R  det(C )  k trace(C )2  12  k (1  2 )2

(6)

If this value is greater than the characteristic threshold value, it is characterized as a corner: Table 1. Conclusions based on R value for pixel(x,y) R

Conclusion

|R| is small

no features of interest

<0

edge

>0

corner

Minimum Eigenvalue Algorithm[2]was developed by Jianbo Shi and Carlo Tomasi and is hence also called Shi-Tomasi corner point detector. This method of feature detection stands out from the rest because it makes use of feature dissimilarity to monitor the quality of features of images.RMS residue of the first and the current frame quantifies the change in the appearance of a feature. It emphasizes on the fact thata feature is characterized to be a good one if it makes the tracker works the best. Now, Harris features’ scoring function as given in Eq(6) is:

R  12  k (1  2 )2 Instead of the above equation, min-eigen corner detector directly computes R by taking minimum of the two eigen values since,for tracking process, the corners are more stable, taking certain assumption into considerations.

R  min(1 , 2 )

(7)

Speeded Up Robust Features (SURF)[3] is a prominent detector and descriptor which is scale and rotation independent, which brings to us, a good tradeoff between robustness to recurrent deformations and feature intricacy. To elevate its accuracyHessian matrix is used by the detector. Basically, ‘blob-like’ structures are detected in various regions based on the value in the Hessian matrix, i.e. where the value of the determinantis maximum.Let x  ( x, y) be a point in an image I , then the Hessian matrix in x at scale  is,

Vinay A / Procedia Computer Science 00 (2018) 000–000

5

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917

912

 Lxx ( x,  ) Lxy ( x,  )  H ( x,  )     Lxy ( x,  ) Lyy ( x,  )  where Lxx ( x,  ) is the convolution of the Gaussian second order derivative x, and similarly for

Lxy ( x,  ) and Lyy ( x,  ) .

(8)

2 g ( ) with the image I in point x 2

What stands out about this detector is its speed as its seen to offer nearly real-time computation with no dedicated optimizations or any compromise on the performance. Edge Histogram Descriptor (EHD)[4][13] is a prominent method for shape detection. The detector draws,from the edge map, anNxN pixels region surrounding the point of interest. This region is divided into 4x4 (16) subregions and local edge histograms are computed for each sub-region. Following this is the generation of edge distribution histogram representing the relative frequency of occurrence of 5 types of edges (horizontal, vertical, 45°diagonal, 135°diagonal and isotropic) in each local area. The resulting histogram vector is then normalized and contains 80 bins (4x4x5) that represents the spatial distribution of the edges in that region. Log-Gabor Histogram Descriptor (LGHD)[5] is a robust descriptor popularly known for handling image pairs, coming from a variety of modalities and spectra, with intensity that doesn’t always vary linearly.It describes local patches in a way analogous to EHD, the difference being that it uses multi-oriented and multi-scale Log-Gabor filters instead of multi-oriented Sobel descriptor. It finds application in matching even in the presence of non-linear intensity variations, take for example, multi-spectral and multi-modal images. Phase Congruency and Edge Histogram Descriptor (PCEHD)[4] is a descriptor that amalgamates frequency domain analysis called “the phase congruency” and the spatial distribution of the contours in a neighboring window of the extracted points. A set of stable features are extracted using Log-Gabor filters at different alignments and frequencies. Feature descriptors are later computed from the coefficients of the Log-Gabor filters and histograms of edge orientations. Feature matching is done based on cosine similarity function. 2.2. RANSAC We compare the feature descriptor-detector combinations after applying RANSAC on each of them.RANSAC[8] is an iterative line-fitting technique performed on an observed data set (that contains outliers) to estimate parameters of a mathematical model. It is a probabilistic algorithm, that is, the result obtained is reasonable with only a certain probability. In this technique, we begin by setting a threshold followed by the removal of points that fall beyond our set threshold(outliers). The key idea behind this technique is not that there are more number of inliers than outliers, but that the outliers are different from a chosen set. RANSAC isseen to work remarkablywell even when more than half of the given data points are outliers. The more number of iterations the higher the probability. Considera distribution, X for a given mathematical model. In order to obtain the required probability value , an appropriate threshold(t) is selected. The threshold can be calculated by assuming that it follows a Gaussian distribution with mean 0 and variance . We can now compute the square of the distance of each point with the assumed model (d2 ), this follows the Chi-square ( 2 ) distribution with m degrees of freedom. 2

 Fm (k )

k2

 0

2 m

( )d  k 2

(9)

6

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A/ Procedia Computer Science 00 (2018) 000–000

t 2  Fm1 ( )

(10)

The value of threshold(t) is determined by set value of 2

913

.

2

Inliers: d  t 2 2 Outliers: d  t

1  p  (1  u m )t

I

(11)

log(1  p) log(1  (1  v) m )

(12)

I and t will determine how effective the algorithm is on the data set. Further description of RANSAC is provided in [8]. 2.3. Hypercomplex Fourier Transform:

(a)

(b)

Fig 2: (a) Feature matching with HFT enabled. (b) Feature matching without HFT.

HFT[6][12] is a saliency detection technique used for description of spectral content of color images.Hypercomplex numbers, precisely quaternions[12], are used to define a Fourier transform.A quaternion essentially comprises of a real and an imaginary (vector) part, the imaginary part generally represented in terms of three components. It is hence also referred to as vector quantity.Quaternion Fourier transform, which represents the color pixels as geometrical vector quantities, aces well in describing the spectral content of color images. It opens up, at various levels, the formulation of various ways of image filtering in spectral domain. Given a hypercomplex matrix consisting of quaternions as shown in Eq (13), the discrete version of HFT[6][12] can be computed (Eq (14)), where μ is a unit pure quaternion and μ(square) = -1.  represents the direction of the vector represented by the imaginary part.

f (n, m)  a  bi  cj  dk ,

(13)   mv   nu  

1 M 1 N 1   2   M  N   F H [u, v]  f (n, m)  e MNm 0n 0

(14)

914

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A / Procedia Computer Science 00 (2018) 000–000

7

3. Experimental Results: 3.1. Faces 94 Dataset:

Fig 3: Sample set from the dataset - Faces 94

The Faces94 dataset consists of images of 180x200 pixels resolution in a portrait format.Images were taken under varied settings like different expressions, minor changes in head turn, slant and tilt. To ensure uniformity,the subjects were asked to maintain a frontal posture, and all pictures were taken against a green background. Since the images were taken in a single session there is no individual hairstyle variation. 3.2. Grimace Dataset:

Fig 4: Sample set from the dataset – Grimace

Grimace dataset consists of pictures of 18 people each having a sequence of 20 images (180x200 pixels) (male and female). It shows considerable variation in attributes like - head turn, tilt and slant, and very little image lighting variations. Major changes in the hairstyle cannot be observed since it was taken in a single session. It is also seen to show appreciable variation in expressions. 3.3. Result: The formulae used to find accuracy, precision and recall are as follows:

8

Vinay A et al. / Computer Procedia Computer (2018) 908–917 Vinay A/ Procedia Science 00Science (2018) 132 000–000

accuracy 

915

tp  tn tp tp precision  recall  tp  tn  fp  fn tp  fp tp  fn

tp - True Positive fp - False Positive

(15)

tn - True Negative fn - False Negative

Table 2.Percentage accuracy values for detector-descriptor combinations for FACES-94 dataset LGHD

PCEHD

EHD

Harris-Stephens Features

66.67

66.67

66.67

SURF

70

70.66

72.66

Min-Eigen Features

90.67

80

81.33

Table 3. Percentage accuracy values for detector-descriptor combinations for GRIMACE dataset LGHD

PCEHD

EHD

Harris-Stephens Features

62.67

64.66

64.66

SURF

63.33

62.66

63.33

Min-Eigen Features

71.3

65.33

70.66

112.5

90

67.5

67.5

Accuracy(%)

Accuracy(%)

90

45 22.5

0

LGHD

PCEHD

EHD

Feature Descriptors Min-Eigen Features SURF Harris-Stephens Features

(a)

45

22.5

0

LGHD

PCEHD

EHD

Feature Descriptors Min-Eigen Features SURF Harris-Stephens Features

(b)

Fig 5: Bar Graph Illustrating Accuracy for Dataset – a) Faces 94 b) Grimace

Given above is the result of our experiment. In the tables, each cell represents the accuracy for the respective combination. Test for accuracy was carried out a number of times, each time with a different set of images from the dataset. The values for accuracy for each of the nine descriptor-detector combinations were noted. Finally, the mean of the values was taken. Table 4 lists out the precision, recall and accuracy values for the best feature descriptordetector combination (LGHD-Minimum Eigen). Fig 5 gives a visual representation of the accuracy for all 9 descriptor-detector combinations, where, the x-axis represents the descriptor and the y-axis represents accuracy. What needs to be noted is the fact that throughout the trials, Min Eigen - LGHD combinations stands out the most, by giving us an appreciably high value for accuracy each time, making it quite clear that the above-mentioned combination is the best.This assertion can be confirmed by the calculations pertaining to the accuracy values given in Table 2 (for Faces 94) which is pictorially represented in Fig 5(a). Min-Eigen with LGHD shows an overall increase of 13.3% and 11.5% over Min-Eigen with PCEHD and EHD respectively; 29.52%, 28.32% and 24.8% increase over SURF paired with LGHD, PCEHD and EHD respectively; and a 36% increase over Harris Features paired with each of LGHD, PCEHD and EHD.

Vinay A / Procedia Computer Science 00 (2018) 000–000 Vinay A et al. / Procedia Computer Science 132 (2018) 908–917

916

9

From Table4, we see that recall value for Faces 94 is an acceptable 67.5. However, grimace gives a low value for recall. Attributing to the fact that grimace is a tougher dataset to handle with high variations in expressions, we can say that despite its high accuracy, this combination as it is cannot solely recognize same faces with immense differences in expression. Table 4.Precision, Recall and Accuracy values for LGHD - Minimum Eigen combination for the used datasets Faces 94

Grimace

Precision

97.22

86.67

Recall

67.5

21.36

Accuracy

90.67

71.3

(a)

(b)

Fig 6: Feature matching with a pair of images of (a) different people (b) same person (before applying HFT)

(a)

(c)

(b)

(d)

Fig 7: Matching features for (a) True Negative (b) True Positive (c) False Negative (d) False Positive

4. Conclusion In this paper, we determined which feature descriptor-detector combination works the best for face recognition. We brought forward nine combinations making use of feature detectors - Harris Stephens Corner point detector, ShiTomasi Corner point detector (Min Eigenvalue corner point detector) and SURF; and feature descriptors - LGHD,

10

Vinay A et al. / Procedia Computer Science 132 (2018) 908–917 Vinay A/ Procedia Computer Science 00 (2018) 000–000

917

EHD and PCEHD. These combinations were put to test on the images from the publicly available datasets - Faces 94 and Grimace. The degree of efficiency can be approximated by computing the parameters - accuracy, precision and recall for the combinations. Each of these parameters is used to compare the nine combinations. After scrutinized and extensive testing, we find that Min-Eigen, when compared to the other competitors, stands unparalleled when paired with all three descriptors. On further observation, we can observe and hence conclude that Minimum Eigen Features paired with LGHD stands to be unrivaled amongst all feature descriptor-detector combinations. This combination is seen to have the highest value for accuracy, along with acceptable values for precision and recall before and after applying HFT for dimensionality reduction, making it the most suitable combination out of the selected nine for face recognition. References [1] C. Harris and M. Stephens (1988). "A combined corner and edge detector"Proceedings of the 4thAlvey Vision Conference. pp. 147–151. [2] J. Shi and C. Tomasi (June 1994). "Good Features to Track,". 9th IEEE Conference on Computer Vision and Pattern Recognition. Springer. [3] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. "Speeded Up Robust Features".ETH Zurich, Katholieke Universiteit Leuven [4]T. Mouats and N. Aouf, 2013. “Multimodal stereo correspondence based on phase congruency and edge histogram descriptor”. in International Conference on Information Fusion, pp. 1981-1987. [5] C.A. Aguilera, A. D. Sappa, and R. Toledo, 2015. “LGHD: A feature descriptor for matching across non-linear intensity variations, in Image Processing (ICIP)”.2015 IEEE International Conference, pp. 178-181. [6] Jian Li, Martin D. Levine, Xiangjing An and Hangen He. “Visual Saliency Based on Scale-Space Analysis in the Frequency Domain”. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.4 (2013): 996-1010. [7] Rosten, E., and T. Drummond. "Fusing Points and Lines for High Performance Tracking". Proceedings of the IEEE International Conference on Computer Vision, Vol. 2 (October 2005): pp. 1508–1511. [8] Fischler, Martin A., and Robert C. Bolles. "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography." Communications of the ACM 24, no. 6 (1981): 381-395. [9] Andrew W. Senior and Ruud M. Bolle. “Face recognition and its applications”.Chapter 4 in Biometric Solutions for Authentication in an EWorld, D.Zhang ed. Kluwer Academic Publishers July 2002 ISBN 1402071426. [10]Nilanjan DeyPradipti Nandi, Nilanjana Barman, Debolina Das, Subhabrata Chakraborty.“Comparative Study between Moravec and Harris Corner Detection of Noisy Images Using Adaptive Wavelet Thresholding Technique”.International Journal of Engineering Research and Applications (IJERA)ISSN: 2248-9622www.ijera.comVol. 2, Issue 1, Jan- Feb 2012, pp. 599-606. [11] Parvathy Ram, Dr. S.Padmavathi. “Analysis of Harris Corner Detection for Color Images”. International conference on Signal Processing, Communication, Power and Embedded System (SCOPES)-2016. [12] Todd A. Ell,Stephen J. Sangwine. “Hypercomplex Fourier Transforms of Color Images”. IEEE Transactions on Image Processing, Vol. 16, No. 1, January 2007. [13]Neetesh Prajapati,Amit Kumar Nandanwar,G.S. Prajapati. ”Edge Histogram Descriptor, Geometric Moment and Sobel Edge Detector Combined Features Based Object Recognition and Retrieval System”.(IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (1) , 2016, 407-412.