3D human face description: landmarks measures and geometrical features

3D human face description: landmarks measures and geometrical features

Image and Vision Computing 30 (2012) 698–712 Contents lists available at SciVerse ScienceDirect Image and Vision Computing journal homepage: www.els...

636KB Sizes 1 Downloads 136 Views

Image and Vision Computing 30 (2012) 698–712

Contents lists available at SciVerse ScienceDirect

Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis

3D human face description: landmarks measures and geometrical features☆ Enrico Vezzetti ⁎, Federica Marcolin Dipartimento di Ingegneria Gestione e della Produzione, Politecnico di Torino, Italy

a r t i c l e

i n f o

Article history: Received 12 April 2011 Received in revised form 7 December 2011 Accepted 2 February 2012 Keywords: 3D models 3D scanners Face morphology Soft tissue landmarks

a b s t r a c t Distance measures and geometrical features are widely used to describe faces. Generally, they are extracted punctually from landmarks, namely anthropometric reference points. The aims are various, such as face recognition, facial expression recognition, face detection, study of changes in facial morphology due to growth, or dysmorphologies. Most of the time, landmarks were extracted with the help of an algorithm or manually located on the faces. Then, measures are computed or geometrical features are extracted to perform the scope of the study. This paper is intended as a survey collecting and explaining all these features, in order to provide a structured user database of the potential parameters and their characteristics. Firstly, facial soft-tissue landmarks are defined and contextualized; then the various measures are introduced and some results are given; lastly, the most important measures are compared to identify the best one for face recognition applications. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Face study has been carried out in these decades for many applications: maxillofacial surgery, delict investigation, authentication, historical research, telecommunications or even games. Face recognition is surely the largest branch of this diversified field, embracing subfields such as citizens identification, recognition of suspects, corporate usages in access control and on line banking. Since a new trend emerged to measure and evaluate 3D facial models, for the past decades three dimensional facial data were obtained mostly by direct anthropometric measurements. Anatomical landmarks have been used for over a century by anthropometrists interested in qualifying cranial variations. A great body of work in craniofacial anthropometry is that of Leslie Farkas [35,37] who created a database of anthropometric norms by measuring and comparing more than 100 dimensions (linear, angular and surface contours) and proportions in hundreds of people over a period of many years. These measurements include 47 landmark points to describe the face [14]. Nowadays the information in which researchers are interested are more complete and dynamic. The interest is to use facial landmarks as fiducial points of the subjects and extract geometrical features from them, in order to keep information of how the examined face is. Their uses are various and may depend on the research area. The attention to facial landmarks is due to the fact that they are points which all faces join and that have a particular biological meaning. Hard-tissue landmarks lie on the skeletal and may be identified only through lateral cephalometric radiographs; soft-tissue landmarks are on the skin and can be identified on the 3D point clouds ☆ This paper has been recommended for acceptance by Stefanos Zafeiriou. ⁎ Corresponding author. E-mail address: [email protected] (E. Vezzetti). 0262-8856/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2012.02.007

generated by the scanning or on images. This study only deals with soft-tissue landmarks; the most famous ones are shown in Fig. 1. Actually, the set of facial landmarks is much larger than this. In fact, there are approximately 60 indentifiable soft-tissue points on human face, but they may change depending on the application they are used for. Previous work on landmark extraction is reported in Section 2. The applications of landmarking are exposed and contextualized. Section 3 presents the measures and features that may connect and involve facial landmarks. After the definition of the measures and an analysis on previous work, a cross-tab table for every measure containing applications, methods, data dimension, and results obtained by other authors is presented. Where no particular comment is reported, in these tables rates correspond to recognition rates (RR). In Section 4 results are commented and explained. A comparison between the most important measures, namely Euclidean and geodesic distances, is then reported from literature. Section 5 concludes the work. 2. Previous work One of the most important applications that deals with facial landmarks is face recognition, whose large applications are: citizenship identification in borders, passports, I.D. documents, Visas; criminal identification in database screening, surveillance, alert, mob control and anti-terrorism; corporate usages in access control and time attendance in luxurious buildings, sensitive offices, airports, pharmaceutical factories; utility laptop, desktop, web, airport/sensitive console log-on and file encrypt; on line banking; gaming in casinos and watch-lists; hospitality industries such as hotel and resort CRM's; important sites like power plants and military installations. The purposes are various, but belong to two big branches: face verification,

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

Fig. 1. Anthropometric soft-tissue landmarks: g-glabella, n-nasion, en-endocanthion, exexocanthion, or-orbital, prn-pronasal, sn-subnasal, al-alae, ch-cheilion, pg-pogonion, gngnathion, go-gonion, me-menton [13].

or authentication, to guarantee secure access, and face identification, or recognition of suspects, dangerous individuals and public enemies by Police, FBI and other safety organizations [51]. 2.1. Face recognition Much research is carried out on landmarking for face recognition. In his various publications, Rohr et al. addressed the problem of choosing an optimal neighborhood for a region-of-interest (ROI) around face landmarks, proposing multi-step differential processes for subvoxel localization of three-dimensional landmarks [40,41]. They used deformable models for introducing a 3D landmarking approach. To model the surface in proximity of a landmark, they used an approach based on quadric surfaces combined with global deformations [2,42]. Then they fitted 3D parametric intensity models to 3D images, proposing an analytic intensity model based on the Gaussian error function applied to 3D rigid transformations and deformations to efficiently model anatomical surfaces [108]. Finally, they introduced a new multi-step approach to improve threedimensional anatomical landmark detection in tomographic images [43]. Romero et al. compared several approaches based on graph matching and cascade filtering for localizing landmarks in 3D faces. In the first method, they applied the structural graph matching algorithm relaxation-by-elimination using a distance-to-local-plane node property and a Euclidean-distance arc property. After unlikely candidates have been eliminated with the graph matching process, the most likely triplet is selected, through the calculum of the minimum Mahalanobis distance over a six-dimensional space given by to three node variables and three arc variables. The second method localizes the nose tip by using state-of-the-art pose-invariant feature descriptors embedded into a cascade filter. After that, the inner eye corners are localized by applying a local graph matching [82,83]. Then they introduced and evaluated their pose-invariant point-pair descriptors, that they used to encode three-dimensional shape between pairs of 3D points. Two variants of descriptor are described: the point-pair Spin Image, related to the classical Spin Image, and one derived from an implicit radial basis function (RBF) model of the face surface. These descriptors encode edges in graph-based

699

representations of 3D shapes. The authors showed how the descriptors are used to identify the nose-tip and the eye-corner of a human face simultaneously in six systems for landmark localization [82,83]. Ruiz et al. [84] presented an algorithm for automatic localization of landmarks on 3D faces based on an Active Shape Model (ASM). The ASM is used as a statistical joint location model for configurations of facial features and is adapted to individual faces via a guided search whereby landmark specific Shape Index models are matched to local surface patches. Similarly, Sang-Jun et al. [88] extracted the position of the eyes, the nose and the mouth applying the Active Shape Models. Salah et al. dealt with landmark localization proposing a coarse-to-fine method. Different Gabor filter channels are employed to model landmark features for the localization. [85,86]. Since 3D information can improve landmarking by either removing the illumination effects from the 2D image or by supplying robust features depthand curvature-based, they investigated both sides of this topic. They showed that using directly three-dimensional features is more promising than illumination correction with the help of 3D [85,86]. They also inspected shortcomings of existing known-in-literature landmark detection approaches, for both 2D and 3D face recognition, and compared several methods for performing automatic landmark localization on ‘almost frontal positioned’ faces in different scales. They employed two new methods for analyzing facial features in coarse and fine scales successively. A mixture of factor analyzers is used in the first method to learn Gabor filter outputs on a coarse scale. The second method is a template matching of Discrete Cosine Transform (DCT) features [87]. Later, they assessed a fully automatic 3D facial landmark localization algorithm based on an accurate statistical modelling of the facial traits [23]. Finally, several 3D face recognizers were evaluated on the Bosphorus database, that was gathered for studies on facial expression. The authors provided identification results of three 3D face recognition algorithms, i.e., generic face template-based ICP approach, one-to-all ICP approach, and depth image-based Principal Component Analysis (PCA). In addition, they incorporated bi-dimensional texture classifiers in a fusion setting [3]. D'Hose et al. [22] presented a three-dimensional landmarking method based on Gabor wavelets. The curvature of the 3D faces was extracted and then used to perform a coarse landmark detection. Takács et al. [98] described a new dynamic and multi-resolution scheme to localize candidate landmark neighborhoods for facial feature recognition purposes. The suggested low-level data-driven attention model employs a nonlinear sampling lattice of oriented Gaussian filters and employs small oscillatory movements to extract local image characteristics. Beumer et al. [6,7] studied the effects of the number of landmarks on the mean localization error and the recognition performance. Two landmark localization methods are explored and compared for that purpose: the Most Likely-Landmark Locator (MLLL), based on maximizing the likelihood ratio, and Viola-Jones detection. Both use the locations of facial features (eyes, nose, mouth) as landmarks. Further, they introduced a landmarkcorrection method (BILBO) that relies on the projection into a subspace. 2.2. Face detection A connected but quite different field is face detection, which consists in identifying one or more faces in an image, where many other objects can be present. Most of the literature concerning face detection investigates face detection in two-dimensional (2D) images. Colombo et al. [18] proposed a 3D face detection method based on a fusion of a feature-based approach with a holistic one. Face are identified with three candidate facial points, namely nose tip and eyes, which are detected through the study of surface curvature. Each triplet is processed by a PCA-based classifier trained to discriminate between faces and non-faces. Nair et al. [75] presented a framework based on the fitting of a face model for detecting faces, localizing

700

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

landmarks and achieving fine registration of face meshes. Face detection is performed by classifying the transformations between model points and candidate vertices based on the upper-bound of the deviation of the parameters from the mean model. Landmark localization is performed on the segmented face by finding the transformation that minimizes the deviation of the model from the mean shape. Jesorsky et al. [52] presented an approach to face detection that is robust to changes in illumination and background. It relies on edgebased shape comparison and works on greyscale images. Takács et al. [97] described a general approach for the detection of faces and landmarks based on biologically-based image representation. Employing an enhanced Self-Organizing Feature Maps (SOFM) approach, the optimal set of face, nose, eyes, and mouth model, respectively, is found by using cross-validation and corrective training. Yow et al. [111] used active contour models to detect the face boundary and verify facial candidate points for improving a feature-based face detection approach. Rodrigues et al. [81] proved that multi-scale keypoint representation, i.e. retinotopic keypoint maps turned to different spatial frequencies, could be employed for object detection. In particular, they showed that combinations of hierarchicallystructured saliency maps over scales in conjunction with spatial symmetries could lead to face detection through grouping operators that deal with fiducial points at the eyes, nose and mouth. 2.3. Facial expression recognition A similar application is facial expression recognition, a branch of recognition which deals with identifying different facial expressions. “Sometimes the facial expression analysis has been confused with emotion analysis in the computer vision domain. For emotion analysis, higher level knowledge is required. For example, although facial expressions can convey emotion, they can also express intention, cognitive processes, physical effort, or other intra- or interpersonal meanings. Interpretation is aided by context, body gesture, voice, individual differences, and cultural factors as well as by facial configuration and timing. Computer facial expression analysis systems need to analyze the facial actions regardless of context, culture, gender, and so on” [101, page 247]. Some researchers worked on this topic. In their various papers, Tang et al. [99,100] used the properties of the segments connecting certain facial points to perform person and gender facial expression recognition. An automatic feature selection method, based on maximizing the average relative entropy of marginalized class-conditional feature distributions, was proposed. The method was applied to a complete pool of candidate features composed of normalized Euclidean distances between 83 facial feature points in the 3D space. Soyel et al. [93–95] described a pose invariant three-dimensional facial expression recognition method using distance vectors retrieved from 3D distributions of facial feature points to classify universal facial expressions. Their works are based on the theories of Paul Ekman, a psychologist who has been a pioneer in the study of emotions and their relation to facial expressions. His theory is that the expressions associated with some emotions were basic or biologically universal to all humans. He devised a list of 6 basic emotions from cross-cultural research: anger, disgust, fear, happiness, sadness and surprise [30,31]. For his precious and unique work, Ekman has been considered one of the 100 most eminent psychologists of the twentieth century. Nowadays, many authors involved in studies of facial expressions used his theory to concentrate their researches on expressions referred to emotions considered basic. Cohen and Sebe also dealt with facial expression recognition. Cohen et al. [15–17] built a system for classification of facial expressions from video input using three different approaches. The first is based on Bayesian network classifiers, with a focus on changes in distribution assumptions and feature dependency structures; the second relies on temporal cues; the third is a new architecture of hidden Markov models (HMMs), which performs both segmentation and

recognition. Then, Sebe et al. [89] created a facial expression database where the subjects show the natural facial expressions based upon their emotional state. Then, they evaluated the several promising machine learning algorithms for emotion detection which include techniques such as Bayesian networks, Support Vector Machines (SVMs), and decision trees. In their works, the face tracker uses a model-based approach where an explicit 3D wire-frame model of the face is constructed. In the first frame of the image sequence, landmark facial features such as the eye corners and mouth corners are selected interactively. Wang et al. [105] extracted facial expressions by using an expression classifier that is learned from boosting Haar feature based Look-Up-Table type weak classifiers. The system consists of three modules: face detection, facial feature landmark extraction, and facial expression recognition. It can automatically recognize seven expressions in real time that include the Ekman's six and the neutral one. So they were able to integrate the expression recognition module with an automatic face detection and a landmarking system. Hong et al. [49] used personalized galleries to elaborate an on-line facial expression recognition system. They built this system on the framework of the PersonSpotter system, for tracking and detecting faces in a live video sequence. Firstly, the person concerned is identified by utilizing the recognition method of Elastic Graph Matching. Then, the “personalized” gallery of this person, namely the collection of images of the same person with different expressions, is used to recognize the facial expression. Node weighting and weighted voting, in addition to Elastic Graph Matching, are applied to help this identification. Furthermore, the same system is employed to find landmarks. Zheng et al. [114] addressed the problem of facial expression recognition applying kernel canonical correlation analysis (KCCA). 34 landmark points were hand-located on facial images and then converted into a labelled graph (LG) vector using the Gabor wavelet transformation-based method to represent the face features. Michel et al. [68] dealt with facial expression in live video presenting a real-time approach to emotion recognition. An automatic facial feature tracker was applied to perform face detection and the extraction of features. The displacements of facial features in the video sequences are employed as input to a Support Vector Machine (SVM) classifier. The authors evaluated their method in terms of recognition accuracy for a variety of interaction and classification scenarios. Wang et al. [102] addressed the issue of facial expression representation and recognition from a 3D shape point of view. They proposed an approach for extracting primitive 3D facial expression features and then classified the prototypic facial expressions applying the feature distribution. Abbound et al. [1] used an active appearance model (computed with 3 or only one PCA) for automatic facial expression recognition and synthesis, considering the six universal emotional categories of Ekman. Then they proposed a new method that allows, from a single photo, to cancel the facial expression on a face and to artificially add a new expression on the same face. In this last framework, two facial expression modelling approaches were proposed. They manually labelled each training face with 53 landmarks, in order to get a “ground truth” training set of shape vectors. This issue may be also considered as belonging to face correction field. Wang et al. [103] presented a topographic modelling approach to recognize and analyze facial expression from single static images. They developed this topographic modelling based on the Topographic Context (TC), a new facial expression descriptor, for representing and recognizing facial expressions. This topographic analysis treats the image as a 3D surface and labels each pixel by its terrain features. A face model with 64 control points was defined to locate the expressive regions. The Active Appearance Model (AAM) was used to find facial features and shapes. Mpiperis et al. explored bilinear models for jointly addressing three-dimensional face and facial expression recognition. They proposed an elastically deformable model algorithm that establishes correspondence among faces set and then constructed bilinear models that decouple the identity and facial expression factors.

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

Firstly, the model is trained using hand-located landmarks, then it is automatically fitted to any new surface using directed distances from and to the surface. The bilinear model allows decoupling the impact of identity and expression on face appearance and encoding their contribution in separate control parameters. Faces are represented as parametric surface models described by a fixed length parameter vector. In order to find implicitly corresponding facial landmarks between the model and the given face, the authors fitted a generic face model to each face with a new technique based on geodesic distances, so that it enables to perform face recognition invariant to facial expressions and facial expression recognition with unknown identity [71–73]. They also presented a new approach for 3D facial expression recognition that relies on the advances of ant colony and particle swarm optimization (ACO and PSO respectively) in the data mining framework. A generic 3D face model, which is deformed elastically to match the facial surfaces, established anatomical correspondence between faces. An ACO/PSO-based rule algorithm was employed to discover a set of rules for the faces classification. The authors associated each facial scan with a set of feature points on the eyes, the eyebrows, the nose, the mouth, and the face boundary, which have been manually localized and are used as landmarks during the establishment of point correspondence [74]. Gizatdinova et al. [44] implemented a landmarking method for facial images that relies on extracting oriented edges and constructing edge maps at two resolution levels. Landmark candidates consist in edge regions with characteristic edge pattern. The method should ensure invariance to expressions in eyes detection, while nose and mouth localization was deteriorated by happiness and disgust. 2.4. Study on face morphology Another field in which facial landmarks are applied is the study of facial morphology. The purposes are various, such as the analysis of facial abnormalities, dysmorphologies, growth changes, aesthetic or purely theoretical. The discipline that deals with this kind of studies is Anthropometry, which is directly connected to maxillofacial surgery, namely aesthetic, plastic and corrective. Facial landmarks do not only appear in the applications of this discipline, but even belong to it. In fact, they were defined by surgeons in order to have a common name for every specific part of the face. A pioneer in Anthropometry is surely Leslie G. Farkas, who used anatomical landmarks to provide an essential update on the best methods for the measurement of the surfaces of the head and neck [35]. He gathered a set of anthropometric measurements of the face in different ethnic groups [39]. Then examined the effects on faces of some syndromes, such as Treacher Collins's [58,59], Apert's [58,59], cleft lips, nasal deformity [57] and children's cleft palate [33]. He studied the changes of the head and face during the growth [34] and also researched on facial beauty and neoclassical canons in face proportions [21,36,38,60]. 2.5. Face correction There are two quite different applications which face landmarks are used for. The first one is face correction. It consists in detecting and correcting imperfections in group photos, such as close eyes, inappropriate, unflattering and goofy faces. Dufresne [27] designed a method for diagnosing and correcting these issues. Face and facial landmarks are detected by an implementation of Bayesian Tangent Shape Model search, then an SVM classifier is trained to identify unflattering faces. Bad faces are then warped to match nearest neighbour faces from the good face set. 2.6. Performance evaluation of technical equipments The other application is the performance evaluation of technical equipments. If the examined equipment is able to identify facial

701

landmark correctly, his performance is considered effective. Enciso et al. [32] investigated on methods for generating 3D facial images such as laser scans, stereophotogrammetry, infrared imaging and CT and focused on validation of indirect three-dimensional landmark location and measurement of facial soft-tissue with light-base techniques. They also evaluated precision, repeatability and validation of a light-based imaging system. Aung et al. [4] analysed the development of laser scanning techniques enabling the capture of 3D images, especially for surface measurements of the face. They used a laser optical surface scanner to took 83 facial anthropometric measurements, using 41 identifiable landmarks on the scanned image. Then they demonstrated that the laser scanner can be a useful tool for rapid facial measurements in selected anatomical parts of the face. In fact, accurate location of landmarks and operator skill are important factors to achieve reliable results. Once landmarks are extracted from faces, manually or automatically, they become useful if it is possible to extrapolate the precious information their particular position give them. Gupta et al. [47,48] indeed investigated the effect of the choice of facial fiducial points on the performance of their proposed 3D face recognition algorithm. They repeated the same steps with distances between arbitrary face points, instead of the anthropometric fiducial points. These points were located in the form of a 5 × 5 rectangular grid positioned over the primary facial features of each face. They chose these particular facial points as they measure distances between the significant facial landmarks, including the eyes, nose and the mouth regions, without requiring localization of specific fiducial points. They showed that in their algorithms, when anthropometric distances are replaced by distances between arbitrary regularly spaced facial points, their performances decrease substantially. As a matter of fact, landmarks have both a geometrical and biological meaning on the human face and for this reason the extraction of measures and features from their links become necessary for providing a complete face description. Next section faces this task. 3. Features types: classification Facial landmarks lie in zones of the faces which have peculiar geometric and anthropometric features. These features were extracted from the faces by various authors in many different ways, depending on the usages they were assigned to. The scope is to extract accurate geometric information of the examined face and allow the comparison with other faces from which the same corresponding information were previously extracted. For face recognition applications, the computation of the Euclidean or geodesic distances between landmarks is a method widely used. They are considered measures, rather than real features. Particularly, in anthropometry applications, these measures are called morphometric. They generally are distances or angles and their property is that one measure involves more than one landmark. As a matter of fact, both Euclidean and Geodesic distances refer to two points, while angles involve three landmarks. But the information obtained by these reference points may be more geometric in nature, keeping for instance specific data of curvature or shape. 3.1. Euclidean distance The Euclidean distance or Euclidean metric is the distance “as the crow flies” between two points [104]. It is shown on a face in Fig. 2. The Euclidean distance between landmarks is used by most authors as a morphometric measure. Once landmarks are obtained from a facial image or a three-dimensional face, they select some significant landmark pairs between them and compute the corresponding Euclidean distances. Then these distances are used to compare faces for face recognition purposes or to perform studies on face morphometry, as said above. The Euclidean-distance-based morphometric measures are chosen depending on the application.

702

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

(ILDs) between the selected landmarks. These distances are then compressed using Principal Component Analysis (PCA) and projected onto the classification space using Linear Discriminant Analysis (LDA). 3.1.2. Facial expression recognition Soyel et al. [93,94] used six different Euclidean distances between feature points to form a distance vector for facial expression recognition. They are: openness of eyes, height of eyebrows, openness of mouth, width of mouth, stretching of lip and openness of jaw. During the recognition experiments, a distance vector is derived for every 3D model and the whole procedure is repeated numerous times. 3.1.3. Study on face morphology Ras et al. [80] introduced stereophotogrammetry as a threedimensional registration method for quantifying facial morphology and detecting changes in facial morphology during growth and development. They used six sets of automatically extracted 3D landmarks coordinates to calculate the Euclidean distances between exocanthion and chelion, chelion and pronasale, exocanthion and pronasale for both sides of the face. Changes in facial morphology due to growth and development were analysed with an analysis of variance of these distances. Fig. 2. Euclidean distance between pronasal and right exocanthion.

3.1.1. Face recognition There is wide previous work on this topic. Gupta et al. [47,48] presented three-dimensional face recognition algorithms employing Euclidean distances between anthropometric fiducial points as features along with linear discriminant analysis classifiers. Prabhu et al. [79] addressed the problem of automatically locating the facial landmarks of a single person across frames of a video sequence. The mean of the Euclidean distances between the coordinates of each of the 79 landmarks, that were fitted by the tracking method to those that were manually annotated, was computed. Similarly, Zhao et al. [113] formed a vector by 11 Euclidean distances between facial expression sensitive landmarks. Moreno et al. [69] performed an HK segmentation, i.e. based on the signs of the mean (H) and Gaussian (K) curvatures, for isolating regions of pronounced curvature on 420 threedimensional facial meshes. After the segmentation task, the authors performed a feature extraction. Among them, Euclidean distances between some fiducial points were computed. Gordon [45] presented a face recognition system which uses features extracted from range and curvature data for face representation purpose. She extracted high level features which marked salient events on the face surface in terms of points, lines and regions. Since the most basic set of scalar features describing the face correspond measurements of the face, she firstly computed the Euclidean distance of: left eye width, left eye width, eye separation, total width (span) of eyes, nose height, nose, width, nose depth and head width. Likewise, Lee et al. [61] calculated with Euclidean distance relative lengths of extracted facial feature points. Efraty et al. [28,29] studied the silhouette of face profile and introduced a new method for face recognition that improves robustness to rotation. They achieved this by exploring the feature space of profiles under various rotations with the aid of a threedimensional face model. Based on the fiducial points on the profile silhouette, they extracted a set of rotation-, translation- and scaleinvariant features which were used to design and train a hierarchical pose-identity classifier. Euclidean distance was chosen by him as one type of measurements between landmarks. Daniyal et al. [20] represented the face geometry with inter-landmark distances within selected regions of interest to achieve robustness to expression variations. The proposed recognition algorithm first represents the geometry of the face by a set of Euclidean Inter-Landmark Distances

3.1.4. Performance evaluation of technical equipments The last field in which Euclidean distances between landmarks were applied is performance evaluation of technical equipments. Enciso et al. [32] used a digitizer to obtain landmarks and then directly measured Euclidean distances between them. These distances were compared with the indirect homologous distances measured on the scans with computer tools. Aung et al. [4] firstly carried out direct Euclidean-distance-based anthropometric measurements of the face using standard anthropometric landmarks as defined by Farkas. The subject was then laser-scanned with the optical surface scanner and the laser scan measurements were done using selected landmarks identifiable on the scanned image. The same number of corresponding sets of measurements from the direct and indirect methods were then compared, in order to evaluate laser scanner performance. Table 1 shows Euclidean distance performances depending on applications and methods. 3.2. Geodesic distance and arc-length distance A geodesic is the shortest distance between two points on a surface [70]. It is shown on a face in Fig. 3. Some authors used geodesic distance between facial landmarks. First of all, Bronstein et al. [8–12] proposed to model facial expressions as isometries of the facial surface. The facial surface is described as a smooth compact connected two-dimensional Riemannian manifold (surface), denoted by S. The minimal geodesics between s1, s2 ∈ S are curves of minimum length on S connecting s1 and s2. The geodesics are denoted by CS*(s1, s2). The geodesic distances refer to the lengths of the minimum geodesics and are denoted by    dS ðs1 ; s2 Þ ¼ length C S ðs1 ; s2 Þ : A transformation ψ : S → Q is called an isometry if dS ðs1 ; s2 Þ ¼ dQ ðψðs1 Þ; ψðs2 ÞÞ for all s1, s2 ∈ S. “In other words, an isometry preserves the intrinsic metric structure of the surface. The isometric model, assuming facial expressions to be isometries of some neutral facial expression, is based on the intuitive observation that the facial skin stretches only slightly. All expressions of a face are assumed to be “intrinsically” equivalent (i.e. have the same metric structure), and “extrinsically”

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

703

Table 1 Euclidean distance: applications, relative methods, data dimension, and performances obtained by the authors in their previous works.

Measure

Application

Reliability

References

Principal component analysis (PCA)

3D

88,8%

Gupta et al., 2007 [48]

Linear discriminant analysis (LDA)

3D

97,60%

Gupta et al., 2007 [49]

New algorithm (face profile recognition)

2D

90%

Liposcak et al., 1999

PCA + LDA

3D

84,5%-96,5% depending on probe and training sets

Daniyal et al., 2009

Invariant method (new method proposed by the authors)

3D

Accuracy: 87,7%

Iterative closest point (ICP)

3D

Accuracy: 90,6%

Face registration for quantifying facial morphology

Stereophotogrammetry

3D

Reproducibility significance: P<0,05

Ras et al., 1996

Validation of landmark localization and measurement of facial soft-tissue

Light-based techniques

3D

Mean error: 1, 6+/1, 85%; reproducibility significance: 0,05

Enciso et al., 2004

Evaluation of laser scanner measuring performance

Laser optical surface scanning

3D

Reliability of laser scanner measurement: 12, 5%-50% depending on face regions

Aung et al., 1995

Face recognition

Euclidean distance

Face detection + landmark localization

Method

Active shape models (ASM) with Kalman filtering of landmark positions Active shape models (ASM with Kalman filtering of PCA shape coefficients Probabilistic neural network (PNN)

Face expression recognition

Dimension

Cadoni et al., 2009

Fit error mean: 6,73 pixels Prabhu et al.

2D Fit error mean: 6,84 pixels 3D

88% 74,1%

LDA + Gabor wavelets

Soyel et al., 2008

2D LDA + topographic context method

different. Broadly speaking, the intrinsic geometry of the facial surface can be attributed to the subject's identity, while the extrinsic geometry is attributed to the facial expression. The isometric model

Fig. 3. Geodesic distance between pronasal and right exocanthion.

79,2%

tacitly assumes that the expressions preserve the topology of the surface. This assumption is valid for most regions of the face except the mouth. Opening the mouth changes the topology of the surface by virtually creating a hole” [12, page 3]. Based on this model, expression-invariant signatures of the face were constructed by means of approximate isometric embedding into flat spaces. They applied a new method for measuring isometry-invariant similarity between faces by embedding one facial surface into another. Promising face recognition results are obtained in numerical experiments even when the facial surfaces are severely occluded. Gupta et al. [46–48] worked on the same assumption, namely that different facial expressions could be regarded as isometric deformations of the face surface. “These deformations preserve intrinsic properties of the surface, one of which is the geodesic distance between a pair of points on the surface” [46, page 2]. Based on these ideas, they presented a preliminary study aimed at investigating the effectiveness of using geodesic distances between all pairs of 25 fiducial points on the surface as features for face recognition. Instead of choosing a random set of points on the face surface, they considered facial landmarks relevant to measuring anthropometric facial proportions employed widely in facial plastic surgery and art. They calculated geodesics using the Dijkstra's shortest path algorithm by defining 8 connected nearest neighbours about each point. Twenty-five fiducial points were labelled by hand on each face. Three face recognition algorithms were implemented. The first employed 300 geodesic distances between all pairs of fiducial points, computed with the fast marching algorithm for front propagation, as features for recognition. The second algorithm employed 300 Euclidean distances between all the pairs. The normalized L1 norm, namely the ratio between the dimension and its respective variance, was used as the metric for

704

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

Table 2 Geodesic distance: applications (in this case only face recognition), relative methods, data dimension, and performances obtained by the authors in their previous works.

Measure

Geodesic distance

Application

Method

Dimension

Reliability

References

Principal component analysis (PCA)

3D

89,9%

Gupta et al., 2007 [48]

Linear discriminant analysis (LDA)

3D

94,70%

Gupta et al., 2007 [49]

New algorithm

3D

3,1% (EER for midly occluded faces); 5,5% (EER for severly occluded faces)

Bronstein et al., 2006

Face recognition

PCA

70%-75% for identical twins

Rigid surface matching

82%-100% for identical twins Bronstein et al., 2005

3D

Euclidean distance + geodesic distance

Face recognition

Canonical form matching (new method proposed by the authors)

100% for identical twins

PCA

56,6%

LDA

92,6%

ICA

87,9%

Anthroface 3D (new method proposed by the authors) on landmarks

Anthroface 3D on arbitrary points

matching faces with both the Euclidean distance and geodesic distance features. The third algorithm relied on PCA. The arc-length is the length of an irregular arc segment and is also called “rectification of a curve”. Thanks to its definition, it is strictly connected to the geodesics. Efraty et al. [28,29] were interested in profile-based face recognition. They defined five types of measurements based on the properties of the profile between two landmarks. One of them was exactly the arc-length between landmarks. Nevertheless, Aung et al. [4], who used facial landmarks to evaluate the performance of a laser scanner, argued that tangential or arc measurements were slightly more complex and needed careful positioning of the image before accurate measurements could be made. Table 2 shows geodesic distance and Euclidean combined with geodesic distance performances depending on methods. 3.3. Ratios of distances “The ratios of face features play a crucial role in the classification of faces. In the past 20 years, researchers and practitioners in anthropology and aesthetic surgery have analyzed faces from a different

Gupta et al., 2010 3D 97,9%

87,5%

perspective. They use a set of canonical points on a human face that are critical for face reconstruction. These points, and distances between them are used to represent a face. In fact, artists developed a set of neoclassical canons (ratios of distances) to represent faces as far back as the Renaissance period. All these observations motivate researchers to explore the role of ratios of the distances between face landmarks in face recognition” [90, page 122]. Furthermore, ratios of distances are normalizations invariant to affine transformations. Particularly, “an affine transformation is any transformation that preserves collinearity (i.e., all points lying on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint of a line segment remains the midpoint after transformation). Geometric contraction, expansion, dilatation, reflection, rotation, and translation are all affine transformations, as their combinations. In general, an affine transformation is a composition of rotations, translations and dilatations” [62, page 4135]. Generally, in face study, ratios are defined for the Euclidean distances or geodesic distances among landmarks. These ratios are often normalized distances, obtained by dividing a distance between points by face width.

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

705

Table 3 Ratios: applications, relative methods, data dimension, and performances obtained by the authors in their previous works.

Measure

Application

Method

Face recognition

PCA

Dimension

2D

Reliability

References

75%-98% depending on the similarity measure

Shi et al., 2006

Ratios LDA Face expression recognition

83,6% Tang et al., 2008

3D

Support vector machine (SVM)

87,1%

3.3.1. Face recognition Shi et al. [90] investigated how well-normalized Euclidean distances (special ratios) can be exploited for face recognition. Exploiting symmetry and using principal component analysis, they reduced the number of ratios to 20. They are free from translation, scaling and 2D rotation of face images. The normalized distances for a face are then defined as

is employed in their work to analyze surface flatness, since it can reflect the inverse flatness of the geodesic curve that samples the surface on which the two end points (i, j) lie. Therefore, this ratio is capable of capturing obvious differences in facial curvature. Table 3 shows ratio performances depending on applications and methods, when they are provided by the authors.

    d li ; lj r li ; lj ¼ ;∀l ; l ∈fl1 ; …; lN g; dðla ; lb Þ i j

3.4. Curvature and shape

where {l1, …, lN} are the landmarks, N is their cardinality, la and lbare two landmarks whose Euclidean distance is defined as a benchmark distance. Together with Euclidean distances and geodesic distances, Gupta et al. [46,47] used ratios. They presented an anthropometric three-dimensional (Anthroface 3D) face recognition algorithm, which is based on a systematically selected set of discriminatory structural characteristics of the human face derived from the existing scientific literature on facial anthropometry. Anthropometric cranio-facial proportions are ratios of pairs of straight-line and/or along-the-surface distances between specific cranial and facial fiducial points. For example, the most commonly used nasal index N1 is the ratio of the horizontal nose width to the vertical nose height. Lee et al. [61] used relative ratios between feature points to perform face recognition. 3.3.2. Facial expression recognition Tang et al. [99,100] dealt with facial expression recognition. They devised a set of features based on properties of the line segments connecting certain facial feature points on a 3D face model. Among them, normalized distances were extracted. 3.3.3. Study on face morphology Mao et al. [67] took care of studying facial change due to growth. They formulated a new inverse flatness metric, the ratio of the geodesic distance and the Euclidean distance between landmarks, to study 3D facial surface shape. With this ratio, they were able to analyze curvature asymmetry, which cannot be detected by studying the Euclidean distance alone. They also attempted to combine it with the conventional symmetric method based on Euclidean inter-landmark distances to express facial symmetry in terms of both surface flatness and also the geometric symmetry of landmark positions (captured by the Euclidean distances), to give a better overall description of threedimensional facial symmetry. If GDi, j is the geodesic distance between points i and j, and EDi, j is the Euclidean distance, then the ratio of the geodesic to Euclidean distance R¼

GDi;j EDi;j

Punctual values of curvature and shape are precious information about facial surface behaviour. Although their valuable contribution, they are not used as often as distances. That is because they are not as easily tractable and extractable from faces. The necessity to condensate and formalize their values becomes basic in this field, where generally surfaces are not real, but described by point clouds or meshes. Several techniques have been developed to estimate the curvature information in the last two decades. From the mathematical viewpoint, the curvature information can be retrieved by the first and second partial derivatives of the local surface, the local surface normal and the tensor voting [107]. An interesting curvature representation was proposed by Koenderink et al. [56]. It is based on the parametrization of the structure in two features maps, namely the Shape Index S and the Curvedness Index C. The formal definition of Shape Index can be given as follows:

S¼−

2 k þ k2 arctan 1 ; S∈½−1; 1; k1 ≥k2 ; π k1 −k2

where k1 and k2 are the principal curvatures. It describes the shape of the surface. Koenderink et al. proposed a partition of the range [− 1,1] in nine categories, which correspond to nine different surfaces. Nevertheless, Dorai et al. [24–26] employed a modified definition to identify the shape category to which each surface point on an object belongs. With their definition, all shapes can be mapped on the interval S ∈ [0, 1], conveniently allowing aggregation of surface patches based on their shapes:



1 1 k þ k2 − arctan 1 : 2 π k1 −k2

Dorai et al. addressed the problem of representing and recognizing free-form (sculpted) surfaces which vary in shape and complexity. They proposed a new and general surface representation scheme for recognizing arbitrarily curved 3D rigid objects from range data. No restrictive assumptions are made about the types of surfaces on the object [24–26].

706

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

S does not give an indication of the scale of curvature present in the shapes. For this reason, an additional feature is introduced, the Curvedness Index of a surface: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k1 2 þ k2 2 C¼ : 2

It is a measure of how highly or gently curved a point is and is defined as the distance from the origin in the (k1, k2)-plane. Whereas the Shape Index scale is quite independent of the choice of a unit of length, the curvedness scale is not. Curvedness has the dimension of reciprocal length. In practice one has to point out some fiducial sphere as the unit sphere to fix the curvedness scale. Since principal curvatures may be computed punctually, then both S and C may be too. This advantage allows to extract shape and curvedness information from landmarks or fiducial points, guaranteeing a formalization for these features. Few authors used Shape and Curvedness Indexes for recognition. Worthington et al. [107] used shape-from-shading for investigating whether regions of uniform surface topography could be extracted from intensity images and subsequently used for the purposes of three-dimensional object recognition. They drew on the constant Shape Index maximal patch representation of Dorai et al. Song et al. [92] defined and extracted facial Shape Indexes based on facial curvature characteristics and performed dynamic programming for face recognition. Shin et al. [91] used distinctive facial features to describe a pose invariant three-dimensional face recognition method. They calculated Shape Index on each area of facial feature point to represent curvature characteristics of facial components. Calignano [13] used Shape and Curvedness Indexes for a morphological analysis methodology for soft-tissue landmarks automatic extraction. Nair et al. [20,75] dealt with face recognition, face detection and landmark localization. In the phase involving the isolation of the candidate vertices, they computed two feature maps, namely the Shape Index and the Curvedness Index, in order to properly characterize the curvature of each vertex on the face mesh. The low-level feature maps were computed after a Laplacian smoothing that reduced outliers arising from the scaling process, for the purpose of isolating the candidate vertices. Zhao et al. [113] analysed facial expressions. In order to describe local surface properties, they computed Shape Index of all points on the local grids and concatenate into vector SI. They choose Shape Index because it has been proven to be an efficient feature to describe local curvature information and is independent of the coordinate system. In their various works, Jain et al. employed Shape Index in the framework of face verification. Firstly, they outlined methods to detect key anchor points in 3D face scanner data, that are used to estimate the pose and then match the test image to a three-dimensional face model. They presented two algorithms for facial anchor points detection: one for frontal images and one for arbitrary pose. Shape Index was used to locate prominent anchor points within the face regardless of the orientation of the face in the scan [18]. Then, the authors proposed a multimodal scheme which integrated three-dimensional range and bi-dimensional intensity face information provided in order to extract fiducial points. The Shape Index response derived from the range map and the cornerness response (similar to Curvedness Index) from the intensity map are combined to determine the positions of the corners of the eyes and the mouth [64,65]. Later, they developed a face recognition system that utilizes three-dimensional shape information to make the system robust to arbitrary pose and lighting. For each subject, a 3D face model is constructed by integrating several 2.5D face scans which are captured from different views. 2.5D is a simplified 3D (x, y, z) surface representation that contains at most one depth value (z direction) for every point in the (x, y) plane. Using a combination of the

pose invariant Shape index and the 3D coordinates they developed heuristics to locate a set of candidate feature points [66]. Other well-known 3D surface and curvature representations are Spin Images, introduced by Johnson [54]. Oriented points, 3D points with associated directions, are used to create Spin Images. An oriented point at a surface mesh vertex is defined using the 3D position of the vertex and surface normal at the vertex. The surface normal at a vertex is computed by fitting a plane to the points connected to the vertex by edges in the surface mesh. An oriented point defines a partial, object-centred, coordinate system. Two cylindrical coordinates can be defined with respect to an oriented point: the radial coordinate α, defined as the perpendicular distance to the line through the surface normal, and the elevation coordinate β, defined as the signed perpendicular distance to the tangent plane defined by vertex normal and position. A Spin Image is created for an oriented point at a vertex in the surface mesh as follows. A 2D accumulator indexed by α and β is created. Next, the coordinates (α, β) are computed for a vertex in the surface mesh that is within the support of the Spin Image (explained below). The bin indexed by (α, β) in the accumulator is then incremented; bilinear interpolation is used to smooth the contribution of the vertex. This procedure is repeated for all vertices within the support of the Spin Image. The resulting accumulator can be thought of as an image; dark areas in the image correspond to bins that contain many projected points. As long as the size of the bins in the accumulator is greater than the median distance between vertices in the mesh (the definition of mesh resolution), the position of individual vertices will be averaged out during Spin Image generation. Spin Images generated from two different surfaces representing the same object will be similar because they are based on the shape of the object, but they will not be exactly the same due to variations in surface sampling and noise. However, if the surfaces are uniformly sampled then the Spin Images from corresponding points on the different surfaces will be linearly related. A standard method for comparing linearly related data sets is the linear correlation coefficient, so the correlation coefficient between two Spin Images is used to measure Spin Image similarity [54]. Passalis et al. employed both Shape Index and Spin Images. Firstly, they proposed a unified method to perform face recognition across scans of faces in different positions, addressing the problem of partial matching. Both frontal and side left or right facial scans are handled in a way that allows inter-pose retrieval operations. Shape Index was used for 3D landmark detection [77]. Then, the authors introduced the first method for three-dimensional facial landmark detection able to work in datasets with pose rotations up to 80° around the yaxis. After calculating Shape Index values on a 3D facial dataset, a mapping to 2D space is performed, so that they created a Shape Index map. Local maxima and minima are identified on the Shape Index map. Local maxima are candidate landmarks for nose tips and chin corners. The Shape Index's maxima and minima that are located are sorted in descending order of significance according to their corresponding Shape Index values. The most significant subset of points for each group, namely caps and cups, is retained. However, experimentation showed that the Shape Index alone is not robust enough for detecting anatomical landmarks in facial datasets in a variety of poses. Thus, candidate landmarks estimated from Shape Index values are classified and filtered out according to their relevance with corresponding Spin Image templates. A Spin Image encodes the coordinates of points on the surface of a 3D object with respect to a socalled “oriented point” (p, n), where n is the normal vector at point p of a 3D object's surface. A Spin Image at an oriented point (p, n) is a 2D grid accumulator of 3D points, as the grid is rotated around n by 360°. Thus, a Spin Image is a descriptor of the global or local shape of the object, invariant under rigid transformations [76,78]. They also designed the computational tools and a hardware prototype for 3D face recognition. Scalability in both time and space is achieved by converting three-dimensional face scans into compact

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

metadata. Particularly, the presented multistage alignment method robustly and accurately aligns, even in the presence of different facial expressions. They used Spin Images to establish a correspondence between model and data, if arbitrary rotations and translations are expected [55]. As said previously, also Romero et al. used Spin Images as descriptors to locate nose-tip and eye-corner [82,83]. Local Scale-Invariant Features are also used as representations for images. They were proposed by Lowe [63]. He presented a new method for image feature generation called the Scale Invariant Feature Transform (SIFT). This approach transforms an image into a large collection of local feature vectors, each of which is invariant to image translation, scaling, and rotation, and partially invariant to illumination changes and affine or 3D projection. Previous approaches to local feature generation lacked invariance to scale and were more sensitive to projective distortion and illumination change. The SIFT features share a number of properties in common with the responses of neurons in inferior temporal (IT) cortex in primate vision. The scale-invariant features are efficiently identified by using a staged filtering approach. The first stage identifies key locations in scale space by looking for locations that are maxima or minima of a difference-ofGaussian function. Each point is used to generate a feature vector that describes the local image region sampled relative to its scale-space coordinate frame. The features achieve partial invariance to local variations, such as affine or 3D projections, by blurring image gradient locations. This approach is based on a model of the behaviour of complex cells in the cerebral cortex of mammalian vision. The resulting feature vectors are called SIFT keys. The SIFT keys derived from an image are used in a nearest-neighbour approach to indexing to identify candidate object models. Collections of keys that agree on a potential model pose are first identified through a Hough transform hash table, and then through a least-squares fit to a final estimate of model parameters. When at least 3 keys agree on the model parameters with low residual, there is strong evidence for the presence of the object. Since there may be dozens of SIFT keys in the image of a typical object, it is possible to have substantial levels of occlusion in the image and yet retain high levels of reliability [63]. 3D SIFT were fairly employed as face descriptors. Zhang et al. [112] focused on local feature based 3D face recognition, and proposed a Faceprint method. They extracted SIFT features from texture and range images and matched them. The matching number of key points together with geodesic distance ratios between models are used as matching scores, while the likelihood ratio based score level fusion is conducted to calculate the final matching score. Thanks to the robustness of SIFT, shape index, and geodesic distance against various changes of geometric transformation, illumination, pose and expression, the Faceprint method is inherently insensitive to these variations. Experimental results proved that this method achieves consistently high performance comparing with commonly used SIFT on texture images. Ji et al. [53] extracted human-focused foreground actions from videos containing global motions or concurrent actions by an attention shift model. Subsequently, a spatiotemporal vocabulary is built based on 3D SIFT features extracted from these human-focused action regions. Action recognition belongs to face detection field. Other parameters and methodologies were used to extract shape and curvature information from facial landmarks or fiducial points. Moreno et al. [69] performed face recognition using 3D surfaceextracted descriptors. Averages and variances of the mean and Gaussian curvatures, evaluated in points belonging to the various regions which face surface was divided by, were extracted. Gordon [45] defined a set of features which describe the nose ridge and that are based on the measurement of curvature. They are: maximum Gaussian curvature on the ridge line, average minimum curvature on the ridge above the tip of the nose, Gaussian curvature at the bridge, Gaussian curvature at the base. The maximum Gaussian curvature occurs approximately at the tip of the nose and provides

707

some description of local shape at that point. The average minimum curvature between the bridge and the nose tip is meant to provide a measure of the curvature along the nose ridge. Xu et al. developed an automatic 3D face recognition method combining the global geometric features with local shape variation information. The scattered three-dimensional point cloud is firstly represented by a regular mesh, then the local shape information is extracted for characterizing both individual and global geometric features. First, they described the 3D shape of the principle areas with a 1D vector by defining a metric, then analyzed the shape variation using the GaussianHermite moments [109]. Later, they focused on three-dimensional face data and proposed a method to solve a specific problem, i.e., locating the tip of the nose by a hierarchical filtering scheme combining local features. After identifying the nose tip position through a study on facial shape, they focused on the nose ridge using the Included Angle Curve (IAC). Included Angle Curve is described in the following. “For the detected nose tip, P, they placed two spheres with radius R and r (R > r), centred at P. The intersection of two spheres and the facial surface generates a torus in 3D space. They defined a reference plane, G, by the normal vector, N, of the nose tip and a reference vector, C, orthogonal to N. A plane, Q, rotates along the axis, N, beginning from G with a fixed step, Δθ. Thus, the rotated plane, Q, divides the torus into 360 =Δ θ partitions. They obtained in the barycentre, Bi, of i-th partition by averaging the points within it and then calculate the included angle between Bi − P and N. They used one curve to describe the variation of these included angles, which is called the Included Angle Curve (IAC). Since P is the local highest point according to the authors' hierarchical filtering scheme, all its included angles are larger than 90°. The area near the nose ridge has the smallest included angle, which can be used to determine the approximate direction of the nose ridge. This algorithm is similar to point signatures. The main difference is that their method does not acquire a plane fit to the local points. It largely reduces the computational cost” [100, page 1491]. Efraty et al. [28,29] computed for each pair of landmarks the mean curvature of the region between landmarks and the L2-norm of curvature along the contour between landmarks (proportional to bending energy). Wang et al. [102] dealt with facial expression recognition. An approach to extract primitive 3D facial expression features from the triangle meshes of faces was proposed. They performed principal curvature analysis, which produced a set of attributes that describe the geometric surface properties at each vertex. Among them, principal curvatures, representing the maximum and the minimum degrees of bending of a surface, and steepness are included. Using these geometric attributes, they were able to classify every vertex into a category. Table 4 shows shape and curvedness performances depending on applications and methods. 3.5. Other features Other geometrical features were extracted from face landmarks. Ras et al. [80] studied facial morphology and computed angles between fiducial points. Particularly, the angles exocanthion-chelionpronasal, exocanthion-pronasal-exocanthion, pronasal-exocanthionchelion and between the two planes formed by exocanthion, chelion and pronasal of both sides were calculated. Changes in facial morphology due to growth and development were analyzed with an analysis of variance with the angles. Lee et al. [61] performed face recognition calculating relative angles among facial feature points. Moreno et al. [69] computed angles, regions, areas of regions and centroids of regions. Zhao et al. [113], for face recognition purposes, used the multi-scale LBP operator, a powerful texture measure used widely in 2D face analysis. It extracts information which is invariant to local grayscale variations of the image with low computational complexity. They also computed a landmark displacement vector. The

708

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

Table 4 Curvedness and shape information: applications (i.e. face recognition and landmark detection), relative methods, data dimension, and performances obtained by the authors in their previous works.

Measure

Shape and curvedness info

Application

Method

Dimension

Reliability

Hierarchical graph matching (HGM)

2D + 3D

96,8%

2D HGM + 3D PCA

2D + 3D

96,8%

PCA + nearest neighbour (NN)

3D

76,4% (correct classification rate (CCR) for global geometric features (GGF)); 82% (correct classification rate (CCR) for GGF + shape variation information (SVI))

Xu et al., 2004

Gabor filter + PCA

2D + 3D

70%-90% depending on features and classifying theories

Wang et al., 2002

Annotated face model (AFM)

3D

Cumulative match characteristic (CMC): 41%-87% depending on landmarking methods and datasets

Perakis et al., 2009 [79]

ICP

3D

Matching error: 1 for neutral expressions; 37 for smiling expressions

ICP + LDA

3D

Matching error: 0 for neutral expressions; 27 for smiling expressions

ICP

3D

88%-98% depending on rank and number of feature candidate sets

Hüsken et al., 2005

Face recognition

Surface matching Constrained LDA Surface matching + constrained LDA

Shape index + spin images

References

Lu et al., 2005

Lu et al., 2006 [67]

68%-98% depending on categories of face scans 2.5D scans and 3D models

69%-86% depending on categories of face scans

Lu et al., 2006 [68]

77%-99% depending on categories of face scans

Landmark detection for face verification

ICP

3D

99% for frontal images; 86% for face images with different poses and facial expressions

Colbry al., 2005

Face recognition

AFM

3D

79,4%-89,1% depending on datasets

Passalis et al., 2011 Perakis et al., 2009 [80]

Landmark identification for facial region retrieval

Active landmark model (ALM)

3D

Correct pose estimation: 96,25%-100%; mean distance error: 7,74-10,59 mm, depending on datasets

Spin images

Face recognition

AFM

3D

94,1%-99% depending on similarity measures and datasets

Kakadiaris et al., 2007

Scale invariant feature transform (SIFT) features

Face recognition

Faceprint method (new method proposed by the authors)

3D

76,5% for comparison neutral vs. non-neutral expressions; 93,6% for comparison neutral vs.neutral

Zhang et al., 2009

displacement of a landmark is used to capture the change of the landmark location when an expression appears on a neutral face. It is a scientific tool for quantifying the difference between an expressive face and the neutral one. Similarly, Sun et al. [96] derived the displacement vector between each individual frame and the initial frame (neutral expression). Dufresne [27] utilized the vectors between selected facial points as features for 2D face correction. He showed that simply measuring the width and height of the mouth does not indicate what pose that mouth is in, i.e. smiling, scowling or smirking. Vectors were selected due to being particularly expressive. That is, a human could understand the expression if only these vectors were presented to them. Tang et al. [99,100] extracted slopes of the line segments connecting a subset of the 83 facial feature points for facial expression recognition purposes. Daniyal et al. [19] analyzed the performance of different landmark combinations (signatures) to determine a signature that is robust to expressions for the purpose of face recognition. The selected signature is then used to train a Point Distribution Model for the automatic localization of the landmarks. As a validation, Jesorsky et al. [52] used relative error for face detection. Relative error is based on the distances between the expected and the estimated eye position, so it must not be considered as a normalized distance. Other authors extracted depth and texture features from landmarks or face zones. A texture coding provides information about facial regions with little geometric structures, such as hair, forehead and eyebrows, while a depth coding provides information about regions where there is little texture such as the chin,

jawline and cheeks. Particularly, Wang et al. [106] extracted shape and texture features from defined feature points for face recognition purposes. BenAbdelkader et al. [5] worked on face coding for recognition and identification. They designed a pattern classifier for three different inputs: depth map, texture map, and both depth and texture maps. Hüsken et al. [50] included both texture and shape as typical 2D and 3D representations of faces. Table 5 shows the performances of some of the other features depending on methods, when they are provided by the authors. Table 6 shows the performances of some previous features when they are joined together. This combination should enforce the reliability of extracted information and improve the performance. 4. Results Depending on the application field, many of the previous measures were judged by the researchers as valid, effective and suitable to a face description. Previous tables showed numerical values of performances of these measures in their different methods and applications. Since the fields which all these geometrical features are used for are really various, it is out of the scope of this paper to provide here a detailed analysis of the results these measures give in their applications. However, it is possible to briefly comment the reliability values. Regarding face recognition, several results were provided. Excellent performances were obtained by Bronstein et al. [10,11] for

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

709

Table 5 Other features (depth and texture): applications (in this case only face recognition), relative methods, data dimension, and performances obtained by the authors in their previous works.

Measure

Application

Depth info

Depth info + texture info

Face recognition

Texture info

Method

Dimension

Reliability

Facelt technology

98,73%+/-0,20

LDA

96,09%+/-0,37

Facelt technology

100%

3D

LDA

98,58%+/-0,23

Facelt technology

99,76%+/-0,08

LDA

97,22%+/-0,30

References

BenAbdelkader et al., 2005

obtained 100% accuracy with their new algorithm “canonical form matching” using geodesic distance. Another promising result (97.9%) was reached by Gupta et al. [48] with the new proposed algorithm “anthroface 3D” applied to the combination of Euclidean and geodesic distance. Shi et al. [90] reached 98% RR using ratios and principal component analysis (PCA), while BenAbdelkader et al. [5] obtained recognition rates in a range between 96% and almost 100% using both FaceIt technology and LDA applied to depth and texture information. First-rate results were obtained for shape and curvedness by Lu et al. in their various works. They got a 0 matching error value for neutral

geodesic distance (recognition rate RR: 100%), by Gordon [45] for the combination of Euclidean distance and Gaussian curvature (maximum RR: 100%) using a clustering algorithm, and by Efraty et al. [29] for the fusion of Euclidean distance, arc-length, curvature, and angle contributions applied to a new face profile algorithm (maximum RR: 100%). Analyzing in detail every measure when it is applied to face recognition, it is also possible to identify the methods which led to highest rates. The best result for Euclidean distance (RR: 97.6%) was achieved by Gupta et al. [46,47] with the use of linear discriminant analysis (LDA). As said previously, Bronstein et al. [10,11]

Table 6 Combination of measures: applications (in this case only face recognition), relative methods, data dimension, and performances obtained by the authors in their previous works.

Combination of measure

Application

Euclidean distance, area, angle, curvature

Euclidean distance, Gaussian curvature

Euclidean distance, ratio, angle, curvature

Euclidean distance, arc length, curvature, angle

Euclidean distance, texture, shape index

Face recognition

Method

Dimension

Reliability

78% (when best match is selected); 92% (when the five best matches are selected)

References

New algorithm

3D

Clustering

3D

80%-100% depending on the feature set

Gordon, 1992

Depth-based dynamic programming (DP) and feature-based support vector machine (SVM)

3D

95%

Lee et al., 2005

3D

60%-100% depending on datasets

Efraty et al., 2010

3D

74,6%-94,23% depending on feature sets

Zhao et al., 2010

New algorithm (face profile recognition)

Statistical facial feature model (SF AM) + PCA

Moreno et al.

710

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

expressions when using a fusion of iterative closest point (ICP) and LDA [64], and a 99% RR with a fusion of surface matching and constrained LDA [66]. Another noteworthy result is Kakadiaris et al.'s 99% RR, obtained with Spin Images and annotate face model (AFM) algorithm. Face expression recognition did not lead to similar results, that in any case are not as numerous as face recognition's. Fairly good rates were reached by Soyel et al. [94], who used probabilistic neural network (PNN) and the Euclidean distance and obtained 88% accuracy. An analogue result (87.1% RR) was obtained by Tang et al. [99,100] when support vector machine (SVM) was applied to ratios. Landmark detection studies and applications gained excellent results, especially when shape and curvedness information are involved. 99% accuracy for frontal images was obtained by Colbry et al. [18], who used Shape Index in a ICP algorithm. A 100% correct pose estimation was reached by Perakis et al. [78], who used active landmark model (ALM) and a combination of Shape Index and Spin Images. Fairly good results were achieved for Euclidean distance by Ras et al. [80] and Enciso et al. [32], who obtained a reproducibility significance less than or equal to 0.05 using stereophotogrammetry and light-based techniques, respectively. A significance equal to 0.05 means that the results show no significant variance due to scan or landmark marking [32]. We conclude with an overview of how functional are the most important features, namely Euclidean and geodesic distances, in recognition usage, i.e. the main field. These evaluations are given by those authors who employed both the two measures and compare the obtained results. Bronstein et al. [8–11], which used geodesic distances, obtained promising face recognition results on a small database of 30 subjects even when the facial surfaces were severely occluded. They also demonstrated that the approach has several significant advantages, one of which is the ability to handle partially missing data. This is exactly the contrary of what was proven by Gupta et al. in his first study [46], who tested both Euclidean and geodesic distance. The two algorithms based on Euclidean or geodesic distances between anthropometric facial landmarks performed substantially better than the baseline PCA algorithm. “The algorithms based on geodesic distance features performed on a par with the algorithm based on Euclidean distance features. Both were effective, to a degree, at recognizing 3D faces. In this study the performance of the proposed algorithm based on geodesic distances between anthropometric facial landmarks decreased when probes with arbitrary facial expressions were matched against a gallery of neutral expression 3D faces. This suggests that geodesic distances between pairs of landmarks on a face may not be preserved when the facial expression changes. This was contradictory to Bronstein et al.'s assumption regarding facial expressions being isometric deformations of facial surfaces. In conclusion, geodesic distances between anthropometric landmarks were observed to be effective features for recognizing 3D faces, however they were not more effective than Euclidean distances between the same landmarks. The 3D face recognition algorithm based on geodesic distance features was affected by changes in facial expression” [46, page 3]. Lately, Gupta et al. [48] gained other results. He obtained that, for expressive faces, the recognition rates of the algorithm that was based on both the Euclidean and geodesic facial anthropometric distances were also generally higher than those of the algorithm that was based on only Euclidean distances. “This suggests that facial geodesic distances may be useful for expression invariant 3D face recognition and further strengthens Bronstein et al.'s proposition that different facial expressions may be modelled as isometric deformations of the facial surface” [48, page 345]. 5. Conclusion An exhaustive set of morphometric measures and geometrical features extractable from facial landmarks were here presented and

explained. The most popular ones are certainly Euclidean and geodesic distance, which were used by many authors, also as benchmarking elements of comparison. The application which involve them the most is recognition, with its various subfields, such as face recognition, facial expression recognition and face detection. Landmarks are the starting point for this study, since they are just the points from which information are extracted. This is due to the fact that, as said in Section 2, from various evaluations [47,48] it resulted necessary to use just the anthropometric landmarks as fiducial points. As a matter of fact, most of the work concerning 3D facial morphometry refers exactly to landmarks.

References [1] B. Abbound, F. Davoine, M. Dang, Facial expression recognition and synthesis based on an appearance model, Signal Processing: Image Communication, Vol. 19, issue 8, Elsevier, 2004, pp. 723–740. [2] M. Alker, S. Frantz, K. Rohr, H.S. Stiehl, Improving the Robustness in Extracting 3D Point Landmarks from 3D Medical Images Using Parametric Deformable Models, Lecture Notes in Computer Science, Vol. 2191, Springer-Verlag, 2001, pp. 582–590. [3] N. Alyüz, B. Gökberk, H. Dibeklioğlu, A. Savran, A.A. Salah, L. Akarun, B. Sankur, 3D Face Recognition Benchmarks on the Bosphorus Database with Focus on Facial Expressions, Lecture Notes in Computer Science, Vol. 5372, Springer-Verlag, 2008, pp. 57–66. [4] S.C. Aung, R.C.K. Ngim, S.T. Lee, Evaluation of the laser scanner as a surface measuring tool and its accuracy compared with direct facial anthropometric measurements, Br. J. Plast. Surg. 48 (1995) 551–558. [5] C. BenAbdelkader, P.A. Griffin, Comparing and combining depth and texture cues for face recognition, Image and Vision Computing, 23, Elsevier, 2005, pp. 339–352. [6] G.M. Beumer, Q. Tao, A.M. Bazen, R.N.J. Veldhuis, Comparing landmarking methods face recognition, http://www.utwente.nl 2005. [7] G.M. Beumer, Q. Tao, A.M. Bazen, R.N.J. Veldhuis, A landmark paper in face recognition”, 7th International Conference on Automatic Face and Gesture Recognition, IEEE, 2006 (6 pp. - 78). [8] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Expression-Invariant 3D Face Recognition, Lecture Notes in Computer Science, Vol. 2688, Springer, 2003, pp. 62–70. [9] A.M. Bronstein, M.M. Bronstein, A. Spira, R. Kimmel, Face Recognition from Facial Surface Metric, Lecture Notes in Computer Science, Vol. 3022, Springer, 2004, pp. 225–237. [10] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Three-Dimensional Face Recognition, International Journal of Computer Vision, 64 (1), Springer, 2005, pp. 5–30. [11] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Expression-Invariant Face Recognition via Spherical Embedding”, IEEE International Conference on Image Processing, III: 756-759, , 2005. [12] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Robust expression-invariant face recognition from partially missing data, Lecture Notes in Computer Science, Vol. 3953, Springer, 2006, pp. 396–408. [13] F. Calignano, (2009) Morphometric methodologies for bio-engineering applications, PhD Degree Thesis, Politecnico di Torino, Department of Production Systems and Business Economics. [14] J. Čarnický, D. Chorvát Jr., Three-dimensional measurement of human face with structured-light illumination, Meas. Sci. Rev. Volume 6, Section 2 (1) (2006) 1–4. [15] I. Cohen, N. Sebe, A. Garg, L.S. Chen, T.S. Huang, Facial expression recognition from video sequences: temporal and static modelling, Computer Vision and Image Understanding, 91, Elsevier, 2003, pp. 160–187. [16] I. Cohen, N. Sebe, F.G. Cozman, T.S. Huang, Semi-Supervised Learning for Facial Expression Recognition, Proceedings of the 5th ACM SIGMM international workshop on Multimedia Information retrieval, 2003. [17] I. Cohen, N. Sebe, Y. Sun, M.S. Lew, T.S. Huang, Evaluation of Expression Recognition Techniques, Lecture Notes in Computer Science, Vol. 2728, Springer, 2003, pp. 49–54. [18] D. Colbry, G. Stockman, A. Jain, Detection of Anchor Points for 3D Face Verification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, p. 118. [19] A. Colombo, C. Cusano, R. Schettini, 3D face detection using curvature analysis, Pattern Recongnition, 39, Elsevier, 2006, pp. 444–455. [20] F. Daniyal, P. Nair, A. Cavallaro, Compact signatures for 3D face recognition under varying expressions, Sixth IEEE International Conference on Advanced Video and Signal Based Survellaince, 2009, pp. 302–307. [21] W. Dawei, Q. Guozheng, Z. Mingli, L.G. Farkas, Differences in Horizontal, Neoclassical Facial Canons in Chinese (Han) and North American Caucasian Populations, Aesthetic Plastic Surgery, 21, Springer-Verlag, 1997, pp. 265–269. [22] J. D'Hose, J. Colineau, C. Bichon, B. Dorizzi, Precise Localization of Landmarks on 3D Faces using Gabor Wavelets, First IEEE International Conference on Biometrics: Theory, Applications, and Systems, 2007, pp. 1–6. [23] H. Dibeklioğlu, A.A. Salah, L. Akarun, 3D Facial Landmarking under Expression, Pose, and Occlusion Variations, 2nd IEEE International Conference on Biometrics: Thoery, Applications and Systems, 2008, pp. 1–6.

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712 [24] C. Dorai, A.K. Jain, COSMOS - A Representation Scheme for Free-Form Surfaces, Fifth International Conference on Computer Vision – IEEE Conferences, 1995, pp. 1024–1029. [25] C. Dorai, A.K. Jain, Recognition of 3D free-form objects, Proceedings of the 13th International Conference on Pattern Recognition - IEEE Conferences, Vol. 1, 1996, pp. 697–701. [26] C. Dorai, A.K. Jain, COSMOS - A Representation Scheme for 3D Free-Form Objects, IEEE Trans. Pattern Anal. Mach. Intell. 19 (10) (1997) 1115–1130. [27] D. Dufresne, “Towards an Automatic System for Unflattering Face Correction”, www.brown.edu. [28] B.A. Efraty, D. Chu, E. Ismalov, S. Shah, I.A. Kakadiaris, 3D-Aided Profile-Based Face Recognition, 16th International Conference on Image Processing (ICIP) – IEEE Conferences, 2009, pp. 4125–4128. [29] B.A. Efraty, E. Ismalov, S. Shah, I.A. Kakadiaris, Profile-Based 3D-Aided Face Recognition, http://www.cs.uh.edu 2010. [30] P. Ekman, Are There Basic Emotions? Psychol. Rev. 99 (3) (1992) 550–553. [31] P. Ekman, Handbook of Cognition and Emotions, John Wiley & Sons Ltd., Sussex, UK, 1999. [32] R. Enciso, E.S. Alexandroni, K. Benyamein, R.A. Keim, J. Mah, Precision, Repeatability and Validation of In direct 3D Anthropometric Measurements with Light-Based Imaging Techniques, IEEE International Symposium on Biomedical Imaging: Nano to Macro, Vol. 2, 2004, pp. 1119–1122. [33] L.G. Farkas, W.K. Linsday, Morphology of Adult Face After Repair of Isolated Cleft Palate in Childhood, Cleft Palate J. 9 (1972) 132–142. [34] L.G. Farkas, J.C. Posnick, T.M. Hreczko, Anthropometric Growth Study of the Head, Cleft Palate-Craniofac. J. 29 (4) (1992) 303–308. [35] L.G. Farkas, Anthropometry of the Head and Face, Raven Press, New York, 1994. [36] L.G. Farkas, Facial Morphometry of Television Actresses Compared with Normal Women, J. Oral Maxillofac. Surg. 53 (1995) 1014–1015. [37] L.G. Farkas, Accuracy of Anthropometric Measurements: Past, Present and Future, Cleft Palate-Craniofac. J. 33 (1) (1996) 10–22. [38] Farkas, C.R. Forrest, L. Litsas, Revision of Neoclassical Facial Canons in Young Adult Afro-Americans, Aesthetic Plastic Surgery, 24, Springer-Verlag, 2000, pp. 179–184. [39] L.G. Farkas, M.J. Katic, C.R. Forrest, International Anthropometric Study of Facial Morphology in Various Ethnic Groups/Races, J. Craniofac. Surg. 16 (4) (2005) 615–646. [40] S. Frantz, K. Rohr, H.S. Stiehl, Multi-Step Procedures for the Localization of 2D and 3D Point Landmarks and Automatic ROI Size Selection, Lecture Notes in Computer Science, Vol. 1406, Springer-Verlag, 1998, pp. 687–703. [41] S. Frantz, K. Rohr, H.S. Stiehl, Improving the Detection Performance in Semiautomatic Landmark Extraction, Lecture Notes in Computer Science, Vol. 1679, Springer-Verlag, 1999, pp. 253–263. [42] S. Frantz, K. Rohr, H.S. Stiehl, Localization of 3D Anatomical Point Landmarks in 3D Tomographic Images Using Deformable Models, Lecture Notes in Computer Science, Vol. 1935, Springer-Verlag, 2000, pp. 492–501. [43] S. Frantz, K. Rohr, H.S. Stiehl, “Development and validation of a multi-step approach to improved detection of 3D point landmarks in tomographic images”, Image and Vision Computing, Science Direct 23 (11) (2005) 956–971. [44] Y. Gizatdinova, V. Surakka, Feature-Based Detection of Facial Landmarks from Neutral and Expressive Facial Images, IEEE Trans. Pattern Anal. Mach. Intell. 28 (1) (2006) 135–139. [45] G.G. Gordon, Face Recognition Based on Depth and Curvature Features, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1992, pp. 808–810. [46] S. Gupta, M.K. Markey, J.K. Aggarwal, A.C. Bovik, 3D Face Recognition based on Geodesic Distances, http://www.psu.edu 2007. [47] S. Gupta, J.K. Aggarwal, M.K. Markey, A.C. Bovik, 3D Face Recognition Founded on the Structural Diversity of Human Faces, IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–7. [48] S. Gupta, M.K. Markey, A.C. Bovik, Anthropometric 3D Face Recognition, Int. J. Comput. Vis. 90 (3) (2010) 331–349. [49] H. Hong, H. Neven, C. von der Malsburg, Online Facial Expression Recognition Based on Personalized Gallery, 3rd IEEE International Conference on Automatic Face and Gesture Recognition, 1998, pp. 354–359. [50] M. Hüsken, M. Brauckmann, S. Gehlen, C. von der Malsburg, Strategies and Benefits of Fusion of 2D and 3D Face Recognition”, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. [51] A.K. Jain, S.Z. Li, Handbook of Face Recognition, Springer, New York, 2005. [52] O. Jesorsky, K.J. Kirchberg, R.W. Frischholz, Robust Face Detection Using the Hausdorff Distance, Lecture Notes in Computer Science, Vol. 2091, Springer, 2001, pp. 90–95. [53] R. Ji, H. Yao, X. Sun, Actor-independent action search using spatiotemporal vocabulary with appearance hashing, Pattern Recognition, 44, Elsevier, 2011, pp. 624–638. [54] A.E. Johnson, M. Hebert, Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes, IEEE Trans. Pattern Anal. Mach. Intell. 21 (5) (1999) 433–449. [55] I.A. Kakadiaris, G. Passalis, G. Toderici, M.N. Murtuza, Y. Lu, N. Karampatziakis, T. Theoharis, Three-Dimensional Face Recognition in the Presence of Facial Expressions: An Annotated Deformable Model Approach, IEEE Trans. Pattern Anal. Mach. Intell. 9 (4) (2007) 640–649. [56] J.J. Koenderink, A.J. van Doorn, Surface shape and curvature scales, Image Vis. Comput. 10 (8) (1992) 557–564. [57] M.P. Kohout, L.M. Aljaro, L.G. Farkas, J.B. Mulliken, Photogrammetric Comparison of Two Methods for Synchronous Repair of Bilateral Cleft Lip and Nasal Deformity, Plast. Reconstr. Surg. 102 (5) (1998) 1339–1349.

711

[58] J.C. Kolar, L.G. Farkas, I.R. Munro, Craniofacial Disproportions in Apert's Syndrome: an Anthropometric Study, Cleft Palate J. 22 (4) (1985) 253–265. [59] J.C. Kolar, L.G. Farkas, I.R. Munro, Surface Morphology in Treacher Collins Syndrome: an Anthropometric Study, Cleft Palate J. 22 (4) (1985) 266–274. [60] T.T. Le, L.G. Farkas, C.K. Ngim, L.S. Levin, C.R. Forrest, Proportionality in Asian and North American Caucasian Faces Using Neoclassical Facial Canons as Criteria, Aesthetic Plastic Surgery, 26, Springer-Verlag, 2002, pp. 64–69. [61] Y. Lee, H. Song, U. Yang, H. Shin, K. Sohn, Local Feature Based 3D Face Recognition, Lecture Notes in Computer Science, Vol. 3546, Springer, 2005, pp. 909–918. [62] M.Z.W. Liao, J. Wang, W. Chen, Y.Y. Tang, An Affine Invariant Euclidean Distance Between Images, Proceedings of the 5th International Conference on Matching Learning and Cybernetics, Dalian, , 2006, pp. 4133–4137, (13-16 August). [63] D.G. Lowe, Object Recognition from Local Scale-Invariant Features, The Proceedings of the 7th IEEE International Conference on Computer Vision, Vol. 2, 1999, pp. 1150–1157. [64] X. Lu, A.K. Jain, Multimodal Facial Feature Extraction for Automatic 3D Face Recognition, Michigan State University, 2005 www.msu.edu. [65] X. Lu, A.K. Jain, Automatic Feature Extraction for Multiview 3D Face Recognition, 7th International Conference on Automatic Face and Gesture Recognition, IEEE, , 2006, pp. 585–590. [66] X. Lu, A.K. Jain, D. Colbry, Matching 2.5D Face Scans to 3D Models, IEEE Trans. Pattern Anal. Mach. Intell. Vol. 8 (1) (2006) 31–43. [67] Z.L. Mao, J.P. Siebert, W.P. Cockshott, and A.F. Ayoub, “A coordinate-free method for the analysis of 3D facial change”, www.gla.ac.uk. [68] P. Michael, R.E. Keliouby, Real Time Facial Expression Recognition in Video using Support Vector Machines, Proceedings of the 5th international conference on Multimodal interfaces, 2003http://www.psu.edu. [69] A.B. Moreno, A. Sánchez, J. Vélez, and J. Díaz, “Face recognition using 3D surfaceextracted descriptors”, http://www.psu.edu [70] I. Mpiperis, S. Malassiotis, M.G. Strintzis, 3-D Face Recognition With the Geodesic Polar Representation, IEEE Trans. Inf. Forensic Secur. 2 (3) (2007) 537–547. [71] I. Mpiperis, S. Malassiotis, M.G. Strintzis, Bilinear Models for 3-D Face and Facial Expression Recognition, IEEE Trans. Inf. Forensic Secur. Vol. 3 (3) (2008) 498–511. [72] I. Mpiperis, S. Malassiotis, M.G. Strintzis, Bilinear Elastically Deformable Models with Application to 3D Face and Facial Expression Recognition, 8th IEEE International Conference on Automatic Face & Gesture Recognition, 2008, pp. 1–8. [73] I. Mpiperis, S. Malassiotis, M.G. Strintzis, Expression-Compensated 3D Face Recognition with Geodesically Aligned Bilinear Models, 2nd International Conference on Biometrics: Theory, Applications and Systems, 2008, pp. 1–6. [74] I. Mpiperis, S. Malassiotis, V. Petridis, M.G. Strintzis, 3D facial expression recognition using swarm intelligence, IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 2133–2136. [75] P. Nair, A. Cavallaro, 3-D Face Detection, Landmark Localization, and Registration Using a Point Distribution Model, IEEE Trans. Multimed. 11 (4) (2009) 611–623. [76] G. Passalis, P. Perakis, T. Theoharis, A. Kakadiaris, Using Facial Symmetry to Handle Pose Variations in Real-World 3D Face Recognition, IEEE Trans. Pattern Anal. Mach. Intell. (99) (2011) 1–14. [77] P. Perakis, G. Passalis, T. Theoharis, G. Toderici, I.A. Kakadiaris, Partial Matching of Interpose 3D Facial Data for Face Recognition, IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, 2009, pp. 1–8. [78] P. Perakis, T. Theoharis, G. Passalis, I.A. Kakadiaris, Automatic 3D Facial Region Retrieval from Multi-pose Facial Datasets, Eurographic Workshop on 3D Object Retrieval, The Eurographics Association, Munich, Germany, 2009, pp. 37–44, (Mar. 30 – Apr. 3). [79] U. Prabhu, K. Seshadri, and M. Savvides, “Automatic Facial Landmark Tracking in Video Sequences using Kalman Filter Assisted Active Shape Models“, http:// humanmotion.rutgers.edu [80] F. Ras, L.L.M.H. Habets, F.C. van Ginkel, B. Prahl-Andersen, Quantification of facial morphology using stereophotogrammetry – demonstration of a new concept, Journal of Dentistry, Vol. 24, No. 5, Elsevier, 1996, pp. 369–374. [81] J. Rodrigues, J.M. Hans du Buf, Multi-scale keypoints in V1 and face detection, Lecture Notes in Computer Science, Vol. 3704, Springer, 2005, pp. 205–214. [82] M. Romero, N. Pears, Landmark Localisation in 3D Face Data, 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2009, pp. 73–78. [83] M. Romero, N. Pears, Point-pair descriptors for 3D facial landmark localisation, IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, 2009, pp. 1–6. [84] M.C. Ruiz, J. Illingworth, Automatic landmarking of faces in 3D - ALF3D, 5th International Conference on Visual Information Engineering - IEEE Conferences, 2008, pp. 41–46. [85] A.A. Salah, L. Akarun, Gabor Factor Analysis for 2D+3D Facial Landmark Localization, IEEE 14th Signal Processing and Communications Applications, 2006, pp. 1–4. [86] A.A. Salah, L. Akarun, 3D Facial Feature Localization for Registration, Lect. Notes Comput. Sci. Vol. 4105 (2006) 338–345. [87] A.A. Salah, H. Çinar, L. Akarun, B. Sankur, Robust Facial Landmarking for Registration, Annals of Telecommunications 62 (1-2) (2007) 1608–1633 http:// www.psu.edu. [88] P. Sang-Jun, S. Dong-Won, 3D face recognition based on feature detection using active shape models, International Conference on Control, Automation and Systems - IEEE Conferences, 2008, pp. 1881–1886. [89] N. Sebe, M.S. Lew, Y. Sun, I. Cohen, T. Gevers, T.S. Huang, Authentic facial expression analysis, Image Vis. Comput. 25 (2007) 1856–1863. [90] J. Shi, A. Samal, D. Marx, How effective are landmarks and their geometry for face recognition, Computer Vision and Image Understanding, 102, Elsevier, 2006, pp. 117–133.

712

E. Vezzetti, F. Marcolin / Image and Vision Computing 30 (2012) 698–712

[91] H. Shin, K. Sohn, 3D Face Recognition with Geometrically Localized Surface Shape Indexes, 9th International Conference on Control, Automation, Robotics and Vision - IEEE Conferences, 2006, pp. 1–6. [92] H. Song, U. Yang, S. Lee, K. Sohn, 3D Face Recognition Based on Facial Shape Indexes with Dynamic Programming, Lecture Notes in Computer Science, 3832, Springer-Verlag, 2005, pp. 99–105. [93] H. Soyel, H. Demirel, Facial Expression Recognition Using 3D Facial Feature Distances, Lecture Notes in Computer Science, Vol. 4633, Springer, 2007, pp. 831–838. [94] H. Soyel, H. Demirel, 3D Facial Expression Recognition with Geometrically Localized Facial Features, 23rd International Symposium on Computer and Information Sciences, 2008, pp. 1–4. [95] H. Soyel, H. Demirel, Optimal Feature Selection for 3D Facial Expression Recognition with Geometrically Localized Facial Features, Fifth International Conference on Soft Computing, Computing With Words and Perceptions in System Analysis, Decision and Control, 2009, pp. 1–4. [96] Y. Sun, M. Reale, L. Yin, Recognizing Partial Facial Action Units Based on 3D Dynamic Range Data for Facial Expression Recognition, 8th IEEE International Conference on Automatic Face & Gesture Recognition, 2008, pp. 1–8. [97] B. Takács, H. Wechsler, Detection of Faces and Facial Landmarks Using Iconic Filter Banks, Pattern Recognition, Vol. 30, No. 10, Elsevier, 1997, pp. 1623–1636. [98] B. Takács, H. Wechsler, A Dynamic and Multiresolution Model of Visual Attention and Its Application to Facial Landmark Detection, Computer Vision and Image Understanding, Vol. 80, issue 1, Elsevier, 1998, pp. 63–73. [99] H. Tang, T.S. Huang, 3D Facial Expression Recognition Based on Properties of Line Segments Connecting Facial Feature Points, 8th IEEE International Conference on Automatic Face & Gesture Recognition, 2008, pp. 1–6. [100] H. Tang, T.S. Huang, 3D Facial Expression Recognition Based on Automatically Selected Features, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, 2008, pp. 1–8. [101] Y. Tian, T. Kanade, J.F. Cohn, Facial Expression Analysis, chapter of Handbook of Face Recognition, Springer, 2005, pp. 247–275. [102] J. Wang, L. Yin, X. Wei, Y. Sun, 3D Facial Expression Recognition Based on Primitive Surface Feature Distribution, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, pp. 1399–1406.

[103] J. Wang, L. Yin, Static topographic modeling for facial expression recognition and analysis, Comput. Vis. Image Underst. 108 (2007) 19–34. [104] L. Wang, Y. Zhang, J. Feng, On the Euclidean Distance of Images, IEEE Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1–6. [105] Y. Wang, H. Ai, B. Wu, C. Huang, Real Time Facial Expression Recognition with Adaboost, Proceedings of the 17th International Conference on Pattern Recognition, Vol. 3, 2004, pp. 926–929. [106] Y. Wang, C.S. Chua, Y.K. Ho, Facial feature detection and face recognition from 2D and 3D images, Pattern Recognit. Lett. (2002) 1191–1202. [107] P.L. Worthington, E.R. Hancock, Structural Object Recognition using Shapefrom-Shading, 15th International Conference on Pattern Recognition - IEEE Conferences, Vol. 1, 2000, pp. 738–741. [108] S. Wörz, K. Rohr, “Localization of anatomical point landmarks in 3D medical images by fitting 3D parametric intensity models”, Medical Image Analysis, Science Direct 10 (1) (2006) 41–58. [109] C. Xu, Y. Wang, T. Tan, L. Quan, Automatic 3D Face Recognition Combining Global Geometric Features with Local Shape Variation Information, http://www.psu.edu 2004. [110] C. Xu, T. Tan, Y. Wang, L. Quan, Combining local features for robust nose location in 3D facial data, Pattern Recognition Letters, 27, Elsevier, 2006, pp. 1487–1494. [111] K.C. Yow, R. Cipolla, Enhancing Human Face Detection using Motion and Active Contours, Lecture Notes in Computer Science, Vol. 1351, Springer, 1997, pp. 515–522. [112] G. Zhang, Y. Wang, Faceprint: Fusion of Local features for 3D Face Recognition, Lect. Notes Comput. Sci. (2009) 394–403. [113] X. Zhao, E. Dellandréa, L. Chen, D. Samaras, AU Recognition on 3D Faces Based on an Extended Statistical Facial Feature Model, Fourth IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS), 2010, pp. 1–6. [114] W. Zheng, X. Zhou, C. Zou, L. Zhao, Facial expression recognition using kernel canonical correlation analysis (KCCA), IEEE Trans. Neural Netw. 17 (1) (2006) 233–238.