Surface geodesic pattern for 3D deformable texture matching

Surface geodesic pattern for 3D deformable texture matching

Author’s Accepted Manuscript Surface Geodesic Pattern for 3D Deformable Texture Matching Farshid Hajati, Ali Cheraghian, Soheila Gheisari, Yongsheng G...

2MB Sizes 2 Downloads 97 Views

Author’s Accepted Manuscript Surface Geodesic Pattern for 3D Deformable Texture Matching Farshid Hajati, Ali Cheraghian, Soheila Gheisari, Yongsheng Gao, Ajmal S. Mian www.elsevier.com/locate/pr

PII: DOI: Reference:

S0031-3203(16)30232-1 http://dx.doi.org/10.1016/j.patcog.2016.08.019 PR5848

To appear in: Pattern Recognition Received date: 28 February 2016 Revised date: 11 July 2016 Accepted date: 18 August 2016 Cite this article as: Farshid Hajati, Ali Cheraghian, Soheila Gheisari, Yongsheng Gao and Ajmal S. Mian, Surface Geodesic Pattern for 3D Deformable Texture Matching, Pattern Recognition, http://dx.doi.org/10.1016/j.patcog.2016.08.019 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Surface Geodesic Pattern for 3D Deformable Texture Matching Farshid Hajatia,b,∗, Ali Cheraghianc , Soheila Gheisarid , Yongsheng Gaob , Ajmal S. Miane a

Electrical Engineering Department, Tafresh University, Tafresh, Iran b School of Engineering, Griffith University, QLD 4111, Australia c Electrical Engineering Department, Amirkabir University of Technology, Tehran, Iran d School of Software, University of Technology Sydney, NSW 2007, Australia e School of Computer Science and Software Engineering, The University of Western Australia, WA 6009, Australia

Abstract This paper presents a Surface Geodesic Pattern (SGP) representation for matching textured 3D deformable surfaces. SGP encodes the local variations of the surface texture derivatives to extract local information from distinctive textural relationships contained in a geodesic neighbourhood. Thus, SGP derives it’s strength from the fusion of surface texture and shape information at the data level in a way that is invariant to non-rigid deformations. We also propose Gabor Topography Wavelet (GTW) for direct feature extraction from the range data. Both features are combined using a multi-view sparse representation to achieve higher discrimination capability while matching non-rigid 3D surfaces. The performance of the proposed method is evaluated extensively on the Bosphorus face database, the FRGC v2 face database, ∗

The corresponding author Email addresses: [email protected] (Farshid Hajati), [email protected] (Farshid Hajati), ali [email protected] (Ali Cheraghian), [email protected] (Soheila Gheisari), [email protected] (Yongsheng Gao), [email protected] (Ajmal S. Mian) Preprint submitted to Pattern Recognition

August 19, 2016

and the PolyU contact-free hand database and the results are compared to state-of-the-art methods. Experimental results show the effectiveness and superiority of the proposed method in recognizing objects under non-rigid surface deformations. Keywords: 3D non-rigid object recognition, geodesic derivatives, deformable surface matching. 1. Introduction Deformable 3D surface matching has many applications in computer vision and computer graphics including non-rigid object registration [1], surface classification [2], statistical shape analysis [3], expression invariant face recognition, and shape blending [4]. Deformation is a change in the shape of a surface due to an applied force. Deformable surface matching is a challenging problem because non-rigid deformations change the intrinsic geometry as well as the appearance of the surface texture. Algorithms that rely only on the appearance of surface texture are unable to match deformable surfaces. The use of 3D shape information can improve the performance of non-rigid surface matching. This is especially useful for the non-elastic deformations, where the geodesic distances between points on the surface remain invariant. Nowadays, the use of 3D information in object recognition algorithms has increased due to the improvement in 3D scanners. 3D images used in surface matching can be divided into two categories. The first category consists of single-view range images captured by commercial 3D scanners. A singleview range image has the location of the surface points in 3D space defined as a single range value (z direction) on a uniform x − y grid [5]. It can also be represented as an xyz point-cloud of the partial surface. The second 2

Table 1: Categorization of the existing 3D surface matching methods

Rigid Methods

Non-Rigid Methods

Shape-Based Methods Ocegueda et al. [6] Passalis et al. [7] Efraty et al. [8] Smeets et al. [9] Faltemier et al. [10] Frome et al. [11] Huang et al. [12] Li et al. [13] Yu et al. [55] Elad et al. [19] Bronstein et al. [20] Berretti et al. [21] Bronstein et al. [22]

Multimodal Methods Mian et al. [14] Hajati et al. [15] Zhang et al. [16] Li et al. [17] Kanhangad et al. [18]

N/A

category consists of 3D models obtained after merging several point-clouds from different viewing angles. Static 3D sensors can practically capture only a partial view of a surface in a single scan. In non-rigid 3D surface matching, deformation invariant representation is of particular importance as rigid surface matching systems are sensitive to surface deformations. Different approaches have been proposed for 3D deformation-invariant surface matching. However, these methods rely solely on shape information without considering the distinctive textural pattern on the 3D shape (see Table 1). How to effectively develop multimodal techniques for recognizing non-rigid 3D surfaces remains an open research problem. In this paper, we present a novel Surface Geodesic Pattern (SGP) representation that uses both texture and shape data for deformable 3D surface matching. The proposed geodesic path regulated texture coding can fuse texture and shape information at the data level to tackle the problems of surface deformations. Instead of using the complete 3D images, SGP utilizes micro-patterns of the surface which are more robust under different surface variations. Recently, multi-view based methods have gained much interest in machine vision applications [58]. These views may be obtained either from different sources or from different feature descriptors. Since, the information from a particular view cannot describe objects completely, developing multi-view

3

learning algorithms has become popular. Xu et al. [59] extended the theory of the information bottleneck and presented an algorithm to learn objects using multi-view features. Their idea was to model a multi-view learning problem by a multiple sender communication system. They strengthened the discrimination of the encoder further through a marginal maximization algorithm and multi-view features. Xu et al. [60] proposed a learning algorithm to recover intact representations for multi-view examples. The Iteratively Reweight Residuals (IRR) algorithm with fast convergence was presented to solve the optimization problem. They proposed a new definition of multi-view stability and showed that the complementarity between various views was useful for the stability and generalization. Yu et al. [56] proposed a new multimodal sparse coding method based on hypergraph learning to predict click of images. A hypergraph was used to build a group of manifolds which explored the complementary characteristics of different features through a group of weights. An alternating optimization procedure was applied to obtain the weights of different modalities and sparse codes. Yu et al. [57] proposed a High-order Distance-based Multi-view Stochastic Learning (HD-MSL) method for image classification which was performed on a probabilistic framework combining different views with labeling information. The algorithm adopted a distance obtained from the hypergraph to estimate the probability distribution. In this paper, following the multi-view strategy, Gabor Topography Wavelet (GTW) is proposed to extract distinctive shape features directly from range data and subsequently integrated with the Surface Geodesic Patterns (SGP) using a multi-view sparse fusion approach. The main contributions of the proposed method are:

4

• The proposed method uses local geodesic micro-patterns instead of the complete 3D surface. The proposed coding scheme fuses 2D texture and 3D shape information at the data level and is invariant to surface deformations. • A multi-view feature-level sparse fusion method is proposed to combine texture and surface information into a robust feature that is invariant to deformations of non-rigid surfaces. 2. Related Work Existing 3D surface recognition techniques can be broadly classified into two categories according to the type of data they use: purely shape-based and multimodal methods. Shape-based methods extract discriminative features directly from surface shape [11], [23]. Hilaga et al. [24] explored intergeodesic distances between points on a surface to match non-rigid surfaces. They defined a scalar function on the surface namely distribution function which was computed by integrating the geodesic distances from the given point to the rest of the points on the surface. The skeletal structure of the scalar function was constructed and analysed via Multidimensional Reeb Graphs (MRG) [25]. The distance between MRGs was used as the dissimilarity metric. Sundar et al. [26] compared 3D surfaces by encoding the geometric and topological information in a skeletal graph and used graph matching techniques to match the skeletons. Osada et al. [27] proposed a method for computing shape signatures for 3D polygonal models. They represented the signature of an object’s surface as a shape distribution sampled from a shape function measuring global geometric properties of the object’s surface. Frome et al. [11] proposed two regional shape descriptors of 3D 5

shape contexts and harmonic shape contexts, and evaluated their performance on recognizing vehicles in range images of scenes using a database of 56 cars. Yu et al. [55] proposed the Sparse Patch Alignment Framework (SPAF) algorithm to learn manifolds. They used an optimization strategy to construct local patches of the manifolds based on the patch alignment framework and adopted a sparse representation method to select a few neighbors of each data point spanning a low-dimensional affine subspace around that point. Finally, the complete alignment strategy was used to construct the manifold. Multimodal methods use both texture and shape information of the 3D surface for matching and hence have higher performance compared to methods which use shape data alone [28]. Mian et al. [14] used a fully automatic multimodal face recognition algorithm to perform hybrid matching. They used a 3D Spherical Face Representation (SFR) and the Scale-Invariant Feature Transform (SIFT) [29] descriptor to form a rejection classifier that was combined with a region-based ICP algorithm. The results of all matching engines were fused at the metric level to achieve higher accuracy. They reported 99.7% and 98.3% verification rates at a 0.001 FAR (False Acceptance Rate) and 99% and 95.4% identification rates for neutral versus neutral and non-neutral versus neutral expression experiments respectively, on the FRGC v2 database [30]. Hajati et al. [15] proposed Patch Geodesic Distance (PGD) for expression and pose invariant 3D face recognition. In PGD, 3D face images were partitioned into equal-sized square patches in a nonoverlapping manner. Local geodesic paths within patches and their global geodesic paths were combined to encode the shape adjusted textures into feature descriptors. They achieved rank-1 recognition rates of 84.8% and

6

69.1% on the BU-3DFE [31] and the Bosphorus [32] databases, respectively. Zhang et al. [16] proposed a multimodal method for palmprint verification. They extracted surface curvature features from the shape, while Gabor features were used for texture representation. An Equal Error Rate (EER) of 0.0022% on a database of 108 subjects was achieved. Li et al. [17] extracted principal line features and palm shape features to align the palmprints. They achieved an EER of 0.025% on the PolyU 3D palmprint database which contains 8000 samples. Kanhangad et al. [18] introduced two representations for contactless hand verification: finger surface curvature and unit normal vector. They also introduced a new feature representation called SurfaceCode for 3D palmprint which resulted in a significant reduction in the template size. They achieved an Equal Error Rate (EER) of 0.56% on the PolyU contact-free hand database [18]. From the deformability viewpoint, existing 3D surface matching algorithms can be divided into rigid and non-rigid algorithms. Rigid algorithms utilize rigid transformations for feature extraction [9], [11], [13], [23], [33], [34], [35], [36]. Kazhdan [23] used rotation-invariant spherical harmonics as shape descriptors in 3D object recognition. Vranic [33] proposed depth buffer-based, silhouette based, and ray-based methods for similarity measurement between 3D objects. Laga et al. [34] applied spherical parameterization and geometry images for 3D object matching which was invariant to rotation and scaling. They compensated the rotations in 3D images using the principal axis of the object. Allen et al. [35] proposed a method for fitting highresolution template meshes to detailed human body range scans with sparse 3D markers. They formulated an optimization problem in which the degrees of freedom were an affine transformation at each template vertex. The ob-

7

jective function was a weighted combination of three measures: proximity of transformed vertices to the range data, similarity between neighboring transformations, and proximity of sparse markers at corresponding locations on the template and target surface. Faltemier et al. [10] introduced a 3D face recognition method by fusing scores from 28 independently matched small regions. They reported a rank-1 recognition rate of 97.2% and verification rate of 93.2% at 0.1% FAR on the FRGC v2 database [30]. Ocegueda et al. [6] presented a Markov Random Field (MRF) model to analyse 3D meshes and determine both discriminative and non-discriminative vertices. Efraty et al. [8] proposed a silhouetted face profile framework for 3D face recognition. 3D data of subjects were used to create profiles under different rotations and train a classifier using the extracted features. Smeets et al. [9] proposed meshSIFT features for expression-invariant face recognition. In meshSIFT, salient points on the 3D facial surface are detected as mean curvature extrema in scale space and orientations are assigned to each. Then, the neighborhood of each salient point is described as a feature vector consisting of concatenated histograms of shape indices and slant angles. Finally, the feature vectors of two 3D facial surfaces are matched by comparing the angles in the feature space. They achieved rank-1 recognition rates of 93.7% and 89.6% for the Bosphorus [32] and the FRGC v2 [30] databases, respectively. They also demonstrated that symmetric meshSIFT descriptors had acceptable performance in partial 3D data matching. Passalis et al. [7] proposed a pose-invariant 3D face recognition approach using facial symmetry and automatic landmark detection. They fitted an annotated face model to the input face scan to overcome the challenges of missing data. An average rank-1 recognition rate of 83.7% was reported on datasets

8

of the University of Notre Dame and the University of Houston containing pose variations. Huang et al. [37] proposed Multi-Scale Local Binary Pattern (MS-LBP) for 3D facial surface representation. They achieved a rank-1 recognition rate of 96.1% on the FRGC v2 database [30]. In [12], a geometric representation for 3D range images based on Multi-Scale Extended Local Binary Patterns (MS-ELBP) achieved a rank-1 recognition rate of 97.2% and a verification rate of 98.4% at a 0.001 FAR respectively on the FRGC v2 database [30]. Li et al. [13] proposed a mesh-based approach for 3D face recognition using local shape descriptor and a SIFT-like matching process. They employed both the maximum and the minimum curvatures estimated in the 3D Gaussian scale space to detect salient points. To comprehensively characterize 3D facial surfaces and their variations, they used weighted statistical distributions of multiple order surface differential quantities, including histogram of mesh gradient (HoG), histogram of shape index (HoS), and histogram of gradient of shape index (HoGS) within a local neighborhood of each salient point. The above rigid object recognition algorithms, however, are inherently sensitive to surface deformations. Since most of the objects observed in the real world are non-rigid, non-rigid shape representation has gained more attention in computer vision [19], [20], [21], [38]. Elad and Kimmel [19] presented a method to construct a bending invariant signature for surfaces based on the geodesic distances between points. They measured the inter-geodesic distances between uniformly distributed points on the surface. Then, a MultiDimensional Scaling (MDS) technique was applied to extract coordinates in a finite dimensional Euclidean space in which geodesic distances are replaced by Euclidean ones for matching. Bronstein et al. [20] proposed an

9

Table 2: SGP’s important notations and their descriptions

Notations n Ir,q (z) r q Wp a and b x and y F (·) Z0 (x0 , y0 ) SGP (Z0 )

Descriptions nth-order GID at point Z using the rth geodesic ring and qth neighboring point rth geodesic ring qth neighboring point Patch size Patch labels Patch coordinates Spiral derivative function Geodesic ring’s center point Surface Geodesic Pattern at point Z0

expression-invariant representation of faces called Canonical Image Representation which modelled deformations resulted from facial expressions. The canonical images were built by calculating the geodesic distances between the points of the facial surface. Their experimental results demonstrated that a smaller embedding error leads to better recognition. Berretti et al. [21] encoded the geometric information of the 3D face into a compact representation in the form of a graph where the nodes were represented with equal-width iso-geodesic stripes. They achieved recognition rates of 99% and 94.1% on the SHREC’08 and the FRGC v2 databases, respectively. Litman et al. [39] proposed a diffusion-geometric framework for volumetric stable component detection and description in deformable shapes. They evaluated their method on the SHREC’11 [40] database and SCAPE [41] human body database and demonstrated high discriminability of the proposed volumetric features. Bronstein et al. [22] proposed a method for computing the similarity of non-rigid shapes based on the distributions of diffusion distances defined with explicit invariance properties. 3. Surface Geodesic Pattern (SGP) In this section, we propose a new Surface Geodesic Pattern (SGP) representation that directly encodes deformation invariant texture patterns of a surface in the 3D space. The important notations used in this section are summarized in Table 2. For a textured surface I(Z(x, y)) in 3D space, Z0 (x0 , y0 ) is 10

Figure 1: Example geodesic rings (R = 3 and N = 4) around Z0 on a 3D facial surface.

the centre point of a set of geodesic rings, and Zr,q (x, y), r = 1, . . . , R; q = 0, . . . , N − 1, is the qth neighboring point around Z0 (x0 , y0 ) on the rth ring (see Figure 1). 3D texture information is carried by the changes in magnitude of I(Z) among points Z0 and Zr,q , which are spatially distributed on the geodesic rings of the surface. We represent the texture variation of the surface using a novel Geodesic Inner-Derivative (GID) function. GID can be computed in different orders encoding the relative changes of texture variations within the neighborhood. GID measures concavity and convexity of the texture variations using the product of the texture derivatives along a spiral path within the neighborhood.  (Z0 ), is The first-order Geodesic Inner-Derivative (GID) at Z = Z0 , Ir,q

defined as I  r,q (Z0 ) = (Ir,q (Z0 ) − Ir−1,(q−1) mod N (Z0 ))· (Ir+1,(q+1) mod N (Z0 ) − Ir,q mod N (Z0 ))

(1)

; r = 1, ..., R & q = 0, ..., N − 1 where Ir,q (Z0 ) denotes the texture value of the neighboring point Zr,q around the center point Z0 , and mod is the remainder operator.  (Z0 ), is The second-order Geodesic Inner-Derivative (GID) at Z = Z0 , Ir,q

11

defined as    Ir,q (Z0 ) =(Ir,q (Z0 ) − Ir−1,(q−1) mod N (Z0 ))·   (Ir+1,(q+1) mod N (Z0 ) − Ir,q mod N (Z0 ))

(2)

; r = 1, ..., R & q = 0, ..., N − 1  where Ir,q (Z0 ) is the first-order Geodesic Inner-Derivative (GID) computed

using Equation (1). In general, the nth -order Geodesic Inner-Derivative (GID) at Z = Z0 can be defined as (n−1)

(n) (n−1) Ir,q (Z0 ) = F (Ir,q (Z0 )) · F (Ir+1,q+1 (Z0 ))

(3)

; r = 1, ..., R & q = 0, ..., N − 1 where F (·) is an spiral derivative function defined as  (m)  (m) (m) F Ir,q (Z0 ) = Ir,q (Z0 ) − Ir−1,(q−1) mod N (Z0 )

(4)

After computing the nth -order Geodesic Inner-Derivative (GID) at Z = Z0 , we encode the computed inner-derivatives of the point Z = Z0 into a decimal value called Surface Geodesic Pattern (SGP) using the unit step function. Essentially, we are only encoding the direction of inner-derivative which is more robust to illumination changes compared to the derivative value itself. We represent a surface texture value at Z = Z0 with a nth -order

12

(b)

(a)

(c)

(d)

Figure 2: Surface Geodesic Pattern (SGP). (a) Original 3D image, (b) first-order SGP, (c) second-order SGP, (d) third-order SGP.

Surface Geodesic Pattern (SGP) as SGPr(n)

(Z0 ) =

N −1 

(n)

2k u(Ir,k (Z0 )) ; r = 1, ..., R

(5)

k=0 (n)

where Ir,k (Z0 ) is the nth -order Geodesic Inner-Derivative (GID) at Z = Z0 computed from the neighboring texture values located on the geodesic rings, and u(·) denotes the unit step function. The result of the SGP operator for an example 3D face is illustrated in Figure 2. It can be observed from Figure 2 that as the operator order increases, finer details are captured on the textured surface. The SGP operator is used for surface texture representation. The proposed method applies SGP operator on each point to extract discriminative features from its neighborhood. We model the distribution of Surface Geodesic Patterns (SGPs) by local histograms, due to their inherent robustness against variations in pose and illumination. For this purpose, we partition the SGP images into non-overlapping square patches and compute the histogram for each patch, independently. For a M × M SGP image and a patch size of Wp , the number of patches is (M/Wp )2 . Here, we label the patch points with a and b indexes defined as a = x/Wp  + 1 ; 13

0≤x
(6)

b = y/Wp  + 1 ;

0≤y
(7)

where a and b are integers ranging from 1 to M/Wp , and the symbol · denotes the floor function. Using a and b indexes, the ab-th patch of the Surface Geodesic Pattern (n)

SGPr,ab (Z), is represented as (n)

SGPr,ab (Z(x, y)) = SGPr(n) (Z(Wp (a − 1) + xab , Wp (b − 1) + yab ))

(8)

where xab and yab are the coordinates in the ab-th patch. We use the spatial histogram for each patch and concatenate the histograms to achieve a feature vector as (n)

HSGPr(n) (Z) = {H[SGPr,ab (Z)]|a = 1, ..., M/Wp , b = 1, ..., M/Wp }

(9)

where H[·] denotes the spatial histogram operator. Since the proposed SGP is a geodesic path regulated texture pattern, it is intrinsically robust to surface deformations. This is also evidenced by the consistently greater discriminative capability of SGP compared to the original intensity data under various deformations. Figure 3 compares the SGP and intensity differences of an expressive face against faces with neutral expression in the Bosphorus database [32]. As can be seen, the difference between the average inter-class distance and the intra-class distance of SGP is nearly doubled to that of the original intensity data, showing a very strong discriminability of SGP in recognizing 3D objects under surface deformations.

14

Figure 3: Robustness to surface deformations on the Bosphorus database [32]. (a) The gallery set, (b) first column: images of the same probe with different facial expressions; second column: the intensity distance between the probe and the 105 gallery identities; third column: the SGP distance between the probe and the gallery identities. Ground truth is indicated by arrows. Discriminability is the distance between the average interclass distance and the intra-class distance.

15

Symbol Lh h wz zmax ω θ σ λ m n D d γ Vsp ·, · RV TM GT ϕ AAD GT W

Table 3: GTW’s notations and their definitions

Definition Level curve with the height index h Height index Step size of the level curve Maximum height in the surface Radial center frequency of a sinusoidal wave plane Anti-clockwise orientation of the sinusoidal Gaussian envelope Spatial width of the Gaussian envelope along x and y axes Scaling factor Number of frequencies Number of orientation Dictionary matrix Vector element of the dictionary matrix Balancing factor Sparse representation vector Inner product operator Residual vector over approximating V Topography Map Gabor Topography map Gabor kernel Average Absolute Deviation Gabor Topography Wavelet

4. Gabor Topography Wavelet and Sparse Fusion In addition to SGP features, we also extract pure shape features directly from the topography maps of the surface. These feature are then fused with the SGP features using a sparse multi-view representation. The distinctive characteristic of a topography map is that the shape of the surface is shown by level curves (contour lines). Level curves are lines that join points of the equal height on the surface above or below a reference point (usually the highest point of the surface). The important notations used in this section are summarized in Table 3. Given a surface Z(x, y), the level curve with the height index h, Lh , is defined as Lh = {Z(x, y)|zmax − hwz < Z(x, y) < zmax − (h − 1)wz } ; h = 1, 2, ..., zmax /wz  + 1

(10)

where zmax is the maximum height (reference) point in the surface, wz is the step size of the level curve, h is the height index of the level curve, and · denotes the floor function. 16

(b)

(a)

(c)

Figure 4: Gabor topography map. (a) original 3D image, (b) topography map, (c) Gabor topography map.

Using the level curves Lh , h = 1, 2, ..., zmax /wz  + 1, the topography map T M(Z(x, y)) can be defined as ⎧ ⎨ 0, T M (Z (x, y)) = ⎩ 1,

if

Z(x, y) ∈ Lh

if

Z(x, y) ∈ / Lh

(11)

After extracting the topography map of the given surface, Gabor wavelets [42] are applied to create the Gabor topography map, GTω,θ (Z(x, y)), by convolving the topography map with the Gabor kernel, ϕω,θ (Z(x, y)), as GTω,θ (Z (x, y)) = T M(Z (x, y)) ∗ ϕω,θ (Z (x, y))

ϕω,θ (Z(x,y)) =

(12)

(13)

sin θ+y cos θ)2 ] ω 2 ω2 [(x cos θ+y sin θ)22+(−x 4π σ 2 e · ejω(x cos θ+y sin θ) 4π 3 σ 2

where ω is the radial center frequency of a sinusoidal wave plane, θ is the anti-clockwise orientation of the Gaussian envelope of the sinusoid, and σ is the spatial width of the Gaussian envelope along the x and y axes. Figure 4 illustrates an example Gabor topography map. The Gabor topography maps are obtained by a filter bank of Gabor functions consisting of a number of scales and orientations. To select the ap17

propriate discrete frequencies that define the scales, we use the following exponential sampling [43]: ωk =

ω ; k = 0, . . . , m − 1 λk

(14)

where ω k is the k-th frequency, λ > 1 is the frequency scaling factor, and m is the number of frequencies. Rotation angles, θ, is uniformly spaced as θl =

2πl ; l = 0, . . . , n − 1 n

(15)

where n is the number of orientations. The Gabor topography maps are partitioned into equal-sized WGT W × WGT W patches as GTωabk ,θl (Z(xab , yab )) =

(16)

GTωk ,θl (Z(WGT W (a − 1) + xab , WGT W (b − 1) + yab )) where xab and yab are the coordinates in the ab-th patch, and WGT W is the patch size. After partitioning, we calculate the Average Absolute Deviation (AAD) from the mean of the patch’s gray levels for all patches. For the ab-th patch of the Gabor topography map, the Average Absolute Deviation (AAD) from

18

Figure 5: The Gabor topography maps and the Average Absolute Deviations for a sample topography map.

the mean of the gray levels is defined as AADωabk ,θl (Z) = 1 2 WGT W

(17) (

W GT W xab ,yab =0

|GTωabk ,θl (Z) − GTωabk ,θl (Z)|)

where GTωabk ,θl (Z) is the ab-th patch of the Gabor topography map defined in 2 ab (16). WGT W is the number of pixels in the ab-th patch and GTω k ,θ l (Z) is the

mean of the ab-th patch’s gray levels. By concatenating the AADs, we build a feature vector called Gabor Topography Wavelet (GTW) for the given subject as GT W (Z) = {AADωabk ,θl (Z)|a = 1, ..., M/WGT W ,

(18)

b = 1, ..., M/WGT W , k = 0, ..., m − 1, l = 0, ..., n − 1} where WGT W is the patch size. Figure 5 illustrates the Gabor topography maps and the Average Absolute Deviations for a sample topography map. Once the GTW feature vectors are computed, we fuse them with the SGP 19

feature vectors using the sparse multi-view representation method. For this purpose, we define a dictionary matrix which models input 3D images using a training set. Suppose there are S sample images per subject in the training set. To make the columns of the dictionary matrix, we concatenate the SGP and the GTW vectors of the training samples using a balancing factor. The column of the dictionary matrix for the s-th sample of the t-th subject, ds,t , is defined as ds,t =[HSGPr(n)(Z)

γ · GT W (Z)]T

(19)

; s = 1, ..., S & t = 1, ..., T where γ is a scalar balancing factor which is determined experimentally. By repeating the above process for all subjects in the training set, we obtain the desired dictionary matrix for the sparse fusion. In this way, the dictionary matrix contains S × T columns, called atoms, as D = [d1,1 d1,2 · · · d1,T · · · dS,T ]

(20)

Once the dictionary matrix D is obtained, we compute the sparse representation of each subject in the gallery and probe sets as a sparse linear combination of the dictionary atoms as V = DVsp

(21)

where V is the concatenation of SGP and GTW feature vectors of the subject, and Vsp is the sparse representation vector. Sparsity implies that only a few components of Vsp are non-zero and the rest are zero. This suggests that V 20

can be decomposed as a linear combination of only a few column vectors in D. Such vectors are called the basis of V · D which is over-complete (there are fewer rows than columns). Given a feature vector V and the dictionary D, the sparse decomposition problem aims to find the representation of V as a linear combination of a few dictionary elements (columns). The sparse decomposition equation is defined as min Vsp 0 Vsp

s.t. V = DVsp

(22)

where Vsp 0 is the pseudo-norm (l0 norm) which counts the number of nonzero components of Vsp . Because this problem is NP-Hard [61], we use a convex relaxation of the problem by taking the l1 norm instead of l0 norm as Vsp 1 =



Vspi

(23)

i

One solutions of (21) is the Basis Matching Pursuit (BMP) algorithm [44]. Assuming d0 ∈ D, the vector V can be decomposed into V = V, d0 · d0 + RV

(24)

where the symbol ·, · denotes the inner product operator and RV is the residual vector over approximating V in the direction of d0 . Since d0 is orthogonal to RV , we rewrite Equation (22) as ||V||2 = | V, d0 |2 + ||RV ||2

21

(25)

In order to minimize ||RV ||, we choose d0 such that | V, d0 | is the maximum. The BMP algorithm decomposes the residue RV iteratively by projecting RV on a column vector of D with the best match of RV . The procedure is repeated until the residue becomes smaller than a predefined threshold. 5. Experimental Results To evaluate the effectiveness of the proposed approach, experiments are conducted on the Bosphorus [32] and FRGC v2 [30] 3D face databases for face recognition under facial expressions. The PolyU contact-free hand database [18] is used for 3D hand identification and verification. Since the palmprints have been captured in a contact free fashion, there are some deformations of the hand in the database. The Bosphorus database [32] contains face texture (RGB) and range (depth) data for 105 persons, including 4666 pairs of faces with different expressions, poses, and occlusions. The range images have a resolution of 0.3mm, 0.3mm, and 0.4mm in x, y, and z (depth) dimensions, respectively, and the color texture maps have a resolution of 1600×1200 pixels. To allow for a more direct comparison between the faces, they are normalized and aligned using 24 landmarks provided in the database. Spikes in the range maps are removed and holes are filled as in [14]. Then, faces are resampled on a uniform square grid at 1 mm resolution and cropped to the size of 120×120 in the way that the distance of the nose tip is 60 mm from the sides and 70 and 50 mm from the top and the bottom, respectively. The FRGC v2 database [30] contains 4950 face texture and range images. The database is divided into three sets, namely Spring2003 (943 scans 22

of 277 individuals), Fall2003 and Spring2004 (4,007 scans of 466 individuals). The range and texture images are both 480×640 with one-to-one pixel correspondence. Similar to the Bosphorus database, the FRGC v2 face images are normalized and cropped to the size of 120×120. The PolyU contact-free hand database [18] includes 3,540 hand data from 177 subjects aged 18-50 years, with multiple ethnic backgrounds. Each subject has contributed 5 pairs of hand range and texture images acquired simultaneously in the first session, followed by another 5 in the second session. During the image acquisition no constraints have been employed to confine the position and deformation of the hand. The resolution of the images (both range and texture) is 640×480 pixels. All hand scans have been cropped to 120×120 regions of interest. 5.1. Parameters In this section, we examine the parameters involved in the proposed methods and empirically show that SGP and GTW are robust to a wide range of parameter values. To determine the SGP’s parameters, the radius (r), the patch size (Wp ), and the order, an experimental investigation on the SGP’s recognition accuracy is conducted with different values of the SGP’s order ∈ {1, 2, 3}, the SGP’s radius (from 2 to 10 mm) and the SGP’s patch size (from 3 to 120) using a training set selected from the Bosphorus database [32]. To build the training set, one frontal face per subject is considered as the gallery and one of the remaining faces (rotated/expressive) is selected randomly for each subject as the probe. The results are displayed in Figure 6. The recognition accuracy increases rapidly when SGP’s patch size decreases from 120 to 15. It remains flat at high level till 6, then drops gracefully. The accuracy decreases when the SGP’s order increases from 1 to 3. The best performance is achieved when the SGP’s radius, the SGP’s patch size, and 23

the SGP’s order are 6, 10, and 1, respectively. 99.1%

96.2% 90

90 100

80

80

70

60

60

Recognition Rate (%)

Recognition Rate (%)

100

50

40

40 20 30 0 120 60 30 24 15 12 10 6 4 3 Patch Size

2

3

5

4

6

7

8

9

10

80 70

80

60 60

50

40

40 30

20

20

0 120 60 30 24 15 12 10 6 4 3 Patch Size

20

SGP Radius

(a)

2

3

4

5

6

7

8

9

10

10

SGP Radius

(c)

(b)

Figure 6: Sensitivity of Surface Geodesic Pattern (SGP) to different radii, patch sizes, and the orders. (a) first-order SGP, (b) second-order SGP, (c) third-order SGP. The best performance in each SGP’s order is marked on the surface of recognition rate. 100 Patch Size: 3 Recognition Rate (%)

80

Patch Size: 4 Patch Size: 6 Patch Size: 10

60

Patch Size: 12 Patch Size: 15 40

Patch Size: 24

20

Patch Size: 60

Patch Size: 30 Patch Size: 120 0 30

15

12

6

4

2.2 2.8 GTW’s Step Size

1.9

1.6

1.3

1

Figure 7: Sensitivity of Gabor Topography Wavelet (GTW) to different step sizes and patch sizes.

Figure 7 illustrates the sensitivity of the GTW to the step size (wz ) and the patch size (WGT W ). The recognition rate increases as wz decreases from 30 to 4 and then remains flat till 1. On the other hand, the recognition rate increases when WGT W reduces from 120 to 10 and then decreases gradually when WGT W reduces from 10 to 3. In the remaining experiments, we set wz =1.3, WGT W =10, r=6, Wp =10, and order=2. 5.2. Face identification under varying pose and expressions in the Bosphorus database The Bosphorus database [32] has a rich repertoire of facial expressions i.e. up to 35 expressions per subject. To evaluate the performance of the proposed 24

method under expression and pose variations, our algorithm is tested against all faces in the Bosphorus database [32] excluding the profile and the occluded ones. Up to 45 degree pose variations are included in our test data. One neutral expression frontal face per subject is used as the gallery, while the rest are used as probe faces. The test data contains faces with emotions (happy, fear, disgust, anger, sadness, surprise), lower and upper face actions, combined face actions, and pose variations (Left 45◦ , Right 10◦ , Right 20◦ , Right 30◦ , Right 45◦ , Right Down, Right Up, Slightly Up, Strong Up, Slightly Down, Strong Down) (see Figure 8).

Figure 8: The images of a subject in the Bosphorus database [32] with different expressions and poses. Table 4: Rank-1 recognition rates (%) of the proposed method and the benchmarks on the Bosphorus database [32]. Gallery contains one image per identity in frontal pose in all cases. Method Rank-1 Recog. Rate (%) Baseline ICP [45]* 72.4 Baseline PCA [45]* 70.6 Alyuz et al. [46]* 95.3 Dibeklioglu et al. [47]* 89.2 Dibeklioglu et al. [47]* 62.6 Hajati et al. [15]* 69.1 Proposed Method 95.2 * Results obtained from published papers.

Gallery Identities 47 47 47 47 47 105 105 (Frontals)

No. of Probes 1508 (Expressions) 1508 (Expressions) 1508 (Expressions) 1527 (Frontals) 423 (Rotations) 1155 (Rotations) 3696 (Exp. + Rot.)

The rank-1 recognition rate of the proposed and the benchmark methods on the Bosphorus database [32] are listed in Table 4. As can be seen, the rank-1 recognition rate of the proposed method with 105 gallery identities and 3696 probes, including scans with expression and rotation changes, is 95.2%. However, the Baseline ICP [45] and the Baseline PCA [45] methods have the recognition rates of 72.4% and 70.6% respectively, using only 47 25

gallery identities and 1508 expressive probes. Alyuz et al. [46], Dibeklioglu et al. [47], and Dibeklioglu et al. [47] reported the rank-1 recognition rates of 95.3%, 89.2%, and 62.6% respectively, on the Bosphorus database [32] using 47 gallery images. This experiment shows that the proposed algorithm outperforms the benchmarks on the Bosphorus face database [32]. Alyuz et al. [46] achieved similar performance to our technique, using less than half of the database. 5.3. Face identification and verification under varying expressions in the FRGC v2 database We compare the proposed method with recent published methods that use FRGC v2 [30] as the testing database in their experiments. Comparison is performed on both identification and verification tasks. In line with the FRGC v2 test protocol, we use the Fall2003 and Spring2004 sets containing 4007 scans of 466 subjects for the identification experiment. The gallery set of 466 faces is formed using the first neutral scan of each subject or the first stored scan for the few subjects that have no neutral scan. Two probe sets are generated. The first probe set consists of 1944 neutral (expressionless) faces and the second set consists of 1597 expressive (having a non-neutral expression) faces. Figure 9 illustrates samples of the face images in the FRGC v2 database [30].

Figure 9: Example images of a person with different expressions in the FRGC v2 database [30].

26

Table 5: The rank-1 recognition rates (%) of the proposed method and the benchmarks in the face identification task on FRGC v2 database [30]. Neutral vs. Neutral Method Haar et al. [48]* Berretti et al. [21]* 96.1 Faltemier et al. [10]* Kakadiaris et al. [49]* Drira et al. [50]* 99.2 Mian et al. [14]* 99 Ocegueda et al. [6]* Smeets et al. [9]* Proposed Method 99.4 * Results obtained from published papers.

Non-Neutral vs. Neutral 91 96.8 95.4 95.7

All vs. Neutral 97 94.1 97.2 97 97.7 97 96.6 89.6 97.7

Recognition Rate (%)

100 99 98 97 Neutral vs. Neutral Non−Neutral vs. Neutral All vs. Neutral

96 95

1

2

3

4

6 5 M (Rank)

7

8

9

10

Figure 10: CMC curves of the proposed method on the FRGC v2 database [30].

The identification accuracies of the proposed method and the benchmark methods on the FRGC v2 database [30] are shown in Table 5 for three different scenarios: neutral versus neutral (466 scans as the gallery and 1944 scans as the probes), non-neutral versus neutral (466 scans as the gallery and 1597 scans as the probe), and all versus neutral (466 scans as the gallery and 3541 scans as the probe). In neutral versus neutral, the rank-1 recognition rate is 99.4% for the proposed method, while the best accuracy among the benchmarks is 99.2%. In non-neutral versus neutral, the proposed method achieved 95.7%, which is the second best accuracy. In all versus neutral, the proposed method has the best performance (97.7%) compared to the benchmarks. The Cumulative Match Characteristic (CMC) curves for the three scenarios are illustrated in Figure 10. The curves show the recognition rates (vertical axis) of the proposed method for identifying a ranked list of M (here M = 1, ..., 10) subjects (horizontal axes). The results show that the

27

Table 6: True Acceptance Rate (%) at 0.001 False Acceptance Rate of the proposed method and the benchmarks in the face verification task on FRGC v2 database [30].

1 0.98 0.96 0.94

Neutral vs. Neutral Non−Neutral vs. Neutral All vs. Neutral

0.92 0.9 −3 10

−1

−2

10 10 False Acceptance Rate (FAR)

Non-Neutral vs. Neutral 98.3 97.7 94.8 91.4 96.3

True Acceptance Rate (TAR)

True Acceptance Rate (TAR)

Neutral vs. Neutral Method Mian et al. [14]* 99.7 Wang et al. [51]* 99.2 Queirolo et al. [52]* 99.5 Berretti et al. [21]* 97.7 Proposed Method 99.7 * Results obtained from published papers.

0

10

All vs. Neutral 99.3 98.6 95.5 98.5

1 0.98 0.96 0.94 ROC II ROC III

0.92 0.9 −3 10

−2

−1

10 10 False Acceptance Rate (FAR)

(b)

(a)

Figure 11: ROC curves of the proposed method obtained from the FRGC v2 database. (a) ROC curves for neutral versus neutral, non-neutral versus neutral, and all versus neutral, (b) ROC II and ROC III curves.

proposed method maintains a high level of accuracy even on the faces with expressions, due to its robustness to surface deformations. In the verification experiment, we evaluate the proposed method using the FRGC v2 test protocols. First, we compute the True Acceptance Rate (TAR) at 0.001 False Acceptance Rate (FAR) in neutral versus neutral, non-neutral versus neutral, and all versus neutral scenarios. The results are tabulated in Table 6, and the Receiver Operating Characteristic (ROC) curves of the proposed method are illustrated in Figure 11(a). Then, we evaluate the proposed method using FRGC v2 database under ROC II and ROC III test protocols [10], [30]. ROC II considers both the target and the query from the collection of one academic year (two semesters). However, ROC III test protocol considers the target from one semester, while the query is from another semester. True Acceptance Rates (TAR) and

28

0

10

Table 7: Equal Error Rate (EER) and True Acceptance Rate (TAR) at 0.001 False Acceptance Rate (FAR) using ROC II and ROC III protocols defined in the FRGC v2 database. [30]. ROC II Method TAR(%) EER(%) Mian et al. [14]* 86.6 5 Faltemier et al. [10]* 93.2 3.2 Wang et al. [51]* 98.0 Kakadiaris et al. [49]* 97.2 Queirolo et al. [52]* 96.5 Berretti et al. [21]* 81.2 Spreeuwers [53]* 94.6 Smeets et al. [9]* 79 3.6 Drira et al. [50]* 93.9 Proposed Method 97.3 1.8 * Results obtained from published papers.

ROC III TAR(%) EER(%) 94.8 98.0 97.0 96.5 94.6 77.2 3.8 97.1 97.1 1.8

Equal Error Rates (EER) following ROC II and ROC III protocols of the FRGCv2 are shown in Table 7. In ROC II, the proposed method achieved 97.3% verification rate and 1.8% EER at 0.001 False Acceptance Rate (FAR) which are much better than most of the benchmarks. In ROC III, the True Acceptance Rate (TAR) and the Equal Error Rate (EER) are 97.1% and 1.8% respectively, for the proposed method. The ROC II and ROC III curves of the proposed method on the FRGC v2 database are illustrated in Figure 11(b). 5.4. 3D hand identification and verification in the PolyU contact-free database Another challenging surface matching task is in the case of contact-free hand identification and verification. The major motivation of the contactfree hand recognition is that the human hands are not rigid which can be a problem in contactless hand imaging. To evaluate the proposed method, we use the PolyU contact-free hand database [18]. Figure 12 illustrates 10 hand

Figure 12: Samples from the PolyU contact-free hand database [18]. The top row are scans from the first session and the bottom row are scans from the second session.

29

scans from five different subjects in the database. One scan per subject is from the first session and the other one is from the second session. As in the face recognition, we test both identification and verification scenarios. For the identification experiment, the first hand scan from each subject is selected to make a gallery set and all the remaining scans are used as probes. In Table 8, the rank-1 recognition rate, the Equal Error Rate (EER), and the True Acceptance Rate (TAR) at 0.001 False Acceptance Rate (FAR) of the proposed method and the benchmarks in the hand identification and verification tasks are reported. The rank-1 recognition rates are 94.2% and 99.2% for the benchmark and the proposed method, respectively. The True Acceptance Rate at 0.001 False Acceptance Rate is 98.6% which is much higher than that of the benchmark approaches (87.5%, 96.2%, 80%, and 88%), while the Equal Error Rate (EER) of the proposed method is close to the best benchmark results. Table 8: Rank-1 recognition rate, Equal Error Rate (EER), and True Acceptance Rate (TAR) at 0.001 False Acceptance Rate (FAR) of the proposed method and the benchmarks. Method

Kanhangad et al. [18] Kanhangad et al. [54] Proposed

Shape Shape + Texture Shape Shape + Texture Method

Identification Experiment Rank-1 Recognition Rate (%) 94.2

Verification Experiment EER (%)

TAR (%)

3.16

87.5

-

0.56

96.2

-

3.8

80.0

-

2.6

88.0

99.2

0.63

98.6

5.5. Computational complexity analysis The computational complexity of the proposed method depends on the radius of the Surface Geodesic Pattern (SGP) and the step size of the Gabor Topography Wavelet (GTW). Table 9 provides a comparison of computation times of the proposed method versus the complexity parameters using a Matlab implementation on an Intel Core i5 2.5 GHz machine with 6 GB 30

Table 9: Average identification time in seconds using the Bosphorus database [32]. Radius(r) 2 4 6 8

ωp = 1.3 14.57 15.01 22.05 30.60

Step size(ωp ) ωp = 6 ωp = 30 14.49 14.47 14.93 14.90 21.97 21.95 30.54 30.51

RAM. Table 9 shows that the computation time is least sensitive to the step size (ωp ). Higher variations in the computation time occur due to changes in the radius (r) where the time doubles when the radius increases by four times. 6. Conclusion Textured 3D object recognition is appealing in real-world applications because most 3D sensors capture shape and texture, simultaneously. However, surface deformations are inevitable in 3D data which significantly reduces the accuracy of the extracted features. This makes rigid 3D object representation and recognition techniques unsuitable for deformable objects. In this paper, we presented Surface Geodesic Pattern (SGP) representation which is robust to non-rigid surface deformations. We also proposed Gabor Topography Wavelet (GTW) features that are extracted directly from the topography map of the 3D shape alone. Finally, we proposed a multi-view algorithm to fuse GTW and SGP features using sparse representation. Experiments on three standard datasets and comparison with benchmarks show the efficacy of the proposed technique and its robustness to non-rigid deformations. References [1] B. Amberg, S. Romdhani, and T. Vetter, Optimal step non-Rigid ICP algorithms for surface registration, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1-8.

31

[2] S. Wang, Y. Wang, M. Jin, X. Gu, and D. Samaras, 3D surface matching and recognition using conformal geometry, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2453-2460. [3] V. Blanz and T. Vetter, Face recognition based on fitting a 3D morphable model, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (9) (2003) 1063-1074. [4] V. Kraevoy and A. Sheffer, Cross-parameterization and compatible remeshing of 3D models, ACM Transactions Graphics 23 (3) (2004) 861-869. [5] X. Lu, A.K. Jain, and D. Colbry, Matching 2.5D face scans to 3D models, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (1) (2006) 31-43. [6] O. Ocegueda, T. Fang, S.K. Shah, and I.A. Kakadiaris, 3D face discriminant analysis using gauss-markov posterior marginals, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (3) (2013) 728-739. [7] G. Passalis, P. Perakis, T. Theoharis, and I.A. Kakadiaris, Using facial symmetry to handle pose variations in real-world 3D face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (10) (2011) 1938-1951. [8] B. Efraty, E. Bilgazyev, S. Shah, and I.A. Kakadiaris, Profile-based 3D-aided face recognition, Pattern Recognition 45 (1) (2012) 43-53. [9] D. Smeets, J. Keustermans, D. Vandermeulen, and P. Suetens, MeshSIFT: local surface features for 3D face recognition under expression variations and partial data, Computer Vision and Image Understanding 117 (2) (2013) 158169. [10] T.C. Faltemier, K.W. Bowyer, and P.J. Flynn, A region ensemble for 3D face recognition, IEEE Transactions on Information Forensics and Security 3 (1) (2008) 62-73. [11] A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik, Recognizing objects in range data using regional point descriptors, in: Proceedings European Conference on Computer Vision, 2004, pp. 224-237. [12] D. Huang, M. Ardabilian, Y. Wang, and L. Chen, A novel geometric facial representation based on multi-scale extended local binary patterns, in: Proceedings IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, 2011, pp. 1-7. [13] H. Li, D. Huang, P. Lemaire, J.-M. Morvan, and L. Chen, Expression robust 3D face recognition via mesh-based histograms of multiple order surface differential quantities, in: Proceedings IEEE International Conference on Image Processing, 2011, pp. 3053-3056. [14] A.S. Mian, M. Bennamoun, and R. Owens, An efficient multimodal 2D-3D hybrid approach to automatic face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (11) (2007) 1927-1943. [15] F. Hajati, A.A. Raie, and Y. Gao, 2.5D face recognition using patched geodesic moments, Pattern Recognition 45 (3) (2012) 969-982. [16] D. Zhang, V. Kanhangad, N. Luo, and A. Kumar, Robust palmprint verification using 2D and 3D features, Pattern Recognition 43 (1) (2010) 358-368.

32

[17] W. Li, L. Zhang, D. Zhang, G. Lu, and J. Yan, Efficient joint 2D and 3D palmprint matching with alignment refinement, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 795-801. [18] V. Kanhangad, A. Kumar, and D. Zhang, A unified framework for contactless hand verification, IEEE Transactions on Information Forensics and Security 6 (3) (2011) 1014-1027. [19] A. Elad and R. Kimmel, On bending invariant signatures for surfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10) (2003) 1285-1295. [20] A.M. Bronstein, M.M. Bronstein, and R. Kimmel, Expression-invariant representations of faces, IEEE Transactions on Image Processing 16 (1) (2007) 188-197. [21] S. Berretti, A. Del Bimbo, and P. Pala, 3D face recognition using isogeodesic stripes, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (12) (2010) 2162-2177. [22] M.M. Bronstein and A.M. Bronstein, Shape recognition with spectral distances, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5) (2011) 1065-1071. [23] M. Kazhdan, Shape representations and algorithms for 3D model retrieval, PhD Dissertation, Princeton University, June 2004. [24] M. Hilaga, Y. Shinagawa, T. Kohmura, and T.L. Kunii, Topology matching for fully automatic similarity estimation of 3D shapes, in: Proceedings 28th Annual Conference on Computer Graphics and Interactive Techniques, 2001, pp. 203-212. [25] G. Reeb, On the singular points of a completely integrable pfaff form or of a numerical function, Comptes Rendus Academie Sciences Paris 222 (1) (1946) pp. 847-849. [26] H. Sundar, D. Silver, N. Gagvani, and S. Dickinson, Skeleton based shape matching and retrieval, in: Proceedings Shape Modeling International, 2003, pp. 130-139. [27] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, Matching 3D models with shape distributions, in: Proceedings International Conference on Shape Modeling and Applications, 2001, pp. 154-166. [28] K.I. Chang, K.W. Bowyer, and P.J. Flynn, Face recognition using 2D and 3D facial data, in: Proceedings ACM Workshop on Multimodal User Authentication, 2003, pp. 25-32. [29] D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings International Conference on Computer Vision, 1999, pp. 1150-1157. [30] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, Overview of the face recognition grand challenge, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 947-954. [31] L. Yin, X. Wei, Y. Sun, J. Wang, and M.J. Rosato, A 3D facial expression database for facial behavior research, in: Proceedings International Conference on Automatic Face and Gesture Recognition, 2006, pp. 211-216.

33

[32] A. Savran, N. Alyuz, H. Dibeklioglu, O. Celiktutan, B. Gokberk, B. Sankur, and L. Akarun, Bosphorus database for 3D face analysis, in: Proceedings First COST 2101 Workshop on Biometrics and Identity Management, 2008, pp. 47-56. [33] D.V. Vranic, 3D model retrieval, PhD Dissertation, Leipzig University, 2004. [34] H. Laga, H. Takahashi, and M. Nakajima, Geometry image matching for similarity estimation of 3D shapes, in: Proceedings Computer Graphics International, 2004, pp. 490-496. [35] B. Allen, B. Curless, and Z. Popovic, The space of human body shapes: reconstruction and parameterization from range scans, in: Proceedings ACM SIGGRAPH 22 (3) (2003) 587-594. [36] G. Barequet and M. Sharir, Partial surface and volume matching in three dimensions, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (9) (1997) 929-948. [37] D. Huang, G. Zhang, M. Ardabilian, Y. Wang, and L. Chen, 3D face recognition using distinctiveness enhanced facial representations and local feature hybrid matching, in: Proceedings IEEE International Conference on Biometrics: Theory Applications and Systems, 2010 pp. 1-7. [38] A.M. Bronstein, M.M. Bronstein, R. Kimmel, and A. Spira, Face recognition from facial surface metric, in: Proceedings European Conference on Computer Vision, 2004, pp. 225-237. [39] R. Litman, A.M. Bronstein, and M.M. Bronstein, SMI 2012: short stable volumetric features in deformable shapes, Computers and Graphics, 36 (5) (2012) 569-576. [40] E. Boyer, A.M. Bronstein, M.M. Bronstein, B. Bustos, T. Darom, R. Horaud, I. Hotz, Y. Keller, J. Keustermans, A. Kovnatsky, R. Litman, J. Reininghaus, I. Sipiran, D. Smeets, P. Suetens, D. Vandermeulen, A. Zaharescu, and V. Zobel, SHREC 2011: robust feature detection and description benchmark, in: Proceedings Eurographics Conference on 3D Object Retrieval, 2011, pp. 71-78. [41] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis, SCAPE: shape completion and animation of people, ACM Transactions Graphics 24 (3) (2005) 408-416. [42] J. Daugman, Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters, Journal of the Optical Society of America 2 (7) (1985) 1160-1169. [43] J. Daugman, Complete discrtete 2D gabor transformations by neural networks for image analysis and compression, IEEE Transactions on Acoustic, Speech, and Signal Processing 36 (7) (1988) 1169-1179. [44] S.S. Chen, D.L. Donoho, and M.A. Saunders, Atomic decomposition by basis pursuit, SIAM Review 43 (1) (2001) 129-159. [45] N. Alyuz, B. Gokberk, H. Dibeklioglu, A. Savran, A.A. Salah, L. Akarun, and B. Sankur, 3D face recognition benchmarks on the bosphorus database with focus on facial expressions. Biometrics and Identity Management, SpringerVerlag, Berlin, Heidelberg, 2008, pp. 57-66.

34

[46] N. Alyuz, B. Gokberk, and L. Akarun, A 3D face recognition system for expression and occlusion invariance, in: Proceedings IEEE Conference on Biometrics: Theory, Applications, and Systems, 2008, pp. 1-7. [47] H. Dibeklioglu, B. Gokberk, and L. Akarun, Nasal region-based 3D face recognition under pose and expression variations, in: Proceedings International Conference on Advances in Biometrics, 2009, pp. 309-318. [48] F.B. ter Haar and R.C. Veltkamp, Expression modeling for expressioninvariant face recognition, Computers and Graphics 34 (3) (2010) 231-241. [49] I.A. Kakadiaris, G. Passalis, G. Toderici, M.N. Murtuza, Y. Lu, N. Karampatziakis, and T. Theoharis, Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach, IEEE Transactions on Pattern Analysis Machine Intelligence 29 (4) (2007) 640-649. [50] H. Drira, B. Ben Amor, A. Srivastava, M. Daoudi, and R. Slama, 3D face recognition under expressions, occlusions and pose variations, IEEE Transactions on Pattern Analysis and Machine Intelligence 99 (9) (2013) 2270-2283. [51] Y. Wang, J. Liu, and X. Tang, Robust 3D face recognition by local shape difference boosting, IEEE Transactions Pattern on Analysis and Machine Intelligence 32 (10) (2010) 1858-1870. [52] C.C. Queirolo, L. Silva, O.R.P. Bellon, and M.P. Segundo, 3D face recognition using simulated annealing and the surface interpenetration measure, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2) (2010) 206219. [53] L. Spreeuwers, Fast and accurate 3D face recognition, International Journal of Computer Vision 93 (3) (2011) 389-414. [54] V. Kanhangad, A. Kumar, and D. Zhang, Combining 2D and 3D hand geometry features for biometric verification, in: Proceedings IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2009, pp. 39-44. [55] J. Yu, R. Hong, M. Wang, and J. You, Image clustering based on sparse patch alignment framework, Pattern Recognition 47 (11) (2014) 3512-3519. [56] J. Yu, Y. Rui, and D. Tao, Click prediction for web image reranking using multimodal sparse coding, IEEE Transactions on Image Processing 23 (5) (2014) 2019-2032. [57] J. Yu, Y. Rui, Y. Y. Tang, and D. Tao, High-order distance-based multi-view stochastic learning in image classification, IEEE Transactions on Cybernetics 44 (12) (2014) 2431-2442. [58] C. Xu, D. Tao, and C. Xu, A survey on multi-view learning, arXiv, preprint 1304.5634, 2013. [59] C. Xu, D. Tao and C. Xu, Large-margin multi-view information bottleneck, IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (8) (2014) 1559-1572. [60] C. Xu, D. Tao and C. Xu, Multi-view intact space learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (12) (2015) 2531-2544. [61] M. Sipser, Introduction to the theory of computation, Third Edition, Cengage Learning, Boston, 2013.

35

Farshid Hajati received the B.Sc. degree in electronic engineering from K. N. Toosi University of Technology, Iran, in 2003 and M.Sc. and Ph.D. degrees in electronic engineering from Amirkabir University of Technology, Iran, in 2006 and 2011, respectively. From 2009 to 2010 he was a visiting scholar with the School of Engineering, Griffth University, Australia. From 2011, he has been the faculty member of Electrical Engineering Department, Tafresh University, Iran. Also, he is an adjunct researcher at the Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Australia. His research interests include digital image processing, pattern recognition, computer vision, neural networks, and face detection and recognition.

Ali Cheraghian received the B.Sc. degree in electronic engineering from Shiraz University of Technology, Iran, in 2008 and M.Sc. degree in electronic engineering from Amirkabir University of Technology, Iran, in 2011. He is currently with the Machine Vision and Image Processing (MVIP) lab at Tafresh University, Iran. His research interests include face recognition, biometrics, pattern recognition, computer vision, and multispectral image analysis.

Soheila Gheisari received the B.Sc. degree in control engineering from K. N. Toosi University of Technology, Iran, in 2004 and M.Sc. degree in electronic engineering from Central Tehran Branch, Azad University, Iran, in 2013. From 2015, she has been a PhD student at School of Software, University of Technology Sydney, Australia. Her research interests include deep learning, pattern recognition, and computer vision. 36

Yongsheng Gao received the B.Sc. and M.Sc. degrees in electronic engineering from Zhejiang University, Hangzhou, China, in 1985 and 1988, respectively, and the Ph.D. degree in computer engineering from Nanyang Technological University, Singapore. Currently, he is a Professor with the School of Engineering, Griffith University, Australia. His current research interests include face recognition, biometrics, biosecurity, image retrieval, computer vision, pattern recognition, environmental informatics, and medical imaging.

Ajmal S. Mian received the B.E. degree in avionics from the College of Aeronautical Engineering, Nadirshaw Edulji Dinshaw University, Karachi, Pakistan, in 1993, the M.S. degree in information security from the National University of Sciences and Technology, Islamabad, Pakistan, in 2003, and the Ph.D. degree in computer science with distinction from The University of Western Australia, Crawley, Australia, in 2006. He received the Australasian Distinguished Doctoral Dissertation Award from Computing Research and Education Association of Australia in 2007. He received the prestigious Australian Post-Doctoral Fellowship in 2008 and the Australian Research Fellowship in 2011. He was named the West Australian Early Career Scientist of the Year 2012. He has secured four national competitive research grants and is currently a Research Professor with the School of Computer Science and Software Engineering, The University of Western Australia. His research interests include computer vision, pattern recognition, machine learning, multimodal biometrics, and hyperspectral image analysis.

37