Isometric deformation invariant 3D shape recognition

Pattern Recognition 45 (2012) 2817–2831 Contents lists available at SciVerse ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/lo...

Download PDF

2MB Sizes 4 Downloads 99 Views

Report

PDF Reader
Full Text

Pattern Recognition 45 (2012) 2817–2831

Contents lists available at SciVerse ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Isometric deformation invariant 3D shape recognition Dirk Smeets n, Jeroen Hermans, Dirk Vandermeulen, Paul Suetens K.U.Leuven, Faculty of Engineering, ESAT/PSI, IBBT-K.U.Leuven Future Health Department, Medical Imaging Research Center, University Hospitals Gasthuisberg, Herestraat 49 - bus 7003, B-3000 Leuven, Belgium

a r t i c l e i n f o

abstract

Article history: Received 30 November 2010 Received in revised form 7 November 2011 Accepted 20 January 2012 Available online 2 February 2012

Intra-shape deformations complicate 3D shape recognition and therefore need proper modeling. Thereto, an isometric deformation model is used in this paper. The method proposed does not need explicit point correspondences for the comparison of 3D shapes. The geodesic distance matrix is used as an isometry-invariant shape representation. Two approaches are described to arrive at a sampling order invariant shape descriptor: the histogram of geodesic distance matrix values and the set of largest singular values of the geodesic distance matrix. Shape comparison is performed by comparison of the shape descriptors using the w2 -distance as dissimilarity measure. For object recognition, the results obtained demonstrate the singular value approach to outperform the histogram-based approach, as well as the state-of-the-art multidimensional scaling technique, the ICP baseline algorithm and other isometric deformation modeling methods found in literature. Using the TOSCA database, a rank-1 recognition rate of 100% is obtained for the identiﬁcation scenario, while the veriﬁcation experiments are characterized by a 1.58% equal error rate. External validation demonstrates that the singular value approach outperforms all other participants for the non-rigid object retrieval contests in SHREC 2010 as well as SHREC 2011. For 3D face recognition, the rank-1 recognition rate is 61.9% and the equal error rate is 11.8% on the BU-3DFE database. This decreased performance is attributed to the fact that the isometric deformation assumption only holds to a limited extent for facial expressions. This is also demonstrated in this paper. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Object recognition Spectral decomposition Isometric deformation Face recognition Three-dimensional face Facial expression Expression variation SHREC

1. Introduction During the last decades, developments in 3D modeling and 3D capturing techniques caused an increased interest in the use of 3D objects for a number of applications, such as CAD/CAM, architecture, computer games, archaeology, medical applications and biometrics. In many of these ﬁelds, an important research problem is 3D shape retrieval, in which an object needs to be recognized from a large database of objects. One witness of the growing interest in 3D object retrieval is the yearly SHREC (SHape REtrieval Contest) contest [1], organized with the objective to evaluate the effectiveness of 3D shape retrieval algorithms. The challenge of 3D shape recognition becomes even harder when intra-shape deformations are present in the database, as is often the case for articulating objects. Fig. 1, for example, shows some deformed shapes from the TOSCA database [2]. Three-dimensional face recognition is a domain of special interest within the 3D shape recognition research community.

n

Corresponding author. Tel.: þ3216349043; fax: þ 3216349001. E-mail addresses: [email protected] (D. Smeets), [email protected] (J. Hermans), [email protected] (D. Vandermeulen), [email protected] (P. Suetens). 0031-3203/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2012.01.020

As a matter of fact, since the last decade of the 20th century, a shifting can be observed from 2D to 3D or 2Dþ3D face representations [3] for automated face recognition tasks. Threedimensional face representations have the intrinsic advantage of not being affected by pose variations or varying lighting conditions. However, the intra-subject deformations still require special attention in automatic 3D face recognition. It has indeed been shown that, for instance, deformations due to cosmetic surgery negatively affect the performance of some state-of-the-art 3D face recognition methods [4]. However, most intra-subject deformations are due to natural intra-subject changes in facial shape, mainly changes in facial expressions. Those expression variations are reported as one of the main challenges of 3D face recognition research [5]. In this paper, a 3D shape recognition method that is invariant for isometric deformations is proposed, while not requiring explicit point correspondences for shape comparison. The 3D shapes are represented by an isometric deformation invariant matrix, i.e. a geodesic distance matrix (GDM). The latter contains the geodesic distance between each pair of points on the surface. This distance is the length of the shortest path on the object surface between two points on the object. Isometric deformations, by deﬁnition, leave these geodesic distances unchanged. The proposed method thus requires the intra-subject deformations to

2818

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

Fig. 1. Intra-shape deformations are present in the TOSCA database [2].

be near isometric, a hypothesis we need to verify as well. Using the geodesic distance matrix, two different shape descriptors are presented that are both invariant to the sampling order of points on the surfaces represented. As such, object recognition reduces to direct comparison of the shape descriptors without the need to establish explicit point correspondences. The ﬁrst approach uses a simple histogram of the GDM values. The second shape descriptor is deﬁned by the set of largest singular values of the GDM. It will be shown that this modal representation is an excellent shape descriptor. This paper is organized as follows: after a discussion of related work in the remaining of this section, the 3D shape recognition method using our isometric deformation model is presented and compared with two other methods using an isometric deformation model in Section 2. The methods are validated for object recognition in Section 3.1 as well as for 3D face recognition in Section 3.2. The validity of the assumption that important intrashape deformations are near isometric is investigated in Section 4. The recognition results are discussed and compared with other isometric deformation invariant 3D shape recognition methods in Section 5. Finally, Section 6 summarizes the main conclusions drawn from this research. 1.1. Related work In literature, geodesic distance matrices have already been used to tackle 3D recognition problems involving non-rigid objects. Probably the best known of these contributions is the algorithm of Elad and Kimmel [6]. Here, the GDM is computed using the fast marching on triangulated domains (FMTDs) method. Subsequently, the GDM is processed using a multidimensional scaling (MDS) approach, converting non-rigid objects into their rigid invariant signature surfaces. These can be compared using standard algorithms for rigid matching. We will use an implementation of this method for comparison with our approaches. The Geodesic Object Representation of Hamza and Krim [7] is another 3D object recognition method relying on geodesic distance matrices. In [7], GDMs are used to determine global geodesic shape functions. This global shape descriptor is deﬁned in each point of the surface and measures the normalized accumulated squared geodesic distances to each other point on the surface. Using kernel density estimation (KDE), the global geodesic shape functions of a particular object are transformed into a geodesic shape distribution. For the actual recognition, these KDEs are compared using the Jensen–Shannon divergence. Next, a similar method to our modal representation approach is the method shown in Jain and Zhang’s work [8]. This method measures the inter-object distance by taking the w2 -distance between the 20 largest eigenvalues of a weighted GDM of the object. In this paper, it will be shown that the weighting of the GDM has an adverse effect on the shape recognition performance. Also in the ﬁeld of expression-invariant 3D face recognition, the idea of using an isometric deformation model is not novel.

Indeed, in 2003, the method of Elad and Kimmel [6] (which has been described above) was applied to 3D faces by Bronstein et al. [9]. The use of this method was justiﬁed by demonstrating that deformations due to facial expression variations are near isometric. An extension of this method is described in [10]. Instead of embedding the facial image into a Euclidean space, it 2 is parameterized on a 2D sphere S . In order to obtain a translation-, rotation- and scale-invariant shape descriptor, the nose tip is chosen to reside on the north pole of the sphere. The remaining degrees of freedom are eliminated by a spherical harmonic transform. The Euclidean norm between the transform coefﬁcients is used as dissimilarity measure. Another extension is the partial embedding of one surface into another surface using generalized MDS (GMDS) [11,12]. GMDS maps the probe image on the model by minimizing the generalized stress, i.e. the weighted sum of differences between corresponding geodesic distances. The three-point geodesic distance approximation is developed for calculating the geodesic distance between points originally not on the model surface. After the publication of the MDS method in [9], many researchers exploited the near invariance of geodesic distances during facial expression variations for expression invariant face recognition purposes. Berretti et al. [13] developed a method based on iso-geodesic stripes, i.e. sets of points with an equal normalized geodesic distance to a reference point located at the nose tip. The spatial relationship between intra-subject stripes is used to compare faces. Iso-geodesic curves form also the basis of the methods of Feng et al. [14] and Jahanbin et al. [15]. Similarly, Mpiperis et al. [16] use a geodesic polar representation, in which each point on the face is characterized by the geodesic distance r from the pole (nose tip) and the polar angle f, compared to a reference path through the sellion. Ouji et al. [17] directly compare the geodesic distance from the nose tip to corresponding points. Correspondence is achieved with the iterative closest point (ICP) algorithm. Li and Zhang [18] and Gupta et al. [19] use pairwise geodesic distances between corresponding landmarks. Most of these methods require point correspondences at some stage during shape comparison. However, establishing these point correspondences is not a trivial task which is often performed iteratively introducing the risk of obtaining sub-optimal solutions. In contrast, the approaches proposed in this paper do not require explicit point correspondences at all.

2. Isometric deformation model In mathematics, an isometry is a distance-preserving isomorphism between metric spaces. An isometric deformation of a submanifold M in a Riemannian space V is therefore a deformation which preserves the length and energies of curves in M [20]. For surfaces this means that isometries will bend the surface without stretching it. Isometric deformations, which reside in the ﬁeld of differential geometry, have already been studied by C.F. Gauss at the beginning of the 19th century [21].

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

For recognition purposes, it is interesting to study geometric invariants during isometric deformations. According to Gauss’s Theorema Egregium, the Gaussian curvature K of a surface, i.e. the product of the principal curvatures, is invariant under local isometry. However, as the computation of curvatures is sensitive to noise, the practical use of this ﬁrst invariant is limited. The ﬁrst fundamental form of M is a more interesting geometric invariant for isometric deformation invariant 3D shape recognition, because its application is less sensitive to noise. In order to formalize this invariant, consider the regular patch (local surface) f : Q -R3 mapping all q ¼ ðu,vÞ A Q to a point x ¼ fðqÞ on the 3D surface. The ﬁrst fundamental form of M corresponds to the inner product of the tangent vectors and is explicitly given by the Riemannian metric [22] 2

2

2

ds ¼ E du þ2F du dvþ G dv ,

ð1Þ

with 2 @f E ¼ , @u F¼

ð2Þ

@f @f , @u @v

ð3Þ

2 @f G ¼ : @v

ð4Þ

Integrating ds over a path on the surface, determines the arc length of that curve. The shortest path between two points on the surface is called minimal geodesic and its length is called the geodesic distance. The geodesic distance T x1 ðxÞ between the 3D surface points x ¼ fðqÞ and x1 ¼ fðq1 Þ can be obtained by solving the Eikonal equation 9rT x1 ðfðqÞÞ9 ¼ 1,

ð5Þ

with boundary condition T x1 ðx1 Þ ¼ 0. In practice, geodesic distances are calculated by a fast marching algorithm for triangulated meshes [23,24]. 2.1. Geodesic distance matrix as shape representation Geodesic distance matrices (GDMs) are shape representations invariant for isometric object deformations. Hence, assuming isometric intra-shape deformations GDMs are appropriate to be used for shape descriptors. A matrix G ¼ ½g ij is regarded as GDM for a particular shape if its elements g ij correspond to the geodesic distances between the points i and j on the object’s surface. Using triangulated meshes as surface representations, GDMs can be computed with a fast marching

2819

algorithm [24]. As such, a GDM is obtained with a computational cost of Oðn2 Þ, with n the number of points on the surface. Fig. 2 contains an example 3D object and its associated GDM. By deﬁnition, GDMs are symmetric and uniquely deﬁned up to a random simultaneous permutation of their rows and columns due to the arbitrary sampling order of the surface points. Hence, G0 and G are equivalent GDMs if they are related by a permutation matrix P, according to G0 ¼ PGPT :

ð6Þ

Provided that two instances of the same object are represented by surface meshes containing an equal number of sufﬁciently dense sampled surface points, a one-to-one correspondence map (bijective map) can be assumed to exist between both surface representations. Hence, point correspondences are mathematically characterized by a permutation matrix and the geodesic distance matrices of these surface meshes are approximately related by Eq. (6). Hence, shape comparison reduces to verifying the extent to which Eq. (6) holds. However, in practice the point correspondences between the objects compared are generally not known. Since establishing explicit point correspondences between surfaces is far from trivial, this work proposes the use of shape descriptors extracted from GDMs that are invariant for simultaneous permutations of their rows and columns. Thereto, two different methods are described in Sections 2.3 and 2.4. These algorithms will be compared with multidimensional scaling, a state-of-the-art technique for isometric deformation invariant recognition and described in Section 2.2. The shape descriptors proposed in the following sections are not restricted to geodesic distance matrices only (G1 ¼ ½g ij ). Therefore they are also applied to other afﬁnity matrices such as the squared GDM (G2 ¼ ½g 2ij ), the Gaussian weighted GDM (G3 ¼ ½expðg 2ij =ð2s2 Þ) and the increasing weighting function GDM (G4 ¼ ½1 þ ð1=sÞg ij 1 ), as deﬁned in [25]. 2.2. Multidimensional scaling Multidimensional scaling (MDS) is a technique that allows visualization of the proximity between points with respect to some kind of dissimilarity (distance) measure. Thereto, the original point conﬁguration is mostly embedded in a lower dimensional space by a (approximately) distance preserving mapping. In the output space, Euclidean distances are generally used. In the method described in [9], MDS is applied on the GDM in order to obtain a conﬁguration of points where point-wise Euclidean distances approximately equal to the original pointwise geodesic distances. As an example, Fig. 3 shows the obtained 2D and 3D embedding after application of classical MDS on the

200 180 160 140 120 100 80 60 40 20 0 Fig. 2. 3D mesh of an object (a) and its geodesic distance matrix representation (b).

2820

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

Fig. 3. 2D (a) and 3D (b) canonical form of the same object as shown in Fig. 2(a).

5 2.5 x 10

140 120 frequency

frequency

2 1.5 1

100 80 60 40

0.5

20

0

0 0

20

40

60 bin

80

100

120

0

20

40

60 bin

80

100

120

Fig. 4. Histograms of all (a) and point-wise averaged (PWA) (b) geodesic distances of the same object as shown in Fig. 2.

object shown in Fig. 2(a). These point conﬁgurations are commonly referred to as canonical forms. Classical MDS includes an eigendecomposition of the doublecentered distance matrix 12JGJ ¼ V LV T ,

ð7Þ T

with J ¼ Ið1=nÞ11 the centering matrix (I is the identity matrix, n the number of points on the surface and 1 a n 1 vector with all entries equal to 1) and G the distance matrix containing squared geodesic distances. The coordinates of the canonical forms are ~ 1=2 , with L ~ a 2 2 or 3 3 diagonal matrix then given by X ¼ V~ L with the largest eigenvalues in order to obtain the 2D or 3D canonical form, respectively. Because the GDM is invariant with respect to isometric transformations, the canonical forms of isometrically deformed objects have the same conﬁguration in the embedding space. Therefore, objects can be compared by rigidly aligning their canonical forms and analyzing the registration error. Note that explicit point correspondences are required to obtain this rigid registration. Since establishing explicit point correspondences is far from trivial, the following sections propose shape descriptors which allow shape comparison without establishing point correspondences. As discussed in Section 2.1 and formalized in Eq. (6), these shape descriptors should therefore be invariant for simultaneous row and column permutations of the GDM. 2.3. Histogram representation By transforming geodesic distance matrices into histograms of geodesic distances, a very simple shape descriptor is obtained that is invariant for simultaneous permutations of rows and

columns of the GDMs. Many different possibilities can be thought of to construct these histograms. However, in this paper, only two of them are considered. The ﬁrst histogram-based shape descriptor is obtained by arranging all upper triangular values of the GDM into a histogram. The second one describes shapes using histograms of their mean geodesic distances per point (point-wise averaged (PWA) geodesic distances). As an example, both histogram representations for the object in Fig. 2 are shown in Fig. 4.

2.4. Modal representation A second shape descriptor proposed uses a singular value decomposition (SVD) of a GDM. This elementary matrix operation transforms a GDM into a permutation-variant matrix of singular vectors and a permutation-invariant diagonal matrix. This is subsequently formalized. Proof. Let P be a random permutation matrix, such that G0 ¼ PGP T is a GDM with rows and columns permuted, and G ¼ U SU T a singular value decomposition of G, then G0 ¼ PGP T ¼ PU SðPUÞT :

ð8Þ

Because PU remains a unitary matrix and S is still a diagonal matrix with non-negative real numbers on the diagonal, the right hand side of (8) is a valid singular value decomposition of G0 . A common convention is to sort the singular values in nonincreasing order. In this case, the diagonal matrix S is uniquely determined by G0 . Therefore, S ¼ S0 , with S0 the singular value matrix of G0 according to G0 ¼ U 0 S0 U 0T . &

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

2821

numerical reasons, only the k largest singular values are computed and retained as shape descriptors. As such, the computational complexity is limited to Oðk n2 Þ, with n the dimension of the GDM. MATLAB 2009a is used for the calculation of eigenvalues.

Conceptually, the singular value decomposition separates the GDM into a diagonal matrix containing intrinsic shape information and a permutation-variant matrix containing rich feature vectors. While the former is used for shape comparison purposes, the latter could be used to establish point correspondences. For

2.5. Dissimilarity measures for shape comparison Table 1 Dissimilarity measures. Dissimilarity measure

In order to compare the shapes in a database, an appropriate dissimilarity measure has to be chosen to compare the associated shape descriptors of the two proposed approaches. In this work, the dissimilarity measures listed in Table 1 are examined. In this table, Hð. . .Þ refers to the Shannon entropy. Moreover, Sj ¼ ½sj1 . . . sji . . . sjM T is an M-dimensional shape descriptor, with j ¼ 1, . . . ,N and N the number of objects.

Formula

Mean normalized Manhattan distance

M P

D1 ¼

29Ski Sli 9 Ski þ Sli

i¼1

Mean normalized maximum norm

D2 ¼ max i

2

w - distance

M P

D3 ¼

i¼1

29Ski Sli 9 Sk þ Sl

i p ﬃﬃﬃﬃ i pﬃﬃﬃ Sk Sl 9 pﬃﬃﬃﬃi pﬃﬃﬃi

29

Ski þ

Sli

Jensen–Shannon divergence

D4 ¼ Hð12 Sk þ 12 Sl Þð12 HðSk Þ þ 12HðSl ÞÞ

Correlation Euclidean distance

D5 ¼ 1 Sk S l JS JJS J sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ M P ðSki Sli Þ2 D6 ¼

Normalized Euclidean distance

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ M P ðSki Sli Þ2 =s2i D7 ¼

Mahalanobis distance

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ M P ðSk Sl ÞT covðSÞ1 ðSk Sl Þ D8 ¼

k

l

3. Recognition experiments and results In this section, the approaches described in Section 2 are validated for general object recognition (Section 3.1) and for 3D face recognition (Section 3.2). Both for general object recognition and 3D face recognition, two types of recognition experiments are considered: identiﬁcation and veriﬁcation. These scenarios are shortly described in the following paragraphs, along with their commonly used performance measures. The identiﬁcation scenario involves a one-to-many comparison of objects. Speciﬁcally, a probe object is compared to many objects in a gallery. In this work it is assumed that each gallery contains exactly one example object per object class considered and that each probe object belongs to exactly one object class included in the gallery. Hence, each probe object matches with exactly one gallery object. Object identiﬁcation involves retrieval of the matching gallery object for a given probe. Hence, using a shape dissimilarity measure, the gallery objects are ranked

i¼1

i¼1

i¼1

Table 2 The results of the different approaches with optimal design parameters. Proposed approaches

R1 RR (%)

EER %

Di

Gi

Z

Histogram of PWA values Histogram of all values Modal representation

68.03 71.31 100

13.89 13.96 1.58

D6 D6 D3

G2 G2 G1

5 1000 128

Table 3 The results of the different MDS implementations.

Table 4 Results of different isometric deformation model methods on TOSCA database.

R1 RR (%)

35.25 39.34

46.18 29.49

40.98 45.90

43.66 35.51

Experiment

R1 RR (%)

EER (%)

MDS Histogram of PWA values Histogram of all values Modal representation ICP

39.34 68.03 71.31 100.0 35.29

29.49 13.89 13.96 1.58 40.07

100

1

90

0.9

80

0.8 ICP

60

histogram of PWA values

MDS histogram of PWA values histogram of all values modal representation

MDS

0.6

histogram of all values

50

ICP

0.7

70

FRR

recogntion rate [%]

CMDS MM ICP SMACOF MM ICP

EER (%)

modal representation

0.5

40

0.4

30

0.3

20

0.2

10

0.1 0

0 1

2

3

4

5

6 7 rank

8

9

10

11

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FAR

1

Fig. 5. Results of standard validation experiments on the TOSCA-database with CMC (a) and ROC (b). Object recognition with a baseline ICP algorithm (thin solid line) is compared to object recognition using MDS (dash-dot line), histogram comparison of PWA (dotted line) and all values (dashed line) and modal representation (thick solid line).

2822

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

15

95 90

equal error rate [%]

rank 1 recognition rate [%]

100

85 80 75 70

10

5

65 60

0 0

50 100 150 200 250 300 350 400 450 500 no. eigenvalues

0

50 100 150 200 250 300 350 400 450 500 no. eigenvalues

Fig. 6. The R1 RR (a) and EER (b) is plotted against the number of eigenvalues used in the shape descriptor. This illustrates the stability of the algorithm proposed w.r.t. the number of eigenvalues used.

according to increasing shape dissimilarity with the probe object. Provided that an appropriate shape dissimilarity measure is used, the matching gallery object should reside on one of the ﬁrst places in this shape dissimilarity ranking. The performance of a shape descriptor in the identiﬁcation case can be characterized by measuring the rank N recognition rate (RN RR). This is the percentage of object probes for which the matching gallery entry belongs to the N most similar gallery objects according to the shape dissimilarity measure used. Plotting the rank N recognition rates for increasing N, results in the cumulative matching curve (CMC). In the veriﬁcation or authentication scenario a one-to-one object comparison is performed to decide whether two objects belong to the same or a different object class. Thereto a threshold on the shape dissimilarity measure is introduced. When the shape dissimilarity value is below or above this threshold, the objects are considered to belong to the same (genuine) or different (imposter) object classes respectively. The performance of a shape dissimilarity measure in the veriﬁcation scenario is quantiﬁed by the receiving operator curve. This curve plots the false rejection rate (FRR) versus the false acceptance rate (FAR) for varying values of the threshold used. The equal error rate (EER) is the point on the ROC for which the FAR is equal to the FRR and can therefore be seen as an important characteristic for the veriﬁcation performance. In the subsequent validation experiments, the methods proposed are compared with the state-of-the-art MDS technique as well as with a baseline recognition algorithm. The latter corresponds to the standard iterative closest point (ICP) algorithm [26]. This is a well-known and extensively used rigid shape registration algorithm that minimizes the sum of squared Euclidean distances between closest points. After rigid registration, the RMS registration error can be used as shape dissimilarity measure. Opposed to the algorithms of Section 2, the baseline ICP algorithm performs rigid shape comparison only. 3.1. Object recognition To validate the different shape matching approaches for general 3D objects, we use the ‘‘TOSCA non-rigid world’’ database [2]. The TOSCA database is a clean dataset and is used to determine the inﬂuences of the different parameters. The best performing method on this database is then externally validated. Therefore, we participated the ‘‘SHREC 2010—Shape Retrieval Contest of Non-rigid 3D Models’’ challenge in 2010 [27] and the ‘‘SHREC 2011—Shape Retrieval Contest of Non-rigid 3D Watertight Meshes’’ in 2011 [28].

3.1.1. TOSCA non-rigid world Data and preprocessing: The TOSCA database is intended as a benchmark dataset for the objective comparison of 3D object recognition methods and point matching algorithms involving non-rigid objects [2]. Thereto, it consists of various 3D non-rigid shapes in a variety of poses. In our experiments, we use 133 objects, including 9 cats, 11 dogs, 3 wolves, 17 horses, 21 gorillas, 1 shark, 6 centaurs, 6 seahorses, 24 female ﬁgures, and two different male ﬁgures, containing 15 and 20 poses. Each object contains approximately 3000 vertices. As the objects considered have closed surfaces, no region of interest needs to be determined. Moreover, because all objects are represented with approximately the same number of points, there is no need for re-sampling. As mentioned in Section 3, both identiﬁcation and veriﬁcation experiments are conducted. For the identiﬁcation experiment, the gallery is constituted of the ﬁrst object of each class. This results in 122 probe objects to be compared with the gallery. Regarding the veriﬁcation experiments, all objects are compared with each other resulting in 133 134=2 ¼ 8911 different shape comparisons. Before comparing the 3D object recognition methods with each other and with the baseline ICP method, the best set of inﬂuencing design parameters is determined for each method. Parameter and conﬁguration tuning: for the histogram approach of Section 2.3 the optimal number of histogram bins must be selected, while for the modal approach of Section 2.4 the optimal number of singular values must be determined. For both algorithms the best GDM weighting Gi (Section 2.1) must be chosen as well as the dissimilarity measure Di used for shape comparison (Section 2.5). This parameter tuning is performed by conducting aforementioned identiﬁcation and veriﬁcation experiments for all possible combinations. The optimal values,1 along with the identiﬁcation and veriﬁcation results, are tabulated in Table 2. The meaning of the parameter Z depends on the method considered. For the histogram approach it corresponds to the number of bins, while for the modal representation Z represents the number of singular values. As previously mentioned, the GDM-based object recognition methods proposed are compared with the state-of-the-art MDS approach. As to obtain a fair comparison, both the classical MDS (CMDS) and the least-squares MDS using the SMACOF algorithm as implemented by Bronstein [2] are considered. After

1

The optimality criterion is R1 RREER.

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

2823

shape comparison approach considered. Moreover, the modal representation outperforms all other methods described. As to illustrate the stability of its performance, Fig. 6 plots the R1 RR (a) and the EER (b) against the number of singular values used in the shape descriptor. Both identiﬁcation and veriﬁcation results remain stable when the number of singular values used is between 50 and 200. Fig. 17 shows some qualitative results that illustrate the consistent correct recognition obtained with the modal representation. Correspondences. As suggested in Section 2.4, point correspondences can be extracted from the eigenvector matrices, by ﬁnding the permutation from U 0 ¼ PU. However, since the eigenvectors are deﬁned up to a sign, sign ﬂips occur. Moreover, due to numerical issues eigenvector can change position. This is called eigenvector ﬂip. To account for these sign and eigenvector ﬂips Mateus et al. [29] proposes a histogram approach together with an EM-based matching framework. To demonstrate the proof-ofconcept, we implemented the histogram approach that matches eigenvectors or their sign ﬂipped version based on their histogram. After correction for sign and eigenvector ﬂip, the permutation is found by matching the rows in U and U 0 . An example of correspondences found by this approach is shown in Fig. 7.

determination of the 3D canonical shapes, these need to be aligned to enable shape comparison. Thereto, both the iterative closest point (ICP) algorithm [26] and moment matching (MM) are considered. The actual shape dissimilarity is expressed in terms of RMS registration error and the Euclidean distance between higher order moments when ICP or MM is used for object alignment respectively. The results of these MDS combinations are given in Table 3. Results: After tuning each of these object recognition algorithms, their performance is compared for both experiments considered. Fig. 5(a) shows the CMC of the identiﬁcation experiment conducted, while Fig. 5(b) displays the ROC for the veriﬁcation experiment. Besides the GDM-based object recognition results, the recognition performance of the baseline ICP algorithms is included. As the rank-1 recognition rate (R1 RR) and the equal error rate (EER) are characteristic points on the CMC and ROC respectively, these values are listed in Table 4 for the different methods. Both Fig. 5 and Table 4 clearly show the baseline rigid ICP algorithm performs considerably worse than any isometric

Table 5 Ofﬁcial shape retrieval results on the SHREC 2010 dataset [27]. Authors

Method

NN (%)

FT (%)

ST (%)

E (%)

DCG (%)

Smeets et al. Ohbuchi and Furuya Wuhrer and Shu

Modal repr. BF-DSIFT-E CF

99.5 98.0 98.0

78.8 76.6 63.5

94.4 89.2 78.0

68.0 64.5 55.3

96.1 94.1 87.8

Table 6 Ofﬁcial shape retrieval results on the SHREC 2011 dataset [28]. Method

NN (%)

FT (%)

ST (%)

E (%)

DCG (%)

Smeets et al. Lian and Gozil Smeets et al. Reuter Van Nguyen Kawamura et al. Ohkita et al. Lavoue´ Tabia and Daoudi Sipiran and Bustos

Modal repr. MDS-CM-BOF meshSIFT shapeDNA BOGH FOG LSF BOW-LSD patchBOF HKS

100.0 99.5 99.5 99.7 99.3 96.8 99.5 95.5 74.8 83.7

96.2 91.3 88.4 89.0 81.1 81.7 79.9 67.2 64.2 40.6

98.4 96.9 96.2 95.2 88.4 90.3 86.3 80.3 83.3 49.7

73.1 71.7 70.8 69.6 64.7 66.0 63.3 57.9 58.8 35.3

99.4 98.2 98.0 97.5 94.9 94.4 94.3 89.7 83.7 73.0

100

1

90

0.9

80

0.8

70

0.7

60

0.6 FRR

recognition rate

Fig. 7. An example of correspondences found by matching the eigenvector matrices after correction for sign and eigenvector ﬂip.

Authors

50

0.5

40

0.4

30

0.3

20

0.2

10

0.1

0

0 1

2

3

4

5 6 rank

7

8

9

10

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FAR

1

Fig. 8. Validation of the proposed method for the SHREC 2010 dataset. The CMC of (a) summarizes the results of the identiﬁcation experiment, while the ROC in (b) shows the veriﬁcation results.

2824

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

3.1.2. Shape retrieval contest 2010 Data and preprocessing: since the modal representation outperformed the other methods considered on the TOSCA database, this algorithm is also validated by participating in the ‘‘SHREC 2010—Shape Retrieval Contest of Non-rigid 3D Models’’ [27]. The used dataset is a subset of the McGill 3D Shape Benchmark [30], containing 200 non-rigid objects, including 20 ants, crabs, hands, humans, octopuses, pliers, snakes, spectacles, spiders and teddies each. The objective of this 3D Shape Retrieval Contest is to evaluate the effectiveness of 3D-shape retrieval algorithms for non-rigid 3D objects [27]. Unlike the object instances of TOSCA non-rigid world database, the different instances of the same object do have some small intrinsic shape and scale variations in the SHREC dataset. Therefore, all meshes are re-sampled keeping 2500 points. In order to account for scale changes, each object’s GDM is normalized with the sum of all its elements. Parameter and conﬁguration tuning: As SHREC is intended as an external validation contest, no ground truth information was available for the competitors. Hence, an extensive parameter and conﬁguration tuning of the method applied was not possible. Therefore, the modal shape matching approach used corresponds with the conﬁguration that optimally performed on the TOSCA database. However, in order to account for the intrinsic shape

differences within an object class, the number of singular values used is reduced to 19. This value was empirically determined. Results: Fig. 8 gives the results of the validation and indicates an EER of 8.86% and a R1 RR of 88.95%. This is a lower performance compared to the TOSCA database, which can be explained by the small intrinsic shape variations present between objects of the same class in this dataset. In order to enable standardized external validation, SHREC prescribes the use of same predeﬁned performance measures. For reasons of completeness, these are shown in Table 5. The nearest neighbor (NN) is deﬁned as the percentage of closest matches that are the same class as the probe. The ﬁrst tier (FT) and second tier (ST) are the percentage of models belonging to the probe’s class that appear within the top k of ranked models, where k ¼ nc 1 for the ﬁrst tier and k ¼ 2ðnc 1Þ with nc the size of the probe’s class. The E-measure (E) is a composite measure of precision and recall. The discounted cumulative gain (DCG) is a statistic that measures the usefulness of a model based on its position in the ranked list. For the further details regarding the meaning of these measures, the reader is referred to [1,31]. 3.1.3. Shape retrieval contest 2011 Data and preprocessing: The modal representation is again validated by participating in the ‘‘SHREC 2011—Shape Retrieval Contest of Non-rigid 3D Watertight Meshes’’ [28]. This dataset is larger than the SHREC 2010 dataset, containing 600 non-rigid objects, recreated and modiﬁed from several publicly available databases such as the McGill database [30], TOSCA shapes [2] and the Princeton Shape Benchmark [31]. All meshes are downsampled, keeping 3000 points. To compensate for scale differences, each object’s GDM is normalized with the square root of the total surface area of the mesh (instead of the sum of all its elements in SHREC 2010). Parameter and conﬁguration tuning: also for SHREC 2011, no ground truth information was available for the competitors.

Table 7 Results of method based on the modal representation on the BU-3DFE database, compared with the baseline ICP algorithm.

Fig. 9. The surface of the left face (a) is cropped by only keeping vertices within a geodesic distance of 80 mm from the nose tip (b).

R1 RR (%)

EER (%)

Modal representation ICP

61.87 71.63

11.84 13.23

1

100 90

ICP

0.9

ICP modal representation

80 70

0.7

60

0.6

50

0.5

40

0.4

30

0.3

20

0.2

10

0.1

0

modal representation

0.8

FRR

recogntion rate [%]

Experiment

0 10

20

30

40 50 60 rank

70

80

90 100

0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FAR

1

Fig. 10. Results of standard validation experiments on the BU-3DFE database with CMC (a) and ROC (b). Face recognition with a baseline algorithm (thin solid line) is compared to face recognition using modal representation (thick solid line).

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

70

22

60

20

2825

expr 2

50 40 30 all expr 1

20

expr 2

equal error rate [%]

rank 1 recogntion rate [%]

all expr 1

18

expr 3 expr 4

16 14 12 10

expr 3

10

expr 4

0

8 6

0

20 40 60 80 100 120 140 160 180 200 no. eigenvalues

0

20 40 60 80 100 120 140 160 180 200 no. eigenvalues

Fig. 11. The R1 RR (a) and EER (b) is plotted against the number of eigenvalues used in the shape descriptor. This illustrates the stability of the algorithm proposed w.r.t. the number of eigenvalues used.

Therefore, the modal shape matching approach used corresponds with the conﬁguration that optimally performed on the TOSCA database. However, in order to account for the intrinsic shape differences within an object class, the number of singular values used is chosen to be 40. Results: Table 6 shows the results of the SHREC 2011 contest, having strong ﬁeld of participants. The modal representation provides the best results on all evaluation criteria. 3.2. Face recognition 3.2.1. BU-3DFE database Data and preprocessing: In order to validate the performance of the modal shape matching method for face recognition in the presence of expression variations, it is applied to a subset of the Binghamton University 3D Facial Expression (BU-3DFE) database [32]. In this experiment, 900 different facial surfaces are considered, which originate from 100 subjects. While only 11% of these facial surfaces correspond to neutral expressions, the other 89% either have an angry or a sad expression, subdivided into four different levels of facial expression. Most facial surfaces had a closed mouth. Because the modal shape matching approach assumes that each surface is constituted of corresponding points, the surfaces considered must represent the same facial area. As this prerequisite is not met for all faces in the dataset, some preprocessing steps are required. This involves detection of the nose tip and removal of all surface vertices with a geodesic distance above a certain threshold from the nose. The geodesic cropping is shown in Fig. 9(b) for a cutoff threshold of 80 mm. In this paper, a very simple nose detection method is used. As for most faces in the dataset the gaze direction is approximately aligned with the z-axis, the nose tip mostly corresponds with the vertex with largest z-value. Only in 2% of the cases, a manual correction was needed. Although falling without the scope of this work, nose detection can be robustly automated with [33,34]. Finally, the surfaces are re-sampled as to be constituted by an equal number of points. For these experiments, 2000 facial vertices are randomly selected, as a trade-off between computational complexity and accuracy of the representation. The result is a normalized 3D face region that contains the same face area for every 3D surface represented by an equal number of points. To apply the modal shape matching approach proposed, a fast marching algorithm for meshes is used to compute the GDM for each face in the dataset. Parameter and conﬁguration tuning: Again, the best dissimilarity measure and the optimal number of eigenvalues is found by an

Fig. 12. The variation of geodesic distance between four ﬁnger tips in three situations is compared with variation of Euclidean distances.

extensive tuning. Using the Jensen–Shannon divergence (D4) with 189 eigenvalues gives the best results for the identiﬁcation and veriﬁcations scenario.2 Results: Fig. 10 shows the results of this approach for the identiﬁcation and veriﬁcation scenario for the optimal conﬁguration, and the results of the baseline ICP algorithm. The equal error (EER) and rank-1 recognition rate (R1 RR) are characteristic points on the ROC and CMC respectively and are listed in Table 7. Both Fig. 10 and Table 7 show a similar performances for the baseline algorithm and the modal representation. To illustrate the stability of the performance of the modal representation, Fig. 11 plots the R1 RR (a) and the EER (b) against the number of singular values used in the shape descriptor. This experiment is repeated four times for subsets containing the neutral scan and all scans with a certain, manually annotated expression strength. The recognition performance clearly drops when emotions are more expressed (higher level of expression strength).

4. Validity of the isometric deformation assumption The proposed methods require the intra-subject deformations to be near isometric. In this section, the validity of the isometric deformation assumption is examined for 3D articulating objects and 3D faces with expression variations. 2

The optimality criterion is R1 RREER.

2826

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

Fig. 13. The geodesic distance is calculated from the nose tip to 21 other facial landmarks on the Bosphorus database [35] to examine its variation. The ﬁrst six faces have different expressions, the last two have a neutral expression.

1

1 Euclidean geodesic

0.9

0.8

0.8

0.7

0.7 frequency

frequency

0.9

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 –150

–100

–50 0 50 relative error [%]

100

150

Euclidean geodesic

0 –50 –40 –30 –20 –10 0 10 20 30 40 50 absolute error [mm]

Fig. 14. The histogram of relative (a) and absolute (b) variation of geodesic and Euclidean distances between the nose tip and 21 other facial landmarks is calculated based on 303 expression scan of 80 subjects.

4.1. 3D articulating objects To examine the deviation to the isometric deformation assumption in a realistic situation when dealing with 3D articulating objects, we looked at the change in geodesic distance between four ﬁnger tips in three situations with different conﬁguration of the hand (an articulating object), as shown in Fig. 12. This results in a mean coefﬁcient of variation (CV) of 5.3% for the geodesic distances, while the CV for Euclidean distances is equal to 27.6%. This clearly indicates the beneﬁt of using an isometric deformation model for recognition.

4.2. 3D faces Fig. 15. The ﬁgure shows the setup of the experiment to validate the isometric deformation assumption for faces.

To validate the isometric deformation assumption for faces during expression variation, two experiments are setup.

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

1

1 geodesic Euclidean

0.9 0.8

0.8 0.7 frequency

frequency

geodesic Euclidean

0.9

0.7 0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 –150

2827

–100

–50 0 50 relative error [%]

100

150

0 –60

–40

–20 0 20 absolute error [mm]

40

60

Fig. 16. The histogram of relative (a) and absolute (b) variation of geodesic and Euclidean distances between all pairs of the 47 facial ﬁducials using the average of the neutral scans as reference.

The ﬁrst experiment examines the variation of the geodesic distance from the nose tip to 21 anatomical landmarks, as shown in Fig. 13, and compares the variation with the variation in Euclidean distance. Therefore, the Bosphorus database, including manually indicated landmarks, is used [35]. Most landmarks are located on the more rigid parts of the human face. The results of the experiment are given in Fig. 14. The histogram of relative (Fig. 14(a)) and absolute (Fig. 14(b)) variation of geodesic and Euclidean distances between the nose tip and 21 other facial landmark is calculated based on 303 expression scans of 80 subjects. Almost no difference between the distribution of geodesic and Euclidean distances can be noticed. The second experiment considers the variation of geodesic distances between each pair of landmarks on the face, again for the Bosphorus database [35] with the landmarks mainly on the rigid part of the face, as well as for an own database with landmarks mainly on the non-rigid part of the face. For landmarks on the non-rigid part, a larger difference between variation of geodesic and Euclidean distances is expected. On 171 expression scans of the Bosphorus database corresponding to 4 subjects, the mean coefﬁcient of variation (CV) is 5.98% for the geodesic distances between all pairs of 22 landmarks, while the mean CV for the Euclidean distances is 6.16%. To examine the variation for landmarks located on the less rigid parts of the face, we set up an experiment, in which 47 ﬁducials are manually indicated on subject’s face as shown in Fig. 15. Then, 18 3D scans are taken, 4 neutrals and the other with different (strong) expressions, using a structured light scanner. To compute the geodesic distances appropriately, the mouths need to be cut open. This is done by manually indicating eight points on the lips and computing the shortest geodesic path between the points in the correct order. The mesh is then cut along this path. The results of this last experiment are shown as a normalized histogram of relative (Fig. 16(a)) and absolute (Fig. 16(b)) variation of geodesic and Euclidean distances using the distances on the average of all neutral scans as reference considering the distances between each pair of the 47 facial ﬁducials. The standard deviation of the relative variation of geodesic and Euclidean distances is 11.30% and 11.21% respectively. The variation, expressed as mean coefﬁcient of variation (CV), for the geodesic distances is 8.25%, for the Euclidean distances 8.14%. The CV is higher than for the Bosphorus database because here more landmarks are chosen on the ﬂexible part of the face.

4.3. Conclusion The experiments, validating the isometric deformation assumption, clearly show that the assumption holds to a much smaller extent for 3D faces with expression variations compared to 3D articulating objects. Moreover, the stronger the expression, the more variation of geodesic distances can be noticed.

5. Discussion The main advantage of the methods proposed in this paper is that, except for the preprocessing step in case of 3D face recognition, the algorithms do not determine any explicit point correspondence between the different surfaces. The spectral decomposition captures the intrinsic shape information in the diagonal matrix, independent of the sampling order and histograms are by deﬁnition sampling order independent. On the other hand, one could mention that it is hard to satisfy the condition that each point needs to have a corresponding point on each other facial surface of the same subject. However, when enough points on the same facial area are considered, there will always be a point close to the (virtual) corresponding point. The most important disadvantage of the proposed modal representation, however, is the sensitivity to topological changes. This occurs for example in non-rigid objects when two hands are merging, or in 3D faces when the mouth opens. Object recognition experiments on the TOSCA shape database have clariﬁed the outperformance of the approach with modal representation over the histogram approaches, the state-of-theart multidimensional scaling technique and the baseline ICP algorithm, having a rank-1 recognition rate of 100% and an equal error rate of 1.58% using the 128 largest eigenvalues. This result proves the effectiveness of describing shapes that are represented by closed surfaces by the largest eigenvalues of an afﬁnity matrix. For rigid objects a Euclidean distance matrix could sufﬁce. However, when isometric deformations are present in the data, the geodesic distance matrix representation is preferable. Although the algorithms are extensively compared to the state-of-the-art multidimensional scaling method and the ICP baseline algorithm, it is still useful to compare the results with other methods using an isometric deformation model, validated on the same database. Bronstein et al. [36] report an EER of 16.32% for the multidimensional scaling (MDS) algorithm and an EER of 15.49% for the generalized multidimensional scaling (GMDS)

2828

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

Fig. 17. Some visual results show the consistent correct recognition. The ﬁrst column represent the probes, the second to fourth the three closest matches with their dissimilarity to the probe and the last column shows the correct match.

algorithm, which is explained in Section 1.1 and in more detail in [11]. Although these methods also use the GDM representation, their performance is clearly lower. By applying MDS on this GDM, a canonical form in dimension m is created. The canonical forms are compared using rigid registration. If the MDS is done by classical scaling, an eigenvalue decomposition is performed on the squared GDM after double-centering (see Eq. (7)). Due to the double-centering, the invariance of the diagonal matrix (with the

eigenvalues) does not hold anymore when permuting the points on the surface. If m¼3, only the three largest eigenvalues are taken and some intrinsic shape information is discarded, which might be another explanation for the lower performance. Validation in the ‘‘SHREC 2010—Shape Retrieval Contest of Non-rigid 3D Models’’ [27] demonstrates an excellent performance of the modal representation approach on more noisy data. It outperforms the method of Ohbuchi and Furuya, which renders

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

range images from the 3D models. Recognition is done using a bag-of-features approach with SIFT features extracted from the range images. The singular value approach also outperforms the method of Wuhrer and Shu, which strongly resembles the leastsquares MDS approach which is also validated here on the TOSCA database. This excellent performance is conﬁrmed in the ‘‘SHREC 2011—Shape Retrieval Contest of Non-rigid 3D Watertight Meshes’’ [28]. On all evaluation criteria, the modal representation performs better than all other participants. The second best performing approach (based on most evaluation criteria) of Lian and Godil constructs a canonical shape using MDS after which the approach of Ohbuchi and Furuya [27] is applied. The shapeDNA approach of Reuter [37] is similar to the modal representation of the GDM, since it decomposes the Laplace–Beltrami operator. This matrix, however, is sparse and does not contain the large distances that are present in the GDM. Therefore, it is more sensitive to sampling. For 3D face recognition, the performances are clearly lower than for object recognition, with a rank-1 recognition rate of 61.9% and an equal error rate of 11.8%. The main reason for this discrepancy is probably the difference of validity of the isometric deformation assumption between articulating objects and 3D faces with the presence of expression variations. While the variation of Euclidean distances is much larger than the variation of geodesic distances for the articulating hand, no clear difference can be noticed for 3D faces with expressions. This last ﬁnding is in contradiction with the measurements by Bronstein et al. [38], where a clear discrepancy is noticed between variation of geodesic and Euclidean distances. However, also in [38] a standard deviation of 15.85% on the relative variation is reported (we found 11.30%). Following [16], the standard deviation of the relative change in geodesic distance was found to be about 15%. Therefore, the isometric deformation model is an approximation. Moreover, without additional processing to disconnect upper and lower lip in the surface representation before determination of the geodesic distances, it is not valid for faces with open mouth. Also occlusions result in a miscalculation of geodesic distances. These aspects come down to the assumption that a bijective map that preserves distances on the surface exists between the surfaces. They can be seen as the general disadvantages of methods using an isometric deformation model. However other strategies for expression-invariant methods, not relying on the isometric assumption, have also disadvantages that are not present in methods using the isometric deformation model. Methods using statistical models to model expression variation always need a training stage to construct the model. Therefore, if no representative training data are used, the recognition performance will decrease. The region-based methods, that perform face recognition on the more rigid parts of the face, do not use all available information by throwing away those parts that are affected by expressions. This leads to loss of information that could be discriminative. Due to the approximation of the isometry assumption, the facial area extracted by the geodesic cropping (described in Section 3.2.1) is only approximately the same for faces from the same subject. These differences in facial area clearly inﬂuence the global shape descriptor (modal representation) and can therefore be more decisive than subtle shape differences. Moreover, occlusions and missing data violate the assumption of the existence of a bijective map between the two surfaces and therefore deteriorate the modal representation too. Again, it is interesting to compare the proposed modal representation approach with other expression invariant face recognition methods using an isometric deformation model and validated on the same database. Mpiperis et al. [16] developed a

2829

face recognition method, which is based on an isometric deformation model using the geodesic polar representation. Instead of calculating pairwise geodesic distances, geodesic distances from the nose tip to all other points are calculated to construct a geodesic polar parameterization. In [16], this method is compared with another method using an isometric deformation model developed by Bronstein et al. [38]. Thereto, again a subset of the BU-3DFE database [32] of 1600 images is used. Mpiperis et al. reported an EER of 9.8% for their method and 15.4% for the method of [38]. A partial explanation for the better performance is the use of color information and a special method for handling the open mouth problem in [16]. It is again particularly interesting to compare the method with the related method described in [39] (which is the basis of [38]). Here, MDS is applied on the GDM representation of the face generating a 3D canonical form. Thus, only the three largest eigenvalues of the squared double-centered GDM are taken and some intrinsic shape information is discarded. This might by an explanation for the higher EER of 15.4% in [16] than the EER of 11.8% of the method that is described here.

6. Conclusions In this work, we proposed two methods for isometric deformation invariant 3D shape recognition. Both are based on the geodesic distance matrix (GDM) as an isometric deformation invariant shape representation. Because this matrix is deﬁned up to a random simultaneous permutation of rows and columns, two GDM permutation invariant approaches are proposed. The ﬁrst calculates a histogram on the GDM, while the second computes the eigenvalues (singular values). The methods are validated for object and 3D face recognition in an identiﬁcation and a veriﬁcation scenario. A better performance is noticed for the modal representation method for the object recognition scenario in comparison to the histogram method, the state-of-art MDS method and the baseline algorithm. For this experiment a rank-1 recognition rate of 100% and an equal error rate of 1.58% are reported for the 128 largest singular values. The method also outperforms other isometric deformation modeling methods found in literature. For 3D face recognition, the performance is lower probably because the isometric deformation assumption holds true to a much smaller extent. The main advantage of the proposed methods is their independence to correspondence information. The singular value decomposition captures some intrinsic shape information in the diagonal matrix, independent of the sampling order while the histogram is sampling order independent by deﬁnition. As future work, we propose to further exploit the modal decomposition method in order to obtain correspondences between different objects. This can be done using the eigenvectors or singular vectors based on the method of Brady and Shapiro [40]. To improve the performance for 3D face recognition in more realistic databases, such as FRGC v2 [41] or SHREC [42], mesh preprocessing algorithms need to be implemented to remove spikes and holes. Handling outliers and missing data requires intervention in the eigendecomposition of the distance matrix by giving more weight to inliers and overlapping regions, respectively.

Acknowledgments This work is supported by the Flemish Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Program of the Fund for Scientiﬁc Research—Flanders (Belgium) (FWO) and the Research Fund K.U.Leuven.

2830

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

References [1] AIM@SHAPE, SHREC—3D shape retrieval contest. /http://www.aimatshape. net/event/SHRECS. [2] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Numerical Geometry of Non-Rigid Shapes, Springer, 2008. [3] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey, ACM Computing Surveys 35 (December) (2003) 399–458. [4] W.D. Jones, Plastic surgery 1, face recognition 0, IEEE Spectrum 46 (September) (2009) 17. [5] K.W. Bowyer, K.I. Chang, P.J. Flynn, A survey of approaches to threedimensional face recognition. in: ICPR ’04: Proceedings of the 17th International Conference on Pattern Recognition, vol. 1, IEEE Computer Society, Washington, DC, USA, 2004, pp. 358–361. [6] A. Elad, R. Kimmel, On bending invariant signatures for surfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10) (2003) 1285–1295. [7] A.B. Hamza, H. Krim, Geodesic object representation and recognition. in: DGCI ’03, Lecture Notes in Computer Science, vol. 2886, Springer, 2003, pp. 378–387. [8] V. Jain, H. Zhang, A spectral approach to shape-based retrieval of articulated 3D models, Computer-Aided Design 39 (5) (2007) 398–407. [9] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Expression-invariant 3D face recognition, in: J. Kittler, M. Nixon (Eds.), AVBPA ’03: Proceedings of the 4th International Conference on Audio and Video-based Biometric Person Authentication, Lecture Notes in Computer Science, vol. 2688, Springer, 2003, pp. 62–69. [10] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Expression-invariant face recognition via spherical embedding. in: ICIP ’05: Proceedings of the IEEE International Conference on Image Processing, vol. 3, IEEE Computer Society, Genoa, Italy, September 2005, pp. 756–757. [11] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching, Proceedings of the National Academy of Sciences of the United States of America 103 (January) (2006) 1168–1172. [12] M.M. Bronstein, A.M. Bronstein, R. Kimmel, Robust expression-invariant face recognition from partially missing data, in: ECCV ’06: Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, May 2006, pp. 396–408. [13] S. Berretti, A.D. Bimbo, P. Pala, Description and retrieval of 3D face models using iso-geodesic stripes, in: MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, ACM, New York, NY, USA, 2006, pp. 13–22. [14] S. Feng, H. Krim, I.A. Kogan, 3D face recognition using Euclidean integral invariants signature, in: SSP ’07: IEEE/SP 14th Workshop on Statistical Signal Processing, Madison, WI, USA, August 2007, pp. 156–160. [15] S. Jahanbin, H. Choi, Y. Liu, A.C. Bovik, Three dimensional face recognition using iso-geodesic and iso-depth curves, in: BTAS ’08: Proceedings of the IEEE Second International Conference on Biometrics Theory, Applications and Systems, Arlington, Virginia, USA, September 2008. [16] I. Mpiperis, S. Malasiotis, M.G. Strintzis, 3-D face recognition with the geodesic polar representation, IEEE Transactions on Information Forensics and Security 2 (September) (2007) 537–547. [17] K. Ouji, B.B. Amor, M. Ardabilian, F. Ghorbel, L. Chen, 3D face recognition using ICP and geodesic computation coupled approach, in: SITIS ’06: Proceedings of the International Conference on Signal-Image Technology and Internet-Based Systems, Hammemet, Tunisia, December 2006, pp. 435–444. [18] X. Li, H. Zhang, Adapting geometric attributes for expression-invariant 3D face recognition, in: SMI ’07: Proceedings of the IEEE International Conference on Shape Modeling and Applications, IEEE Computer Society, Washington, DC, USA, 2007, pp. 21–32. [19] S. Gupta, J.K. Aggarwal, M.K. Markey, A.C. Bovik, 3D face recognition founded on the structural diversity of human faces, in: CVPR ’07: Proceedings of the International Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Minneapolis, Minnesota, USA, June 2007. [20] J. Jost, Riemannian Geometry and Geometric Analysis, Universitext (1979), Springer, 2008. [21] C.F. Gauss, Disquisitiones Generales Circa Superﬁcies Curvas, vol. 6, October 1827.

[22] E.W. Weisstein, First fundamental form, From MathWorld—A Wolfram Web Resource. /http://mathworld.wolfram.com/FirstFundamentalForm.htmlS, (last visited: 10.08.09). [23] R. Kimmel, J.A. Sethian, Computing geodesic paths on manifolds, Proceedings of the National Academy of Sciences of the United States of America 95 (1998) 8431–8435. [24] G. Peyre´, L.D. Cohen, Heuristically driven front propagation for fast geodesic extraction, International Journal for Computational Vision and Biomechanics 1 (January) (2008) 55–67. [25] M. Carcassoni, E.R. Hancock, Spectral correspondence for point pattern matching, Pattern Recognition 36 (2003) 193–204. [26] P.J. Besl, H.D. Mckay, A method for registration of 3-D shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (2) (1992) 239–256. [27] Z. Lian, A. Godil, T. Fabry, T. Furuya, J. Hermans, R. Ohbuchi, C. Shu, D. Smeets, P. Suetens, D. Vandermeulen, S. Wuhrer, SHREC’10 track: non-rigid 3D shape retrieval, in: 3DOR ’10: Proceedings of the Eurographics 2010 Workshop on ¨ 3D Object Retrieval, Eurographics Association, Norrkoping, Sweden, May 2010, pp. 101–108. [28] Z. Lian, A. Godil, B. Bustos, M. Daoudi, J. Hermans, S. Kawamura, Y. Kurita, G. Lavoue´, H. Nguyen, R. Ohbuchi, Y. Ohkita, Y. Ohishi, F. Porikli, M. Reuter, I. Sipiran, D. Smeets, P. Suetens, H. Tabia, D. Vandermeulen, SHREC’11 track: shape retrieval on non-rigid 3D watertight meshes, in: 3DOR ’11: Proceedings of the Eurographics 2011 Workshop on 3D Object Retrieval, Eurographics Association, Llandudno, UK, April 2011, pp. 79–88. [29] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, E. Boyer, Articulated shape matching using Laplacian eigenfunctions and unsupervised point registration, in: CVPR ’08: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008. [30] K. Siddiqi, J. Zhang, D. Macrini, A. Shokoufandeh, S. Bouix, S. Dickinson, Retrieving articulated 3-D models using medial surfaces, Machine Vision and Applications 19 (May) (2008) 261–275. [31] P. Shilane, P. Min, M. Kazhdan, T. Funkhouser, The Princeton shape benchmark, in: SMI ’04: Proceedings of Shape Modeling International, June 2004. [32] L. Yin, X. Wei, Y. Sun, J. Wang, M.J. Rosato, A 3D facial expression database for facial behavior research, in: FG ’06: Proceedings of the 7th IEEE International Conference on Automatic Face and Gesture Recognition, Southampton, UK, April 2006, pp. 211–216. [33] C. Xu, T. Tan, Y. Wang, L. Quan, Combining local features for robust nose location in 3D facial data, Pattern Recognition Letters 27 (13) (2006) 1487–1494. [34] W.J. Chew, K.P. Seng, L.-M. Ang, Nose tip detection on a three-dimensional face range image invariant to head pose, in: IMECS ’09: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, Hong Kong, March 2009. ¨ [35] A. Savran, N. Alyuz, H. Dibeklioglu, O. Celiktutan, B. Gokberk, B. Sankur, L. Akarun, Bosphorus database for 3D face analysis, in: Proceedings of the First COST 2101 Workshop on Biometrics and Identity Management (BIOD), Denmark, May 2008. [36] A.M. Bronstein, M.M. Bronstein, R. Kimmel, M. Mahmoudi, G. Sapiro, A Gromov-Hausdorff framework with diffusion geometry for topologicallyrobust non-rigid shape matching, International Journal of Computer Vision, 2009, published online. [37] M. Reuter, F.-E. Wolter, N. Peinecke, Laplace–Beltrami spectra as ‘Shape-DNA’ of surfaces and solids, Computer-Aided Design 38 (4) (2006) 342–366. [38] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Expression-invariant representations of faces, IEEE Transactions on Image Processing 16 (January) (2007) 188–197. [39] A.M. Bronstein, M.M. Bronstein, R. Kimmel, Three-dimensional face recognition, International Journal of Computer Vision 64 (1) (2005) 5–30. [40] L.S. Shapiro, J.M. Brady, Feature-based correspondence: an eigenvector approach, Image and Vision Computing 10 (5) (1992) 283–288. [41] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, W. Worek, Overview of the face recognition grand challenge. in: CVPR ’05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, IEEE Computer Society, Washington, DC, USA, 2005, pp. 947–954. [42] M. Daoudi, F. ter Haar, R.C. Veltkamp, SHREC 2008—shape retrieval contest of 3D face scans. /http://give-lab.cs.uu.nl/SHREC/shrec2008/S, 2008.

Dirk Smeets graduated at the K.U.Leuven as Biomedical Engineer (magna cum laude) in June 2008. With his master’s thesis he conducted research on semi-automatic liver tumor segmentation in CT images. At this moment Dirk works in the research group Medical Image Computing (ESAT/PSI) at the K.U.Leuven where he is making a Ph.D. on intra-subject deformation modeling for forensic biometric authentication under promotership of Prof. Dr. Ir. Dirk Vandermeulen. He is author of 12 peer-reviewed papers in international journals and conference proceedings.

Jeroen Hermans received the degree in Electrical Engineering at the K.U.Leuven in June 2003. In October 2003, he joined the Medical Image Computing group of the K.U.Leuven. After conducting research regarding patient positioning for radiotherapy treatment of prostate cancer, he started working on his Ph.D. thesis, named Automated Registration of 3D Knee Implant Components to Single-Plane X-Ray Images. He obtained the Ph.D. degree in Electrical Engineering in June 2009. He continued working in the same group, focussing on a variety of medical and forensic image processing applications.

D. Smeets et al. / Pattern Recognition 45 (2012) 2817–2831

2831

Paul Suetens is professor of Medical Imaging and Image Processing and chairman of the Medical Imaging Research Center at the University Hospital Leuven. He is also Head of the Division for Image and Speech Processing at the Department of Electrical Engineering of K.U.Leuven. He is author of more than 400 peer-reviewed papers in international journals and conference proceedings and author of the book ‘‘Fundamentals of Medical Imaging’’ (2 editions). He is involved in several graduate (Biomedical Engineering, Biomedical Sciences) and postgraduate programs (Advanced Medical Imaging, Biomedical Engineering). He is associate editor of IEEE Transactions of Medical Imaging, and has been a member of several research councils and commissions (POC, Research Council, IOF, FWO). He was chairman of the Board of Management of MEDICIM, a spin-off of the K.U.Leuven (Medicim was acquired by Nobel Biocare in 2008).

Dirk Vandermeulen obtained a Masters Degree in Computer Science in 1983 and a Ph.D. in Electrical Engineering in 1991, both at the Katholieke Universiteit Leuven, Belgium. His initial research concentrated on image processing for computer-assisted stereotactic neurosurgery. Some of this work was implemented in a commercially available stereotactic planning system. His work then moved to medical image analysis, still with a strong focus on neurosurgical and neurological applications. He became assistant professor in 1998 and associate professor in 2004 at the Division for Image and Speech Processing at the Department of Electrical Engineering, K.U.Leuven, where he supervises research work on model-based image analysis, with a strong emphasis on image registration based approaches. Since 2002 he is involved in applying computer vision and medical image analysis techniques to forensic imaging, more in particular cranio-facial reconstruction and biometric authentication. He published more than 70 journal papers and over a hundred conference articles.

Isometric deformation invariant 3D shape recognition

Isometric deformation invariant 3D shape recognition

Recommend Documents