Gender discriminating models from facial surface normals

Pattern Recognition 44 (2011) 2871–2886 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr ...

Download PDF

1MB Sizes 2 Downloads 68 Views

Report

PDF Reader
Full Text

Pattern Recognition 44 (2011) 2871–2886

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Gender discriminating models from facial surface normals Jing Wu , William A.P. Smith, Edwin R. Hancock Department of Computer Science, The University of York, Heslington, York YO10 5DD, UK

a r t i c l e i n f o

abstract

Article history: Received 31 March 2010 Received in revised form 4 January 2011 Accepted 22 April 2011 Available online 5 May 2011

In this paper, we show how to use facial shape information to construct discriminating models for gender classiﬁcation. We represent facial shapes using 2.5D ﬁelds of facial surface normals, and investigate three different methods to improve the gender discriminating capacity of the model constructed using the standard eigenspace method. The three methods are novel variants of principal geodesic analysis (PGA) namely (a) weighted PGA, (b) supervised weighted PGA, and (c) supervised PGA. Our starting point is to deﬁne a weight map over the facial surface that indicates the importance of different locations in discriminating gender. We show how to compute the relevant weights and how to incorporate the weights into the 2.5D model construction. We evaluate the performance of the alternative methods using facial surface normals extracted from 3D range images or recovered from brightness images. Experimental results demonstrate the effectiveness of our methods. Moreover, the classiﬁcation accuracy, which is as high as 97%, demonstrates the effectiveness of using facial shape information for gender classiﬁcation. & 2011 Elsevier Ltd. All rights reserved.

Keywords: Gender classiﬁcation Facial surface normals Statistical model Feature extraction Principal geodesic analysis

1. Introduction Gender classiﬁcation plays a signiﬁcant role during our social interactions. It has attracted signiﬁcant attention in the psychology literature [1–4]. Gender determination is also important in human–computer interaction, since it plays an important role in controlling the style of dialogue adopted. It is also a key issue in security and biometrics. Despite its importance, gender classiﬁcation has not attracted signiﬁcant attention in the computer vision literature. Most existing gender determination methods are based on 2D intensity images. However, the gender differences apparent from intensity appearance are subtle, and are easily affected by changes in lighting conditions or the application of facial makeup. Compared to the 2D facial appearance conveyed by brightness images, 3D facial shape provides a more reliable information source for surveillance purposes. The reason for this is that it cannot be easily modiﬁed and it is not affected by changes in lighting. Moreover, psychologists have shown that gender is revealed by both the 2D texture and the 3D structure of the human face [1]. In fact, it has been shown that gender classiﬁcation is more effective when 3D structure is used than when image brightness information alone is used [3]. However, relatively little effort has been expended in determining the role of 3D facial shape in machine gender classiﬁcation [5–7]. This is partly attributable to the more complex computations required in

Corresponding author. Tel.: þ 44 29 20876751; fax: þ44 29 20874598.

E-mail address: [email protected] (J. Wu). 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.04.013

3D facial shape analysis, and the limited effectiveness and high cost of the 3D sensors currently available. An alternative is to use a 2.5D representation based on facial surface normals, or facial needle-maps. These reveal facial shape information from a ﬁxed view point, and can be recovered from 2D brightness images using techniques such as shape-from-shading (SFS). Therefore, in this paper, we pursue the development of a gender classiﬁcation method based on facial needle-maps rather than on intensity images. Having established the relative usefulness of both 2D intensity and 3D shape representations, our main concern in this paper is the development of improved statistical techniques for gender classiﬁcation. For gender classiﬁcation, and for face recognition in general, there are two steps, namely (a) feature extraction and (b) classiﬁcation. This requirement has been widely accepted for some time. In fact, even the earliest gender classiﬁcation system ‘SEXNET’ [8] adhered to it, employing a two-stage network: the ﬁrst stage served to compress a 900-unit image to just 40 representational units, and the second stage classiﬁed each face using a 40-unit representation. Cottrell et al. [9] reduced the image dimensionality from 4096 to 40 via an autoencoder, which is similar in function to principal component analysis (PCA). In fact, there are several gender classiﬁcation approaches using PCAbased features [6,10,11]. These include active appearance models (AAM) [12,13]. Recently, there has been a trend to emphasize the role of the classiﬁer. This is based on the successful application of support vector machines (SVMs) and AdaBoost to gender classiﬁcation [14,15]. In [14,15], feature extraction was simpliﬁed to using ‘thumbnail’ images. However, although using these

2872

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

classiﬁers achieved a higher accuracy than using PCA-based features alone, it is non-trivial to determine the optimal parameters to be used in a complex classiﬁer system. Here we take a different view, and place the emphasis on the role of feature extraction in gender classiﬁcation. This is based on observations concerning the human perceptual system, which utilizes just a very few of the most important attributes [16]. Well selected gender discriminating features can save considerable effort in the classiﬁcation stage. Moreover, they may indicate the inherent difference between genders. Therefore, in this paper, we focus on how to construct discriminating models for gender feature extraction. We aim to improve the discriminating power of PCA-based features, and to demonstrate the effectiveness of using these discriminating features in conjunction with simple classiﬁers. 1.1. Related literature Based on the type of features used, there are two main approaches to gender classiﬁcation. The ﬁrst uses geometric based features. A popular approach is that of Brunelli and Poggio [17], who deﬁned 16 geometrical features as input to train two competing HyperBF networks that were then used to classify gender. A correct classiﬁcation rate of 79% was reported for ‘unseen faces’. In Burton et al.’s work [2] from psychology, 73 points from full-face views and 34 points from proﬁle views are extracted, and an accuracy approaching human performance (94%) was reported. More pertinent to our work is the second approach to gender classiﬁcation which is appearance based. This aims to make use of the full facial image without extracting any geometric features. As mentioned above, many such approaches emphasize the role of the classiﬁer, and use ‘thumbnail’ images as input. Gutta and Wechsler [18] used hybrid classiﬁers consisting of an ensemble of RBF networks together with inductive decision trees. On average, an accuracy of 96% was reported for gender classiﬁcation using this technique. Moghaddam and Yang [14] demonstrated that non-linear SVMs were superior to traditional pattern classiﬁers (linear, quadratic, Fisher linear discriminant, nearest neighbor), RBF classiﬁers, and large ensemble-RBF networks. An accuracy of better than 96% was reported using an SVM with Gaussian RBF kernels. Recently, Kim et al. [19] performed gender classiﬁcation using Gaussian process classiﬁers, which are a class of Bayesian kernel classiﬁers. This technique overcame the difﬁculty encountered by SVMs in determining the hyperparameters for the kernels. Another popular approach to gender classiﬁcation is to use Adaboost learning methods. This type of classiﬁer is much faster than SVMs, and represents a better choice for real-time applications. Shakhnarovich et al. [20] applied a thresholded weak classiﬁer variant of Adaboost to detected face images, and achieved an accuracy of 78% for gender classiﬁcation. Also using a weak classiﬁer Adaboost approach, Wu et al. [21] used a look-uptable to learn gender classiﬁers, and achieved an accuracy of 88%. More recently, Baluja and Rowley [15] used Adaboost with pixel comparisons, and achieved 93% accuracy on 20 20 pixel images. Moreover, the method was 50 times faster than SVM-based classiﬁers. Combination of classiﬁers are also used in gender classiﬁcation. Khan et al. [22] used genetic programming (GP) to combine four traditional classiﬁers (k-means, k-nearest neighbors, linear discriminant analysis, and Mahalanobis distance based classiﬁers) and evolved an optimum combined classiﬁer (OCC). They demonstrated that using OCC not only improve gender classiﬁcation performance, but also need fewer features than the original classiﬁers. They also apply GP to combine SVM classiﬁers [23], and demonstrated the improved performance using the developed OCC. Moreover, the GP combination

scheme automatically incorporated the optimal kernel function and model selection in SVMs to achieve high performance classiﬁcation model. As noted above, in this paper we take the view that feature extraction is also an important issue in gender classiﬁcation. Besides the above mentioned SEXNET system [8], and the work of Cottrell et al. [9], there are many other methods that make use of feature extraction for gender classiﬁcation. Sun et al. [10] applied genetic algorithms to the extracted PCA feature vectors to select a gender discriminating feature subset, and compared the classiﬁcation results obtained using four different classiﬁers. The lowest error rate of 4.7% was achieved using an SVM classiﬁer. Buchala et al. [11] also explored which were the most gender discriminating PCA feature components using linear discriminant analysis (LDA). A gender classiﬁcation accuracy of 86.43% was reported based on the selected feature subset. Recently, active appearance models (AAM) [24], which describe the statistical variations of both gray scale values and shape, have been used as a feature extraction mechanism for gender classiﬁcation. In [12], the AAM was compared with independent component analysis (ICA) for feature extraction using four classiﬁers, namely the nearest neighbor, RBF, multilayer perception (MLP), and generalized learning vector quantization. The best accuracy of over 90% was obtained using AAM features and the MLP classiﬁer. Saatci and Town [13] utilized AAM features and SVM for both gender and expression recognition. They also investigated the interdependency of gender recognition upon expression. In addition to these PCA-based feature extraction methods, alternative feature extraction (or dimensionality reduction) methods, such as ICA [25], locally linear embedding (LLE) [6] and curvilinear component analysis [26] have also been applied to gender classiﬁcation. The above approaches are all based on 2D intensity images. Although 3D facial shape has its advantages in gender classiﬁcation, little work has utilized this information. Graf and Wichmann [6] applied PCA and LLE to 3D range images of human heads, and used an SVM as the classiﬁer. An accuracy of 93.4% has been reported using PCA and SVM in conjunction. Lu and Chen [5] exploited range information for gender classiﬁcation, and proposed an integration scheme which combined the registered range and intensity images. Their experiments demonstrated that integrating the 3D range information provided a better classiﬁcation accuracy than using the 2D intensity alone. The accuracy of the combined method was 91%. Hu et al. [7] explored the gender signiﬁcance of different facial regions in terms of 3D facial shape, and proposed a fusion method to combine the classiﬁcation results of different regions. An accuracy as high as 94.3% was reported. In our previous work [27,28], we explored the use of principal geodesic analysis (PGA) [29] and a model-based shapefrom-shading technique [30,31] to recover ﬁelds of facial surface normals from brightness images. Using the PGA parameters of the ﬁtted model and a simple Bayes classiﬁer, gender classiﬁcation accuracies of 95% for ground-truth facial needle-maps and 90% for needle-maps recovered using shape-from-shading were achieved. 1.2. Paper overview In this paper, we focus on the task of constructing statistical models for extracting gender discriminating features from 2.5D facial needle-maps. The feature extraction techniques for gender classiﬁcation described above are mostly based on PCA, which captures the projections that maximize the variance of the data. However, the projections that maximize the variance are usually not those that best separate the data into distinct clusters. As a result the leading PCA features do not reliably reveal gender differences. In our previous work [32–34], we have proposed the idea to weight the facial needle-maps to take into account the

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

relevance of different locations in determining gender. The aim in doing this is to improve the gender discriminating capacity of the leading features extracted using the standard eigenspace method. In this paper, we (a) describe our previous ideas in detail and more clearly, (b) advance our previous work by proposing a learning strategy to ﬁnd the optimal weight map, and (c) compare the results obtained using these weighting strategies on both range data and recovered facial needle-maps. The results also demonstrate the feasibility of gender classiﬁcation using facial shape information contained within the 2.5D facial needle-map. In our work, the shape of a face is represented by a ﬁeld of facial surface normals which reside on a spherical manifold rather than in a Euclidean space. Linear data analysis techniques such as PCA are not suitable for the analysis of directional data of this type. We therefore commence by showing how to construct an eigenspace model for 2.5D facial surface normals using principal geodesic analysis. We also review the principal geodesic SFS method [31], which is used in our experiments to recover the facial needle-maps from intensity images. Turning our attention to the extraction of gender discriminating features, there are a number of ways to enhance the discriminating power of the standard PGA model. In our previous work [32,34], we presented the most straightforward strategy, referred to as weighted PGA. The idea is to incorporate a precomputed weight map into the PGA model construction process. Research in the psychology literature [4,35] has conﬁrmed that information concerning gender is not uniformly distributed over the face. Work in computer vision has also explored the role of facial regions in gender recognition [36]. The pre-computed weight map quantiﬁes the importance of different facial regions for gender classiﬁcation. It enables us to control the structure of the data variance so that the variance associated with gender discriminating regions is larger than that for non-discriminating regions, and is captured by the leading principal eigenmodes. The weight map could be obtained through psychological experiments on human subjects [35]. For simplicity, however, we construct the weight map using the angular difference between the mean faces of the two genders. The main contribution of this paper is a supervised version of the above method, termed supervised weighted PGA. Here the weight map is learned from the labeled data by minimizing an error function. The weight map is applied during the feature extraction process rather than during model construction. It is therefore speciﬁc to the training data, and enhances the discriminating power of the leading features without affecting the efﬁciency of the model construction process. Experimental results show that by sacriﬁcing some of the universality of the weight map, the supervised weighted PGA method achieves better classiﬁcation accuracy than the weighted PGA method. We also describe a third strategy (idea has been proposed in our previous work [33,34]) to control the gender discriminating power of the constructed model. The method, referred to as supervised PGA, is an extension of the supervised PCA technique [37] from Euclidean data to non-Euclidean data residing on a Riemmanian manifold. According to this approach, PGA is viewed as locating the projection that maximizes the sum of pairwise distances between the projected data. Supervised PGA incorporates a weight for each pair of data that indicates their dissimilarity. This weight map encodes the pairwise gender relationships residing in the data. By making use of this weight map, supervised PGA constructs a gender discriminating model that emphasizes the inter-cluster separation. Experimental results show that supervised PGA slightly outperforms weighted PGA. However, it is relatively speciﬁc to the training data used. We compare the gender discriminating power of the extracted features and the classiﬁcation accuracies obtained using the three

2873

different statistical models. We also compare the results with those obtained using the standard PGA model. The outline of this paper is as follows. Section 2 gives a brief review of the standard PGA method and the principal geodesic SFS method, which are the theoretical background of this paper. In Sections 3–5, we respectively describe the three different strategies (weighted PGA, supervised weighted PGA, and supervised PGA) used to enhance the gender discriminating power of the extracted features. Section 6 reviews the nearest neighbor classiﬁer applied to the features. Experimental results and their discussion are given in Section 7. Finally, Section 8 concludes the paper and offers directions for future investigation.

2. Theoretical background In this section, we review the principal geodesic analysis technique and the principal geodesic SFS method. PGA is the theoretical background of the three novel variants of PGA proposed in this paper. The principal geodesic SFS method is used in our experiments to recover facial shapes from intensity images. 2.1. Principal geodesic analysis PGA is a generalization of PCA from Euclidean data to data residing on a Riemmanian manifold. It makes use of exponential/ log maps and intrinsic means [29,38] to project data onto a plane and analyze the data in the projected space. In our application, a surface normal n can be represented as a point residing on a spherical manifold. The exponential/log maps for a spherical manifold are illustrated in Fig. 1. The exponential map, denoted by Expn, maps u to the point, denoted by Expn(u), on the geodesic in the direction of u at distance JuJ from n. The log map, denoted by Logn is the inverse of the exponential map. Intrinsic means minimize the sum-of-squared geodesic distances on a manifold. For a spherical manifold, the geodesic distance is the arc length between two points. The intrinsic mean m of a set of surface normals n1 , . . . ,nK can be calculated using the gradient descent method proposed by Pennec [38]: ( ) K 1X mðt þ 1Þ ¼ ExpmðtÞ Log mðtÞ ðnk Þ : ð1Þ Kk¼1 It has been explained in detail in our previous work [31,28] how to ﬁnd the principal geodesics from a set of facial needlemaps. Fig. 2 illustrates the process. Suppose there are K facial needle-maps each having N pixel locations. The surface normal at the pixel location l for the kth needle-map is nkl. The left panel of Fig. 2 shows the distribution of surface normals at the pixel location l (nkl , k ¼ 1, . . . ,K) with the mean ml shown as a star. The right panel of Fig. 2 shows the log mapped positions of these normals on the tangent plane passing through ml . We denote the log mapped position of nkl as ukl. By concatenating the x,y-coordinates of ukl at N pixel locations, we get the 2N dimensional log mapped long vector uk ¼ ½uk1x ,uk1y , . . . ,ukNx ,ukNy T . The K long

Fig. 1. The exponential map.

2874

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

1

0.5

PCA1

0

−0.5

−1 −1

−0.5

0

0.5

1

Fig. 2. Projection of surface normals on the unit sphere to points on the tangent plane at the mean.

Fig. 3. The steps of principal geodesic SFS.

vectors form the column-wise data matrix U ¼ ½u1 j . . . juK , and the covariance matrix is S ¼ ð1=KÞUU T . Because N, the dimensionality of the facial needle-maps, is usually too large to make the manipulation of S feasible, the numerically efﬁcient snap-shot method of Sirovich [39] is used to compute the eigenvectors of S. The obtained eigenvector matrix (projection matrix) is denoted as F ¼ ðe1 je2 j jeK1 Þ, where ei , i ¼ 1, . . . ,K1, are the eigenvectors sorted with descending eigenvalues. Given a facial needle-map, the log mapped long vector u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T is computed, then the corresponding PGA parameter vector is b ¼ FT u. 2.2. Principal geodesic shape-from-shading A new iterative shape-from-shading method [31], referred to as principal geodesic SFS, is used in our experiments to recover the facial needle-maps from intensity images. The steps of this SFS method during each iteration are illustrated in Fig. 3. The estimated surface normals are ﬁrst projected into a space spanned by a statistical model to satisfy a strict global shape constraint. Then the image irradiance equation is imposed as a hard local brightness constraint [40]. Upon convergence, there are two types of recovered needle-maps. One is an instance of the statistical model (referred to as the best ﬁt needle-map), the other is the one satisfying the data-closeness. Since the statistical model is constructed by applying PGA to a set of ground-truth facial needlemaps, it captures the distribution of surface normals of real faces. As a result, the projection into the model space guarantees that the recovered needle-maps represent valid human faces. A detailed description and an algorithm of this SFS method can be found in [31,28].

3. Weighted principal geodesic analysis In this section, we describe the weighted PGA method, which is the most straightforward strategy to increase the gender discriminating capacity of the standard PGA model. A pre-computed weight map, which emphasizes the importance of the gender discriminating regions, is incorporated into the PGA model

construction. By component-wise multiplication of the weight map with the face data, the variance of the data in the gender discriminating regions is increased. As a result, the leading eigenmodes, which capture the largest variance, will encode more gender discriminating information than the non-leading eigenvectors. There are two points that must be addressed when implementing the weighted PGA method. The ﬁrst is the construction of the weight map. The second is how to use the weight map for gender classiﬁcation. 3.1. Construction of the weight map The weight map is a representation of the distribution of gender discriminating information over the facial surface. It assigns a weight to each location on the facial needle-map. The locations of the gender discriminating regions, such as eyebrows, nose, etc., are assigned high weights (whigh), while the remaining locations are assigned low weights (wlow). In this way, we control the data variance structure so that the surface normals in the gender discriminating regions have a variance that is a factor whigh =wlow larger than the surface normals in the non-discriminating regions. An optimal weight map would ideally give the gender discrimination that is consistent with that of a human subject, and is independent of the training data. However, this optimal quantiﬁcation is difﬁcult. In this paper, for simplicity, we construct the weight map through the angular difference between the intrinsic means of the female and male facial needle-maps. At the pixel location indexed l, the weight is 1 f 2 wl ¼ 1exp 2 ½arccosðn m , ð2Þ l n l Þ

s

nm l

where is the mean unit surface normal for males at the image location l. n fl is the corresponding mean unit surface normal for females. The weight map W ¼ ½w1 , . . . ,wN T . Since the angular difference at each pixel location between the two gender means is small, this construction is consistent with the small angle behavior of the von Mises–Fisher distribution [41]. By making use of the intrinsic means, the constructed weight map is less inﬂuenced by the differences in facial shapes for subjects of the same gender. To construct the weight map, we need to determine the optimal value of s in Eq. (2). Our optimality criterion is to select the value of s that gives the weight map with which the leading d eigenvectors posses the largest gender discriminating capacity. The discriminating capacity is computed using the criterion function introduced in [16] JðYÞ ¼ trðS1 w Sb Þ ¼

d X i¼1

li ,

ð3Þ

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

where li , i ¼1yd are the eigenvalues of the matrix S1 w Sb . Here, Sw and Sb are the within and between-class scatter matrices in the feature space, i.e. Sw ¼

Sb ¼

X

Kc X

c

k¼1

ðbk b^c Þðbk b^c ÞT ,

X ^ b^c bÞ ^ T, Kc ðb^c bÞð

ð4Þ

ð5Þ

c

where c A ff ,mg denotes the set of class labels for the two genders, Kc is the number of labeled samples in class c, bk, b^c , and b^ are respectively the feature vectors of the kth sample, the mean for class c, and the global mean for the entire training set. 3.2. Apply the weight map to needle-maps An N dimensional needle-map is a point residing on the Q 2 manifold S2 ðNÞ ¼ N i ¼ 1 S . Suppose there are K such facial needle-maps. We make use of the log map to project each needlemap onto the tangent plane passing through the intrinsic mean, and represent them by the matrix of log mapped long vectors U ¼ ½u1 j juK . The weight map is multiplied component-wise with each long vector, and we obtain the set of weighted data WU ¼ ½W:u1 j jW:uK , where : denotes component-wise matrix multiplication. Because each long vector has two components for each pixel location u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T , the weight map is separately applied to the two components at each location, i.e. W:u ¼ ½w1 u1x ,w1 u1y , . . . ,wN uNx ,wN uNy T . The covariance matrix for the set of K needle-maps is constructed from the weighted data as follows: W

S

1 ¼ ðWUÞðWUÞT : K

ð6Þ

We use the numerically efﬁcient snap-shot method of Sirovich [39] to compute the eigenvectors of SW . The K 1 leading eigenvectors e1 ,e2 , . . . ,eK1 of SW form the projection matrix FW ¼ ½e1 je2 j . . . jeK1 . The projection matrix together with the corresponding eigenvalues LW ¼ ½l1 , . . . , lK1 and the intrinsic mean mW constitute the parameters of the weighted PGA model. Given a facial needle-map, through the intrinsic mean and log map, we ﬁrst compute its long vector u, and then the weighted PGA feature vector is given by bW ¼ ðFW ÞT ðW:uÞ,

ð7Þ

where : denotes component-wise matrix multiplication.

4. Supervised weighted principal geodesic analysis In this section, we describe the supervised weighted PGA method. It draws on the same framework as the weighted PGA method, but differs in two important and novel respects. First, the weight map in the weighted PGA method was constructed from the mean faces of the two genders. As a result it does not make full use of the information provided by the labeled data. The supervised weighted PGA method enhances the weighted PGA method by using an iterative learning process to construct the weight map. The second novel ingredient is that in weighted PGA, the weight map is applied during model construction. In supervised PGA, on the other hand, the weight map is applied during feature extraction (which is performed after the construction of the model). We have two main objectives in developing the supervised weighted PGA method. The ﬁrst is how to learn the gender relevant weight map from labeled facial needle-maps. The second is how to extract gender discriminating features by making use of the weight map.

2875

4.1. Feature extraction Suppose there are K facial needle-maps each having N pixel locations. Through principal geodesic analysis, the projection matrix F ¼ ½e1 j jeK1 and the corresponding eigenvalues L ¼ ½l1 , . . . , lK1 are obtained, together with the intrinsic mean m. This constitutes the standard PGA model. Given a facial needle-map n, through the log map Log m ðnÞ, the 2N dimensional long vector u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T is obtained. The standard PGA feature vector is b ¼ FT u, which can be expressed component-wise as bi ¼

2N X

FTil ul ,

ð8Þ

l¼1

where Fi denotes the ith eigenvector, and Fil is its value at the location l. Supervised weighted PGA incorporates a gender relevant weight map to improve the gender discriminating capacity of the standard PGA model. It extends the above component-wise feature extraction method in Eq. (8) by applying the weight map in the following way: bSW ¼ i

2N X

FTil wl ul ,

ð9Þ

l¼1

where wl is the weight at the location l. Because the weight map has a large absolute value in gender discriminating regions, supervised weighted PGA increases the inﬂuence of gender discriminating regions over the extracted features and decreases that of the non-discriminating regions. In this way, we increase the discriminating capacity of the standard PGA model. Eq. (9) can be rewritten as bSW ¼ FT ðW:uÞ,

ð10Þ

where : denotes component-wise matrix multiplication, and W ¼ ½w1 ,w2 , . . . ,w2N1 ,w2N T . This directly relates the supervised weighted PGA method to the weighted PGA method. Both methods multiply the weight map component-wise with the facial data. However, in weighted PGA, this multiplication takes place before the construction of the model. As a result, the model is modiﬁed by the weight map. In supervised weighted PGA, on the other hand, this multiplication takes place during feature extraction. The intrinsic mean and projection matrix are the same as the standard PGA model. Moreover, the weight map in supervised weighted PGA is optimized to the training data through a learning process. 4.2. Optimization of the weight map Suppose there are K labeled data, and the data with indices 1 to nf are labeled females, and the data with indices nf þ1 to K are labeled males. Using the labeled data the weight map is optimized to minimize a total error function

x¼

nf X distW ðbk , b^f Þ2 þ dist ðb , b^m Þ2

k¼1

W

k

K X k ¼ nf þ 1

distW ðbk , b^m Þ2 , dist ðb , b^ Þ2 W

k

ð11Þ

f

where bk, b^f , and b^m are respectively the d dimensional supervised weighted PGA feature vectors of the kth facial needle-map, the intrinsic mean for the females, and the intrinsic mean for the males, and distW is the weighted Euclidean distance between two feature vectors, which is deﬁned as vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ ! u d uX l i 2 dist ðb , b^c Þ ¼ t ð12Þ ðb b^c Þ , P W

k

i¼1

K1 j¼1

lj

ki

i

where c A ff ,mg, li is the eigenvalue corresponding to the ith eigenvector of the projection matrix F, and bki and b^c i are respectively the ith component of bk and b^c .

2876

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

We construct this total error function based on the assumption that when measured in terms of Euclidean distance, faces are closest to the mean face of the same gender. The minimum value of this error function is zero and this is achieved when distðbk , b^f Þ ¼ 0 for all female faces, and distðbk , b^m Þ ¼ 0 for all male faces. When this is the case, the surrounding data are clustered to the mean for their appropriate gender. As a result, the optimization process aims to concentrate the faces around the appropriate mean face in the feature space. We use the weighted Euclidean distance instead of the standard Euclidean distance in order to emphasize the inﬂuence of the leading eigen-features on the calculation of the distance. Substituting Eq. (9) into Eqs. (11) and (12), the total error function can be written as nf

K X qkf X qkm x¼ þ , q q k ¼ 1 km k ¼ n þ 1 kf

qkc ¼

i¼1

!(

PK1

j¼1

lj

2N X

joT SB oj , joT SW oj

ð19Þ

where j j is the determinant of a matrix, and SB, SW are respectively the between-class and within-class scatter matrices in the original data space. In our application, suppose there are K labeled data, and the data with indices 1 to nf are labeled females, and the data with indices nf þ1 to K are labeled males, then, T

T

SB ¼ nf u^f u^f þ ðKnf Þu^m u^m ,

SW ¼

nf X

ðuk u^f Þðuk u^f ÞT þ

K X

ðuk u^m Þðuk u^m ÞT ,

ð21Þ

where uk, u^f , and u^m are respectively the 2N dimensional long vectors of the kth needle-map, the female intrinsic mean, and the male intrinsic mean. Substituting Eqs. (20) and (21) into Eq. (19), then,

)2

FTil Wl ðukl u^ cl Þ

ð20Þ

k ¼ nf þ 1

ð13Þ

where

li

x0 ¼

k¼1

f

d X

within-class scatter

,

ð14Þ

l¼1

where c A ff ,mg, uk and u^ c are respectively the 2N dimensional long vectors of the kth needle-map and the intrinsic mean of class c, ukl and u^ cl are respectively the values of uk and u^ c at the location l. Since the projection matrix F, the matrix of eigenvalues L, and the labeled data ½u1 , . . . ,uK are all ﬁxed, the total error function varies only with the weight map W. We use gradient descent to optimize the weight map, that is: W ðt þ 1Þ ¼ W ðtÞ ZrxðW ðtÞ Þ,

ð15Þ

where Z is the update step size which is chosen to have a value in the interval (0,1). Suppose that W ¼ ½w1 , . . . ,w2N T , where 2N is the dimensionality of the log mapped long vector of the facial needle-maps, then T @xðWÞ @xðWÞ rxðWÞ ¼ , ..., : ð16Þ @w1 @w2N The partial derivatives of the error function are 9 8 >@qkf ðWÞ q ðWÞq ðWÞ@qkm ðWÞ> > nf > = < km kf X @xðWÞ @wx @wx ¼ 2 > > @wx qkm ðWÞ > k¼1> ; : 9 8 @qkf ðWÞ> @qkm ðWÞ > > > q ðWÞq ðWÞ = < K kf km X @wx @wx , þ 2 > > qkf ðWÞ > k ¼ nf þ 1 > ; :

jnf ðoT u^f ÞðoT u^f ÞT þ ðKnf ÞðoT u^m ÞðoT u^m ÞT j PK T T ^ T T ^ T k ¼ nf þ 1 ðo uk o um Þðo uk o um Þ j

x0 ¼ Pnf j

ðoT uk oT u^f ÞðoT uk oT u^f ÞT þ k¼1

T T jnf b^f b^f þ ðKnf Þb^m b^m j , ¼ Pnf P K T j k ¼ 1 ðbk b^f Þðbk b^f Þ þ k ¼ nf þ 1 ðbk b^m Þðbk b^m ÞT j

where, bk, b^f , and b^m are respectively the LDA feature vectors of the kth data, the female mean, and the male mean. Since for gender classiﬁcation, there are only one eigenvector in o with non-zero eigenvalue, then bk, b^f , and b^m are all scalar values. As a result, the j j operation in Eq. (22) can be ignored. Eq. (22) is simpliﬁed as ^ þ ðKn Þ dist ðb^m , bÞ ^ nf distE ðb^f , bÞ E f , PK ^ dist ðb , b Þ þ dist ðb , b^m Þ

x0 ¼ Pnf

k¼1

E

k

f

E

k ¼ nf þ 1

ð23Þ

k

where distE ð,Þ is the Euclidean distance, and in our application, the mean of the data b^ ¼ 0. 0 LDA locates the projection that maximizes x . This is equivalent to ﬁnding the projection that minimizes P Pnf distE ðbk , b^f Þ þ Kk ¼ nf þ 1 distE ðbk , b^m Þ 1 k¼1 x00 ¼ 0 ¼ : ð24Þ ^ þ ðKn Þ dist ðb^m , bÞ ^ x n dist ðb^ , bÞ f

ð17Þ

ð22Þ

E

f

f

E

This relates LDA to supervised weighted PGA. Both of the methods aim to locate the projection that minimizes an error function in the projected feature space. However, in gender classiﬁcation, the features in the LDA error function (Eq. (24)) are scalar, while the features in the supervised weighted PGA error function (Eq. (11)) are vectors.

where ! ( ) d 2N X X @qkc ðWÞ li ¼ 2 ukx u^ cx FTix FTil Wl ðukl u^ cl Þ PK1 @wx i¼1 l¼1 j ¼ 1 lj

5. Supervised principal geodesic analysis ð18Þ

and c A ff ,mg. Since the optimization is performed using all of the available labeled data, the weight map is speciﬁc to the training data. This is distinct from the case where the weight map is constructed using only the information provided by the mean needle-maps for the two genders.

In this section, we consider a third gender classiﬁcation strategy which makes use of pairwise information to improve the discriminating capacity of the constructed model using labeled data. It is an extension of supervised principal component analysis (supervised PCA) [37] to data residing on a Riemmanian manifold. We refer to this method as supervised principal geodesic analysis (supervised PGA).

4.3. Relationship with LDA

5.1. Supervised PCA

Linear discriminant analysis (LDA) ﬁnds the projection o that maximizes the ratio of the between-class scatter and the

Koren and Carmel analyzed the relationship between PCA and multidimensional scaling [37], and concluded that PCA computes

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

the projection that maximizes X x ¼ ðdistE ðbk1 ,bk2 ÞÞ2 ,

ð25Þ

k1 o k2

where vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u d uX ðbk1 i bk2 i Þ2 distE ðbk1 ,bk2 Þ ¼ t

ð26Þ

i¼1

is the Euclidean distance between the data indexed k1 and k2 in the projected feature space. Based on this observation, they generalized PCA by incorporating a symmetric and non-negative pairwise weight matrix fwk1 k2 gKk1 ,k2 ¼ 1 , where wk1 k2 is the dissimilarity measure that gauges the importance of placing the data points k1 and k2 further apart in the projected space. This generalized PCA method seeks the projection that maximizes the sum of weighted squared pairwise distances X x ¼ wk1 k2 distE ðbk1 ,bk2 Þ2 : ð27Þ k1 o k2

It has been proved by Koren and Carmel [37] that the projection that maximizes Eq. (27) is obtained by taking the principal eigenvectors of the matrix XTLX, where L is the Laplacian associated with the dissimilarities, i.e. ( PK k1 ¼ k2 , k ¼ 1 wkk2 , Lk1 k2 ¼ ð28Þ k1 a k2 wk1 k2 , and X is the row-wise data co-ordinate matrix. For labeled data, this weighted PCA method turns out to be equivalent to supervised PCA which underweights the intra-cluster pairwise data dissimilarities: ( t wk1 k2 k1 and k2 have the same label, ð29Þ ¼ w k1 k2 otherwise, wk1 k2 where 0 rt o 1 is a decay factor. In this way, supervised PCA constructs the projection that emphasizes the inter-cluster separation of the data. 5.2. Supervised PGA for facial needle-maps As mentioned in [29,31,28], the principal geodesics in PGA can be approximated by applying standard PCA in the tangent plane passing through the intrinsic mean. A straightforward generalization is to replace the standard PCA in the tangent plane by the supervised method described above. This will have the effect of enhancing the inter-cluster separability of the constructed PGA model. Making use of the log map, we ﬁrst obtain the columnwise long vectors of the K labeled facial needle-maps U ¼ ½u1 j juK . For gender classiﬁcation, we construct the dissimilarity matrix: 0 k1 and k2 are of same gender, wk1 k2 ¼ ð30Þ 1 otherwise and the corresponding Laplacian matrix L. According to [37], the K 1 leading eigenvectors with the highest eigenvalues of the matrix ULUT form the projection in the long vector space: FS ¼ ½e1 je2 j . . . jeK1 . Together with the eigenvalues and intrinsic mean, we obtain the supervised PGA model. Given a facial needlemap, its long vector u is ﬁrst obtained through log mapping it onto the tangent plane passing through the mean. The corresponding supervised PGA features are calculated as bS ¼ ðFS ÞT u:

ð31Þ

Since the Laplacian L is a K K symmetric positive semi-deﬁnite matrix, its eigenvalues are non-negative. Through

eigen-decomposition, L can be rewritten as pﬃﬃﬃﬃﬃﬃpﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ L ¼ FL LL LL FTL ¼ ðFL LL ÞðFL LL ÞT ,

2877

ð32Þ

where FL and LL are the eigenvectors and eigenvalues of L. Accordingly, pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ ULU T ¼ UðFL LL ÞðFL LL ÞT U T ¼ ðU FL LL ÞðU FL LL ÞT : ð33Þ Represented in this way, we can again use the numerically efﬁcient snap-shot method of Sirovich [39] to compute the eigenvectors and eigenvalues of (33). Moreover, representing ULUT in this way, supervised PGA can be viewed as standard PGA on weighted long vectors. The weight pﬃﬃﬃﬃﬃﬃ map for each datum is the component-wise division of ðU FL LL Þ and U, i.e. pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ ½W1 , . . . ,WK ¼ ½u1 FL LL :=u1 , . . . ,uK FL LL :=uK , ð34Þ where Wk ,k ¼ 1, . . . ,K is the weight map for the kth datum, and uk ,k ¼ 1, . . . ,K is the log mapped long vector of the kth needlemap, and ./ denotes the component-wise matrix division. The weight maps are speciﬁc to each datum. This is different from the above two methods which make use of a common weight map for the entire data set.

6. Classiﬁer With the extracted feature vectors of the training and test facial needle-maps at hand, we employ the nearest neighbor classiﬁer to classify the test faces on the basis of gender. We use the Euclidean distance between the feature vectors of the test face and the two mean faces. The mean faces of the two genders are obtained from the training data using intrinsic means. The test face is assigned to the gender with the closer mean in the feature space.

7. Experiments and discussion In this section, we present experiments using the three variants of the PGA method, and compare them to standard PGA. There are four aspects to this study. First, we explore the construction of the weight maps, and examine the gender discriminating capacity of the statistical models constructed from range data. Secondly, we apply the three statistical models to range data for feature extraction, and compare the gender classiﬁcation performance. Here, we also compare our methods with the Fisher faces, support faces, and three Adaboost methods. Thirdly, we apply the constructed models to recovered needlemaps, and compare the results with human observers. Finally, we examine the effectiveness of our methods on the University of Notre Dame (UND) biometric database [42,43], and compare the results with those reported in existing methods. 7.1. Model construction The surface normal data used to construct the gender discriminating statistical models are taken from the Max-Planck Face Database [44,45]. The database comprises 200 laser scanned (Cyberware TM) human heads without hair (100 females and 100 males). The ground-truth facial needle-maps are obtained by ﬁrst orthographically projecting the facial range data onto a frontal view plane, and then cropping the plane to be 142-by124 pixels to maintain only the inner part of the face. Finally, we compute the ground-truth surface normal at each pixel position using the height gradients of the processed range image. Using the ground-truth facial needle-maps, we ﬁrst construct the weight map used in the weighted PGA method. We need to

2878

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

Discriminating capacity

2.16

2.14

2.12

2.1

2.08 0.006

0.008

0.01

0.012

Values of

0.014

0.016

σ2

Fig. 4. Determination of the s value.

Fig. 5. From left to right are the intrinsic mean of the females, intrinsic mean of the males, and the weight map.

140

Error function values

120 100 80 60 40 20 0 0

1000

2000 3000 4000 Number of iterations

5000

Fig. 6. Error function values during the optimization process.

6000

determine the value of s in Eq. (2). As described in Section 3, the s value is determined to be the one with which the constructed weight map gives the largest gender discriminating capacity for the leading d weighted PGA eigenvectors. In our experiments we choose d ¼5, because it has been shown in [46] that ﬁve gender discriminating features is sufﬁcient for gender classiﬁcation. We construct the weight map using 10 different s values, and obtain 10 different weighted PGA models. The discriminating capacity of the leading ﬁve eigenvectors for each of the 10 models are shown in Fig. 4. From the ﬁgure, we select s from the shoulder of the curve, and this occurs when s2 ¼ 0:012. The mean faces for the two genders and the constructed weight map are shown in Fig. 5. It is clear that the mean faces reveal gender information and eliminate any identity cues. The weight map reveals there is greater gender difference in the regions around the eyebrows, nose and mouth. These are known to be gender discriminating regions from the psychology literature [4,35]. In supervised weighted PGA, the weight map is learned through an optimization process. The complete sample (100 females and 100 males) is used as labeled data. We set the step size Z ¼ 0:5 in the gradient descent method, and perform 6000 iterations. A plot of the error function (Eq. (11)) as a function of iteration number is shown in Fig. 6. The error decreases rapidly in the ﬁrst 1000 iterations, and convergence is achieved after about 3000 iterations. The weight maps learned during the optimization process are shown in Fig. 7. The weight map is applied to the log mapped long vector, which has two components for each pixel location. The weight map is therefore displayed in two separate parts corresponding to the two components. From Fig. 7, it is clear that the weight map becomes more detailed as the process iterates. This means that it becomes more speciﬁc to the training data for an increasing number of iterations. The regions emphasized are still the eyebrows, the areas around the nose and the mouth. As mentioned in Section 5, the weighted covariance matrix ULUTp inﬃﬃﬃﬃsupervised pﬃﬃﬃﬃ PGA can be represented as the matrix product ðU FL LÞðU FL LÞT . In this way, supervised PGA can be viewed as standard PGA on weighted long vectors. pﬃﬃﬃﬃﬃﬃ The weight map for the kth (k ¼1,y,K) data is W k ¼ uk FL LL :=uk , where uk is the log mapped long vector of the kth needle-map, and := denotes the component-wise matrix division. Some examples of the two parts of the weight maps are shown in Fig. 8. The weight maps in supervised PGA model are speciﬁc to each datum, while the weight map in supervised weighted PGA model is common to all the data. From the ﬁgure, it is clear that the calculated weight maps also give more importance to the areas which are used for gender discrimination by human observers. After obtaining the weight map, we construct (a) the weighted PGA model, (b) the supervised weighted PGA model, and (c) the supervised PGA model using the ground-truth facial needle-maps. The standard PGA model is also constructed for comparison. Each ground-truth facial needle-map is represented by the feature

Fig. 7. Weight maps during the optimization process.

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

2879

Fig. 8. Examples of the computed weight maps (each comprises two components) in supervised PGA model. The ﬁrst two columns are for females and the last two columns are for males.

Weighted PGA

Supervised Weighted PGA Discriminating capacity

Discriminating capacity

20 1 0.8 0.6 0.4 0.2 0

0

10

20 30 Eigenmodes

40

15 10 5 0

50

0

10

1 0.8 0.6 0.4 0.2 0 0

10

20 30 Eigenmodes

40

50

40

50

Standard PGA Discriminating capacity

Discriminating capacity

Supervised PGA

20 30 Eigenmodes

40

50

1 0.8 0.6 0.4 0.2 0 0

10

20 30 Eigenmodes

Fig. 9. Discriminating capacity.

vectors extracted using the corresponding models. Using these feature vectors, we compute the discriminating capacity deﬁned by Eq. (3) for each eigenmode of the four PGA models. The results of the leading 50 eigenmodes are shown in Fig. 9, from which it is clear that compared to the standard PGA model, the weighted PGA model and the supervised PGA model transfer more discriminating power into the ﬁrst eigenvector. However, the supervised weighted PGA model outperforms the alternative three models by signiﬁcantly improving the gender discriminating power in each of the eigenmodes (note the larger scale on the y-axis). This conﬁrms our assumption that by making use of a gender relevant weight map, the gender discriminating capacity

of the leading eigenmodes is improved. The improvement of the supervised weighted PGA model compared with that for the alternative two novel PGA models is attributable to the fact that the weight map used in the supervised weighted PGA model is learned from the training data. This makes it considerably more speciﬁc to the data than that used in the weighted PGA model. Although the supervised PGA model is also speciﬁc to the labeled data, it only emphasizes the inter-class separation. By minimizing the error function in Eq. (11), the supervised weighted PGA model clusters the data around the means of the two genders. It therefore not only increases the inter-class separation, but also reduces the intra-class separation through the weight optimization

2880

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

3

30

2 2nd dimension

2nd dimension

Weighted PGA 40

20 10 0 −10 −20 −30

−20

−10 0 10 1st dimension

20

0 −1

−3 −5

30

Supervised PGA

0 1st dimension

5

Standard PGA

40

40

30

30

2nd dimension

2nd dimension

1

−2

50

20 10 0 −10

20 10 0 −10 −20

−20 −30 −30

Supervised Weighted PGA

−20

−10 0 10 1st dimensiion

20

30

−30 −40 −30 −20 −10 0 10 1st dimension

20

30

Fig. 10. Visualization of leading two features.

7.2. Gender classiﬁcation on needle-maps extracted from range images In this section, we compare the gender classiﬁcation performance by applying the three gender discriminating models and the standard PGA model to the 200 facial needle-maps extracted from range images (from the Max-Planck database). A total of 160 needle-maps are randomly selected for use as training data, and the remaining 40 as test data. From the training needle-maps, we learn the weight maps and construct the gender discriminating models and the PGA model. When construct the weight map for the weighted PGA model, we set s2 ¼ 0:012 which is obtained from the above model construction section. Both the training and test data are represented by the feature vectors extracted using PGA model and the three gender discriminating models. Gender classiﬁcation is performed by applying the nearest neighbor classiﬁer to the leading m (m¼1,2,5,10,20,30,50) features. The average error rates are estimated with 10-fold cross validation (CV) for each value of m. The results are shown in Fig. 11. From the ﬁgure, there are some interesting effects that deserve comment. First, the supervised weighted PGA model signiﬁcantly outperforms the alternative three models, no matter how many leading features are used. This is consistent with its signiﬁcantly better performance in gender discrimination described above. Second, when m r5, using the supervised PGA model and the weighted PGA model both achieve better gender classiﬁcation

0.25

Classification Error Rates

process. This reduction of intra-class separation is clear in Fig. 10, which visualizes the 200 data using the leading two eigenfeatures. From the ﬁgure, the features extracted using the supervised weighted PGA model are better separated by gender, and are more concentrated within the same gender than those extracted using the remaining three models. Compared to the supervised weighted PGA model, the improvement offered by the weighted PGA model and the supervised PGA model is not obvious.

PGA weighted PGA supervised PGA supervised weighted PGA

0.2

0.15

0.1

0.05

0 m=1

m=2

m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters

Fig. 11. Classiﬁcation error rates.

results than the standard PGA model. This gives further conﬁrmation of our assumption that incorporating gender relevant weights improves the discriminating capacity of the leading eigenmodes. Finally, irrespective of the PGA models used, the gender classiﬁcation error rate reaches a minimum, and then either increases or remains with increasing m. This is consistent with human classiﬁcation performance, which is based on just a very few of the most important attributes [16]. The redundant or irrelevant information contained in higher dimensions causes a deterioration of the classiﬁcation accuracy. In Table 1, we summarize the best classiﬁcation accuracy obtained, together with the optimal number of the leading features used, for the three gender discriminating models and the standard PGA model. In each column the accuracy obtained is greater than 91%, which indicates the feasibility of accurate gender classiﬁcation based on facial shape information contained

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

10%

Weighted PGA

Supervised weighted PGA

Supervised PGA

Standard PGA

91.75 m¼ 30

97.00 m¼ 10

91.25 m¼10

91.25 m¼20

Classification Error Rates

0.25 supervised weighted PGA LDA

0.2

Classification Error Rates

Table 1 Gender classiﬁcation accuracy.

Best accuracy (%) Dimension of features used

2881

8%

6%

4%

2%

0 1/3

2/3

3/3

0.15

4/3 5/3 σ (ratios of σ0)

6/3

7/3

8/3

Fig. 13. Gender classiﬁcation using SVM with RBF kernels.

0.1 Table 2 Gender classiﬁcation using AdaBoost.

0.05

0 m=1

m=2

m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters

Fig. 12. Comparison to Fisher faces.

in facial needle-maps. Using the leading 10 features from supervised weighted PGA, we achieve the highest average classiﬁcation accuracy of 97%. For comparison, we also apply the Fisher faces [47], support faces [14], and three AdaBoost methods [48–50] to the gender classiﬁcation problem. From the 200 facial needle-maps, 160 are randomly selected for use as training data, and the remaining 40 as test data. The average error rates are estimated with 10-fold cross validation. When using Fisher faces, the Fisher linear discriminant is applied to the leading m (m ¼1,2,5,10,20,30,50) standard PGA features. Next the nearest neighbor classiﬁer is applied for the purpose of classiﬁcation. The results are shown in Fig. 12, and are compared with those obtained using the supervised weighted PGA method. It is clear that the supervised weighted PGA method outperforms the Fisher faces method when using fewer features. It indicates the effectiveness of considering intra-class and inter-class differences at a pixel level (supervised weighted PGA) rather than at a sample level (Fisher faces). When learning gender with support faces, we use a support vector machine (SVM) with radial basis kernel functions (RBF) kðxi ,xj Þ ¼ expðjxi xj j2 =2s2 Þ as the classiﬁer, where s is the width of the Gaussian cluster. We estimate the base s value s0 as the average pairwise distance between the training data. To be consistent with [14], the input of the classiﬁer is the set of log mapped long vectors of the 142-by-124 pixel needle-maps. The gender classiﬁcation results for different values of s=s0 are shown in Fig. 13. The highest overall accuracy is 97%, and this is achieved when s ¼ 4=3s0 . It is comparable with the highest accuracy achieved using the supervised weighted PGA method. When using AdaBoost, we also use the log mapped long vectors of the needle-maps as input. Three AdaBoost algorithms, namely, real AdaBoost [48], gentle AdaBoost [49], and modest AdaBoost [50], are used as classiﬁers. The classiﬁcation accuracies are shown in Table 2, which are clearly outperformed by the best accuracy achieved using the supervised weighted PGA method.

Accuracy (%)

Real AdaBoost

Gentle AdaBoost

Modest AdaBoost

96.25

95.75

94.50

The results of using SVM and AdaBoost conﬁrm the important role of feature extraction in gender classiﬁcation. Using only a few well selected features, we can achieve gender classiﬁcation performance using simple classiﬁers which is as good as that obtained using complicated classiﬁers. 7.3. Fitting discriminating models to recovered needle-maps In this section, we ﬁrst illustrate the performance obtained using principal geodesic SFS, and then apply the gender discriminating models to facial needle-maps recovered from intensity images. The intensity images comprise 13 female and 30 male subjects with neutral expression and no glasses. Images are captured under a single known light source direction using a Nikon D200 camera. Brightness normalization is required for these images. We use only the red channel of the color images. The intensity contrast is linearly stretched to normalize the ambient lighting variations. Finally, we use the method proposed in [51] to apply photometric correction and specularity subtraction to the intensity stretched images. Although histogram equalization is widely used for image brightness normalization, we do not use it because it modiﬁes the distribution of intensity. This will affect the shape recovered using SFS since it distorts the physical reﬂectance model. After brightness normalization, the images are geometrically aligned using the centers of the eyes, the tip of nose, and the middle of the mouth. Finally, we crop the images to maintain only the inner facial region. We use the principal geodesic SFS method described in Section 2 to recover the facial needle-maps. The statistical model used in the SFS method is constructed from the 200 range images in the MaxPlanck database. Because of the nature of the principal geodesic SFS method, there are two sets of recovered needle-maps. The ﬁrst satisﬁes the data-closeness constraint and satisﬁes Lambert’s law as a hard constraint. The second needle-map best ﬁts the statistical shape model. Ten examples (ﬁve females and ﬁve males) of the recovered needle-maps and the corresponding integrated height surfaces are shown in Fig. 14. The needle-maps extracted from range images and their corresponding height surfaces are also shown for

2882

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

Fig. 14. Examples of the results of principal geodesic SFS. From left to right are the input images, the recovered needle-maps satisfying data-closeness, the recovered needle-maps best ﬁtting the model, the ground-truth needle-maps, the ﬁfth to the seventh columns are the corresponding recovered surfaces. From top to bottom, the ﬁrst ﬁve rows are females and the following ﬁve rows are males.

comparison. Since the needle-maps satisfying data-closeness appear identical to the images when rendered with a frontal light source, we show the needle-maps re-illuminated with a light source moved by 451 from the viewing direction along the positive x-axis. From the ﬁgure, it is clear that both the recovered needle-maps and the surfaces give realistic shape, overcoming the well-known local convexity–concavity instability problem [52]. Moreover, they are similar to their ground-truth counterparts, especially around the nose, mouth, and cheek regions where some gender discriminating information is encoded. After shape recovery, we apply the three gender discriminating models together with the standard PGA model to the recovered needle-maps for feature extraction. The required models are constructed using the above 200 needle-maps extracted from the range images in the Max-Planck database. As a result, they are models of facial shapes free of facial textures. According to [30], the differences between the best ﬁt needle-map and the needlemap satisfying data-closeness are almost solely due to the variation in albedo at the eyes, eyebrows, and lips. As a result, the best ﬁt needle-map may, in fact, provide a more accurate

estimate of the underlying facial shape. Therefore, we use the best ﬁt needle-maps in this experiments to ensure the exhibited model discriminating capacity is not biased by facial textures. Fig. 15 visualizes the feature extraction results using the leading two eigen-features. From the ﬁgure, it is clear that irrespective of the model used, the extracted features are not as discriminating as those extracted from range data. There are two reasons for this. First, the gender discriminating models are constructed using the range data. Because of a lack of commonality, the models are therefore more suitable to the range data than the needle-maps recovered using SFS. Second, the global shape constraint, imposed by the statistical model in principal geodesic SFS constrains the recovered needle-maps to be more concentrated around the model mean. Therefore, the differences between the faces are reduced by the shape recovery process. However, compared to the alternative three models, the supervised weighted PGA model still exhibits useful gender discriminating capacity on the recovered facial needle-maps. We make use of an EM algorithm to ﬁt a two-component Gaussian mixture model to the leading 10 supervised weighted

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

Weighted PGA

10

1

Supervised Weighted PGA

0.5 2nd dimension

5 2nd dimension

2883

0 −5 −10

0 −0.5 −1 −1.5

−15 −15

−10

−5 0 dimension

5

−2 −6

10

1st

−2 0 1st dimension

Supervised PGA

Standard PGA

5

−4

2

4

10

15

0

2nd dimension

2nd dimension

4

−5 −10

0 −4 −8 −12

−15 −15

−10

−5 0 1st dimension

5

10

−15

−10

−5 0 5 1st dimension

Fig. 15. Visualization of leading two features extracted from recovered facial needle-maps.

Gaussian Mixture estimated by EM

1

2nd dimension

0

−1

−2

−3 −6

−4

−2 1st dimension

0

2

Fig. 16. Clustering results on recovered facial needle-maps.

PGA features for the recovered facial needle-maps. To initialize the EM algorithm, we set the a priori class probabilities as (13/43,30/43) according to the female and male proportions in our data set. We randomly choose a female and a male as the initial means, and the covariance matrices are set as Sð0Þ ¼ Sð0Þ m ¼ f detðSÞ1=10 I10 , where Sc ,c A ff ,mg is the overall covariance matrix, and I10 is the 10 10 identity matrix. We repeat this procedure several times with random initialization, followed by EM iterations. We select the ﬁnal EM result to be that with the largest log likelihood, which is shown in Fig. 16. From the ﬁgure, the extracted features for the 43 faces are still well clustered. However, there are some mis-classiﬁed faces (highlighted in Fig. 16).

The mis-classiﬁed faces are visualized in the ﬁrst row of Fig. 17. The ﬁrst two are mis-classiﬁed females, and the following eight are mis-classiﬁed males. Some of these images (the two females, the 1st, 3rd, 4th, and 6th males) exhibit at least some gender ambiguity under subjective human judgement. However some (the 2nd, 5th, 7th and 8th males) pose no difﬁculty. To investigate this more systematically, we have presented the 43 images to eight unbiased human observers. The observers were asked to assign the images to one of the following classes: (a) deﬁnitely female, (b) possibly female, (c) not sure, (d) possibly male and (e) deﬁnitely male. We assign male probabilities to these classes (a, 0%; b, 25%; c, 50%; d, 75%; e, 100%) and average the results over the eight observers. We arrange the 43 images into a series with ascending average male probability. The positions of the 10 misclassiﬁed faces in the series are also shown in Fig. 17. It is clear that the two females are in the overlapping area (with male probabilities around 50%), the 4th and 6th males are misclassiﬁed as females, and the 1st and 3rd males are rather ambiguous. However, the remaining four males are judged to be male with 100% probability. This result shows that our supervised weighted PGA model is not perfectly consistent with human performance. The main reason is that our gender discriminating model makes use of facial shape information instead of image intensities. In fact, three of the four male images judged to be deﬁnitely males all have obvious facial hair which probably played a dominant role in human judgement. If we only consider facial shape, the four male faces all have relatively ‘soft’ facial shapes which tend to be consistent with feminine features. It is interesting to note that the 6th male in the ﬁrst row, whose gender is ambiguous to human observers, is mis-classiﬁed as female using our supervised weighted PGA model. However, this is consistent with the observation that ‘female faces are more like baby-faces than are male faces—they have smaller chins and noses and their eyes appear larger (a consequence of the lesser brow protuberance)’ [4].

2884

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

Fig. 17. Images of mis-clustered faces and comparison with human observations.

7.4. Gender classiﬁcation on UND data

Classification Error Rates

We also apply our methods to the University of Notre Dame (UND) biometric database [42,43], and compare the results with those reported in existing methods [28,5,7]. The UND biometric database has the advantage that it contains both 2D images and the corresponding range images for each individual. Moreover the scale and the sex ratio of the database (944 images of 275 individuals, of which 383 images of 103 females and 561 images of 172 males) make it suitable for statistical gender classiﬁcation. A subset of this database which contains 200 2D and corresponding range images for 200 subjects (100 female and 100 male) were used in our previous work [28] to perform gender classiﬁcation. Here, we use the same subset to apply the proposed methods. The images (both 2D and range) are ﬁrst geometrically aligned and brightness-normalized in the same way as [28]. Then, we use the needle-maps extracted from the range images to construct the statistical model required in principal geodesic SFS, and apply the SFS method to the 2D images to obtain the recovered facial needle-maps. As shown in [28], the recovered facial needle-maps satisfying data-closeness encode both facial shape and image intensity information and improve the classiﬁcation results. Therefore, in this experiment, we use the recovered facial needle-maps satisfying data-closeness (rather than satisfying the statistical model) for gender classiﬁcation. The construction of the models and the gender classiﬁcation are performed in the same way as performed on range data. The classiﬁcation results, estimated with 5-fold cross validation, are shown in Fig. 18. The results are similar with those achieved using range data. First, when m r 5, using weighted PGA and supervised PGA achieves better classiﬁcation accuracy than using standard PGA. Secondly, the supervised weighted PGA model outperforms the alternative three models, no matter how many leading features are used. The best classiﬁcation accuracy is 92.5% achieved using supervised weighted PGA and the nearest neighbor classiﬁer. It is higher than the accuracy of 88.5% reported in [28] achieved using

0.5 PGA weighted PGA supervised PGA supervised weighted PGA

0.4

0.3

0.2

0.1

0 m=1

m=2

m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters

Fig. 18. Classiﬁcation error rates on a subset of UND data.

PGA and linear discriminant analysis. This conﬁrms the importance and effectiveness of ﬁnding gender discriminating features from facial needle-maps for gender classiﬁcation. We also apply the supervised weighted PGA method to the full UND data set which contains 944 images of 275 individuals. The images are geometrically aligned according to the centers of the eyes, and are brightness-normalized as before. The facial needlemaps satisfying data-closeness are recovered from the images. The classiﬁcation results on the recovered needle-maps are estimated with 5-fold cross validation, and are shown in Fig. 19. The best classiﬁcation accuracy is 96.91% achieved using the leading ﬁve feature components and nearest neighbor classiﬁer. In Table 3, we compare the accuracy with the best accuracies reported in [5,7], both of which also performed on UND data. Lu et al. [5] made use of SVMs and a fusion of multimodal

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

0.4 total female male

Classification Error Rates

0.35 0.3 0.25 0.2 0.15 0.1

2885

possible future direction is the construction of a data independent weight map for the weighted PGA method. The bubbles technique of Gosselin and Schyns [35] is a possible solution. However we might encounter the difﬁculty of how to obtain stimuli revealing facial shape rather than facial texture information. In supervised weighted PGA, a possible line of future work is to explore the use of different error functions, such as the error function deduced from LDA. Another avenue for future investigation is to improve the current SFS technique. In particular, we will explore how to reduce the model dominance, which is caused by satisfying the global shape constraint, and the bias it introduces into gender classiﬁcation.

0.05 0 m=1

Acknowledgment

m=2

m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters

Fig. 19. Classiﬁcation error rates on full UND data using supervised weighted PGA.

Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award and EU FET project SIMBAD - Similarity Based Pattern Recognition. References

Table 3 Comparison of gender classiﬁcation accuracy. Fusion of modalities [5] Total 91.0% Female 83.0% Male 95.6%

Fusion of regions [7]

Supervised weighted PGA

94.3% N/A N/A

96.9% 95.5% 97.9%

information (intensity image and range image) for gender classiﬁcation. Hu et al. [7] also made use of SVMs and a fusion of different facial regions. From Table 3, it is clear that the supervised weighted PGA method, even using the simplest classiﬁer, still outperforms the other two methods in terms of classiﬁcation accuracy.

8. Conclusions In this paper we emphasize the importance of feature extraction for facial gender classiﬁcation, and propose three strategies (weighted PGA, supervised weighted PGA, and supervised PGA) to construct gender discriminating models from 2.5D facial needlemaps. There are two main conclusions that can be drawn from our work. First, by incorporating gender relevant weights, the three discriminating models all improve the gender discriminating capacity of the leading eigenmodes. Moreover, the supervised weighted PGA model, whose gender relevant weight map is learned from labeled data, signiﬁcantly outperforms the weighted PGA model and the supervised PGA model in terms of gender classiﬁcation performance. Second, by using facial shape information contained in the 2.5D facial needle-map, we achieve 97% gender classiﬁcation accuracy on range data. This competes with the accuracy achieved using the support faces method, and exceeds those achieved using the Fisher faces or Adaboost methods. We also illustrate effective shape recovery using the principal geodesic SFS on brightness images, and demonstrate the possibility of gender classiﬁcation on the recovered facial surface normals. Experiments on UND database showed the superiority of the proposed supervised weighted PGA method over some existing methods in terms of gender classiﬁcation accuracy. Although the weighted PGA model is outperformed by the supervised weighted PGA model, it still has the advantage that the weight map can be independent of the data. Therefore, one

[1] V. Bruce, A. Burton, E. Hanna, P. Healey, O. Mason, A. Coombes, R. Fright, A. Linney, Sex discrimination: how do we tell the difference between male and female faces? Perception 22 (1993) 131–152. [2] A. Burton, V. Bruce, N. Dench, What’s the difference between men and women? Evidence from facial measurement, Perception 22 (1993) 153–176. [3] A. O’Toole, T. Vetter, N. Troje, H. Bulthoff, Sex classiﬁcation is better with three-dimensional head structure than with image intensity information, Perception 26 (1997) 75–84. [4] V. Bruce, A. Young, In the Eye of the Beholder: The Science of Face Perception, Oxford University Press, Oxford, England, New York, 1998. [5] X. Lu, H. Chen, A. Jain, Multimodal facial gender and ethnicity identiﬁcation, in: Proceedings of the International Conference on Biometrics2006, pp. 554–561. [6] A. Graf, F. Wichmann, Gender classiﬁcation of human faces, in: Proceedings of the International Workshop on Biologically Motivated Computer Vision2002, pp. 491–500. [7] Y. Hu, J. Yan, P. Shi, A fusion-based method for 3d facial gender classiﬁcation, in: Proceedings of the International Conference Computer and Automation Engineering2010, pp. 369–372. [8] B. Golomb, D. Lawrence, T. Sejnowski, Sexnet: a neural network identiﬁes sex from human faces, in: Proceedings of the Advances in Neural Information Processing Systems1991, pp. 572–577. [9] G. Cottrell, J. Metcalfe, Empath: face, emotion, and gender recognition using holons, Proceedings of the Conference on Advances in Neural Information Processing Systems, vol. 3, 1990, pp. 564–571. [10] Z. Sun, G. Bebis, X. Yuan, S. Louis, Genetic feature subset selection for gender classiﬁcation: a comparison study, in: Proceedings of the IEEE Workshop on Applications of Computer Vision2002, pp. 165–170. [11] S. Buchala, N. Davey, T. Gale, R. Frank, Principal component analysis of gender, ethnicity, age, and identity of face images, in: Proceedings of the IEEE International Conference on Multimodel Interfaces2005. [12] T. Wilhelm, H. Bohme, H. Gross, Classiﬁcation of face images for gender, age, facial expression, and identity, in: International Conference on Artiﬁcial Neural Networks2005, pp. 569–574. [13] Y. Saatci, C. Town, Cascaded classiﬁcation of gender and facial expression using active appearance models, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition2006, pp. 393–398. [14] B. Moghaddam, M. Yang, Learning gender with support faces, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 707–711. [15] S. Baluja, H. Rowley, G. Inc, Boosting sex identiﬁcation performance, International Journal of Computer Vision 71 (1) (2007) 111–119. [16] P. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, PrenticeHall, 1982. [17] R. Brunelli, T. Poggio, Hyberbf networks for gender classiﬁcation, in: Proceedings of the DARPA Image Understanding Workshop1992, pp. 311–314. [18] S. Gutta, H. Weschler, P. Phillips, Gender and ethnic classiﬁcation of human faces using hybrid classiﬁers, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition1998, pp. 194–199. [19] H. Kim, D. Kim, Z. Ghahramani, S. Bang, Appearance-based gender classiﬁcation with Gaussian processes, Pattern Recognition Letters 27 (6) (2006) 618–626. [20] G. Shakhnarovich, P. Viola, B. Moghaddam, A uniﬁed learning framework for real time face detection and classiﬁcation, in: IEEE International Conference on Automatic Face and Gesture Recognition2002, pp. 14–21. [21] B. Wu, H. Ai, C. Huang, Lut-based adaboost for gender classiﬁcation, in: International Conference on Audio- and Video-Based Biometric Person Authentication2003, pp. 104–110.

2886

J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886

[22] A. Khan, A. Majid, A. Mirza, Combination and optimization of classiﬁers in gender classiﬁcation using genetic programming, International Journal of Knowledge-based and Intelligent Engineering Systems 9 (1) (2005) 1–11. [23] A. Majid, A. Khan, A. Mirza, Combination of support vector machines using genetic programming, International Journal of Hybrid Intelligent Systems 3 (2) (2006) 109–125. [24] T. Cootes, G. Edwards, C. Taylor, Active appearance models, Proceedings of the European Conference on Computer Vision, vol. 2, 1998, pp. 484–498. [25] A. Jain, J. Huang, Integrating independent components and linear discriminant analysis for gender classiﬁcation, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition2004, pp. 159–163. [26] S. Buchala, N. Davey, R. Frank, T. Gale, Dimensionality reduction of face images for gender classiﬁcation, Technical Report 408, Department of Computer Science, The University of Hertfordshire, UK, 2004. [27] J. Wu, W. Smith, E. Hancock, Gender classiﬁcation using principal geodesic analysis and Gaussian mixture models, in: Proceedings of the Iberoamerican Congress on Pattern Recognition2006, pp. 58–67. [28] J. Wu, W. Smith, E. Hancock, Facial gender classiﬁcation using shape-fromshading, Image and Vision Computing 28 (6) (2010) 1039–1048. [29] P. Fletcher, S. Joshi, C. Lu, S. Pizer, Principal geodesic analysis for the study of nonlinear statistics of shape, IEEE Transactions on Medical Imaging 23 (8) (2004) 995–1005. [30] W. Smith, E. Hancock, Recovering facial shape using a statistical model of surface normal direction, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (2) (2006) 1914–1930. [31] W. Smith, E. Hancock, Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics, International Journal of Computer Vision 76 (1) (2008) 71–91. [32] J. Wu, W. Smith, E. Hancock, Weighted principal geodesic analysis for facial gender classiﬁcation, in: Proceedings of the Iberoamerican Congress on Pattern Recognition2007, pp. 331–339. [33] J. Wu, W. Smith, E. Hancock, Supervised principal geodesic analysis on facial surface normals for gender classiﬁcation, in: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition2008, pp. 664–673. [34] J. Wu, W. Smith, E. Hancock, Gender classiﬁcation based on facial surface normals, in: International Conference on Pattern Recognition2008, pp. 1–4. [35] F. Gosselin, P. Schyns, Bubbles: a technique to reveal the use of information in recognition tasks, Vision Research 41 (2001) 2261–2271.

[36] Y. Andreu, R. Mollineda, The role of face parts in gender recognition, in: Proceedings of the International Conference Image Analysis and Recognition2008, pp. 945–954. [37] Y. Koren, L. Carmel, Robust linear dimensionality reduction, IEEE Transactions on Visualization and Computer Graphics 10 (4) (2004). [38] X. Pennec, Probabilities and statistics on Riemannian manifolds: a geometric approach, Technical Report RR-5093, INRIA, 2004. [39] L. Sirovich, Turbulence and the dynamics of coherent structures, Quarterly of Applied Mathematics XLV (3) (1987) 561–590. [40] P. Worthington, E. Hancock, New constraints on data-closeness and needle map consistency for shape-from-shading, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (12) (1999) 1250–1267. [41] I. Dhillon, S. Sra, Modeling data using directional distributions, Technical Report, University of Texas, Austin, 2003. [42] P. Flynn, K. Bowyer, P. Phillips, Assessment of time dependency in face recognition: an initial study, in: International Conference on Audio and Video-Based Biometric Person Authentication2003, pp. 44–51. [43] K. Chang, K. Bowyer, P. Flynn, Face recognition using 2d and 3d facial data, in: Proceedings of the ACM Workshop on Multimodal User Authentication2003, pp. 25–32. [44] N. Troje, H. Bulthoff, Face recognition under varying poses: the role of texture and shape, Vision Research 36 (1996) 1761–1771. [45] V. Blanz, T. Vetter, A morphable model for the synthesis of 3d faces in: Proceedings of the SIGGRAPH’99 Conference1999, pp. 187–194. [46] J. Wu, W. Smith, E. Hancock, Learning mixture models for gender classiﬁcation based on facial surface normals, in: Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Part I2007, pp. 39–46. [47] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. ﬁsherfaces: recognition using class speciﬁc linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [48] R. Schapire, Y. Singer, Improved boosting algorithms using conﬁdence-rated predictions, Machine Learning 37 (3) (1999) 297–336 (40). [49] J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting, The Annals of Statistics 38 (2) (2000) 337–374. [50] A. Vezhnevets, V. Vezhnevets, Modest adaboost—teaching adaboost to generalize better, in: Graphicon2005. [51] A. Robles-Kelly, E.R. Hancock, Estimating the surface radiance function from single images, Graphical Models 67 (6) (2005) 518–548. [52] R. Zhang, P. Tsai, J. Cryer, M. Shah, Shape-from-shading: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (8) (1999) 690–706.

Jing Wu received the B.Sc. and M.Eng. degrees in computer science and technology from Nanjing University, China in 2002 and 2005 respectively. She received the Ph.D. degree in computer science from the University of York, UK in January 2010. She currently works as a research associate in Cardiff University. Her research interests include face recognition, gender classiﬁcation, statistical shape modelling, and shape-from-shading.

William Smith completed B.Sc. and Ph.D. in computer science, both at the University of York, in 2002 and 2007 respectively. Currently, he is a lecturer in the Computer Vision and Pattern Recognition group in the Department of Computer Science at the University of York, and is supervising four research students. His research interests are related to face processing, shape-from-shading and reﬂectance modelling. He has published more than 50 papers in international journals and conferences.

Edwin Hancock holds a B.Sc. degree in physics (1977), a Ph.D. degree in high-energy physics (1981) and a D.Sc. degree (2008) from the University of Durham. From 1981– 1991 he worked as a researcher in the ﬁelds of high-energy nuclear physics and pattern recognition at the Rutherford-Appleton Laboratory (now the Central Research Laboratory of the Research Councils). During this period, he also held adjunct teaching posts at the University of Surrey and the Open University. In 1991, he moved to the University of York as a lecturer in the Department of Computer Science, where he has held a chair in Computer Vision since 1998. He leads a group of some 25 faculty, research staff, and Ph.D. students working in the areas of computer vision and pattern recognition. His main research interests are in the use of optimization and probabilistic methods for high and intermediate level vision. He is also interested in the methodology of structural and statistical pattern recognition. He is currently working on graph matching, shape-from-X, image databases, and statistical learning theory. His work has found applications in areas such as radar terrain analysis, seismic section analysis, remote sensing, and medical imaging. He has published about 135 journal papers and 500 refereed conference publications. He was awarded the Pattern Recognition Society medal in 1991 and an outstanding paper award in 1997 by the journal Pattern Recognition. He has also received best paper prizes at CAIP 2001, ACCV 2002, ICPR 2006, BMVC 2007 and ICIAP 2009. In 2009 he was awarded a Royal Society Wolfson Research Merit Award. In 1998, he became a fellow of the International Association for Pattern Recognition. He is also a fellow of the Institute of Physics, the Institute of Engineering and Technology, and the British Computer Society. He has been a member of the editorial boards of the journals IEEE Transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing. In 2006, he was appointed as the founding editor-in-chief of the IET Computer Vision Journal. He has been conference chair for BMVC 1994, Track Chair for ICPR 2004 and Area Chair for ECCV 2006 and CVPR 2008, and in 1997 he established the EMMCVPR workshop series.

Gender discriminating models from facial surface normals

Gender discriminating models from facial surface normals

Recommend Documents