Pattern Recognition 44 (2011) 2871–2886
Contents lists available at ScienceDirect
Pattern Recognition journal homepage: www.elsevier.com/locate/pr
Gender discriminating models from facial surface normals Jing Wu , William A.P. Smith, Edwin R. Hancock Department of Computer Science, The University of York, Heslington, York YO10 5DD, UK
a r t i c l e i n f o
abstract
Article history: Received 31 March 2010 Received in revised form 4 January 2011 Accepted 22 April 2011 Available online 5 May 2011
In this paper, we show how to use facial shape information to construct discriminating models for gender classification. We represent facial shapes using 2.5D fields of facial surface normals, and investigate three different methods to improve the gender discriminating capacity of the model constructed using the standard eigenspace method. The three methods are novel variants of principal geodesic analysis (PGA) namely (a) weighted PGA, (b) supervised weighted PGA, and (c) supervised PGA. Our starting point is to define a weight map over the facial surface that indicates the importance of different locations in discriminating gender. We show how to compute the relevant weights and how to incorporate the weights into the 2.5D model construction. We evaluate the performance of the alternative methods using facial surface normals extracted from 3D range images or recovered from brightness images. Experimental results demonstrate the effectiveness of our methods. Moreover, the classification accuracy, which is as high as 97%, demonstrates the effectiveness of using facial shape information for gender classification. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Gender classification Facial surface normals Statistical model Feature extraction Principal geodesic analysis
1. Introduction Gender classification plays a significant role during our social interactions. It has attracted significant attention in the psychology literature [1–4]. Gender determination is also important in human–computer interaction, since it plays an important role in controlling the style of dialogue adopted. It is also a key issue in security and biometrics. Despite its importance, gender classification has not attracted significant attention in the computer vision literature. Most existing gender determination methods are based on 2D intensity images. However, the gender differences apparent from intensity appearance are subtle, and are easily affected by changes in lighting conditions or the application of facial makeup. Compared to the 2D facial appearance conveyed by brightness images, 3D facial shape provides a more reliable information source for surveillance purposes. The reason for this is that it cannot be easily modified and it is not affected by changes in lighting. Moreover, psychologists have shown that gender is revealed by both the 2D texture and the 3D structure of the human face [1]. In fact, it has been shown that gender classification is more effective when 3D structure is used than when image brightness information alone is used [3]. However, relatively little effort has been expended in determining the role of 3D facial shape in machine gender classification [5–7]. This is partly attributable to the more complex computations required in
Corresponding author. Tel.: þ 44 29 20876751; fax: þ44 29 20874598.
E-mail address:
[email protected] (J. Wu). 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.04.013
3D facial shape analysis, and the limited effectiveness and high cost of the 3D sensors currently available. An alternative is to use a 2.5D representation based on facial surface normals, or facial needle-maps. These reveal facial shape information from a fixed view point, and can be recovered from 2D brightness images using techniques such as shape-from-shading (SFS). Therefore, in this paper, we pursue the development of a gender classification method based on facial needle-maps rather than on intensity images. Having established the relative usefulness of both 2D intensity and 3D shape representations, our main concern in this paper is the development of improved statistical techniques for gender classification. For gender classification, and for face recognition in general, there are two steps, namely (a) feature extraction and (b) classification. This requirement has been widely accepted for some time. In fact, even the earliest gender classification system ‘SEXNET’ [8] adhered to it, employing a two-stage network: the first stage served to compress a 900-unit image to just 40 representational units, and the second stage classified each face using a 40-unit representation. Cottrell et al. [9] reduced the image dimensionality from 4096 to 40 via an autoencoder, which is similar in function to principal component analysis (PCA). In fact, there are several gender classification approaches using PCAbased features [6,10,11]. These include active appearance models (AAM) [12,13]. Recently, there has been a trend to emphasize the role of the classifier. This is based on the successful application of support vector machines (SVMs) and AdaBoost to gender classification [14,15]. In [14,15], feature extraction was simplified to using ‘thumbnail’ images. However, although using these
2872
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
classifiers achieved a higher accuracy than using PCA-based features alone, it is non-trivial to determine the optimal parameters to be used in a complex classifier system. Here we take a different view, and place the emphasis on the role of feature extraction in gender classification. This is based on observations concerning the human perceptual system, which utilizes just a very few of the most important attributes [16]. Well selected gender discriminating features can save considerable effort in the classification stage. Moreover, they may indicate the inherent difference between genders. Therefore, in this paper, we focus on how to construct discriminating models for gender feature extraction. We aim to improve the discriminating power of PCA-based features, and to demonstrate the effectiveness of using these discriminating features in conjunction with simple classifiers. 1.1. Related literature Based on the type of features used, there are two main approaches to gender classification. The first uses geometric based features. A popular approach is that of Brunelli and Poggio [17], who defined 16 geometrical features as input to train two competing HyperBF networks that were then used to classify gender. A correct classification rate of 79% was reported for ‘unseen faces’. In Burton et al.’s work [2] from psychology, 73 points from full-face views and 34 points from profile views are extracted, and an accuracy approaching human performance (94%) was reported. More pertinent to our work is the second approach to gender classification which is appearance based. This aims to make use of the full facial image without extracting any geometric features. As mentioned above, many such approaches emphasize the role of the classifier, and use ‘thumbnail’ images as input. Gutta and Wechsler [18] used hybrid classifiers consisting of an ensemble of RBF networks together with inductive decision trees. On average, an accuracy of 96% was reported for gender classification using this technique. Moghaddam and Yang [14] demonstrated that non-linear SVMs were superior to traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, nearest neighbor), RBF classifiers, and large ensemble-RBF networks. An accuracy of better than 96% was reported using an SVM with Gaussian RBF kernels. Recently, Kim et al. [19] performed gender classification using Gaussian process classifiers, which are a class of Bayesian kernel classifiers. This technique overcame the difficulty encountered by SVMs in determining the hyperparameters for the kernels. Another popular approach to gender classification is to use Adaboost learning methods. This type of classifier is much faster than SVMs, and represents a better choice for real-time applications. Shakhnarovich et al. [20] applied a thresholded weak classifier variant of Adaboost to detected face images, and achieved an accuracy of 78% for gender classification. Also using a weak classifier Adaboost approach, Wu et al. [21] used a look-uptable to learn gender classifiers, and achieved an accuracy of 88%. More recently, Baluja and Rowley [15] used Adaboost with pixel comparisons, and achieved 93% accuracy on 20 20 pixel images. Moreover, the method was 50 times faster than SVM-based classifiers. Combination of classifiers are also used in gender classification. Khan et al. [22] used genetic programming (GP) to combine four traditional classifiers (k-means, k-nearest neighbors, linear discriminant analysis, and Mahalanobis distance based classifiers) and evolved an optimum combined classifier (OCC). They demonstrated that using OCC not only improve gender classification performance, but also need fewer features than the original classifiers. They also apply GP to combine SVM classifiers [23], and demonstrated the improved performance using the developed OCC. Moreover, the GP combination
scheme automatically incorporated the optimal kernel function and model selection in SVMs to achieve high performance classification model. As noted above, in this paper we take the view that feature extraction is also an important issue in gender classification. Besides the above mentioned SEXNET system [8], and the work of Cottrell et al. [9], there are many other methods that make use of feature extraction for gender classification. Sun et al. [10] applied genetic algorithms to the extracted PCA feature vectors to select a gender discriminating feature subset, and compared the classification results obtained using four different classifiers. The lowest error rate of 4.7% was achieved using an SVM classifier. Buchala et al. [11] also explored which were the most gender discriminating PCA feature components using linear discriminant analysis (LDA). A gender classification accuracy of 86.43% was reported based on the selected feature subset. Recently, active appearance models (AAM) [24], which describe the statistical variations of both gray scale values and shape, have been used as a feature extraction mechanism for gender classification. In [12], the AAM was compared with independent component analysis (ICA) for feature extraction using four classifiers, namely the nearest neighbor, RBF, multilayer perception (MLP), and generalized learning vector quantization. The best accuracy of over 90% was obtained using AAM features and the MLP classifier. Saatci and Town [13] utilized AAM features and SVM for both gender and expression recognition. They also investigated the interdependency of gender recognition upon expression. In addition to these PCA-based feature extraction methods, alternative feature extraction (or dimensionality reduction) methods, such as ICA [25], locally linear embedding (LLE) [6] and curvilinear component analysis [26] have also been applied to gender classification. The above approaches are all based on 2D intensity images. Although 3D facial shape has its advantages in gender classification, little work has utilized this information. Graf and Wichmann [6] applied PCA and LLE to 3D range images of human heads, and used an SVM as the classifier. An accuracy of 93.4% has been reported using PCA and SVM in conjunction. Lu and Chen [5] exploited range information for gender classification, and proposed an integration scheme which combined the registered range and intensity images. Their experiments demonstrated that integrating the 3D range information provided a better classification accuracy than using the 2D intensity alone. The accuracy of the combined method was 91%. Hu et al. [7] explored the gender significance of different facial regions in terms of 3D facial shape, and proposed a fusion method to combine the classification results of different regions. An accuracy as high as 94.3% was reported. In our previous work [27,28], we explored the use of principal geodesic analysis (PGA) [29] and a model-based shapefrom-shading technique [30,31] to recover fields of facial surface normals from brightness images. Using the PGA parameters of the fitted model and a simple Bayes classifier, gender classification accuracies of 95% for ground-truth facial needle-maps and 90% for needle-maps recovered using shape-from-shading were achieved. 1.2. Paper overview In this paper, we focus on the task of constructing statistical models for extracting gender discriminating features from 2.5D facial needle-maps. The feature extraction techniques for gender classification described above are mostly based on PCA, which captures the projections that maximize the variance of the data. However, the projections that maximize the variance are usually not those that best separate the data into distinct clusters. As a result the leading PCA features do not reliably reveal gender differences. In our previous work [32–34], we have proposed the idea to weight the facial needle-maps to take into account the
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
relevance of different locations in determining gender. The aim in doing this is to improve the gender discriminating capacity of the leading features extracted using the standard eigenspace method. In this paper, we (a) describe our previous ideas in detail and more clearly, (b) advance our previous work by proposing a learning strategy to find the optimal weight map, and (c) compare the results obtained using these weighting strategies on both range data and recovered facial needle-maps. The results also demonstrate the feasibility of gender classification using facial shape information contained within the 2.5D facial needle-map. In our work, the shape of a face is represented by a field of facial surface normals which reside on a spherical manifold rather than in a Euclidean space. Linear data analysis techniques such as PCA are not suitable for the analysis of directional data of this type. We therefore commence by showing how to construct an eigenspace model for 2.5D facial surface normals using principal geodesic analysis. We also review the principal geodesic SFS method [31], which is used in our experiments to recover the facial needle-maps from intensity images. Turning our attention to the extraction of gender discriminating features, there are a number of ways to enhance the discriminating power of the standard PGA model. In our previous work [32,34], we presented the most straightforward strategy, referred to as weighted PGA. The idea is to incorporate a precomputed weight map into the PGA model construction process. Research in the psychology literature [4,35] has confirmed that information concerning gender is not uniformly distributed over the face. Work in computer vision has also explored the role of facial regions in gender recognition [36]. The pre-computed weight map quantifies the importance of different facial regions for gender classification. It enables us to control the structure of the data variance so that the variance associated with gender discriminating regions is larger than that for non-discriminating regions, and is captured by the leading principal eigenmodes. The weight map could be obtained through psychological experiments on human subjects [35]. For simplicity, however, we construct the weight map using the angular difference between the mean faces of the two genders. The main contribution of this paper is a supervised version of the above method, termed supervised weighted PGA. Here the weight map is learned from the labeled data by minimizing an error function. The weight map is applied during the feature extraction process rather than during model construction. It is therefore specific to the training data, and enhances the discriminating power of the leading features without affecting the efficiency of the model construction process. Experimental results show that by sacrificing some of the universality of the weight map, the supervised weighted PGA method achieves better classification accuracy than the weighted PGA method. We also describe a third strategy (idea has been proposed in our previous work [33,34]) to control the gender discriminating power of the constructed model. The method, referred to as supervised PGA, is an extension of the supervised PCA technique [37] from Euclidean data to non-Euclidean data residing on a Riemmanian manifold. According to this approach, PGA is viewed as locating the projection that maximizes the sum of pairwise distances between the projected data. Supervised PGA incorporates a weight for each pair of data that indicates their dissimilarity. This weight map encodes the pairwise gender relationships residing in the data. By making use of this weight map, supervised PGA constructs a gender discriminating model that emphasizes the inter-cluster separation. Experimental results show that supervised PGA slightly outperforms weighted PGA. However, it is relatively specific to the training data used. We compare the gender discriminating power of the extracted features and the classification accuracies obtained using the three
2873
different statistical models. We also compare the results with those obtained using the standard PGA model. The outline of this paper is as follows. Section 2 gives a brief review of the standard PGA method and the principal geodesic SFS method, which are the theoretical background of this paper. In Sections 3–5, we respectively describe the three different strategies (weighted PGA, supervised weighted PGA, and supervised PGA) used to enhance the gender discriminating power of the extracted features. Section 6 reviews the nearest neighbor classifier applied to the features. Experimental results and their discussion are given in Section 7. Finally, Section 8 concludes the paper and offers directions for future investigation.
2. Theoretical background In this section, we review the principal geodesic analysis technique and the principal geodesic SFS method. PGA is the theoretical background of the three novel variants of PGA proposed in this paper. The principal geodesic SFS method is used in our experiments to recover facial shapes from intensity images. 2.1. Principal geodesic analysis PGA is a generalization of PCA from Euclidean data to data residing on a Riemmanian manifold. It makes use of exponential/ log maps and intrinsic means [29,38] to project data onto a plane and analyze the data in the projected space. In our application, a surface normal n can be represented as a point residing on a spherical manifold. The exponential/log maps for a spherical manifold are illustrated in Fig. 1. The exponential map, denoted by Expn, maps u to the point, denoted by Expn(u), on the geodesic in the direction of u at distance JuJ from n. The log map, denoted by Logn is the inverse of the exponential map. Intrinsic means minimize the sum-of-squared geodesic distances on a manifold. For a spherical manifold, the geodesic distance is the arc length between two points. The intrinsic mean m of a set of surface normals n1 , . . . ,nK can be calculated using the gradient descent method proposed by Pennec [38]: ( ) K 1X mðt þ 1Þ ¼ ExpmðtÞ Log mðtÞ ðnk Þ : ð1Þ Kk¼1 It has been explained in detail in our previous work [31,28] how to find the principal geodesics from a set of facial needlemaps. Fig. 2 illustrates the process. Suppose there are K facial needle-maps each having N pixel locations. The surface normal at the pixel location l for the kth needle-map is nkl. The left panel of Fig. 2 shows the distribution of surface normals at the pixel location l (nkl , k ¼ 1, . . . ,K) with the mean ml shown as a star. The right panel of Fig. 2 shows the log mapped positions of these normals on the tangent plane passing through ml . We denote the log mapped position of nkl as ukl. By concatenating the x,y-coordinates of ukl at N pixel locations, we get the 2N dimensional log mapped long vector uk ¼ ½uk1x ,uk1y , . . . ,ukNx ,ukNy T . The K long
Fig. 1. The exponential map.
2874
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
1
0.5
PCA1
0
−0.5
−1 −1
−0.5
0
0.5
1
Fig. 2. Projection of surface normals on the unit sphere to points on the tangent plane at the mean.
Fig. 3. The steps of principal geodesic SFS.
vectors form the column-wise data matrix U ¼ ½u1 j . . . juK , and the covariance matrix is S ¼ ð1=KÞUU T . Because N, the dimensionality of the facial needle-maps, is usually too large to make the manipulation of S feasible, the numerically efficient snap-shot method of Sirovich [39] is used to compute the eigenvectors of S. The obtained eigenvector matrix (projection matrix) is denoted as F ¼ ðe1 je2 j jeK1 Þ, where ei , i ¼ 1, . . . ,K1, are the eigenvectors sorted with descending eigenvalues. Given a facial needle-map, the log mapped long vector u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T is computed, then the corresponding PGA parameter vector is b ¼ FT u. 2.2. Principal geodesic shape-from-shading A new iterative shape-from-shading method [31], referred to as principal geodesic SFS, is used in our experiments to recover the facial needle-maps from intensity images. The steps of this SFS method during each iteration are illustrated in Fig. 3. The estimated surface normals are first projected into a space spanned by a statistical model to satisfy a strict global shape constraint. Then the image irradiance equation is imposed as a hard local brightness constraint [40]. Upon convergence, there are two types of recovered needle-maps. One is an instance of the statistical model (referred to as the best fit needle-map), the other is the one satisfying the data-closeness. Since the statistical model is constructed by applying PGA to a set of ground-truth facial needlemaps, it captures the distribution of surface normals of real faces. As a result, the projection into the model space guarantees that the recovered needle-maps represent valid human faces. A detailed description and an algorithm of this SFS method can be found in [31,28].
3. Weighted principal geodesic analysis In this section, we describe the weighted PGA method, which is the most straightforward strategy to increase the gender discriminating capacity of the standard PGA model. A pre-computed weight map, which emphasizes the importance of the gender discriminating regions, is incorporated into the PGA model
construction. By component-wise multiplication of the weight map with the face data, the variance of the data in the gender discriminating regions is increased. As a result, the leading eigenmodes, which capture the largest variance, will encode more gender discriminating information than the non-leading eigenvectors. There are two points that must be addressed when implementing the weighted PGA method. The first is the construction of the weight map. The second is how to use the weight map for gender classification. 3.1. Construction of the weight map The weight map is a representation of the distribution of gender discriminating information over the facial surface. It assigns a weight to each location on the facial needle-map. The locations of the gender discriminating regions, such as eyebrows, nose, etc., are assigned high weights (whigh), while the remaining locations are assigned low weights (wlow). In this way, we control the data variance structure so that the surface normals in the gender discriminating regions have a variance that is a factor whigh =wlow larger than the surface normals in the non-discriminating regions. An optimal weight map would ideally give the gender discrimination that is consistent with that of a human subject, and is independent of the training data. However, this optimal quantification is difficult. In this paper, for simplicity, we construct the weight map through the angular difference between the intrinsic means of the female and male facial needle-maps. At the pixel location indexed l, the weight is 1 f 2 wl ¼ 1exp 2 ½arccosðn m , ð2Þ l n l Þ
s
nm l
where is the mean unit surface normal for males at the image location l. n fl is the corresponding mean unit surface normal for females. The weight map W ¼ ½w1 , . . . ,wN T . Since the angular difference at each pixel location between the two gender means is small, this construction is consistent with the small angle behavior of the von Mises–Fisher distribution [41]. By making use of the intrinsic means, the constructed weight map is less influenced by the differences in facial shapes for subjects of the same gender. To construct the weight map, we need to determine the optimal value of s in Eq. (2). Our optimality criterion is to select the value of s that gives the weight map with which the leading d eigenvectors posses the largest gender discriminating capacity. The discriminating capacity is computed using the criterion function introduced in [16] JðYÞ ¼ trðS1 w Sb Þ ¼
d X i¼1
li ,
ð3Þ
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
where li , i ¼1yd are the eigenvalues of the matrix S1 w Sb . Here, Sw and Sb are the within and between-class scatter matrices in the feature space, i.e. Sw ¼
Sb ¼
X
Kc X
c
k¼1
ðbk b^c Þðbk b^c ÞT ,
X ^ b^c bÞ ^ T, Kc ðb^c bÞð
ð4Þ
ð5Þ
c
where c A ff ,mg denotes the set of class labels for the two genders, Kc is the number of labeled samples in class c, bk, b^c , and b^ are respectively the feature vectors of the kth sample, the mean for class c, and the global mean for the entire training set. 3.2. Apply the weight map to needle-maps An N dimensional needle-map is a point residing on the Q 2 manifold S2 ðNÞ ¼ N i ¼ 1 S . Suppose there are K such facial needle-maps. We make use of the log map to project each needlemap onto the tangent plane passing through the intrinsic mean, and represent them by the matrix of log mapped long vectors U ¼ ½u1 j juK . The weight map is multiplied component-wise with each long vector, and we obtain the set of weighted data WU ¼ ½W:u1 j jW:uK , where : denotes component-wise matrix multiplication. Because each long vector has two components for each pixel location u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T , the weight map is separately applied to the two components at each location, i.e. W:u ¼ ½w1 u1x ,w1 u1y , . . . ,wN uNx ,wN uNy T . The covariance matrix for the set of K needle-maps is constructed from the weighted data as follows: W
S
1 ¼ ðWUÞðWUÞT : K
ð6Þ
We use the numerically efficient snap-shot method of Sirovich [39] to compute the eigenvectors of SW . The K 1 leading eigenvectors e1 ,e2 , . . . ,eK1 of SW form the projection matrix FW ¼ ½e1 je2 j . . . jeK1 . The projection matrix together with the corresponding eigenvalues LW ¼ ½l1 , . . . , lK1 and the intrinsic mean mW constitute the parameters of the weighted PGA model. Given a facial needle-map, through the intrinsic mean and log map, we first compute its long vector u, and then the weighted PGA feature vector is given by bW ¼ ðFW ÞT ðW:uÞ,
ð7Þ
where : denotes component-wise matrix multiplication.
4. Supervised weighted principal geodesic analysis In this section, we describe the supervised weighted PGA method. It draws on the same framework as the weighted PGA method, but differs in two important and novel respects. First, the weight map in the weighted PGA method was constructed from the mean faces of the two genders. As a result it does not make full use of the information provided by the labeled data. The supervised weighted PGA method enhances the weighted PGA method by using an iterative learning process to construct the weight map. The second novel ingredient is that in weighted PGA, the weight map is applied during model construction. In supervised PGA, on the other hand, the weight map is applied during feature extraction (which is performed after the construction of the model). We have two main objectives in developing the supervised weighted PGA method. The first is how to learn the gender relevant weight map from labeled facial needle-maps. The second is how to extract gender discriminating features by making use of the weight map.
2875
4.1. Feature extraction Suppose there are K facial needle-maps each having N pixel locations. Through principal geodesic analysis, the projection matrix F ¼ ½e1 j jeK1 and the corresponding eigenvalues L ¼ ½l1 , . . . , lK1 are obtained, together with the intrinsic mean m. This constitutes the standard PGA model. Given a facial needle-map n, through the log map Log m ðnÞ, the 2N dimensional long vector u ¼ ½u1x ,u1y , . . . ,uNx ,uNy T is obtained. The standard PGA feature vector is b ¼ FT u, which can be expressed component-wise as bi ¼
2N X
FTil ul ,
ð8Þ
l¼1
where Fi denotes the ith eigenvector, and Fil is its value at the location l. Supervised weighted PGA incorporates a gender relevant weight map to improve the gender discriminating capacity of the standard PGA model. It extends the above component-wise feature extraction method in Eq. (8) by applying the weight map in the following way: bSW ¼ i
2N X
FTil wl ul ,
ð9Þ
l¼1
where wl is the weight at the location l. Because the weight map has a large absolute value in gender discriminating regions, supervised weighted PGA increases the influence of gender discriminating regions over the extracted features and decreases that of the non-discriminating regions. In this way, we increase the discriminating capacity of the standard PGA model. Eq. (9) can be rewritten as bSW ¼ FT ðW:uÞ,
ð10Þ
where : denotes component-wise matrix multiplication, and W ¼ ½w1 ,w2 , . . . ,w2N1 ,w2N T . This directly relates the supervised weighted PGA method to the weighted PGA method. Both methods multiply the weight map component-wise with the facial data. However, in weighted PGA, this multiplication takes place before the construction of the model. As a result, the model is modified by the weight map. In supervised weighted PGA, on the other hand, this multiplication takes place during feature extraction. The intrinsic mean and projection matrix are the same as the standard PGA model. Moreover, the weight map in supervised weighted PGA is optimized to the training data through a learning process. 4.2. Optimization of the weight map Suppose there are K labeled data, and the data with indices 1 to nf are labeled females, and the data with indices nf þ1 to K are labeled males. Using the labeled data the weight map is optimized to minimize a total error function
x¼
nf X distW ðbk , b^f Þ2 þ dist ðb , b^m Þ2
k¼1
W
k
K X k ¼ nf þ 1
distW ðbk , b^m Þ2 , dist ðb , b^ Þ2 W
k
ð11Þ
f
where bk, b^f , and b^m are respectively the d dimensional supervised weighted PGA feature vectors of the kth facial needle-map, the intrinsic mean for the females, and the intrinsic mean for the males, and distW is the weighted Euclidean distance between two feature vectors, which is defined as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ! u d uX l i 2 dist ðb , b^c Þ ¼ t ð12Þ ðb b^c Þ , P W
k
i¼1
K1 j¼1
lj
ki
i
where c A ff ,mg, li is the eigenvalue corresponding to the ith eigenvector of the projection matrix F, and bki and b^c i are respectively the ith component of bk and b^c .
2876
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
We construct this total error function based on the assumption that when measured in terms of Euclidean distance, faces are closest to the mean face of the same gender. The minimum value of this error function is zero and this is achieved when distðbk , b^f Þ ¼ 0 for all female faces, and distðbk , b^m Þ ¼ 0 for all male faces. When this is the case, the surrounding data are clustered to the mean for their appropriate gender. As a result, the optimization process aims to concentrate the faces around the appropriate mean face in the feature space. We use the weighted Euclidean distance instead of the standard Euclidean distance in order to emphasize the influence of the leading eigen-features on the calculation of the distance. Substituting Eq. (9) into Eqs. (11) and (12), the total error function can be written as nf
K X qkf X qkm x¼ þ , q q k ¼ 1 km k ¼ n þ 1 kf
qkc ¼
i¼1
!(
PK1
j¼1
lj
2N X
joT SB oj , joT SW oj
ð19Þ
where j j is the determinant of a matrix, and SB, SW are respectively the between-class and within-class scatter matrices in the original data space. In our application, suppose there are K labeled data, and the data with indices 1 to nf are labeled females, and the data with indices nf þ1 to K are labeled males, then, T
T
SB ¼ nf u^f u^f þ ðKnf Þu^m u^m ,
SW ¼
nf X
ðuk u^f Þðuk u^f ÞT þ
K X
ðuk u^m Þðuk u^m ÞT ,
ð21Þ
where uk, u^f , and u^m are respectively the 2N dimensional long vectors of the kth needle-map, the female intrinsic mean, and the male intrinsic mean. Substituting Eqs. (20) and (21) into Eq. (19), then,
)2
FTil Wl ðukl u^ cl Þ
ð20Þ
k ¼ nf þ 1
ð13Þ
where
li
x0 ¼
k¼1
f
d X
within-class scatter
,
ð14Þ
l¼1
where c A ff ,mg, uk and u^ c are respectively the 2N dimensional long vectors of the kth needle-map and the intrinsic mean of class c, ukl and u^ cl are respectively the values of uk and u^ c at the location l. Since the projection matrix F, the matrix of eigenvalues L, and the labeled data ½u1 , . . . ,uK are all fixed, the total error function varies only with the weight map W. We use gradient descent to optimize the weight map, that is: W ðt þ 1Þ ¼ W ðtÞ ZrxðW ðtÞ Þ,
ð15Þ
where Z is the update step size which is chosen to have a value in the interval (0,1). Suppose that W ¼ ½w1 , . . . ,w2N T , where 2N is the dimensionality of the log mapped long vector of the facial needle-maps, then T @xðWÞ @xðWÞ rxðWÞ ¼ , ..., : ð16Þ @w1 @w2N The partial derivatives of the error function are 9 8 >@qkf ðWÞ q ðWÞq ðWÞ@qkm ðWÞ> > nf > = < km kf X @xðWÞ @wx @wx ¼ 2 > > @wx qkm ðWÞ > k¼1> ; : 9 8 @qkf ðWÞ> @qkm ðWÞ > > > q ðWÞq ðWÞ = < K kf km X @wx @wx , þ 2 > > qkf ðWÞ > k ¼ nf þ 1 > ; :
jnf ðoT u^f ÞðoT u^f ÞT þ ðKnf ÞðoT u^m ÞðoT u^m ÞT j PK T T ^ T T ^ T k ¼ nf þ 1 ðo uk o um Þðo uk o um Þ j
x0 ¼ Pnf j
ðoT uk oT u^f ÞðoT uk oT u^f ÞT þ k¼1
T T jnf b^f b^f þ ðKnf Þb^m b^m j , ¼ Pnf P K T j k ¼ 1 ðbk b^f Þðbk b^f Þ þ k ¼ nf þ 1 ðbk b^m Þðbk b^m ÞT j
where, bk, b^f , and b^m are respectively the LDA feature vectors of the kth data, the female mean, and the male mean. Since for gender classification, there are only one eigenvector in o with non-zero eigenvalue, then bk, b^f , and b^m are all scalar values. As a result, the j j operation in Eq. (22) can be ignored. Eq. (22) is simplified as ^ þ ðKn Þ dist ðb^m , bÞ ^ nf distE ðb^f , bÞ E f , PK ^ dist ðb , b Þ þ dist ðb , b^m Þ
x0 ¼ Pnf
k¼1
E
k
f
E
k ¼ nf þ 1
ð23Þ
k
where distE ð,Þ is the Euclidean distance, and in our application, the mean of the data b^ ¼ 0. 0 LDA locates the projection that maximizes x . This is equivalent to finding the projection that minimizes P Pnf distE ðbk , b^f Þ þ Kk ¼ nf þ 1 distE ðbk , b^m Þ 1 k¼1 x00 ¼ 0 ¼ : ð24Þ ^ þ ðKn Þ dist ðb^m , bÞ ^ x n dist ðb^ , bÞ f
ð17Þ
ð22Þ
E
f
f
E
This relates LDA to supervised weighted PGA. Both of the methods aim to locate the projection that minimizes an error function in the projected feature space. However, in gender classification, the features in the LDA error function (Eq. (24)) are scalar, while the features in the supervised weighted PGA error function (Eq. (11)) are vectors.
where ! ( ) d 2N X X @qkc ðWÞ li ¼ 2 ukx u^ cx FTix FTil Wl ðukl u^ cl Þ PK1 @wx i¼1 l¼1 j ¼ 1 lj
5. Supervised principal geodesic analysis ð18Þ
and c A ff ,mg. Since the optimization is performed using all of the available labeled data, the weight map is specific to the training data. This is distinct from the case where the weight map is constructed using only the information provided by the mean needle-maps for the two genders.
In this section, we consider a third gender classification strategy which makes use of pairwise information to improve the discriminating capacity of the constructed model using labeled data. It is an extension of supervised principal component analysis (supervised PCA) [37] to data residing on a Riemmanian manifold. We refer to this method as supervised principal geodesic analysis (supervised PGA).
4.3. Relationship with LDA
5.1. Supervised PCA
Linear discriminant analysis (LDA) finds the projection o that maximizes the ratio of the between-class scatter and the
Koren and Carmel analyzed the relationship between PCA and multidimensional scaling [37], and concluded that PCA computes
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
the projection that maximizes X x ¼ ðdistE ðbk1 ,bk2 ÞÞ2 ,
ð25Þ
k1 o k2
where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u d uX ðbk1 i bk2 i Þ2 distE ðbk1 ,bk2 Þ ¼ t
ð26Þ
i¼1
is the Euclidean distance between the data indexed k1 and k2 in the projected feature space. Based on this observation, they generalized PCA by incorporating a symmetric and non-negative pairwise weight matrix fwk1 k2 gKk1 ,k2 ¼ 1 , where wk1 k2 is the dissimilarity measure that gauges the importance of placing the data points k1 and k2 further apart in the projected space. This generalized PCA method seeks the projection that maximizes the sum of weighted squared pairwise distances X x ¼ wk1 k2 distE ðbk1 ,bk2 Þ2 : ð27Þ k1 o k2
It has been proved by Koren and Carmel [37] that the projection that maximizes Eq. (27) is obtained by taking the principal eigenvectors of the matrix XTLX, where L is the Laplacian associated with the dissimilarities, i.e. ( PK k1 ¼ k2 , k ¼ 1 wkk2 , Lk1 k2 ¼ ð28Þ k1 a k2 wk1 k2 , and X is the row-wise data co-ordinate matrix. For labeled data, this weighted PCA method turns out to be equivalent to supervised PCA which underweights the intra-cluster pairwise data dissimilarities: ( t wk1 k2 k1 and k2 have the same label, ð29Þ ¼ w k1 k2 otherwise, wk1 k2 where 0 rt o 1 is a decay factor. In this way, supervised PCA constructs the projection that emphasizes the inter-cluster separation of the data. 5.2. Supervised PGA for facial needle-maps As mentioned in [29,31,28], the principal geodesics in PGA can be approximated by applying standard PCA in the tangent plane passing through the intrinsic mean. A straightforward generalization is to replace the standard PCA in the tangent plane by the supervised method described above. This will have the effect of enhancing the inter-cluster separability of the constructed PGA model. Making use of the log map, we first obtain the columnwise long vectors of the K labeled facial needle-maps U ¼ ½u1 j juK . For gender classification, we construct the dissimilarity matrix: 0 k1 and k2 are of same gender, wk1 k2 ¼ ð30Þ 1 otherwise and the corresponding Laplacian matrix L. According to [37], the K 1 leading eigenvectors with the highest eigenvalues of the matrix ULUT form the projection in the long vector space: FS ¼ ½e1 je2 j . . . jeK1 . Together with the eigenvalues and intrinsic mean, we obtain the supervised PGA model. Given a facial needlemap, its long vector u is first obtained through log mapping it onto the tangent plane passing through the mean. The corresponding supervised PGA features are calculated as bS ¼ ðFS ÞT u:
ð31Þ
Since the Laplacian L is a K K symmetric positive semi-definite matrix, its eigenvalues are non-negative. Through
eigen-decomposition, L can be rewritten as pffiffiffiffiffiffipffiffiffiffiffiffi pffiffiffiffiffiffi pffiffiffiffiffiffi L ¼ FL LL LL FTL ¼ ðFL LL ÞðFL LL ÞT ,
2877
ð32Þ
where FL and LL are the eigenvectors and eigenvalues of L. Accordingly, pffiffiffiffiffiffi pffiffiffiffiffiffi pffiffiffiffiffiffi pffiffiffiffiffiffi ULU T ¼ UðFL LL ÞðFL LL ÞT U T ¼ ðU FL LL ÞðU FL LL ÞT : ð33Þ Represented in this way, we can again use the numerically efficient snap-shot method of Sirovich [39] to compute the eigenvectors and eigenvalues of (33). Moreover, representing ULUT in this way, supervised PGA can be viewed as standard PGA on weighted long vectors. The weight pffiffiffiffiffiffi map for each datum is the component-wise division of ðU FL LL Þ and U, i.e. pffiffiffiffiffiffi pffiffiffiffiffiffi ½W1 , . . . ,WK ¼ ½u1 FL LL :=u1 , . . . ,uK FL LL :=uK , ð34Þ where Wk ,k ¼ 1, . . . ,K is the weight map for the kth datum, and uk ,k ¼ 1, . . . ,K is the log mapped long vector of the kth needlemap, and ./ denotes the component-wise matrix division. The weight maps are specific to each datum. This is different from the above two methods which make use of a common weight map for the entire data set.
6. Classifier With the extracted feature vectors of the training and test facial needle-maps at hand, we employ the nearest neighbor classifier to classify the test faces on the basis of gender. We use the Euclidean distance between the feature vectors of the test face and the two mean faces. The mean faces of the two genders are obtained from the training data using intrinsic means. The test face is assigned to the gender with the closer mean in the feature space.
7. Experiments and discussion In this section, we present experiments using the three variants of the PGA method, and compare them to standard PGA. There are four aspects to this study. First, we explore the construction of the weight maps, and examine the gender discriminating capacity of the statistical models constructed from range data. Secondly, we apply the three statistical models to range data for feature extraction, and compare the gender classification performance. Here, we also compare our methods with the Fisher faces, support faces, and three Adaboost methods. Thirdly, we apply the constructed models to recovered needlemaps, and compare the results with human observers. Finally, we examine the effectiveness of our methods on the University of Notre Dame (UND) biometric database [42,43], and compare the results with those reported in existing methods. 7.1. Model construction The surface normal data used to construct the gender discriminating statistical models are taken from the Max-Planck Face Database [44,45]. The database comprises 200 laser scanned (Cyberware TM) human heads without hair (100 females and 100 males). The ground-truth facial needle-maps are obtained by first orthographically projecting the facial range data onto a frontal view plane, and then cropping the plane to be 142-by124 pixels to maintain only the inner part of the face. Finally, we compute the ground-truth surface normal at each pixel position using the height gradients of the processed range image. Using the ground-truth facial needle-maps, we first construct the weight map used in the weighted PGA method. We need to
2878
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
Discriminating capacity
2.16
2.14
2.12
2.1
2.08 0.006
0.008
0.01
0.012
Values of
0.014
0.016
σ2
Fig. 4. Determination of the s value.
Fig. 5. From left to right are the intrinsic mean of the females, intrinsic mean of the males, and the weight map.
140
Error function values
120 100 80 60 40 20 0 0
1000
2000 3000 4000 Number of iterations
5000
Fig. 6. Error function values during the optimization process.
6000
determine the value of s in Eq. (2). As described in Section 3, the s value is determined to be the one with which the constructed weight map gives the largest gender discriminating capacity for the leading d weighted PGA eigenvectors. In our experiments we choose d ¼5, because it has been shown in [46] that five gender discriminating features is sufficient for gender classification. We construct the weight map using 10 different s values, and obtain 10 different weighted PGA models. The discriminating capacity of the leading five eigenvectors for each of the 10 models are shown in Fig. 4. From the figure, we select s from the shoulder of the curve, and this occurs when s2 ¼ 0:012. The mean faces for the two genders and the constructed weight map are shown in Fig. 5. It is clear that the mean faces reveal gender information and eliminate any identity cues. The weight map reveals there is greater gender difference in the regions around the eyebrows, nose and mouth. These are known to be gender discriminating regions from the psychology literature [4,35]. In supervised weighted PGA, the weight map is learned through an optimization process. The complete sample (100 females and 100 males) is used as labeled data. We set the step size Z ¼ 0:5 in the gradient descent method, and perform 6000 iterations. A plot of the error function (Eq. (11)) as a function of iteration number is shown in Fig. 6. The error decreases rapidly in the first 1000 iterations, and convergence is achieved after about 3000 iterations. The weight maps learned during the optimization process are shown in Fig. 7. The weight map is applied to the log mapped long vector, which has two components for each pixel location. The weight map is therefore displayed in two separate parts corresponding to the two components. From Fig. 7, it is clear that the weight map becomes more detailed as the process iterates. This means that it becomes more specific to the training data for an increasing number of iterations. The regions emphasized are still the eyebrows, the areas around the nose and the mouth. As mentioned in Section 5, the weighted covariance matrix ULUTp inffiffiffiffisupervised pffiffiffiffi PGA can be represented as the matrix product ðU FL LÞðU FL LÞT . In this way, supervised PGA can be viewed as standard PGA on weighted long vectors. pffiffiffiffiffiffi The weight map for the kth (k ¼1,y,K) data is W k ¼ uk FL LL :=uk , where uk is the log mapped long vector of the kth needle-map, and := denotes the component-wise matrix division. Some examples of the two parts of the weight maps are shown in Fig. 8. The weight maps in supervised PGA model are specific to each datum, while the weight map in supervised weighted PGA model is common to all the data. From the figure, it is clear that the calculated weight maps also give more importance to the areas which are used for gender discrimination by human observers. After obtaining the weight map, we construct (a) the weighted PGA model, (b) the supervised weighted PGA model, and (c) the supervised PGA model using the ground-truth facial needle-maps. The standard PGA model is also constructed for comparison. Each ground-truth facial needle-map is represented by the feature
Fig. 7. Weight maps during the optimization process.
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
2879
Fig. 8. Examples of the computed weight maps (each comprises two components) in supervised PGA model. The first two columns are for females and the last two columns are for males.
Weighted PGA
Supervised Weighted PGA Discriminating capacity
Discriminating capacity
20 1 0.8 0.6 0.4 0.2 0
0
10
20 30 Eigenmodes
40
15 10 5 0
50
0
10
1 0.8 0.6 0.4 0.2 0 0
10
20 30 Eigenmodes
40
50
40
50
Standard PGA Discriminating capacity
Discriminating capacity
Supervised PGA
20 30 Eigenmodes
40
50
1 0.8 0.6 0.4 0.2 0 0
10
20 30 Eigenmodes
Fig. 9. Discriminating capacity.
vectors extracted using the corresponding models. Using these feature vectors, we compute the discriminating capacity defined by Eq. (3) for each eigenmode of the four PGA models. The results of the leading 50 eigenmodes are shown in Fig. 9, from which it is clear that compared to the standard PGA model, the weighted PGA model and the supervised PGA model transfer more discriminating power into the first eigenvector. However, the supervised weighted PGA model outperforms the alternative three models by significantly improving the gender discriminating power in each of the eigenmodes (note the larger scale on the y-axis). This confirms our assumption that by making use of a gender relevant weight map, the gender discriminating capacity
of the leading eigenmodes is improved. The improvement of the supervised weighted PGA model compared with that for the alternative two novel PGA models is attributable to the fact that the weight map used in the supervised weighted PGA model is learned from the training data. This makes it considerably more specific to the data than that used in the weighted PGA model. Although the supervised PGA model is also specific to the labeled data, it only emphasizes the inter-class separation. By minimizing the error function in Eq. (11), the supervised weighted PGA model clusters the data around the means of the two genders. It therefore not only increases the inter-class separation, but also reduces the intra-class separation through the weight optimization
2880
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
3
30
2 2nd dimension
2nd dimension
Weighted PGA 40
20 10 0 −10 −20 −30
−20
−10 0 10 1st dimension
20
0 −1
−3 −5
30
Supervised PGA
0 1st dimension
5
Standard PGA
40
40
30
30
2nd dimension
2nd dimension
1
−2
50
20 10 0 −10
20 10 0 −10 −20
−20 −30 −30
Supervised Weighted PGA
−20
−10 0 10 1st dimensiion
20
30
−30 −40 −30 −20 −10 0 10 1st dimension
20
30
Fig. 10. Visualization of leading two features.
7.2. Gender classification on needle-maps extracted from range images In this section, we compare the gender classification performance by applying the three gender discriminating models and the standard PGA model to the 200 facial needle-maps extracted from range images (from the Max-Planck database). A total of 160 needle-maps are randomly selected for use as training data, and the remaining 40 as test data. From the training needle-maps, we learn the weight maps and construct the gender discriminating models and the PGA model. When construct the weight map for the weighted PGA model, we set s2 ¼ 0:012 which is obtained from the above model construction section. Both the training and test data are represented by the feature vectors extracted using PGA model and the three gender discriminating models. Gender classification is performed by applying the nearest neighbor classifier to the leading m (m¼1,2,5,10,20,30,50) features. The average error rates are estimated with 10-fold cross validation (CV) for each value of m. The results are shown in Fig. 11. From the figure, there are some interesting effects that deserve comment. First, the supervised weighted PGA model significantly outperforms the alternative three models, no matter how many leading features are used. This is consistent with its significantly better performance in gender discrimination described above. Second, when m r5, using the supervised PGA model and the weighted PGA model both achieve better gender classification
0.25
Classification Error Rates
process. This reduction of intra-class separation is clear in Fig. 10, which visualizes the 200 data using the leading two eigenfeatures. From the figure, the features extracted using the supervised weighted PGA model are better separated by gender, and are more concentrated within the same gender than those extracted using the remaining three models. Compared to the supervised weighted PGA model, the improvement offered by the weighted PGA model and the supervised PGA model is not obvious.
PGA weighted PGA supervised PGA supervised weighted PGA
0.2
0.15
0.1
0.05
0 m=1
m=2
m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters
Fig. 11. Classification error rates.
results than the standard PGA model. This gives further confirmation of our assumption that incorporating gender relevant weights improves the discriminating capacity of the leading eigenmodes. Finally, irrespective of the PGA models used, the gender classification error rate reaches a minimum, and then either increases or remains with increasing m. This is consistent with human classification performance, which is based on just a very few of the most important attributes [16]. The redundant or irrelevant information contained in higher dimensions causes a deterioration of the classification accuracy. In Table 1, we summarize the best classification accuracy obtained, together with the optimal number of the leading features used, for the three gender discriminating models and the standard PGA model. In each column the accuracy obtained is greater than 91%, which indicates the feasibility of accurate gender classification based on facial shape information contained
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
10%
Weighted PGA
Supervised weighted PGA
Supervised PGA
Standard PGA
91.75 m¼ 30
97.00 m¼ 10
91.25 m¼10
91.25 m¼20
Classification Error Rates
0.25 supervised weighted PGA LDA
0.2
Classification Error Rates
Table 1 Gender classification accuracy.
Best accuracy (%) Dimension of features used
2881
8%
6%
4%
2%
0 1/3
2/3
3/3
0.15
4/3 5/3 σ (ratios of σ0)
6/3
7/3
8/3
Fig. 13. Gender classification using SVM with RBF kernels.
0.1 Table 2 Gender classification using AdaBoost.
0.05
0 m=1
m=2
m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters
Fig. 12. Comparison to Fisher faces.
in facial needle-maps. Using the leading 10 features from supervised weighted PGA, we achieve the highest average classification accuracy of 97%. For comparison, we also apply the Fisher faces [47], support faces [14], and three AdaBoost methods [48–50] to the gender classification problem. From the 200 facial needle-maps, 160 are randomly selected for use as training data, and the remaining 40 as test data. The average error rates are estimated with 10-fold cross validation. When using Fisher faces, the Fisher linear discriminant is applied to the leading m (m ¼1,2,5,10,20,30,50) standard PGA features. Next the nearest neighbor classifier is applied for the purpose of classification. The results are shown in Fig. 12, and are compared with those obtained using the supervised weighted PGA method. It is clear that the supervised weighted PGA method outperforms the Fisher faces method when using fewer features. It indicates the effectiveness of considering intra-class and inter-class differences at a pixel level (supervised weighted PGA) rather than at a sample level (Fisher faces). When learning gender with support faces, we use a support vector machine (SVM) with radial basis kernel functions (RBF) kðxi ,xj Þ ¼ expðjxi xj j2 =2s2 Þ as the classifier, where s is the width of the Gaussian cluster. We estimate the base s value s0 as the average pairwise distance between the training data. To be consistent with [14], the input of the classifier is the set of log mapped long vectors of the 142-by-124 pixel needle-maps. The gender classification results for different values of s=s0 are shown in Fig. 13. The highest overall accuracy is 97%, and this is achieved when s ¼ 4=3s0 . It is comparable with the highest accuracy achieved using the supervised weighted PGA method. When using AdaBoost, we also use the log mapped long vectors of the needle-maps as input. Three AdaBoost algorithms, namely, real AdaBoost [48], gentle AdaBoost [49], and modest AdaBoost [50], are used as classifiers. The classification accuracies are shown in Table 2, which are clearly outperformed by the best accuracy achieved using the supervised weighted PGA method.
Accuracy (%)
Real AdaBoost
Gentle AdaBoost
Modest AdaBoost
96.25
95.75
94.50
The results of using SVM and AdaBoost confirm the important role of feature extraction in gender classification. Using only a few well selected features, we can achieve gender classification performance using simple classifiers which is as good as that obtained using complicated classifiers. 7.3. Fitting discriminating models to recovered needle-maps In this section, we first illustrate the performance obtained using principal geodesic SFS, and then apply the gender discriminating models to facial needle-maps recovered from intensity images. The intensity images comprise 13 female and 30 male subjects with neutral expression and no glasses. Images are captured under a single known light source direction using a Nikon D200 camera. Brightness normalization is required for these images. We use only the red channel of the color images. The intensity contrast is linearly stretched to normalize the ambient lighting variations. Finally, we use the method proposed in [51] to apply photometric correction and specularity subtraction to the intensity stretched images. Although histogram equalization is widely used for image brightness normalization, we do not use it because it modifies the distribution of intensity. This will affect the shape recovered using SFS since it distorts the physical reflectance model. After brightness normalization, the images are geometrically aligned using the centers of the eyes, the tip of nose, and the middle of the mouth. Finally, we crop the images to maintain only the inner facial region. We use the principal geodesic SFS method described in Section 2 to recover the facial needle-maps. The statistical model used in the SFS method is constructed from the 200 range images in the MaxPlanck database. Because of the nature of the principal geodesic SFS method, there are two sets of recovered needle-maps. The first satisfies the data-closeness constraint and satisfies Lambert’s law as a hard constraint. The second needle-map best fits the statistical shape model. Ten examples (five females and five males) of the recovered needle-maps and the corresponding integrated height surfaces are shown in Fig. 14. The needle-maps extracted from range images and their corresponding height surfaces are also shown for
2882
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
Fig. 14. Examples of the results of principal geodesic SFS. From left to right are the input images, the recovered needle-maps satisfying data-closeness, the recovered needle-maps best fitting the model, the ground-truth needle-maps, the fifth to the seventh columns are the corresponding recovered surfaces. From top to bottom, the first five rows are females and the following five rows are males.
comparison. Since the needle-maps satisfying data-closeness appear identical to the images when rendered with a frontal light source, we show the needle-maps re-illuminated with a light source moved by 451 from the viewing direction along the positive x-axis. From the figure, it is clear that both the recovered needle-maps and the surfaces give realistic shape, overcoming the well-known local convexity–concavity instability problem [52]. Moreover, they are similar to their ground-truth counterparts, especially around the nose, mouth, and cheek regions where some gender discriminating information is encoded. After shape recovery, we apply the three gender discriminating models together with the standard PGA model to the recovered needle-maps for feature extraction. The required models are constructed using the above 200 needle-maps extracted from the range images in the Max-Planck database. As a result, they are models of facial shapes free of facial textures. According to [30], the differences between the best fit needle-map and the needlemap satisfying data-closeness are almost solely due to the variation in albedo at the eyes, eyebrows, and lips. As a result, the best fit needle-map may, in fact, provide a more accurate
estimate of the underlying facial shape. Therefore, we use the best fit needle-maps in this experiments to ensure the exhibited model discriminating capacity is not biased by facial textures. Fig. 15 visualizes the feature extraction results using the leading two eigen-features. From the figure, it is clear that irrespective of the model used, the extracted features are not as discriminating as those extracted from range data. There are two reasons for this. First, the gender discriminating models are constructed using the range data. Because of a lack of commonality, the models are therefore more suitable to the range data than the needle-maps recovered using SFS. Second, the global shape constraint, imposed by the statistical model in principal geodesic SFS constrains the recovered needle-maps to be more concentrated around the model mean. Therefore, the differences between the faces are reduced by the shape recovery process. However, compared to the alternative three models, the supervised weighted PGA model still exhibits useful gender discriminating capacity on the recovered facial needle-maps. We make use of an EM algorithm to fit a two-component Gaussian mixture model to the leading 10 supervised weighted
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
Weighted PGA
10
1
Supervised Weighted PGA
0.5 2nd dimension
5 2nd dimension
2883
0 −5 −10
0 −0.5 −1 −1.5
−15 −15
−10
−5 0 dimension
5
−2 −6
10
1st
−2 0 1st dimension
Supervised PGA
Standard PGA
5
−4
2
4
10
15
0
2nd dimension
2nd dimension
4
−5 −10
0 −4 −8 −12
−15 −15
−10
−5 0 1st dimension
5
10
−15
−10
−5 0 5 1st dimension
Fig. 15. Visualization of leading two features extracted from recovered facial needle-maps.
Gaussian Mixture estimated by EM
1
2nd dimension
0
−1
−2
−3 −6
−4
−2 1st dimension
0
2
Fig. 16. Clustering results on recovered facial needle-maps.
PGA features for the recovered facial needle-maps. To initialize the EM algorithm, we set the a priori class probabilities as (13/43,30/43) according to the female and male proportions in our data set. We randomly choose a female and a male as the initial means, and the covariance matrices are set as Sð0Þ ¼ Sð0Þ m ¼ f detðSÞ1=10 I10 , where Sc ,c A ff ,mg is the overall covariance matrix, and I10 is the 10 10 identity matrix. We repeat this procedure several times with random initialization, followed by EM iterations. We select the final EM result to be that with the largest log likelihood, which is shown in Fig. 16. From the figure, the extracted features for the 43 faces are still well clustered. However, there are some mis-classified faces (highlighted in Fig. 16).
The mis-classified faces are visualized in the first row of Fig. 17. The first two are mis-classified females, and the following eight are mis-classified males. Some of these images (the two females, the 1st, 3rd, 4th, and 6th males) exhibit at least some gender ambiguity under subjective human judgement. However some (the 2nd, 5th, 7th and 8th males) pose no difficulty. To investigate this more systematically, we have presented the 43 images to eight unbiased human observers. The observers were asked to assign the images to one of the following classes: (a) definitely female, (b) possibly female, (c) not sure, (d) possibly male and (e) definitely male. We assign male probabilities to these classes (a, 0%; b, 25%; c, 50%; d, 75%; e, 100%) and average the results over the eight observers. We arrange the 43 images into a series with ascending average male probability. The positions of the 10 misclassified faces in the series are also shown in Fig. 17. It is clear that the two females are in the overlapping area (with male probabilities around 50%), the 4th and 6th males are misclassified as females, and the 1st and 3rd males are rather ambiguous. However, the remaining four males are judged to be male with 100% probability. This result shows that our supervised weighted PGA model is not perfectly consistent with human performance. The main reason is that our gender discriminating model makes use of facial shape information instead of image intensities. In fact, three of the four male images judged to be definitely males all have obvious facial hair which probably played a dominant role in human judgement. If we only consider facial shape, the four male faces all have relatively ‘soft’ facial shapes which tend to be consistent with feminine features. It is interesting to note that the 6th male in the first row, whose gender is ambiguous to human observers, is mis-classified as female using our supervised weighted PGA model. However, this is consistent with the observation that ‘female faces are more like baby-faces than are male faces—they have smaller chins and noses and their eyes appear larger (a consequence of the lesser brow protuberance)’ [4].
2884
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
Fig. 17. Images of mis-clustered faces and comparison with human observations.
7.4. Gender classification on UND data
Classification Error Rates
We also apply our methods to the University of Notre Dame (UND) biometric database [42,43], and compare the results with those reported in existing methods [28,5,7]. The UND biometric database has the advantage that it contains both 2D images and the corresponding range images for each individual. Moreover the scale and the sex ratio of the database (944 images of 275 individuals, of which 383 images of 103 females and 561 images of 172 males) make it suitable for statistical gender classification. A subset of this database which contains 200 2D and corresponding range images for 200 subjects (100 female and 100 male) were used in our previous work [28] to perform gender classification. Here, we use the same subset to apply the proposed methods. The images (both 2D and range) are first geometrically aligned and brightness-normalized in the same way as [28]. Then, we use the needle-maps extracted from the range images to construct the statistical model required in principal geodesic SFS, and apply the SFS method to the 2D images to obtain the recovered facial needle-maps. As shown in [28], the recovered facial needle-maps satisfying data-closeness encode both facial shape and image intensity information and improve the classification results. Therefore, in this experiment, we use the recovered facial needle-maps satisfying data-closeness (rather than satisfying the statistical model) for gender classification. The construction of the models and the gender classification are performed in the same way as performed on range data. The classification results, estimated with 5-fold cross validation, are shown in Fig. 18. The results are similar with those achieved using range data. First, when m r 5, using weighted PGA and supervised PGA achieves better classification accuracy than using standard PGA. Secondly, the supervised weighted PGA model outperforms the alternative three models, no matter how many leading features are used. The best classification accuracy is 92.5% achieved using supervised weighted PGA and the nearest neighbor classifier. It is higher than the accuracy of 88.5% reported in [28] achieved using
0.5 PGA weighted PGA supervised PGA supervised weighted PGA
0.4
0.3
0.2
0.1
0 m=1
m=2
m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters
Fig. 18. Classification error rates on a subset of UND data.
PGA and linear discriminant analysis. This confirms the importance and effectiveness of finding gender discriminating features from facial needle-maps for gender classification. We also apply the supervised weighted PGA method to the full UND data set which contains 944 images of 275 individuals. The images are geometrically aligned according to the centers of the eyes, and are brightness-normalized as before. The facial needlemaps satisfying data-closeness are recovered from the images. The classification results on the recovered needle-maps are estimated with 5-fold cross validation, and are shown in Fig. 19. The best classification accuracy is 96.91% achieved using the leading five feature components and nearest neighbor classifier. In Table 3, we compare the accuracy with the best accuracies reported in [5,7], both of which also performed on UND data. Lu et al. [5] made use of SVMs and a fusion of multimodal
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
0.4 total female male
Classification Error Rates
0.35 0.3 0.25 0.2 0.15 0.1
2885
possible future direction is the construction of a data independent weight map for the weighted PGA method. The bubbles technique of Gosselin and Schyns [35] is a possible solution. However we might encounter the difficulty of how to obtain stimuli revealing facial shape rather than facial texture information. In supervised weighted PGA, a possible line of future work is to explore the use of different error functions, such as the error function deduced from LDA. Another avenue for future investigation is to improve the current SFS technique. In particular, we will explore how to reduce the model dominance, which is caused by satisfying the global shape constraint, and the bias it introduces into gender classification.
0.05 0 m=1
Acknowledgment
m=2
m = 5 m = 10 m = 20 m = 30 m = 50 Number of Parameters
Fig. 19. Classification error rates on full UND data using supervised weighted PGA.
Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award and EU FET project SIMBAD - Similarity Based Pattern Recognition. References
Table 3 Comparison of gender classification accuracy. Fusion of modalities [5] Total 91.0% Female 83.0% Male 95.6%
Fusion of regions [7]
Supervised weighted PGA
94.3% N/A N/A
96.9% 95.5% 97.9%
information (intensity image and range image) for gender classification. Hu et al. [7] also made use of SVMs and a fusion of different facial regions. From Table 3, it is clear that the supervised weighted PGA method, even using the simplest classifier, still outperforms the other two methods in terms of classification accuracy.
8. Conclusions In this paper we emphasize the importance of feature extraction for facial gender classification, and propose three strategies (weighted PGA, supervised weighted PGA, and supervised PGA) to construct gender discriminating models from 2.5D facial needlemaps. There are two main conclusions that can be drawn from our work. First, by incorporating gender relevant weights, the three discriminating models all improve the gender discriminating capacity of the leading eigenmodes. Moreover, the supervised weighted PGA model, whose gender relevant weight map is learned from labeled data, significantly outperforms the weighted PGA model and the supervised PGA model in terms of gender classification performance. Second, by using facial shape information contained in the 2.5D facial needle-map, we achieve 97% gender classification accuracy on range data. This competes with the accuracy achieved using the support faces method, and exceeds those achieved using the Fisher faces or Adaboost methods. We also illustrate effective shape recovery using the principal geodesic SFS on brightness images, and demonstrate the possibility of gender classification on the recovered facial surface normals. Experiments on UND database showed the superiority of the proposed supervised weighted PGA method over some existing methods in terms of gender classification accuracy. Although the weighted PGA model is outperformed by the supervised weighted PGA model, it still has the advantage that the weight map can be independent of the data. Therefore, one
[1] V. Bruce, A. Burton, E. Hanna, P. Healey, O. Mason, A. Coombes, R. Fright, A. Linney, Sex discrimination: how do we tell the difference between male and female faces? Perception 22 (1993) 131–152. [2] A. Burton, V. Bruce, N. Dench, What’s the difference between men and women? Evidence from facial measurement, Perception 22 (1993) 153–176. [3] A. O’Toole, T. Vetter, N. Troje, H. Bulthoff, Sex classification is better with three-dimensional head structure than with image intensity information, Perception 26 (1997) 75–84. [4] V. Bruce, A. Young, In the Eye of the Beholder: The Science of Face Perception, Oxford University Press, Oxford, England, New York, 1998. [5] X. Lu, H. Chen, A. Jain, Multimodal facial gender and ethnicity identification, in: Proceedings of the International Conference on Biometrics2006, pp. 554–561. [6] A. Graf, F. Wichmann, Gender classification of human faces, in: Proceedings of the International Workshop on Biologically Motivated Computer Vision2002, pp. 491–500. [7] Y. Hu, J. Yan, P. Shi, A fusion-based method for 3d facial gender classification, in: Proceedings of the International Conference Computer and Automation Engineering2010, pp. 369–372. [8] B. Golomb, D. Lawrence, T. Sejnowski, Sexnet: a neural network identifies sex from human faces, in: Proceedings of the Advances in Neural Information Processing Systems1991, pp. 572–577. [9] G. Cottrell, J. Metcalfe, Empath: face, emotion, and gender recognition using holons, Proceedings of the Conference on Advances in Neural Information Processing Systems, vol. 3, 1990, pp. 564–571. [10] Z. Sun, G. Bebis, X. Yuan, S. Louis, Genetic feature subset selection for gender classification: a comparison study, in: Proceedings of the IEEE Workshop on Applications of Computer Vision2002, pp. 165–170. [11] S. Buchala, N. Davey, T. Gale, R. Frank, Principal component analysis of gender, ethnicity, age, and identity of face images, in: Proceedings of the IEEE International Conference on Multimodel Interfaces2005. [12] T. Wilhelm, H. Bohme, H. Gross, Classification of face images for gender, age, facial expression, and identity, in: International Conference on Artificial Neural Networks2005, pp. 569–574. [13] Y. Saatci, C. Town, Cascaded classification of gender and facial expression using active appearance models, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition2006, pp. 393–398. [14] B. Moghaddam, M. Yang, Learning gender with support faces, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 707–711. [15] S. Baluja, H. Rowley, G. Inc, Boosting sex identification performance, International Journal of Computer Vision 71 (1) (2007) 111–119. [16] P. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, PrenticeHall, 1982. [17] R. Brunelli, T. Poggio, Hyberbf networks for gender classification, in: Proceedings of the DARPA Image Understanding Workshop1992, pp. 311–314. [18] S. Gutta, H. Weschler, P. Phillips, Gender and ethnic classification of human faces using hybrid classifiers, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition1998, pp. 194–199. [19] H. Kim, D. Kim, Z. Ghahramani, S. Bang, Appearance-based gender classification with Gaussian processes, Pattern Recognition Letters 27 (6) (2006) 618–626. [20] G. Shakhnarovich, P. Viola, B. Moghaddam, A unified learning framework for real time face detection and classification, in: IEEE International Conference on Automatic Face and Gesture Recognition2002, pp. 14–21. [21] B. Wu, H. Ai, C. Huang, Lut-based adaboost for gender classification, in: International Conference on Audio- and Video-Based Biometric Person Authentication2003, pp. 104–110.
2886
J. Wu et al. / Pattern Recognition 44 (2011) 2871–2886
[22] A. Khan, A. Majid, A. Mirza, Combination and optimization of classifiers in gender classification using genetic programming, International Journal of Knowledge-based and Intelligent Engineering Systems 9 (1) (2005) 1–11. [23] A. Majid, A. Khan, A. Mirza, Combination of support vector machines using genetic programming, International Journal of Hybrid Intelligent Systems 3 (2) (2006) 109–125. [24] T. Cootes, G. Edwards, C. Taylor, Active appearance models, Proceedings of the European Conference on Computer Vision, vol. 2, 1998, pp. 484–498. [25] A. Jain, J. Huang, Integrating independent components and linear discriminant analysis for gender classification, in: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition2004, pp. 159–163. [26] S. Buchala, N. Davey, R. Frank, T. Gale, Dimensionality reduction of face images for gender classification, Technical Report 408, Department of Computer Science, The University of Hertfordshire, UK, 2004. [27] J. Wu, W. Smith, E. Hancock, Gender classification using principal geodesic analysis and Gaussian mixture models, in: Proceedings of the Iberoamerican Congress on Pattern Recognition2006, pp. 58–67. [28] J. Wu, W. Smith, E. Hancock, Facial gender classification using shape-fromshading, Image and Vision Computing 28 (6) (2010) 1039–1048. [29] P. Fletcher, S. Joshi, C. Lu, S. Pizer, Principal geodesic analysis for the study of nonlinear statistics of shape, IEEE Transactions on Medical Imaging 23 (8) (2004) 995–1005. [30] W. Smith, E. Hancock, Recovering facial shape using a statistical model of surface normal direction, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (2) (2006) 1914–1930. [31] W. Smith, E. Hancock, Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics, International Journal of Computer Vision 76 (1) (2008) 71–91. [32] J. Wu, W. Smith, E. Hancock, Weighted principal geodesic analysis for facial gender classification, in: Proceedings of the Iberoamerican Congress on Pattern Recognition2007, pp. 331–339. [33] J. Wu, W. Smith, E. Hancock, Supervised principal geodesic analysis on facial surface normals for gender classification, in: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition2008, pp. 664–673. [34] J. Wu, W. Smith, E. Hancock, Gender classification based on facial surface normals, in: International Conference on Pattern Recognition2008, pp. 1–4. [35] F. Gosselin, P. Schyns, Bubbles: a technique to reveal the use of information in recognition tasks, Vision Research 41 (2001) 2261–2271.
[36] Y. Andreu, R. Mollineda, The role of face parts in gender recognition, in: Proceedings of the International Conference Image Analysis and Recognition2008, pp. 945–954. [37] Y. Koren, L. Carmel, Robust linear dimensionality reduction, IEEE Transactions on Visualization and Computer Graphics 10 (4) (2004). [38] X. Pennec, Probabilities and statistics on Riemannian manifolds: a geometric approach, Technical Report RR-5093, INRIA, 2004. [39] L. Sirovich, Turbulence and the dynamics of coherent structures, Quarterly of Applied Mathematics XLV (3) (1987) 561–590. [40] P. Worthington, E. Hancock, New constraints on data-closeness and needle map consistency for shape-from-shading, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (12) (1999) 1250–1267. [41] I. Dhillon, S. Sra, Modeling data using directional distributions, Technical Report, University of Texas, Austin, 2003. [42] P. Flynn, K. Bowyer, P. Phillips, Assessment of time dependency in face recognition: an initial study, in: International Conference on Audio and Video-Based Biometric Person Authentication2003, pp. 44–51. [43] K. Chang, K. Bowyer, P. Flynn, Face recognition using 2d and 3d facial data, in: Proceedings of the ACM Workshop on Multimodal User Authentication2003, pp. 25–32. [44] N. Troje, H. Bulthoff, Face recognition under varying poses: the role of texture and shape, Vision Research 36 (1996) 1761–1771. [45] V. Blanz, T. Vetter, A morphable model for the synthesis of 3d faces in: Proceedings of the SIGGRAPH’99 Conference1999, pp. 187–194. [46] J. Wu, W. Smith, E. Hancock, Learning mixture models for gender classification based on facial surface normals, in: Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Part I2007, pp. 39–46. [47] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [48] R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions, Machine Learning 37 (3) (1999) 297–336 (40). [49] J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting, The Annals of Statistics 38 (2) (2000) 337–374. [50] A. Vezhnevets, V. Vezhnevets, Modest adaboost—teaching adaboost to generalize better, in: Graphicon2005. [51] A. Robles-Kelly, E.R. Hancock, Estimating the surface radiance function from single images, Graphical Models 67 (6) (2005) 518–548. [52] R. Zhang, P. Tsai, J. Cryer, M. Shah, Shape-from-shading: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (8) (1999) 690–706.
Jing Wu received the B.Sc. and M.Eng. degrees in computer science and technology from Nanjing University, China in 2002 and 2005 respectively. She received the Ph.D. degree in computer science from the University of York, UK in January 2010. She currently works as a research associate in Cardiff University. Her research interests include face recognition, gender classification, statistical shape modelling, and shape-from-shading.
William Smith completed B.Sc. and Ph.D. in computer science, both at the University of York, in 2002 and 2007 respectively. Currently, he is a lecturer in the Computer Vision and Pattern Recognition group in the Department of Computer Science at the University of York, and is supervising four research students. His research interests are related to face processing, shape-from-shading and reflectance modelling. He has published more than 50 papers in international journals and conferences.
Edwin Hancock holds a B.Sc. degree in physics (1977), a Ph.D. degree in high-energy physics (1981) and a D.Sc. degree (2008) from the University of Durham. From 1981– 1991 he worked as a researcher in the fields of high-energy nuclear physics and pattern recognition at the Rutherford-Appleton Laboratory (now the Central Research Laboratory of the Research Councils). During this period, he also held adjunct teaching posts at the University of Surrey and the Open University. In 1991, he moved to the University of York as a lecturer in the Department of Computer Science, where he has held a chair in Computer Vision since 1998. He leads a group of some 25 faculty, research staff, and Ph.D. students working in the areas of computer vision and pattern recognition. His main research interests are in the use of optimization and probabilistic methods for high and intermediate level vision. He is also interested in the methodology of structural and statistical pattern recognition. He is currently working on graph matching, shape-from-X, image databases, and statistical learning theory. His work has found applications in areas such as radar terrain analysis, seismic section analysis, remote sensing, and medical imaging. He has published about 135 journal papers and 500 refereed conference publications. He was awarded the Pattern Recognition Society medal in 1991 and an outstanding paper award in 1997 by the journal Pattern Recognition. He has also received best paper prizes at CAIP 2001, ACCV 2002, ICPR 2006, BMVC 2007 and ICIAP 2009. In 2009 he was awarded a Royal Society Wolfson Research Merit Award. In 1998, he became a fellow of the International Association for Pattern Recognition. He is also a fellow of the Institute of Physics, the Institute of Engineering and Technology, and the British Computer Society. He has been a member of the editorial boards of the journals IEEE Transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing. In 2006, he was appointed as the founding editor-in-chief of the IET Computer Vision Journal. He has been conference chair for BMVC 1994, Track Chair for ICPR 2004 and Area Chair for ECCV 2006 and CVPR 2008, and in 1997 he established the EMMCVPR workshop series.