Neurocomputing 117 (2013) 1–11
Contents lists available at SciVerse ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Gabor feature-based face recognition on product gamma manifold via region weighting$ Yue Zhang a,b,n, Chuancai Liu a a b
School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China School of Mathematics and Physics, Anhui Polytechnic University, Wuhu 241000, China
art ic l e i nf o
a b s t r a c t
Article history: Received 19 April 2012 Received in revised form 9 December 2012 Accepted 10 December 2012 Communicated by X. Li Available online 6 April 2013
In this paper, we put forward a novel product statistical manifold framework for face recognition based on the use of Gabor features. The collection of multi-region multi-channel Gabor magnitude sets is characterized as a point on a product Gamma manifold by generative modeling and maximum likelihood estimation (MLE)-based product embedding. Although intrinsic analysis on statistical manifolds seems to be a conventional approach, the development of computational information geometry involving intrinsic tools is still somewhat lagging. For this reason, we focus on extrinsic tools and introduce an immersion of product Gamma manifolds to facilitate incorporating the method of dual-space linear discrimnant analysis (DLDA) into our recognition system. With the learned region-adaptive distance metrics and weights, we can integrate regional discriminative information in product magnitude-generating model matching. Experimental results on FERET and CMU-PIE databases show that the performance of the proposed method is competitive. & 2013 Elsevier B.V. All rights reserved.
Keywords: Face recognition Gamma manifold Extrinsic distance Gabor features Dual-space linear discriminant analysis (DLDA)
1. Introduction Information geometry aims at studying differential geometry structures of statistical manifold, or a manifold of probability density functions (PDFs). Among existing differential structures, the Riemannian–Fisher structure is the most widely used form for parametric families for analyses and applications. As the most natural information measurement, geodesic distance plays a key role in processing PDFs on statistical manifold, including divergence quantization, dimensionality reduction, clustering, interpolation and extrapolation [1]. These promote practical applications of distribution-modeled data or feature sets in terms of the embedding principle [2]. Nowadays, information geometry has been applied into many fields, such as control theory, neural networks, machine learning and molecular biology [2–6]. However, following reasons indicate the difficulties using parametric families' information geometries in image object recognition: First, the absence of explicit parametric knowledge makes it hard
n Corresponding author at: School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China. Tel.: +86 18 605690586. E-mail address:
[email protected] (Y. Zhang). ☆ This work is supported by National Natural Science Foundation of China (Grant no. 6063050, 90820004, 71171003), Ministry of Industry and Information Technology of China (Grant no. E0310/1112/JC01) and Anhi Natural Science Foundation (Grant no. KJ2011B022).
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.12.053
to model raw pixel features or transformed features in a parameterized way for any image. Second, modeling feature sets from an entire image in distributions leads to the inadequate utilization of spatial information in original images or transformed images. Although region partition can be viewed as a remedy, it must be supported by an efficient information integration system. Third, geodesic distance calculation on a parametric manifold with complex information structures involves higher complexity. These causes make it hard to use intrinsic analytical tools on manifolds in designing a computationally efficient algorithm. Therefore, one can resort to extrinsic tools rather than intrinsic tools to quantize information divergence between models on a target manifold [7]. In fact, the computation of extrinsic statistics on a statistical manifold is simpler than the intrinsic counterparts. More importantly, the derived extrinsic statistics are usually adaptive to Euclidean geometry, saving a lot of trouble in generalizing Euclidean metric-based recognition or learning algorithms to a parametric statistical manifold. Gabor filters own desirable characteristics of spatial localization and orientation selectivity, and Gabor filtered images can robustly boycott influence factors such as illumination, scale, pose and deformation to certain extent. Hence, Gabor feature-based object recognition and texture retrieval have been broadly studied, and many methods have been presented in recent years [8–13]. Among them, non-parametric and parametric model-based methods have currently attracted considerable attention. The advantage of modeling data or feature sets with models on a parametric
2
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
statistical manifold is that it not only gives compact representations for massive data sets, but also provides geometrical analysis tools for various applications. In [11], the authors employed Gamma densities and generalized Gaussian densities to approximate underlying Gabor magnitude distributions and Gabor phase distributions respectively. They designed the face recognition algorithm using null space discriminant analysis (NLDA) to extract features on the set of concatenated regional models' parameters obtained by MLE. Although the method gains better performance, the invariant geometric structures of considered models are ignored. The advantages of non-parametric histogram descriptor lie in its robustness and the good representation of samples. Mio et al. [12] treated the histograms of wavelet responses as discretized realizations of PDFs, which are assumed lying on a non-parametric statistical manifold. Based on multinomial geometry, the method achieves recognition and segmentation tasks by matching product histogram-represented multi-spectral texture patterns of filtered images. Taking information theory as the analysis tool, Minh et al. [13] fitted marginal distributions of Gabor wavelet coefficients with generalized Gaussian densities for texture retrieval. In their method, the estimated model parameters are used for texture representation, while Kullback-Leibler (KL-) divergence is used to quantize dissimilarities of textures. However, the latter two methods neglect class label information. In this work, we present a novel face recognition approach by Gabor magnitude-generating model matching on an extended MLE-embedded product Gamma submanifold. The selection of Gamma distributions in magnitude-generating modeling has been justified by some empirical studies [11,14], as well as the conclusion that every neighborhood of a random process on real line has a neighborhood of processes that are represented by Gamma distributions, in a rather precise sense [15]. The discrimination of channel magnitude distributions comes from the finding that marginal distributions of wavelet responses are sufficient to characterize homogenous texture [16]. The partition of an entire image is motivated by the fact the features of a local regional image are usually more robust than that of an entire image [17]. Different regions of an image carry unequal discriminative information. This explains the necessity to quantize regional weights to factorize dissimilarity measurements on corresponding regional MLE-embedded product Gamma submanifolds in product model matching. For each face image, our method, to begin with, divides the image into some non-overlapping rectangular regions. The entire image and all its sub-images are convoluted with a bank of Gabor wavelets in turn. Then, we model each channel magnitude set of a regional face to be generated by an underlying Gamma distribution in probability. Each channel magnitude set of each regional face is embedded into 2-dimensional Gamma manifold via the MLE-based embedding. After that under the regional MLE-based product embedding, the collection of multi-channel magnitude sets of each regional face links to a point on corresponding regional MLE-embedded product Gamma submanifold. Further, the face images of a recognition task are represented as points on the extended MLE-embedded product Gamma submanifold via the extended MLE-based product embedding. Clearly, this kind of face representation can greatly decrease memory cost, while making it possible to use proper analytical tools on Gamma manifolds to design a practicable model-based matching algorithm. As the core technical element of our method, the construction of region-adaptive matching distance metric and weight on each regional MLE-embedded product Gamma submanifold involves the choice of intrinsic or extrinsic geometrical structures of a product Gamma manifold. Taking into account computational practicability, we design an immersion of a product Gamma manifold for using extrinsic tools on regional MLE-embedded
product Gamma submanifolds. The use of extrinsic tools allows us to combine the class label information of training images and multi-channel magnitude distribution information in a Euclidean metric-based supervised learning system. In our work, DLDA [18] serves as the learner on each regional immersed product submanifold to produce a lower-dimensional Euclidean embedding. We equip each regional MLE-embedded product Gamma submanifold with the distance metric derived from corresponding DLDAembedded Euclidean submanifold. With the derived regional distance metrics, regional weights are further learned based on Fisher separation criterion [19]. Thus the weighted direct sum metric is built for face recognition by means of the nearest product magnitude-generating models matching. The rest of this paper is organized as follows. Section 2 describes a novel face representation and MLE-based product embeddings. Section 3 details the proposed face recognition approach. The experimental results are reported in Section 4. We draw the conclusions in the final section.
2. Gabor-based face representation on product Gamma manifold In this section, we design a novel face representation with multiregion and multi-channel Gabor magnitude-generating models that are the points lying on the extended MLE-embedded product Gamma submanifold. 2.1. Product Gamma manifold and MLE-based product embedding Let Θ be the set of Gamma distributions on Ω ¼ Rþ , that is, α α−1 α x e−ðα=γÞx ; ðα; γÞ∈Rþ Rþ ; Θ ¼ Gðα; γÞjGðα; γÞ ¼ ð1Þ γ ΓðαÞ R ∞ z−1 −t where ΓðÞ is the Gamma function, i.e. ΓðzÞ ¼ 0 t e dt, z 4 0. The set of Θ is called Gamma 2-manifold where ðα; γÞ plays the role of a coordinate system [6]. A n-fold product Gamma manifold Θn is a product of n copies of factor manifold Θ. Let Ω ¼ Rþ be a data space, and Ωn be a product of n copies of factor Ω. Each element X∈Ωn is a sample realization constituted by observed data x1 ; x2 ; …; xn drawn from common Gamma distribution Gðα; γÞ with unknown parameters α and γ. Let Ψ ðÞ be the digamma function, i.e. Ψ ðzÞ ¼ Γ′ðzÞ=ΓðzÞ. In addition, x ¼ 1= ^ The mapping n∑nj¼ 1 xj , x^ ¼ ð∏nj¼ 1 xj Þ1=n , and y ¼ lnðx=xÞ. ϕ : Ωn -Θ;
X↦ðα; γÞ
ð2Þ
is called the MLE-based embedding of Ω onto Gamma manifold Θ if γ and α are obtained by a MLE [20]. On the MLE implementation, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi α is derived by an iteration process, α0 ¼ ð1 þ 1 þ 4y=3Þ=4y and αj ¼ αj−1 ðlnðαj−1 Þ−Ψ ðαj−1 ÞÞ=y, j⩾1, and γ ¼ x. Although most of data spaces are non-Euclidean, the embedding principle indicates that the MLE-based embedding is an important approach to endow a collection of data sets with the dissimilarity measurement derived from their probabilistic generative model space [21]. Let X ¼ fX 1 ; …; X N g be a collection of data sets. Each data set X l in X is an element of Ωn and a sample realization of Gðαl ; γ l Þ where αl and γ l are unknown. If ϕl ; l ¼ 1; …; N are all MLE-based embeddings defined by Eq. (2), the MLE-based product embedding is given by n
N
∏ ϕl : Ωn ⋯ Ωn -ΘN ; |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl} l¼1 N
N
X ↦ ∏ ϕl ðX l Þ;
ð3Þ
l¼1
where ∏N l ¼ 1 ϕl ðX l Þ ¼ ðϕ1 ðX 1 Þ; …; ϕN ðX N ÞÞ and ϕl ðX l Þ ¼ ðαl ; γ l Þ, l ¼ 1; …; N. Thus X , the collection of data sets is embedded onto product Gamma manifold ΘN . Namely, X can be viewed as a point
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11 N ∏N l ¼ 1 ðαl ; γ l Þ ¼ ðα1 ; γ 1 ; …; αN ; γ N Þ on product Gamma manifold Θ via the MLE-based product embedding ∏N ϕ . l¼1 l
2.2. Multi-channel Gabor magnitude sets and regional subdivision ! A 2D-Gabor wavelet located at position z ¼ ðx; yÞ can be defined as follows [22]: ! ∥K u;v ∥2 −∥K u;v ∥2 ∥2 z ∥2 −iK u;v ! ! z −e−s2 =2 ; 2s e ½e ψ u;v ð z Þ ¼ s2
ð4Þ
Hence for each regional face, it is reasonable to assume that each channel of Gabor magnitude set is a sampling realization of an unknown Gamma model. In this way, any channel of Gabor magnitude set can be characterized as a point on Gamma 2manifold Θ via MLE-based embedding ϕ. On this basis, we present two types of MLE-based product embeddings: the regional MLEbased product embedding and the extended MLE-based product embedding. Let I i be a regional face of face image I in image set E. The regional MLE-based product embedding is given by UV
where uð ¼ 0; 1; …; UÞ and vð ¼ 0; 1; …; VÞ define the orientation v and scale of Gabor wavelets, respectively. ku;v ¼ kv eiϕu , kv ¼ kmax =f , ϕu ¼ πu=U. Here, kmax is the maximum frequency, and f is the spacing factor between wavelets in the frequency domain. The total number of orientations is U and V is the total number of scales. When we take a Gabor wavelet at a special orientation and scale as a channel, a bank of Gabor wavelets fψ u;v ðÞ : ðu; vÞ∈ f0; …; U−1g f0; …; V−1gg forms a set of parallel and quasiindependent channels. ! Let I be a facial image of size X Y, and Ou;v ð z Þ is the (u; v)-channel convolution output of image I and the Gabor ! wavelet ψ u;v ðÞ at pixel location z . The (u; v)-channel magnitude ! ! response at pixel location z is marked as M u;v ð z Þ, the (u; v)-channel Gabor magnitude set of image I is then formulated as ! ! M u;v ðIÞ ¼ fM u;v ð z Þ : z ¼ ðx; yÞ∈f1; …; Xg f1; …; Ygg:
Φi ¼ ∏ ϕl : MiU;V ðEÞ-ΘUV ; l¼1
U
V
MiU;V ðIÞ⟼ ∏ ∏ ðαiu;v ; γ iu;v Þ; u¼1v¼1
i ¼ 0; 1; …; K:
ð9Þ
And the extended MLE-based product embedding is defined as Φ¼
UVðKþ1Þ
∏
l¼1
ϕl : MKU;V ðEÞ-ΘUVðKþ1Þ ;
K
U
V
MKU;V ðIÞ⟼ ∏ ∏ ∏ ðαiu;v ; γ iu;v Þ: i¼0u¼1v¼1
ð10Þ ðMiU;V ðEÞÞ UV
the ith regional MLE-embedded product We call Φ submanifold of Θ . And ΦðMKU;V ðEÞÞ is called the extended MLE-embedded product submanifold of ΘUVðKþ1Þ . Thus a regional face I i can be represented as a point on Φi ðMiU;V ðEÞÞ and a face image I can be represented as a point on ΦðMKU;V ðEÞÞ. i
ð5Þ
Consequently, the U V channels of Gabor magnitude sets of image I are M u;v ðIÞ; ðu; vÞ∈f0; …; U−1g f0; …; V−1g. Inspired by the fact that both holistic and local cues are crucial for face recognition, we divided an image I into K non-overlapping sub-regions. For the entire image of I denoted as I 0 here, we express the collection of U V channels of Gabor magnitude sets as M0U;V ðIÞ ¼ fM u;v ðI 0 Þ : ðu; vÞ∈f0; …; U−1g f0; …; V−1gg:
MiU;V ðIÞ ¼ fM u;v ðI i Þ : ðu; vÞ∈f0; …; U−1g f0; …; V−1gg:
3. Magnitude-generating model matching on product Gamma manifold According to the designed face representation, the principle of our recognition algorithm is overall similarity matching between generative models of collections of multi-region multi-channel Gabor magnitude sets on the extended MLE-embedded product Gamma submanifold.
ð6Þ
Similarly, the collection of U V channels of Gabor magnitude sets corresponding to the ith (i ¼ 1; …; K) sub-image I i of I is denoted as ð7Þ
Combining Eqs. (6) and (7), we can use the form MiU;V ðIÞ, i ¼ 0; 1; …; K to express the collections of U V channels of Gabor magnitude sets over K+1 regions of image I. Further, we write the collection of U V ðK þ 1Þ magnitude sets of image I corresponding to K+1 regions and U V filter channels as MKU;V ðIÞ ¼ fM u;v ðI i Þ : ðu; vÞ∈f0; …; U−1g f0; …; V−1g;
3
i ¼ 0; 1; …; Kg:
ð8Þ MiU;V ðEÞ
For the image set E of a face recognition task, we use to denote the ensemble of collections of U V channels of magnitude sets from the i-th (i ¼ 0; 1; …; K) regions of all face images in E. And MKU;V ðEÞ is the ensemble of collections of (K+1)-region U Vchannel magnitude sets of all face images in E. 2.3. A novel Gabor texture-based face representation In the field of object recognition, many versions of Gabor texture representations have been recently proposed to deal with the challenge of the high dimensionality of global Gabor features [8,10–12]. In our work, we use the region-channel Gabor magnitude distributions to design a face representation that ensures modeling effectiveness and computational efficiency. It is known that magnitude distribution of each sub-region is nearly consistent with the magnitude distribution of entire region.
3.1. Product immersion and extrinsic distance on product Gamma manifold For Gamma 2-manifold Θ with coordinate system ðα; γÞ, we use the immersion given by τ : Θ-R3 ;
ðα; γÞ↦ðα; β; ηÞ;
ð11Þ 3
to immerse it into 3-dimensional Euclidean space R , where β ¼ α=γ and η ¼ logðΓðβÞÞ−β log α. The 3-dimensional Euclidean space R3 is called the immersion space of Gamma manifold Θ or the ambient space of immersed submanifold τðΘÞ. For more details of the immersion, see Appendix A. If τl ; l ¼ 1; …; N are all immersions defined by Eq. (11), the mapping N
∏ τl : ΘN -R3N ;
l¼1
N
N
l¼1
l¼1
∏ ðαl ; γ l Þ↦ ∏ ðαl ; βl ; ηl Þ
ð12Þ
is a product immersion. Accordingly, any submanifold of product Gamma manifold ΘN can be immersed into R3N via product 3N immersion ∏N is the immersion space of product l ¼ 1 τl , and R Gamma manifold ΘN . Similar to the MLE-based product embedding, we develop two product immersions: the regional product immersion and the extended product immersion. The regional product immersion is defined as UV
Λi ¼ ∏ τl : Φi ðMiU;V ðEÞÞ-R3UV ; l¼1
U
V
∏ ∏ ðαiu;v ; βiu;v ; ηiu;v Þ;
u¼1v¼1
U
V
∏ ∏ ðαiu;v ; γ iu;v Þ⟼
u¼1v¼1
i ¼ 0; 1; …; K:
ð13Þ
4
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
And the extended product immersion is defined as Λ¼
UVðKþ1Þ
∏
l¼1 K
τl : ΦðMKU;V ðEÞÞ-R3UV ðKþ1Þ ; U
K
U
V
∏ ∏ ∏ ðαiu;v ; γ iu;v Þ
i¼0u¼1v¼1
V
⟼ ∏ ∏ ∏ ðαiu;v ; βiu;v ; ηiu;v Þ:
ð14Þ
i¼0u¼1v¼1
We call Λi ðΦi ðMiU;V ðEÞÞÞ as the ith regional immersed product submanifold of R3UV and ΛðΦðMKU;V ðEÞÞÞ as the extended immersed product submanifold of R3UV ðKþ1Þ . In Fig. 1, we visualize the MLE-based embedding and the immersion. From FERET face database, we select 14 images of two subjects which are separately shown in Fig. 1 (a) and (b). For (0,0)-channel Gabor magnitude sets from entire regions of chosen face images, corresponding points on Gamma manifold are shown in Fig. 1 (c), and corresponding points in the immersion space are shown in Fig. 1 (d). As illustrated in Fig. 1, the immersed points representing sample images of two subjects form two compact clusters well-separated from each other. The result confirms that Gamma models can reflect the discriminations of Gabor magnitudes in distributions, and the proposed immersion is neighbor structure-preserving mapping for underlying models. The visualization for immersions corresponding to (1,2)-channel Gabor
magnitude sets from 16 different sub-regions of an image is shown in Fig. 2. As shown in Fig. 2 (a), a face image selected from PIE database is divided into 16 sub-regions. Fig. 2 (b) depicts (1,2)channel Gabor magnitude sets from 16 different sub-regions of the face image. From Fig. 2 (c), we can see that 16 immersed points corresponding to 16 (1,2)-channel magnitude distributions scatter in the immersion space of Gamma manifold, which manifests that regional magnitude distributions are differentiated each other in practise. We define an extrinsic distance on Gamma manifold Θ as the Euclidean distance on immersion space R3 . Similarly, the Euclidean distance on immersion space R3N is called an extrinsic distance on product Gamma manifold ΘN . Hence, the extrinsic distance between two different points p (∏N j ¼ 1 ðαj ; γ j Þ) and q ~ j ; γ~ j Þ) on product Gamma manifold ΘN is computed by (∏N j ¼ 1 ðα ( ρðp; qÞ ¼
)1=2
N
∑ ½ðαj −α~ j Þ þ ðβj −β~ j Þ2 þ ðηj −~η j Þ2 2
j¼1
;
ð15Þ
where βj ¼ αj =γ j , ηj ¼ log Γðβj Þ−βj log αj , and β~ j ¼ α~ j =~γ j , η~ j ¼ log Γðβ~ j Þ −β~ j log α~ j . It is clear that the calculation of extrinsic distance ρ is much simpler than the calculation of geodesic distance on product Gamma manifolds.
12 11.5 11 10.5 γ
10 9.5 9
Subject 1 Subject2
8.5 8
7.5
0.65 0.7 0.75 0.8 0.85 0.9 0.95 α
1
1.05 1.1
Fig. 1. Visualization of the MLE-based embedding and the immersion for a channel Gabor magnitude set from an entire region. (a) Seven images of Subject 1. (b) Seven images of Subject 2. (c) Embedded points of (0,0)-channel magnitude distributions of entire regions of 14 face images. (d) Immersed points corresponding to (0,0)-channel magnitude sets of entire regions of 14 face images.
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
5
Fig. 2. Visualization of the immersion corresponding to a channel Gabor magnitude distribution from a sub-region. (a) The partitioned face image. (b) The (1,2)-channel Gabor magnitude features from 16 sub-regions. (c) Immersed points corresponding to (1,2)-channel Gabor magnitude distributions from 16 sub-regions.
3.2. Sample extrinsic mean and variation on product Gamma manifold Assume that we have a distribution Q on product Gamma manifold ΘN with the extrinsic distance measure ρ. Let points p1 ; …; pn on ΘN be independent identical distribution observations from distribution Q and Q^ n be the corresponding empirical distribution estimated from these observations. Now we define the Fréchet function [23] of Q^ n , F : ΘN -R as follows: n
FðqÞ ¼ ∑ ½ρðq; pi Þ2 Q^ n ðpi Þ; i¼1
q∈ΘN :
ð16Þ
The extrinsic mean set of Q^ n on ΘN is defined as the set of all q for which FðqÞ is the minimum value of F. If there is only a point in the set, the point is called the sample extrinsic mean. The infimum of F on ΘN is called the sample extrinsic variation. Suppose that q has coordinate ∏N j ¼ 1 ðαj ; γ j Þ and pi ; i ¼ 1; …; n ~ ij ; γ~ ij Þ, i ¼ 1; …; n. Let βj ¼ αj = have separately coordinates ∏N j ¼ 1 ðα γ j ; ηj ¼ log Γðβj Þ−βj log αj ,
and
i i i β~ j ¼ α~ ij =~γ ij ; η~ ij ¼ log Γðβ~ j Þ−β~ j log α~ ij ,
i ¼ 1; …; n; j ¼ 1; …; N. So FðqÞ can be calculated by the following equation: " # n N i FðqÞ ¼ ∑ Q^ n ðpi Þ ∑ ðαj −α~ ij Þ2 þ ðβj −β~ j Þ2 þ ðηj −~η ij Þ2 : ð17Þ i¼1
j¼1
In order to obtain all q for which FðqÞ is minimum, we now turn to the following optimization problems: n
min ∑ Q^ n ðpi Þðαj −α~ ij Þ2 ;
j ¼ 1; …; N;
i min ∑ Q^ n ðpi Þðβj −β~ j Þ2 ;
j ¼ 1; …; N;
min ∑ Q^ n ðpi Þðηj −~η ij Þ2 ;
j ¼ 1; …; N:
αj
i¼1 n
βj
i¼1 n
ηj
i¼1
ð18Þ
Given jð ¼ 1; …; NÞ, we obtain
" # n n n ∑ Q^ n ðpi Þðαj −α~ ij Þ2 ¼ ∑ Q^ n ðpi Þα2j −2 ∑ Q^ n ðpi Þα~ ij αj
i¼1
i¼1 n
þ ∑ Q^ n ðpi Þðα~ ij Þ2 :
i¼1
ð19Þ
i¼1
Recognizing that ∑ni¼ 1 Q^ n ðpi Þ ¼ 1, we obtain αj ( ¼ ∑ni¼ 1 Q^ n ðpi Þα~ ij ) that minimizes ∑ni¼ 1 Q^ n ðpi Þðαj −α~ ij Þ2 . Further, if Q^ n is uniform, the Fréchet mean αj ð ¼ 1=n∑ni¼ 1 α~ ij ÞÞ is the same as the arithmetic mean. Based on the fact, we can conclude that the sample extrinsic
mean and variation on a product Gamma manifold are the same as the ordinary Euclidean mean and variation on its immersed space when the distribution Q is uniform. There are always the same number of face images of each subject in a face database E. For a face recognition task, in general, we select the same number of training samples for each subject to build a training set Bð⊂EÞ. Hence under product embedding Φi , the images of elements in MiU;V ðBÞ are uniformly distributed on Φi ðMiU;V ðEÞÞ. Thus, sample extrinsic means on Φi ðMiU;V ðEÞÞ expressing class centers are just arithmetic means on Λi ðΦi ðMiU;V ðEÞÞÞ. 3.3. Model matching on extended embedded product Gamma submanifold Now, a core concern is how to endow each regional MLEembedded product submanifold with a label information-containing distance metric that can well reflect the neighbor structures of embedded data points. Meanwhile, a label information-containing weight should be quantized to factorize the discriminative level of magnitude distributions of a regional image. The introduction of extrinsic tools on product Gamma manifolds makes it possible for our recognition algorithm to capture geometry structures of magnitudegenerating models and to utilize the label information of training samples simultaneously. In the last decades, representative feature extraction-based face recognition approaches [18,24–30] have been proposed. Among them, linear discriminant analysis (LDA)-based methods are popular due to the use of class label information and simple computation. Compared with other existing versions of LDA, the algorithm DLDA is more stable, and it takes full advantage of all the discriminative information in both the principle and null subspaces of the within-class scatter matrix. For face database E, Φi ðMiU;V ðEÞÞ and Λi ðΦi ðMiU;V ðEÞÞÞ are the ith (i ¼ 0; 1; …; K) regional MLE-embedded product submanifold and corresponding regional immersed product submanifold respectively. Given a training set Bð⊂EÞ, we conduct DLDA on Λi ðΦi ðMiU;V ðEÞÞÞ to achieve the optimal projection matrix ξi ¼ ½ξip ; ξic , where ξip is a ð3UVÞ T ip matrix, and ξic is a ð3UVÞ T ic matrix. And T i ¼ T ip þ T ic ð ≤3UVÞ is the dimensionality of projection space. Then the regional DLDA-embedding W i ¼ ðW ip ; W ic Þ can be written as i
W ip : Λi ðΦi ðMiU;V ðEÞÞÞ-RT p ; i ¼ 0; 1; …; K;
U
V
∏ ∏ ðαiu;v ; βiu;v ; ηiu;v Þ↦ðxi1 ; …; xiT i Þ;
u¼1v¼1
p
6
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
i
W ic : Λi ðΦi ðMiU;V ðEÞÞÞ-RT c ;
U
i
V
∏ ∏ ðαiu;v ; βiu;v ; ηiu;v Þ↦ðyi1 ; …; yiT i Þ; c
u¼1v¼1
i ¼ 0; 1; …; K; where
ð20Þ
K
i
Dist ¼ ∑ f ρi
ð24Þ
i¼0
ðxi1 ; …; xiT i Þ ¼ ∏Uu ¼ 1 ∏Vv ¼ 1 ðαiu;v ; βiu;v ; ηiu;v Þξip , p
and ðyi1 ; …; yiT i Þ ¼ c
∏Uu ¼ 1 ∏Vv ¼ 1 ðαiu;v ; βiu;v ; ηiu;v Þξic . And under regional DLDA-embedding W i , W i ðΛi ðΦi ðMiU;V ðEÞÞÞÞ is the ith regional DLDA-embedded Euclii
dean submanifold of RT . Let L be the number of face class in face database E. From the i th (i ¼ 0; 1; …; K) regional faces of a training sample set of E, we can obtain the learned regional DLDA-embedding W i and sample extrinsic means on Φi ðMiU;V ðEÞÞ, pil , l ¼ 1; …; L, which express class centers of corresponding regional faces. Given a test image I, for its i regional face I i , h (∏Uu ¼ 1 ∏Vv ¼ 1 ðαiu;v ; γ iu;v Þ) is the corresponding representation on Φi ðMiU;V ðEÞÞ. For training sample image I~ in face database E, qi (∏Uu ¼ 1 ∏Vv ¼ 1 ðα~ iu;v ; γ~ iu;v Þ) is the representation of its regional face I~i . Hence, the regional matching metric on Φi ðMiU;V ðEÞÞ between I i and I~ i is given by U V U V i i i i i ρi ðh ; qi Þ ¼ −W ip Λi ∏ ∏ ðα~ iu;v ; γ~ iu;v Þ W p Λ ∏ ∏ ðαu;v ; γ u;v Þ u¼1v¼1
u¼1v¼1
U V U V i i i i þ1=ϵi −W ic Λi ∏ ∏ ðα~ iu;v ; γ~ iu;v Þ W c Λ ∏ ∏ ðαu;v ; γ u;v Þ u¼1v¼1
u¼1v¼1
U V U V i i i i i i ~ i ; η~ i Þ ~ ¼ W ∏ ∏ ðα ; β ; η Þ −W ∏ ∏ ð α ; β u;v u;v u;v u;v p u;v u;v p u¼1v¼1
U V U V i þ1=ϵ W ic ∏ ∏ ðαiu;v ; βiu;v ; ηiu;v Þ −W ic ∏ ∏ ðα~ iu;v ; β~ u;v ; η~ iu;v Þ u¼1v¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u i u i u Tp u Tc i 2 t it i ~ ∑ ðyij −y~ ij Þ2 : ¼ ∑ ðxj −x j Þ þ 1=ϵ j¼1
u¼1v¼1
ð21Þ
j¼1
where 1=ϵi is the average eigenvalue in complementary subspace of principle subspace of the within-class scatter matrix, and ∥ ∥ denotes the Euclidean norm. Moreover if the sample extrinsic ~ i;l ~ i;l Þ, the regional mean pil has coordinate ∏Uu ¼ 1 ∏Vv ¼ 1 ðα~ i;l u;v ; β u;v ; η u;v matching metric on Φi ðMiU;V ðEÞÞ between I i and the lth class center of corresponding regional faces can be computed by U V U V i i i i i ~ i;l ~ i;l Þ ρi ðh ; pil Þ ¼ −W ip ∏ ∏ ðα~ i;l u;v ; β u;v ; η u;v W p Λ ∏ ∏ ðαu;v ; γ u;v Þ u¼1v¼1
u¼1v¼1
U V U V i i i i ~ i;l ~ i;l Þ þ1=ϵi −W ic ∏ ∏ ðα~ i;l u;v ; β u;v ; η u;v W c Λ ∏ ∏ ðαu;v ; γ u;v Þ u¼1v¼1
u¼1v¼1
U V U V i i i i i i;l ~ i;l i;l ¼ W p ∏ ∏ ðαu;v ; βu;v ; ηu;v Þ −W p ∏ ∏ ðα~ u;v ; β u;v ; η~ u;v Þ u¼1v¼1
u¼1v¼1
U V U V i i i i i i;l ~ i;l i;l þ1=ϵi W c ∏ ∏ ðαu;v ; βu;v ; ηu;v Þ −W c ∏ ∏ ðα~ u;v ; β u;v ; η~ u;v Þ u¼1v¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u i u i u Tp u Tc i;l 2 t 2 i i ¼ ∑ ðxj −x~ j Þ þ 1=ϵ t ∑ ðyij −y~ i;l j Þ : j¼1
ðsiw Þ2 þ ðsib Þ2
ð22Þ
j¼1
;
4. Experimental results In order to demonstrate the performance of proposed method for face recognition, we carry out verification experiments on two face databases including FERET1 and CMU-PIE.2 In FERET face database, we execute a series of experiments with respect to algorithm setting including Chi-square goodness-of-fit tests, selections of the proper partition and the immersion scheme and regional weight distribution. With the empirically optimal partition and immersion schemes, we conduct comparison experiments on these two databases to evaluate the recognition performance of our method. 4.1. Face databases and experimental setup
i ¼ 0; 1; …; K:
The FERET database was sponsored by the U.S. Department of Defense and is one of standard databases used in testing and in evaluating face recognition algorithms. Here, we choose a subset of the FERET database. This subset includes 1400 images of 200 subjects (each subject containing seven images). Seven images of each individual consist of 3 front images with varied illuminations, facial expressions and time (separately marked ‘ba’, ‘bj’, ‘bk’), and four profile images with 7151 and 7 251 pose (separately marked ‘bd’, ‘be’, ‘bf’, ‘bg’) [27]. The CMU-PIE database contains 68 subjects with 41,368 face images under 4 different expressions, 13 different poses and 43 different illumination conditions. In our comparison experiments, we select a subset (C05) consisting of 3332 images of 68 subjects. In our experiments, the facial portion of each original image is cropped to a size of 64 64 based on the location of eyes. Samples of a person in FERET database are shown in Fig. 3. Face images of some people in the PIE database are shown in Fig. 4. For the use of Gabor wavelets of related methods in following experiments, we select the bank of wavelets with the pparameters: u∈f0; 1; ffiffiffi …; 7gðU ¼ 8Þ; v∈f0; 1; …; 4gðV ¼ 5Þ; s ¼ 2π; f ¼ 2, and kmax ¼ π. In addition, the size of the wavelet window is set to 32 32 pixels for a 64 64 face image. 4.2. Goodness-of-fit test
u¼1v¼1
Once we learned the matching distance ρi on the regional MLEembedded product submanifold Φi ðMiU;V ðEÞÞ, the within-class mean and variance ðmiw ; siw Þ and between-class mean and variance ðmib ; sib Þ can be computed. Their calculation is detailed in Appendix B. According to the Fisher separation criterion, region weights are automatically calculated by ðmiw −mib Þ2
on the extended MLE-embedded product submanifold ΦðMKU;V ðEÞÞ, and ρi is the matching distance on the ith regional MLE-embedded product submanifold Φi ðMiU;V ðEÞÞ. For a test image, we can match its representation to that of all training samples and class centers on the extended MLE-embedded product submanifold using the metric Dist, and the image take the class label that minimizes it.
u¼1v¼1
i
ωi ¼
Let f ¼ 1=ωi ði ¼ 0; 1; …; KÞ, where 1=ωi ðωi 4 0Þ is computed by Eq. (23). Then we can build the matching metric
ð23Þ
In this section, we use Chi-square goodness-of-fit tests to verify that single channel Gabor magnitudes of entire images and subimages are all consistent with Gamma distributions. Twelve face images are randomly selected from FERET database, while eight sub-images are taken from the first selected face image. For each of chosen images, we select a Gabor magnitude set, and let 20 Gabor magnitude sets be from different filter channels. Given the level of significance of 0.05, Chi-square goodness-of-fit tests are adopted to address how well the estimated Gamma distributions 1 2
Available at: http://www.itl.nist.gov/iad/humanid/feret/. Available at: http://www.ri.cmu.edu/projects/project_418.html.
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
7
Fig. 3. Images of one subject in FERET database.
Fig. 4. Images of some people from a subset C05 of PIE database.
Table 1 Chi-squared goodness-of-fit test for underlying Gamma models. Face images
(u; v)-Channel
Triple
χ 2STAT
p-Value
(0,0) (0,3) (1,1) (1,2) (2,1) (2,2) (2,4) (3,4) (4,0) (5,1) (5,3) (7,4) (0,2) (1,0) (2,0) (3,2) (4,3) (5,4) (6,4) (7,0)
[0.25,0.0035,0.95] [0.25,0.005,0.95] [0.25,0.004,0.95] [0.25,0.006,0.95] [0.25,0.005,0.95] [0.25,0.008,0.95] [0.25,0.003,0.95] [0.25,0.003,0.95] [0.25,0.02,0.95] [0.25,0.007,0.95] [0.25,0.004,0.95] [0.25,0.003,0.95] [0.25,0.06,0.8] [0.25,0.04,0.8] [0.25,0.081,0.8] [0.25,0.03,0.8] [0.25,0.04,0.8] [0.25,0.03,0.8] [0.25,0.04,0.8] [0.25,0.04,0.8]
217.3 163.61 182.910 115.390 153.950 94.138 256.190 252.280 43.943 119.040 194.890 222.840 8.615 15.276 14.727 24.394 21.991 28.788 20.482 15.499
0.2389 0.1115 0.3974 0.5924 0.2586 0.3789 0.1763 0.2268 0.2477 0.1319 0.1854 0.7276 0.7552 0.5509 0.0885 0.3002 0.1448 0.1147 0.2141 0.5353
fit to actual magnitude distributions. For a Gabor magnitude set, we use an ordered triple to determine the boundaries of each class grouping for binning the data. In the triple, the first element is the first cumulative probability value while the last one is the last cumulative probability value. The second element is the step of any two adjacent cumulative probability values from the first one to the last one. Thus, we achieve class boundaries by quantiles determined by cumulative probabilities. For example, a triple [0.25, 0.05, 0.95] corresponds to class boundaries represented by 0.25-quantile, 0.3-quantile, 0.35-quantile, … , 0.9-quantile, 0.95quantile. Table 1 shows 20 face images, corresponding channels and class boundary-determining triples, Chi-squared test statistic values and p-values. From Table 1, there is no evidence, at the 5% significance level, to suggest that a Gamma distribution is not appropriate. 4.3. Partition scheme selection In order to capture image spatial information, we try to divide an image into non-overlapping rectangular regions. In this section, eight partition schemes: 2 2 ¼ 4, 2 4 ¼ 8, 2 6 ¼ 12, 2 8 ¼ 16, 4 4 ¼ 16, 4 6 ¼ 24, 4 8 ¼ 32 and 8 8 ¼ 64 are considered. We compare the recognition performances of eight partition schemes under four training conditions to select the best subdivision. In each training case, we select two examples of each subject from FERET face database to construct a training set and the rest images form a test set. Four different training sets are separately built by ‘ba’ and ‘bd’, ‘ba’ and ‘be’, ‘ba’ and ‘bj’, ‘ba’ and ‘bk’.
Fig. 5 shows that the recognition performance of our method varies with partitions and the dimensions of regional DLDAembedded submanifolds in different training cases. For the method of DLDA, the eleven dimension settings of projection principle and null subspaces are (10, 10), (10, 20), (20, 20), (20, 30), (20, 40), (30, 40), (30, 50), (30, 60), (30, 70), (40, 70), (40, 80). The dimensionality of a regional DLDA-embedded submanifold is the sum of the dimensions of the principle subspace and the null subspace. All regional DLDA-embedded submanifolds are in the same dimension. As shown in Fig. 5, among these eight partition schemes, the subdivision of 4 4 ¼ 16 with the dimensionality of 110 consistently obtains the best result. The subdivisions of 8 8 ¼ 64 and 2 2 ¼ 4 show poor recognition performance. It implies that as to distribution-based descriptors, a partition that yields too large or too small sub-regions will degenerate the discriminations of spatial information in magnitude images. Hence, for the following exploring experiments, the principle subspace dimension and the null subspace dimension are respectively set as 40 and 70. Moreover, we divide each image into 4 rows and 4 columns and achieve 17 regional faces including the entire region and 16 sub-regions. In this way, for a face image I in image set E, we obtain the collection of 17-region 40-channel Gabor magnitude sets, M16 8;5 ðIÞ. A regional face I i corresponds to a point on corresponding 80-dimensional regional MLE-embedded product submanifold Φi ðM16 8;5 ðEÞÞ. And the face image I can be expressed as a point on the 1360-dimensional extended MLEembedded product submanifold ΦðM16 8;5 ðEÞÞ.
4.4. Immersion scheme selection To obtain the well-preserved geometry structures of Gamma manifold in a higher-dimensional manifold, a nice immersion is needed. In this way on product Gamma manifolds, the sample extrinsic mean induced by the immersion will be very close to the Fréchet mean derived by geodesic metric. There is an immersion addressed in [31], and we denote it as ‘immersion-D’. In this section, we empirically compare the proposed immersion with ‘immersion-D’ in terms of recognition accuracy (%). We take two samples per subject from FERET database to form a training set and the remaining subset is used as a test set. Seven different training sets are separately ‘ba’ and ‘bd’, ‘ba’ and ‘be’, ‘ba’ and ‘bf’, ‘ba’ and ‘bg’, ‘ba’ and ‘bj’, ‘ba’ and ‘bk’. We adopt the partition scheme and the dimensionality of each of regional DLDA-embedded submanifolds mentioned above. Fig. 6 depicts the recognition performance comparison between the proposed immersion and ‘immersion-D’. We can see that the proposed immersion always gets better performance. In particular, for the sixth case, i.e. the training set consisting of ‘ba’ and ‘bk’ of each subject, the proposed immersion shows a higher recognition rate than ‘immersion-D’, by more than 15%.
8
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
78 76 80
72 70 68 2x2 2x4 2x6 2x8 4x4 4x6 4x8 8x8
66 64 62 60 58
20
40
60 80 dimension
100
recogntion rate(%)
recogntion rate(%)
74
60
120
20
40
60 80 dimension
100
120
74
78
72
76
70
74 72 70 68
2x2 2x4 2x6 2x8 4x4 4x6 4x8 8x8
66 64 62 60 20
40
60 80 dimension
100
120
recogntion rate(%)
recogntion rate(%)
2x2 2x4 2x6 2x8 4x4 4x6 4x8 8x8
70
65
80
58
75
68 66 64 2x2 2x4 2x6 2x8 4x4 4x6 4x8 8x8
62 60 58 56 54
20
40
60 80 dimension
100
120
Fig. 5. Performance of the proposed method with variation in the partition schemes and dimensions of regional DLDA-embeded submanifolds. (a) The case of the training set consisting of ‘ba’ and ‘bd’ of each subject. (b) The case of the training set comprising ‘ba’ and ‘be’ of each subject. (c) The case of the training set containing ‘ba’ and ‘bf’ per subject. (d) The case of the training set including ‘ba’ and ‘bj’ per subject.
sub-regions (the fourth image). Note that a brighter intensity implies a bigger weight, while a darker intensity indicates smaller one. When using training sample images acquired in different time, the biggest weight corresponds to the entire region, as shown in sub-figure (a). When two emotion-varying images are selected as training samples, the entire region weight decreases while weights of eyes and eyebrows regions increase, as shown in sub-figure (b). The sub-figures (c) and (d) show that the eyes, nose and mouth regions dedicate the larger discriminative components to face recognition as two images with different poses are chosen as training samples. In fact, these results are in accordance with our intuition.
90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
case 1
case 2
case 3 immersion-D
case 4
case 5
case 6
proposed
Fig. 6. Comparison of the proposed immersion with the affine immersion ‘immersion-D’.
4.5. Region weight distribution When adopting the best partition scheme of 4 4 ¼ 16 and the proposed immersion, we further reveal the variation of regional weights under different training conditions. In each of four sub-figures of Fig. 7, the first two images are original face images of training samples of one individual, and the last two describe the weight distributions of entire region (the third image) and all
4.6. Complexity analysis Regarding the complexity of the proposed algorithm, we provide a theoretical and empirical analysis on the computational cost. We assume that there are M training samples selected from a database containing L subjects and each image is divided into K (¼16) non-overlapping sub-regions. We achieve U Vchannel (U ¼ 8; V ¼ 5) Gabor magnitude sets from each of K+1 regions. Consequently, for an X Y image, the computational complexity of extracting Gabor features is OðX 2 Y 2 UVÞ and that of finding a maximum likelihood embedding is OðXYUV Þ. The complexity of learning all regional optimal projections is OðU 2 V 2 MKÞ and weighting regions is a OðM 2 LKÞ calculation. Moreover, we select randomly two examples of each subject from FERET database to build a training set, and the remainders form a test set.
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
9
Fig. 7. Variations of learned region weights over different training sets. (a) Weight distribution in the case of time-varying training samples. (b) Weight distribution in the case of emotion-varying training samples. (c) Weight distribution in the case of pose-varying training samples (from +151 to +251). (d) Weight distribution in the case of pose-varying training samples (from −251 to −151).
Table 2 The average time (seconds) of each stage of our method on FERET database. Stages Representation on Gamma manifold for an image
Time (s) 40-Channel magnitude feature extraction Extended MLE-based product embedding
Immersions and projection matrix learning Weight learning Matching
We achieve the recognition task on a Laptop with P6800 CPU and 1.86G RAM under the MATLAB (Version 7.01) programming environment. Table 2 shows the average run time (in seconds) of each stage of our method after 10-fold cross validation. These results indicate that the overall complexity of our approach is dominated by the stage of Gabor features extraction. 4.7. Comparisons with other Gabor-based methods In this section, we compare our method with some popular Gabor-based methods on FERET and CMU-PIE two databases. In [8], authors applied the Enhanced Fisher linear discriminant Model (EFM) to an augmented Gabor feature vector for face recognition. We denote the technique as ‘Gabor+EFM’. In [9], the method of ‘Gabor+KDA’ was proposed. This method uses Kernel Discriminant Analysis to reduce the dimensionality of globally concatenated Gabor feature vector. These two techniques of Gabor +EFM and Gabor+KDA both use the down-sampling strategy to reduce the dimensionality of the augmented Gabor feature vectors, where the down-sampling factor is set as 16. For the method of Gabor+KDA, pffiffiffi Gaussian kernel is adopted and the kernel width is valued as 2 2e2 as proposed by the designer of the method. In [10], the multi-channel Gabor face representation (MGFR) method based on 2D Gaborface matrices was developed. The method combines the MGFR and (2D)2PCA and uses Frobenius distance as the similarity measure for face recognition. This algorithm is marked as ‘MGFR+(2D)2PCA’. In [11], authors divided a face image into 8 8 ¼ 64 sub-regions and represented a face image as a vector which consists of shape and scale parameters of distributions of Gabor magnitude and phase over 40 channels and 65 regions. Then they designed a face recognition approach by applying NLDA to the Gabor magnitude and phase texture representation (GMPTR) and denoted it as ‘GMPTR+NLDA’. In [12], authors adopted the tool of information geometry (IG). They used geodesic distance in a non-parametric information manifold to quantify the divergence of multi-channel Gabor magnitude histogram representations (MGMHR). We denote the technique as ‘MGMHR+IG’. We carry out the algorithm by using the rootmean-squared geodesic distance as information metric and adopting the bin number of 30 for each channel.
2.5938 0.1250 1.0625 2.9688 6.7188
Table 3 The average recognition rate (mean 7 std)% of different methods on FERET database. Methods
Sample size 2
3
Gabor+EFM Gabor+KDA MGFR+(2D)2PCA GMPTR+NLDA MGMHR+IG
45.59 77.74(199) 50.157 11.53(199) 52.147 7.7.52(6 12) 53.317 8.18(199) 55.78 76.20
48.267 7.86(199) 64.617 15.55(199) 59.82 7 9.55(6 12) 66.247 9.55(199) 59.4 79.37
Proposed
71.797 7.11(80)
82.96 79.29(80)
Table 4 The average recognition rate (mean 7std)% of different methods on PIE database. Methods
Sample size 2
3
Gabor+EFM Gabor+KDA MGFR+(2D)2PCA GMPTR+NLDA MGMHR+IG
61.76 711.02(67) 70.017 7.80(67) 59.46 7 6.54(6 12) 72.78 77.12(67) 48.66 7 9.82
77.02 7 14.65(67) 84.217 10.21(67) 66.42 76.51(6 12) 83.61 75.31(67) 55.04 710.40
Propose
77.06 76.25(100)
88.3575.65(100)
In comparison experiments, for each recognition algorithm, we adopt the nearest neighbor classifier and randomly select k(¼ 2,3) examples of each subject from a database for training, and the rest of images of the database are used for testing. The averaged recognition rates and the corresponding standard deviations are obtained by 10-fold cross validation. The results on performance evaluation on FERET database and PIE database are separately shown in Tables 3 and 4. From these tables, we can draw conclusions as follows:
Among these methods, both MGFR+(2D)2PCA and MGMHR+IG use no class label information. Consequently, compared with
10
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
other methods except for Gabor+EFM, both of them obtain poor recognition performance as more training samples per subject are provided. As can be seen from Table 3, when 2 training samples of each subject are used, MGMHR+IG achieved 2% higher accuracy than GMPTR+NLDA, and 5%higher than Gabor+KDA. However, when 3 training samples of each subject are used, both of them gain the average recognition rates 4% lower than Gabor+KDA, and 6% lower than GMPTR +NLDA. As shown in Table 4, these two methods are more inferior to the methods using discriminant analysis in both training cases. It should be noted that Gabor+EFM, Gabor+KDA, MGFR +(2D)2PCA and MGMHR+IG are holistic methods. Due to taking into account the spatial relations of sub-regions, our method shows better performance than these holistic methods. In FERET database, among these holistic methods, MGMHR+IG becomes the best performer when 2 examples per subject are used for training, while Gabor+KDA gains the best performance when 3 training examples per subject are available. On average, the accuracy obtained by our approach is 15% higher than MGMHR+IG in the 2-training case, and 18% higher than Gabor +KDA in the 3-training case. According to the experimental results obtained on PIE database, the method of Gabor+KDA performs consistently best in these holistic methods in both training cases. In the case that two training samples per person are available, our approach outperforms Gabor+KDA by 7%. When three training samples per person are used, the method of Gabor+KDA (84.21%) is below our method (88.35%). Benefited from integrating regional magnitude distribution information on product Gamma manifold in weighted manner, our method consistently outperforms the method of GMPTR +NLDA in all trials. Particularly, on the FERET database, our method achieves the recognition accuracy of 82.96% while GMPTR+NLDA only gains the accuracy 66.24% in the case that 3 training samples per subject are used. The relatively worse performance of GMPTR+NLDA mainly lies in learning in ‘dirty laundry’ [2]. That is, GMPTR+NLDA only treats the concatenated regional models parameters as a real-value vector for each image and entirely ignores the geometries of considered Gamma family and generalized Gaussian family. Although MGMHR+IG lays emphasis on model matching distance metric with the information geometry of nonparametric PDF family. Its performance is greatly weakened in practice because of the ignorance of class label information and the information loss in models' discretization. As can be observed from Tables 3 and 4, the performance of MGMHR+IG is far inferior to that of our method. For example, in 3 training samples case, our method gains an average 23% increase on recognition accuracy on the FERET database and an increase of 33% on the PIE database.
5. Conclusion With the generative modelings of region-channel Gabor magnitude sets and the MLE-based product embeddings, a novel face recognition approach is exploited. The essence of our approach is product model matching on the extended MLE-embedded product Gamma submanifold. Considering algorithmic practicability, we build an immersion of a product Gamma manifold to obtain an extrinsic distance. Its derivatives, such as the sample extrinsic mean and variation are derived for carrying out DLDA to incorporate the label information of training samples into regional MLEembedded product submanifolds. With the Euclidean metrics on DLDA-embedded submanifolds, we further construct the regionadaptive distance metrics and weights, thus deriving the weighted
direct sum metric for product model matching. Experimental results on Chi-square goodness-of-fit tests validate that it is reasonable to use Gamma distributions to model Gabor magnitude distributions in different channels and regions. Experimental results on the immersion scheme illustrate that the proposed immersion mode is very effective to preserve the geometry structures of magnitude-generating models. Comparative results on PIE and FERET face databases show that the proposed method achieves consistently the best performance and demonstrate the effectiveness of our method. As to further study, we plan to combine geometry structures on Gamma manifold with kernel trick, and exploit wider applications to object recognition tasks.
Appendix A Proposition 1. Let Θ defined by Eq. (1) be a set of Gamma distributions. Then the mapping τ defined by Eq. (11) is an immersion. Proof. Each element of Θ may be parameterized using two positive real-valued variables ðα; γÞð∈Rþ Rþ Þ. In addition, parameterizations of Θ are C ∞ (infinitely many times differentiable) diffeomorphic to each other. So that Gamma manifold is a 2dimensional C ∞ differentiable manifold with coordinate system ðα; γÞ. We assume that τ1 ðα; γÞ ¼ α; τ2 ðα; γÞ ¼ α=γ, and τ3 ðα; γÞ ¼ log ðΓðα=γÞÞ−α=γ logðαÞ, the Jacobian matrix of τ ¼ ðτ1 ; τ2 ; τ3 Þ is written as 0 1 1 0 C ∂ðτ1 ; τ2 ; τ3 Þ B 1=γ −α=γ 2 C: ð25Þ ¼B @ Ψ ðα=γÞ−log α−1 A ∂ðα; γÞ αðlog α−Ψ ðα=γÞÞ γ It is obvious that the rank of the Jacobian matrix is 2 which is equivalent to the dimensionality of Gamma manifold. Hence τ is an immersion. □ Appendix B Given training and test sets containing L distinct subjects, N l is the training sample number of the lth (l ¼ 1; …; L) class. The ith (i ¼ 0; 1; …; K) regional face of the k-th sample of the l-th class can be expressed as a point qil;k on Φi ðMKU;V ðEÞÞ. We can then calculate the within-class mean and variance and the between-class mean and variance by miw ¼
N l −1 Nl 1 L 2 1 ∑ ∑ ∑ ; L l ¼ 1 N l ðN l −1Þ k ¼ 1 j ¼ kþ1 ρi ðqil;k ; qil;j Þ þ 1
Nl Nh L−1 L 2 1 1 ∑ ∑ ∑ ∑ ; LðL−1Þ l ¼ 1 h ¼ lþ1 N l N h k ¼ 1 j ¼ 1 ρi ðqil;k ; qih;j Þ þ 1 8 !2 91=2 < = Nl L N l −1 2 1 i i ; ∑ ∑ ∑ −mw sw ¼ L i i i :∑l ¼ 1 N l ðN l −1Þ l ¼ 1 k ¼ 1 j ¼ kþ1 ρ ðql;k ; ql;j Þ þ 1 ;
mib ¼
sib ¼
8 <
1
L−1
∑
L
∑
Nl
Nh
∑ ∑
:∑L−1 ∑L N N −1 l ¼ 1 h ¼ lþ1 k ¼ 1 j ¼ 1 l ¼ 1 h ¼ lþ1 l h
1 −mib ρi ðqil;k ; qih;j Þ þ 1
!2 91=2 = ;
:
ð26Þ
References [1] S. Amari, H. Nagaoka, Methods of Information Geometry, Translations of Mathematical Monographs, American Mathematical Society, 2000. [2] G. Lebanon, Riemannian Geometry and Statistical Matching Learning, Ph.D. Thesis, Carnegie Mellon University, 2005. [3] S. Amari, Differential geometry of a parametric collection of invertible linear systems Riemannian metric, dual affine connections and divergence, Math. Syst. Theory 20 (1987) 53–82.
Y. Zhang, C. Liu / Neurocomputing 117 (2013) 1–11
[4] S. Amari, H. Park, T. Ozeki, Singularities affect dynamics of learning in neuromanifolds, Neural Comput. 18 (2006) 1007–1065. [5] K.M. Carter, R. Raich, W.G. Finn, A.O. Hero, Fine: Fisher information nonparametric embedding, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 2093–2098. [6] Y. Cai, C.T.J. Dodson, Andrew J. Doig, Olaf Wolkenhauer, Information theoretic analysis of protein sequences shows that amino acids self cluster, J. Theor. Biol. 218 (2002) 409–418. [7] Abhishek Bhattacharya, Nonparametric Statistics on Manifolds with Applications to Shape Spaces, Ph.D. Dissertation, 2008. [8] C.J. Liu, H. Wechsler, Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition, IEEE Trans. Image Process. 11 (2002) 467–476. [9] L. Shen, L. Bai, Gabor feature based face recognition using kernel methods, in: Proceedings of the Sixth IEEE Conference on Face and Gesture Recognition, Korea, 2004, pp. 170–176. [10] L. Wang, Y. Li, C. Wang, H. Zhang, 2D Gaborface representation method for face recognition with ensemble and multichannel model, Image Vision Comput. 26 (2008) 820–828. [11] L. Yu, Z. He, Q. Cao, Gabor texture representation method for face recognition using the Gamma and generalized Gaussian models, Image Vision Comput. 28 (2010) 177–187. [12] W. Mio, D. Badlyans, X.W. Liu, A computational approach to Fisher information geometry with applications to image analysis, in: Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Lecture Notes in Computer Science, Springer, LNCS 3757, 2005, pp. 18–33. [13] M.N. Do, M. Vetterli, Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance, IEEE Trans. Image Process. 11 (2002) 146–158. [14] R. Kwitt, A. Uhl, Lightweight probabilistic texture retrieval, IEEE Trans. Image Process. 19 (2010) 241–253. [15] C.T.J. Dodson, J. Scharcanski, Information geometric similarity measurement for near-random stochastic processes, IEEE Trans. Syst. Man Cybern.—Part A: Syst. Hum. 33 (2003) 435–440. [16] S.C. Zhu, X. Liu, Y.N. Wu, Exploring texture ensembles by efficient Markov chain Monte Carlo—toward a ‘trichromacy’ theory of texture, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 554–569. [17] B. Heisele, T. Serre, T. Poggio, A component-based framework for face detection and identification, Int. J. Comput. Vision 74 (2007) 167–181. [18] X. Wang, X. Tang, Dual-space linear discriminant analysis for face recognition, in: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, 2004, pp. 564–569. [19] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed., John Wiley and Sons, New York, 2001. [20] H. Dang, G. Weerakkody, Bounds for the maximum likelihood estimates in two-parameter Gamma distribution, J. Math. Anal. Appl. 245 (2000) 1–6. [21] G. Lebanon, Information geometry, the embedding principle, and document classification, in: Proceedings of the Second International Symposium on Information Geometry and its Application, Tokyo, 2005, pp. 101-108. [22] T.S. Lee, Image representation using 2D Gabor wavelets, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1996) 959–971. [23] M. Fréchet, Leséléments aléatoires de nature quelconque dans un espace distancie, Ann. l'Inst. Henri Poincaré 10 (1948) 215–310.
11
[24] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognit. Neurosci. 3 (1991) 71–86. [25] P. Belhumeur, J. Hespanha, D. Kriegeman, Eigenfaces vs. fisher-faces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 711–720. [26] L. Chen, H. Liao, M. Ko, J. Lin, G. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition 33 (2000) 1713–1726. [27] J. Yang, A.F. Frangi, J.Y. Yang, D. Zhang, Z.J, KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 230-244. [28] W.k. Yang, J.G. Wang, M.W. Ren, L. Zhang, J.Y. Yang, Feature extraction using fuzzy inverse FDA, Neurocomputing 72 (2009) 3384–3390. [29] W.k. Yang, C.Y. Sun, L. Zhang, A multi-manifold discriminant analysis method for image feature extraction, Pattern Recognition 44 (2011) 1649–1657. [30] W.S. Chen, P.C. Yuen, X.H. Xie, Kernel machine based rank-lifting regularized discriminant analysis method for face recognition, Neurocomputing 74 (2011) 2953–2960. [31] C.T.J. Dodson, H. Matsuzoe, An affine embedding of the gamma manifold, Appl. Sci. 5 (2003) 7–12.
Yue Zhang is a doctoral student at School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing of China. She took a Master of Foundation Mathematics in 1999, from Department of Mathematics of Anhui Normal University, China. Her scientific interests are in the fields of Probability, Mathematical Statistics and Pattern Recognition.
ChuanCai Liu is presently Professor of School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing of China, and Standing Committeeman of Professional Committee of the robot of Chinese Association of Artificial Intelligence. He attained the degree of Master of Detection of Atmospheric & Atmospheric Remote Sensing in 1993, from Nanjing Institute of Meteorology, and earned his doctorate in Shipping & Marine Engineering Fluid Dynamics in 1997, from China Ship Research Institute. His scientific interests are in the fields of Computer Vision, Pattern Recognition, and Intelligent Robot.