A novel discriminant criterion based on feature fusion strategy for face recognition

A novel discriminant criterion based on feature fusion strategy for face recognition

Author's Accepted Manuscript A novel discriminant criterion based on feature fusion strategy for face recognition Wen-Sheng Chen, Xiuli Dai, Binbin P...

1MB Sizes 1 Downloads 100 Views

Author's Accepted Manuscript

A novel discriminant criterion based on feature fusion strategy for face recognition Wen-Sheng Chen, Xiuli Dai, Binbin Pan, Taiquan Huang

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(15)00168-X http://dx.doi.org/10.1016/j.neucom.2015.02.019 NEUCOM15146

To appear in:

Neurocomputing

Received date: 26 August 2014 Revised date: 3 February 2015 Accepted date: 9 February 2015 Cite this article as: Wen-Sheng Chen, Xiuli Dai, Binbin Pan, Taiquan Huang, A novel discriminant criterion based on feature fusion strategy for face recognition, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2015.02.019 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A Novel Discriminant Criterion Based on Feature Fusion Strategy for Face Recognition Wen-Sheng Chena,b,c , Xiuli Daia , Binbin Pana,c,∗, Taiquan Huangb a

College of Mathematics and Computational Science, Shenzhen University, Shenzhen 518160, China b College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518160, China c Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518160, China

Abstract Feature extraction is an important problem in face recognition. There are two kinds of structural features, namely Euclidean structure and the manifold structure. However, the single-structural feature extraction methods cannot fully utilize the advantages of global feature and local feature simultaneously. Thus their performances will be degraded. To overcome the limitations of the single-structural feature based face recognition schemes, this paper proposes a novel discriminant criterion using Feature Fusion Strategy (FFS), which nonlinearly combines both Euclidean and manifold structures in the face pattern space. The proposed discriminant criterion is suitable to develop an iterative algorithm. It is able to automatically determine the optimal parameters and balance the tradeoff between Euclidean structure and manifold structure. The proposed FFS algorithm is successfully applied to face recognition. Three publicly available face databases, ORL, FERET and CMU PIE, are selected for evaluation. Compared with Linear Discriminant Analysis (LDA), Locality Preserving Projection (LPP), Unsupervised Discriminant Projection (UDP) and Semi-Supervised LDA (SSLDA), the experimental results show that the proposed method gives superior performance. Keywords: Face Recognition, Euclidean structure, Manifold Structure ∗

Corresponding author. Tel.:(+86)755-26534582 Email addresses: [email protected] (Wen-Sheng Chen), [email protected] (Binbin Pan)

Preprint submitted to Neurocomputing

February 18, 2015

1. Introduction Face Recognition (FR) has become one of the most challenging research topics in pattern recognition and computer vision because of the complicated facial image variations, such as poses, illuminations and expressions. The facial images are represented as pattern vectors, which can be viewed as data points distributed in a high-dimensional pattern feature space. However, it is computationally expensive to directly conduct face recognition in such a high-dimensional feature space. This problem is usually called the curse of dimensionality. Feature extraction is a valid means to deal with highdimensional problem. Under different constraint conditions, lots of feature extraction methods have been proposed to find optimal projections which map the facial images into a low-dimensional feature space [1]-[11]. Feature extraction approaches are mainly classified into two categories. In the first category, algorithms aim to discover the global structure of the facial images from the distribution of the images in the Euclidean space. We refer to these algorithms as Euclidean structure preserved algorithms. In the second category, the facial images are assumed to reside on a low-dimensional manifold in the Euclidean space. Algorithms try to discover the local structure of the facial manifold. These algorithms are called manifold structure preserved algorithms. The representative Euclidean structure preserved algorithms include Principal Component Analysis (PCA) [1] and Linear Discriminant Analysis (LDA) [2], while the typical manifold structure preserved methods are Locality Preserving Projection (LPP) [3] and Unsupervised Discriminant Projection (UDP)[4]. PCA, LDA and LPP are respectively called Eigenface, Fisherface and Laplacianface methods in face recognition tasks. PCA aims to generate a set of orthogonal axes of projections which maximize the total scatter with minimal information loss. PCA can simply be implemented and it has better performance in image denoising. In face recognition, PCA is sensitive to the facial variations, such as illumination, facial expression and pose. Furthermore, PCA is actually an unsupervised approach because it does not utilize the class label information. In contrast with PCA, LDA is a supervised method which attempts to seek an optimal projection under Fisher criterion such that the ratio of inter-class distance against intra-class distance attains maximum. Although the accuracy of 2

LDA is better than that of PCA in most cases [5], LDA suffers from Small Sample Size (3S) problem, which occurs when the number of training samples is smaller than the dimensionality of original feature space. In order to resolve 3S problem, some variants of LDA, such as regularization algorithms [12]-[16],maximum margin criterion [17]-[19], locality sensitive discriminant analysis [20]-[22] and subspace algorithms [23]-[26], have been developed. These LDA-based methods focus on modeling the Euclidean structure of facial images, and thus cannot reveal the intrinsic face manifold structure. On the contrary, from the manifold point of view, He et al [3] pioneeringly proposed a LPP method for face recognition. The basic idea of LPP is to find a projection matrix which makes the adjacent data points to be still nearby in the LPP-mapped attribute space. It means that LPP can detect the low-dimensional manifold structure embedded in the high dimensional face data space. However, LPP is also an unsupervised feature extraction approach which just learns a single-manifold of the data. Its performance will be affected in face classification tasks. To avoid the flaw of single-manifold learning approach, Yang et al [4] presented an Unsupervised Discriminant Projection (UDP) method to extract the multi-manifold hidden in the high dimensional pattern space. Because UDP can employ the information of both nonlocality and locality, it is more fit than LPP for classification purpose and outperforms LPP in some face databases. Nevertheless, neither LPP nor UDP are able to explore the Euclidean structure of the face images. They are single-structural feature extraction approaches as well. It is understood that global feature can deal with the facial images under uniform illumination changes well, but is sensitive to the local appearance changes such as facial expression, pose and occlusion. In contrast, local feature is more robust to the local appearance changes, but is weak in uniform illumination changes. Therefore, all single-structural feature extraction methods are incapable of utilizing the advantages of global feature and local feature simultaneously. To address the problem of the single-structural feature extraction methods, this paper proposes a new feature fusion strategy which is more suitable to design an efficient update algorithm. The novel discriminant criterion function is established via combining the Euclidean and manifold structures of the facial images. This criterion function contains two tradeoff parameters which balance the contribution of the Euclidean and the manifold structures. It theoretically obtains the computational formulae to determine the optimal parameters. A cross-iteration algorithm is then developed to automatically tune the optimal parameters and further 3

find the optimal projection directions under certain stopping criteria. The proposed Feature Fusion Strategy (FFS) based face recognition method is evaluated on the publicly available face data sets. Compared with some single-structural feature extraction approaches, experimental results show that our FFS method has superior performance. The rest of this paper is organized as follows. Section 2 briefly gives a review on some related works. Section 3 establishes a new discriminative criterion function and develops a FFS algorithm. Section 4 demonstrates the experimental results. Finally, conclusions are drawn in section 5. 2. Related works This section will give a brief review of some popular methods used in face recognition. 2.1. Some Notations and Definitions Let d be the dimensionality of the original face space and c be the number of classes. The set of total training samples is X = [X1 , X2 , · · · , Xc ] and the (i) (i) (i) ith class Xi = [x1 , x2 , · · · , xmi ], (1 ≤ i ≤ c) contains mi samples and the total number of all samples is m = ci=1 mi . The data xij ∈ Rd represents the jth sample in the ith class and μi denotes  the mean the ith class, i.e.  i (i) mi of (i) c 1 x , and the entire mean μ = x μi = m1i m i=1 j=1 j j=1 j . m Some matrices used in LDA, LPP and UDP algorithms will be defined as follows. In LDA method, two scatter matrices, such as within-class scatter matrix SW , between-class scatter matrix SB are defined below: i 1  (i) (i) = (xj − μi )(xj − μi )T , m i=1 j=1

m

c

SW

1  SB = mi (μi − μ)(μi − μ)T . m i=1 c

For LPP approach, the weighted graph matrix S = (Sij ) ∈ Rm×m is calculated according to the following principle:  xi −xj 2 exp(− ), xi ∈ Nk (xj ) or xj ∈ Nk (xi ) , t Sij = (1) 0, otherwise 4

where t > 0, Nk (x) means the k nearest neighbors of data x. Two matrices = XLX T and SD = XDX T , where D = used in LPP are defined as SL  diag(D11 , · · · , Dmm ) with Dii = j Sij and the Laplacian matrix L = D −S. In UDP approach, matrix H = (Hij ) ∈ Rm×m describes the relationship of two points in facial image space, which is slightly modified below:  xi −xj 2 1 − exp(− ), xi ∈ / Nk (xj ) or xj ∈ / Nk (xi ). t Hij = 0, otherwise Local scatter matrix SL is defined as SL = XLX T , where L is the Laplacian matrix. Nonlocal scatter matrix SN is defined by 1  = Hij (xi − xj )(xi − xj )T 2 i=1 j=1 M

SN

M

= XNX T , where N = B − H and B = diag(B11 , B22 , . . . , Bmm ) with Bii =

 j

Hij .

2.2. LDA The goal of LDA is to find an optimal projection matrix which maximizes the ratio of the between-class scatter to the within-class scatter, namely WLDA = arg max W

tr(W T SB W ) . tr(W T SW W )

The above problem is equivalent to solving the following generalized eigensystem: SB w = λSW w, where λ and w are the eigenvalue and eigenvector respectively. The projection matrix WLDA is formed with the eigenvectors associated to the largest c − 1 eigenvalues of the generalized eigen-system. Although LDA is theoretically sound for face recognition, it often suffers from the Small Size Sample (3S) problem, which occurs when the dimension of the feature vectors is higher than the number of training samples. Several representative methods, such as LDA(Fisherface)[2], direct LDA [28] and regularized discriminant analysis [12]-[14], have been proposed to address this drawback.

5

2.3. LPP The basic idea of LPP is to model a projection matrix which captures the low-dimensional manifold structure of the data embedded in the highdimensional pattern space. In details, its criterion function makes the nearby points also be similar using a weighted graph matrix S = (Sij ) ∈ Rm×m , and is defined as follows. 1 2 J(W ) = W T xi − W T xj  Sij 2 ij = tr(W T SL W ). The following constraint is needed to remove an arbitrary scaling factor.: W T SD W = I. Hence, the LPP problem is transformed to the following problem: WLP P = arg min tr(W T SL W ), W T SD W =I

which is equivalent to solving the following generalized eigen-system: SL w = λSD w. The projection matrix WLP P is generated with the eigenvectors associated to the smallest eigenvalues of the above eigen-system. 2.4. UDP Motivated by the idea that LDA considers two quantities, namely betweenclass and within-class, UDP creates a new criterion function which is the ratio of the nonlocal scatter to the local scatter: tr(W T SN W ) . JU DP (W ) = tr(W T SL W ) The objective of UDP is to find the projection matrix WU DP such that WU DP = arg max JU DP (W ). W

The projection matrix WU DP is formed with the eigenvectors associated to the largest eigenvalues of following the generalized eigen-system: SN w = λSL w. In summary, these methods only consider either the Euclidean structure or the manifold structure and thus are the single-structural feature extraction approaches. Moreover, they often encounter the 3S problem in FR tasks. 6

3. The Proposed FFS Approach In this section, we propose a novel discriminant criterion using feature fusion strategy. This criterion is suitable for automatic parameter selection. Details are discussed as follows. 3.1. Proposed Discriminant Criterion To fuse the features extracted from the Euclidean and manifold structures, we propose a new discriminant criterion function as follows: tr(W T (bSB + (1 − b2 )SN )W ) J(a, b, W ) = . tr(W T (a2 SW + (1 − a)SL )W )

(2)

By maximizing (2), we find the projection which pushes the intra-class data and non-local data away, while pulling the inter-class data and local data close. The parameters a and b are to balance the tradeoff between the Euclidean structure and the manifold structure. LDA can be viewed as an instance of our method by setting a = 1 and b = 1. Also, our approach contains UDP as a special case when setting a = 0 and b = 0. The values of a and b among [0, 1] adjust the contribution of the Euclidean structure and the manifold structure. The optimal projection matrix is obtained by solving the following optimization problem: WF F S = arg max J(a, b, W ). (3) a,b,W

We optimize the combination parameters a and b together with the projection matrix W . Since a and b are determined automatically, it avoids the manual adjustment which is time-consuming and requires prior knowledge of the users. Another popular method for automatic parameter selection is the cross-validation. Note that the cross-validation trains the model many times for different combinations of the parameters and different validation data. This would also be computationally expensive. A further problem with crossvalidation is that the discretization of the parameters may lead to sub-optimal solution. The optimization problem (3) is solved by alternately optimizing the combination parameters and the projection matrix. When a and b are given, problem (3) can be solved by setting the derivative of (2) with repsect to W to zero, leading to the generalized eigen-system: (bSB + (1 − b2 )SN )w = λ(a2 SW + (1 − a)SL )w. 7

(4)

The projection matrix WF F S is formed with the eigenvectors associated to the largest eigenvalues of the above eigen-system. 3.2. Parameter Determination Formulae This subsection discusses how to select the optimal parameters a and b. To this end, we have the following theorem. Theorem 1: To maximize the discriminant criterion function J(a, b, W ) defined by (2) as projection matrix W is fixed, the parameters a and b must respectively satisfy the following formulae: a=

tr(W T SL W ) , 2 tr(W T SW W )

b=

tr(W T SB W ) . 2 tr(W T SN W )

(5)

Proof: For a given projection matrix W , the formulae (5) can be obtained by calculating the gradient of J with respect to (a, b) and setting it to zero, namely ∇(a,b) J(a, b, W ) = 0. To this end, we rewrite the equation (2) as follows J(a, b, W ) × tr(W T (a2 SW + (1 − a)SL )W ) = tr(W T ((bSB + (1 − b2 )SN )W ). (6) and take the partial derivative of both sides of the equation (6) with respective to a. It yields that ∂J × tr(W T (a2 SW + (1 − a)SL )W ) + J(a, b, W ) × tr(W T (2aSW − SL )W ) = 0. ∂a Let

∂J ∂a

= 0, we have tr(W T (2aSW − SL )W ) = 0,

namely,

tr(W T SL W ) . 2tr(W T SW W ) Similarly, it can be derived by direct computation that a=

∂J tr(W T (SB − 2bSN )W ) = . ∂b tr(W T (a2 SW + (1 − a)SL )W ) Setting

∂J ∂b

= 0, we get b=

tr(W T SB W ) . 2tr(W T (SN )W ) 8

The theorem is immediately concluded. Based on the formulae (5), we can design a cross-iterative algorithm. For a given initial value (a, b) = (a0 , b0 ), the projection matrix W0 can be calculated according to the eigen-system (4). In turn, by substituting W0 into formulas (5), the updated parameters a and b can be obtained. Along this direction, our cross-iterative formulae are obtained as follows: ak =

T tr(Wk−1 SL Wk−1 ) , T 2tr(Wk−1 SW Wk−1 )

(7)

bk =

T tr(Wk−1 SB Wk−1 ) , T 2tr(Wk−1 SN Wk−1 )

(8)

Wk = arg max J(ak , bk , W ). W

(9)

On the convergence of the algorithm, we have the following theorem: Theorem 2: The criterion function J(a, b, W ) is non-decreasing under the update formulae (7), (8) and (9). Proof: By the optimality of (ak , bk ) when Wk−1 is fixed, i.e. ∇J(ak , bk , Wk−1) = 0, we have J(ak , bk , Wk−1 ) ≥ J(ak−1 , bk−1 , Wk−1). (10) By the optimality of Wk when (ak , bk ) is given in (9), we obtain J(ak , bk , Wk ) ≥ J(ak , bk , Wk−1 ).

(11)

Combining (10) and (11), it yields J(ak , bk , Wk ) ≥ J(ak−1 , bk−1 , Wk−1 ),

(12)

which implies the non-decrease of J(a, b, W ). 3.3. Stopping Criteria for Iteration In this subsection, some stopping criteria are given for our cross-iterative algorithm. The iterative procedure will be stopped if one of the following conditions is met. 1. Set a maximal number of iteration t0 . If the number of iteration reaches t0 , the iteration is ceased. 2. Calculate and record the rank one accuracy at each iterative stage. Let rk be the accuracy at the kth iteration. If |rk+1 − rk | < ε for a given small threshold ε or rk+1 < rk , then the iteration is stopped. 3. If the values of parameters a or b do not range in the interval [0, 1], the iteration is stopped. 9

3.4. Algorithm Design Based on above analysis, the proposed algorithm are described as follows. Step 1: Construct four scatter matrices SW , SB , SL and SN according to the definitions in subsection 2.1. √ (1) (1) (c) (c) Step 2: Let ΦT = [x1 − μ, · · · , xm1 − μ, | · · · , |x1 − μ, · · · , xmc − μ]/ m ∈ eig Rd×m . Perform eigen value decomposition on ΦTT ΦT , such that ΦTT ΦT = UΛU T , where U ∈ Rm×m is an orthogonal matrix, Λ = diag(λ1 , λ2 , · · · , λτ , 0, · · · , 0) ∈ Rm×m with λ1 ≥ λ2 ≥ · · · ≥ λτ > 0. Denote ΛT = diag(λ1 , λ2 , · · · , λτ ), −1

UT = U(:, 1 : τ ) ∈ Rm×τ and then set WT = ΦT UT ΛT 2 ∈ Rd×τ . Step 3: Let k = 0 and initialize ak = a0 , bk = b0 , rk = 0. (k) (k) Step 4: Compute S1 , S2 according to the following expressions: (k) (k) S1 = bk SB + (1 − b2k )SN , S2 = a2k SW + (1 − ak )SL . Step 5: Perform eigen value decomposition   Σ2 0 eig T (k) U2T ∈ Rτ ×τ , WT S2 WT = U2 0 0

Step 6:

Step 7: Step 8:

Step 9:

where U2 ∈ Rτ ×τ is an orthonormal matrix, Σ2 = diag(σ1 , · · · , σr ) with σ1 ≥ · · · ≥ σr > 0 and r ≤ τ . (k) (k) (k) (k) If r = τ , set S1 ← WTT S1 WT , S2 ← WTT S2 WT and goto step 7. Otherwise, update WT according to the rule WT ← WT (:, 1 : τ − 1), τ ← τ − 1 and goto step 5. (k) (k) Solve (S2 )−1 S1 w = λw and obtain Wk , which is formed with the p eigenvectors corresponding to the largest p eigenvalues. Calculate the accuracy rk using Wk . Next, go to step 9 if the stopping criterions are satisfied. Otherwise, let k ← k + 1 , compute ak , bk according to (7), (8) and go to step 4. Let the final projection matrix WF F S = Wk .

4. Experimental Results This section reports the experimental results of our FFS method and some single-structural feature extraction methods such as LDA[2], LPP [3] and UDP [4]. The Semi-Supervised LDA (SSLDA) which extracts both the 10

Figure 1: The images of one person from the ORL database

Figure 2: The variations of two person from the FERET database

Euclidean structure and manifold structure is also used for evaluation [27]. Three publicly available face databases, namely ORL face database, FERET face database and CMU PIE face database, are used for evaluation. The ORL face database contains 400 images of 40 individuals. Each person consists of 10 images with different facial expressions, small variations in scales and orientations. Figure 1 illustrates the image variations of one individual from the ORL database. For FERET database, we select 120 people, each of which gets 6 images. The six images are extracted from four different sets, namely Fa, Fb, Fc and duplicate. This subset includes variations in illuminations, facial expressions and poses. Fa is a regular frontal images set, and Fb is an alternative frontal image, taken seconds after the corresponding Fa, but Fc is obtained by the different camera on the same day, a strict subset of the duplicate image are taken at least of 18 months [29]. All the duplicates are frontal images. Images from two people are shown in Figure 2. The CMU PIE database contains 41368 facial images of 68 people. It involves complicated facial variants in pose, illumination and expression. A 11

subset including 3808 images of 68 individuals (56 images per individual) is selected to test the algorithms. The facial portion of each original image was automatically cropped based on the location of eyes. Part images of one people are shown in Figure 3.

Figure 3: Part images of one individual in the CMU PIE database

In these three databases, the resolution of each image is 112x92, and with 256 gray levels per pixel. All facial images are preprocessed before training and testing. They are firstly aligned by the centers of eyes and mouth, and then the resolution of the facial images is reduced to 30x25 using two-level D4 wavelet transformation (WT). The compressed facial image is represented by the LL-subband of WT. The regularization strategy is employed in SSLDA to tackle the 3S problems. To resolve the 3S problems occurred in other algorithms, PCA is used to reduce the dimensionality of the data to ensure the matrix in the right side of the generalized eigen-problem is invertible. That is, S2 is invertible in our method, SW is invertible in LDA, SD is invertible in LPP and SL is invertible in UDP. The reduced dimensionality is kept as large as possible. The input data are randomly divided into training data and testing data. We arbitrarily select the same number of data from each class to generate the training dataset. Then the remaining data form the testing dataset. For the unsupervised methods, only the training data themselves are used for training. While for the supervised methods, the training data associated with the corresponding class labels are employed for training. For a fair comparison, all methods use the same training and testing data. The experiments are repeated 10 times and the average accuracies are recorded. The nearest neighbor classifier is employed in the recognition stage. The parameters in our algorithm were set as: 12

• The width t in (1) was fixed to 1000. • We chose 5 nearest neighbors to construct SL and SN . • The a and b were both initilized as 0.5, which implies the contributions of Euclidean structure and manifold structure are the same at the beginning. • The maximial number of iteration was set to 10. • The tolerance  in stopping conditions was 10−3 . 4.1. Results on ORL database For the ORL database of faces, the n(n = 2, 3, . . . , 7) images from each individual are randomly selected for training, while the rest 10 −n images for testing. Under different training number (TN), the experimental results are tabulated in Table 1 and plotted in Figure 4. When the number of training data is small, manifold structure preserved algorithms such as LPP and UDP achieve better performance than Euclidean structure preserved algorithms LDA. However, when we have enough training data, LDA outperforms LPP and UDP. The reason may be that the small number of training data could not reflect the “true” Euclidean structure well, which leads to the failure of algorithms using global feature. In contrast, manifold structure preserved algorithms extract feature from the local neighbors, thus are more robust for small number of training data. When we have more training data, the Euclidean space of facial images can be characterized more accurately. This leads to the rapid increase of accuracies of LDA. The manifold structure preserved algorithms benefit less than the global structure preserved algorithms for large number of training data. By combining the Euclidean and manifold structures, SSLDA and FFS gives more robust performance in all cases. The proposed FFS is more accurate than SSLDA. Table 1: Recognition rates (%) on ORL database TN LDA LPP UDP SSLDA FFS

2 66.36±4.17 77.91±2.87 80.25±2.97 82.44±3.43 81.28±3.08

3 75.21±2.99 81.96±1.68 85.64±2.13 84.39±2.69 87.96±1.91

4 81.33±2.24 83.42±1.89 86.75±2.25 85.67±1.29 90.08±1.64

13

5 87.80±2.46 83.85±1.72 86.60±1.51 86.65±1.74 90.85±0.82

6 91.38±1.79 86.13±2.46 88.63±2.12 89.63±2.90 93.81±1.49

7 94.00±2.00 88.42±2.20 90.42±2.01 91.75±1.33 94.67±1.43

90

90

85

85

80 Accuracy(%)

Accuracy(%)

95

80

75

65 2

70

LDA LPP UDP SSLDA FFS

70

3

4 5 Number of Training

6

75

LDA LPP UDP SSLDA FFS

65

60 2

7

3

4

5

Number of Traininig

(a) ORL

(b) FERET 75 70

Accuracy(%)

65 60 55 50 LDA LPP UDP SSLDA FFS

45 40 35 2

3

4 Number of Training

5

6

(c) CMU PIE Figure 4: Rank 1 accuracy versus training number on the (a) ORL face database, (b) FERET face database, (c) CMU PIE face database

In order to check the detailed performance of our method, we provide graphically illustration which is the Cumulative Match Characteristic (CMC) curve [30]. CMC plots the recognition rate against the rank. High accuracy means good performance. For each number of training images on the ORL database, the experimental results are tabulated in Table 2-Table 7, and the CMC curves are plotted in figure 5 respectively with the number of training images from 2 to 7. It can also be seen that our method has the best performance.

14

Table 2: Recognition rates (%) on ORL database(TN=2)

Rank LDA LPP UDP SSLDA FFS

1 66.34±4.17 77.91±2.87 80.25±2.97 82.44±2.24 81.28±3.08

Rank LDA LPP UDP SSLDA FFS

1 75.21±2.99 81.96±1.68 85.64±2.13 84.39±1.86 87.96±1.91

2 75.13±4.32 84.75±2.49 86.22±2.52 86.94±2.17 87.50±2.66

3 79.88±4.02 87.38±2.88 89.53±2.36 89.84±1.88 90.31±2.33

4 82.63±3.56 89.31±2.66 91.16±2.31 91.72±2.32 91.88±2.58

5 84.84±3.38 90.56±2.49 92.34±2.54 92.97±2.01 92.81±2.77

Table 3: Recognition rates (%) on ORL database(TN=3)

2 82.43±3.03 88.50±2.15 91.14±1.90 89.79±2.69 92.00±1.49

3 86.25±2.99 90.79±1.95 93.36±1.79 91.79±2.45 94.36±1.36

4 88.64±2.37 92.18±1.69 94.61±1.76 93.11±1.99 95.36±1.08

5 90.39±2.37 93.39±1.61 95.32±1.51 93.71±1.85 96.39±0.82

Table 4: Recognition rates (%) on ORL database(TN=4)

Rank LDA LPP UDP SSLDA FFS

1 81.33±2.34 83.42±1.89 86.75±2.25 85.67±2.44 90.08±1.64

2 87.91±3.07 89.67±1.74 92.33±1.46 91.33±2.38 94.04±1.70

3 90.67±3.09 92.29±1.79 94.58±1.02 93.42±2.52 95.79±1.31

4 92.46±2.34 93.86±1.24 96.00±1.08 94.75±2.48 96.58±1.09

5 93.83±2.53 95.08±1.00 96.75±0.90 95.75±1.94 97.21±1.02

Table 5: Recognition rates (%) on ORL database(TN=5)

Rank LDA LPP UDP SSLDA FFS

1 87.80±2.46 83.85±1.72 86.60±1.51 86.65±2.71 90.85±0.82

2 91.30±1.46 89.90±1.41 92.10±1.10 91.45±2.54 94.70±0.86

3 93.15±1.42 92.00±1.62 94.00±1.58 93.65±2.03 95.95±1.12

4 94.50±1.49 93.20±1.70 95.50±1.33 94.90±1.70 96.80±1.25

5 95.05±1.32 94.45±1.23 96.45±0.80 95.60±1.81 97.25±1.14

4.2. Results on FERET databases The FERET databases have 6 images for one individual. We randomly select n(n = 2, 3, 4, 5) images from one individual for training and the remaining 6 − n images for testing. The results are shown in Table 8 and plotted in Fig 4 respectively. We observe a similar situation on FERET database. LDA is less accurate 15

Table 6: Recognition rates (%) on ORL database(TN=6)

Rank LDA LPP UDP SSLDA FFS

1 91.38±1.79 86.13±2.46 88.63±2.12 89.63±1.48 93.81±1.49

Rank LDA LPP UDP SSLDA FFS

1 94.00±2.00 88.42±2.20 90.42±2.01 91.75±3.00 94.67±1.43

2 94.69±1.72 92.69±1.25 93.88±1.31 93.50±1.65 95.88±1.19

3 96.44±1.38 94.44±1.12 95.19±1.22 95.44±1.29 97.56±1.30

4 97.31±1.14 95.69±0.95 96.44±1.72 96.75±1.24 98.13±0.88

5 97.88±0.94 96.56±1.33 97.25±1.39 97.25±1.22 98.50±0.67

Table 7: Recognition rates (%) on ORL database(TN=7)

2 96.42±1.11 93.25±2.37 94.75±1.52 94.67±2.81 96.42±1.11

3 97.83±1.48 94.50±1.93 95.92±1.59 96.08±1.89 97.92±1.19

4 98.33±1.04 95.50±1.97 96.92±1.57 96.92±1.67 98.50±0.95

5 98.83±1.05 96.08±2.04 97.58±1.39 97.67±1.10 98.92±0.88

Table 8: Recognition rates (%) on FERET database

TN LDA LPP UDP SSLDA FFS

2 64.42±1.68 67.85±1.69 70.88±1.18 72.19±1.55 71.94±1.37

3 76.72±2.26 76.58±1.87 80.64±1.87 81.11±2.90 82.33±1.04

4 86.50±2.24 82.71±1.78 85.58±1.45 84.79±1.86 87.00±2.21

5 89.00±2.35 83.08±4.14 85.75±3.05 86.92±2.50 88.92 ±2.78

Table 9: Recognition rates (%) on FERET database(TN=2)

Rank LDA LPP UDP SSLDA FFS

1 64.42±1.68 67.85±1.69 70.88±1.18 72.19±1.47 71.94±1.37

2 69.17±1.97 73.04±1.67 75.92±1.33 76.56±1.29 76.48±1.42

3 72.25±1.98 75.96±1.81 78.50±1.81 79.45±1.25 79.29±1.40

4 74.54±1.79 77.69±1.44 80.46±1.59 81.37±1.83 81.31±1.34

5 76.38±1.71 79.42±1.25 82.13±1.52 82.83±1.81 82.48±1.01

for small number of training data, but increases rapidly when more training data are available. LPP and UDP are robust when a few training data are given, but perform worse than LDA. when T N = 5. Our method still achieves the best performance.

16

ORL(TN=3)

ORL(TN=2) 95

100

90

95

Accuracy (%)

Accuracy(%)

85

80

85

LDA LPP UDP SSLDA FFS

75

70

65 1

90

2

3 Rank

4

LDA LPP UDP SSLDA FFS

80

75 1

5

2

ORL(TN=4)

3 Rank

4

5

ORL(TN=5)

98

98

96

96

94

94 Accuracy (%)

Accuracy (%)

92 90 88 86

LDA LPP UDP SSLDA FFS

84 82 80 1

2

3 Rank

4

92 90 88

LDA LPP UDP SSLDA FFS

86 84 82 1

5

2

ORL(TN=6)

3 Rank

4

5

ORL(TN=7)

100

100

98

98

96 Accuracy (%)

Accuracy (%)

96 94 92

LDA LPP UDP SSLDA FFS

90 88 86 1

2

3 Rank

4

94 LDA LPP UDP SSLDA FFS

92

90

88 1

5

2

3 Rank

4

5

Figure 5: CMC curve comparisons on the ORL database Table 10: Recognition rates (%) on FERET database(TN=3)

Rank LDA LPP UDP SSLDA FFS

1 76.72±2.26 76.58±1.87 80.64±1.87 81.11±1.32 82.33±1.04

2 80.58±1.84 81.56±2.32 85.36±1.46 85.47±1.39 86.53±0.99

3 83.06±1.75 84.11±2.18 87.75±1.27 87.25±1.34 88.54±1.24

4 84.47±2.05 85.50±1.63 89.53±0.76 88.47±1.18 90.25±0.71

5 85.94±1.95 86.56±1.90 90.14±0.73 89.47±1.25 90.98±0.60

4.3. Results on CMU PIE databases In CMU PIE database, we randomly select n(n = 2, 3, 4, 5, 6) images from one individual for training and the remaining 56 − n images for testing. The 17

Table 11: Recognition rates (%) on FERET database(TN=4)

Rank LDA LPP UDP SSLDA FFS

1 86.50±2.24 82.71±1.78 85.58±1.45 84.79±1.18 87.00±2.21

2 88.75±1.88 85.25±2.04 88.42±2.00 89.33±1.96 89.83±2.05

3 89.96±1.89 87.29±2.12 89.58±1.68 90.46±2.01 91.46±1.67

4 90.92±1.83 88.33±1.83 90.58±1.45 91.29±1.88 91.96±1.57

5 91.54±1.77 89.17±2.01 91.36±1.16 91.96±1.71 92.54±1.44

Table 12: Recognition rates (%) on FERET database(TN=5)

Rank LDA LPP UDP SSLDA FFS

1 89.00±2.35 83.08±4.14 85.75±3.05 86.92±3.54 88.92±2.28

2 90.33±2.26 86.25±3.65 88.58±2.39 89.83±3.44 91.58±1.87

3 91.83±2.18 87.83±3.12 89.83±2.72 91.92±2.72 92.00±2.16

4 92.25±2.19 88.83±2.46 90.75±2.56 92.67±2.35 92.67±2.07

FERET(TN=2)

5 92.67±1.88 89.50±2.76 91.25±2.49 93.00±2.23 93.25±1.82

FERET(TN=3)

84

92

82

90

80 88 Accuracy (%)

Accuracy (%)

78 76 74 72

LDA LPP UDP SSLDA FFS

70 68

2

3 Rank

4

84 82

LDA LPP UDP SSLDA FFS

80 78

66 64 1

86

76 1

5

2

94

92

92

90

90

88 LDA LPP UDP SSLDA FFS

86

84

82 1

2

3 Rank

4

5

FERET(TN=5)

94

Accuracy (%)

Accuracy (%)

FERET(TN=4)

3 Rank

4

88

LDA LPP UDP SSLDA FFS

86

84

82 1

5

2

3 Rank

4

5

Figure 6: CMC curve comparisons on the FERET database

results are shown in Table 13 and plotted in Fig 4 respectively. We observe that the Euclidean structure preserved algorithm is superior 18

Table 13: Recognition rates (%) on CMU PIE database

TN LDA LPP UDP SSLDA FFS

2 42.97±2.35 37.49±2.79 40.73±2.49 48.30±3.00 46.41±2.42

3 57.08±2.22 48.64±2.68 53.30±1.62 56.79±0.98 60.66±2.27

4 64.05±1.00 49.18±3.40 53.66±1.68 62.29±1.66 65.29±1.33

5 67.48±1.16 48.51±3.71 53.46±2.75 66.34±1.29 68.37±0.89

6 71.01±1.05 49.32±3.79 54.30±3.13 68.34±1.20 71.87±0.99

Table 14: Recognition rates (%) on CMU PIE database(TN=2)

Rank LDA LPP UDP SSLDA FFS

1 42.96±2.35 37.49±2.79 40.73±2.49 48.30±3.00 46.41±2.42

2 48.59±2.50 42.93±2.46 46.62±2.33 53.26±2.97 52.13±2.30

3 52.09±2.79 46.32±2.43 49.95±2.50 56.21±3.00 55.40±2.37

4 54.59±2.87 48.88±2.42 52.45±2.55 58.44±2.95 58.08±2.29

5 56.88±2.79 50.97±2.35 54.59±2.42 60.48±2.84 60.28±2.25

Table 15: Recognition rates (%) on CMU PIE database(TN=3)

Rank LDA LPP UDP SSLDA FFS

1 57.08±2.22 48.64±2.68 53.30±1.62 56.79±0.98 60.66±2.27

2 61.37±2.03 53.58±2.67 57.87±1.32 60.69±0.94 64.66±2.17

3 64.09±1.76 56.43±2.58 60.64±1.33 63.16±1.23 67.16±2.02

4 66.07±.781 58.55±2.52 62.75±1.23 65.14±1.20 69.18±1.99

5 67.75±1.80 60.31±2.42 64.48±1.23 66.68±1.09 70.67±1.91

Table 16: Recognition rates (%) on CMU PIE database(TN=4)

Rank LDA LPP UDP SSLDA FFS

1 64.05±1.00 49.18±3.40 53.66±1.68 62.29±1.66 65.29±1.33

2 67.81±1.12 53.87±3.52 58.29±1.63 65.77±1.51 68.57±1.05

3 69.99±1.25 56.78±3.51 60.87±1.51 67.93±1.60 70.58±0.87

4 71.62±1.18 58.97±3.43 62.93±1.33 69.49±1.60 72.19±0.88

5 72.96±1.14 60.77±3.39 64.62±1.20 70.99±1.69 73.64±0.77

to the manifold structure preserved algorithms on CMU PIE database. This implies that the local features extracted by LPP and UDP is not robust to the complicated facial variants. By adjusting the contribution between the Euclidean structure and manifold structure automatically, our method gives more robust performance.

19

Table 17: Recognition rates (%) on CMU PIE database(TN=5)

Rank LDA LPP UDP SSLDA FFS

1 67.48±1.16 48.51±3.71 54.30±2.75 66.34±1.29 68.37±0.89

2 70.90±1.17 54.33±3.55 58.35±2.74 69.77±1.09 71.73±0.75

3 72.89±1.02 57.54±3.52 61.33±2.41 71.80±1.03 73.78±0.71

4 74.42±0.99 59.96±3.31 63.32±2.31 73.34±1.10 75.30±0.74

5 75.74±0.93 61.86±3.23 65.06±2.38 74.64±1.08 76.60±0.61

Table 18: Recognition rates (%) on CMU PIE database(TN=6)

Rank LDA LPP UDP SSLDA FFS

1 71.01±1.05 49.32±3.79 54.30±3.13 68.34±1.20 71.87±0.99

2 74.19±0.86 54.51±3.21 59.40±3.03 71.70±1.36 74.91±0.80

3 76.01±0.88 57.63±3.27 62.39±3.03 73.64±1.31 76.81±0.80

4 77.31±0.80 60.05±3.32 64.53±3.01 75.06±1.24 78.20±0.71

5 78.49±0.79 62.03±3.56 66.21±3.15 76.32±1.23 79.27±0.64

5. Conclusions In order to address the drawbacks of existing single-structural feature extraction algorithms, this paper proposes a novel feature fusion strategy based discriminant criterion, which is able to make the best use of the global feature and local feature. Especially, it is suitable to design a cross-iterative algorithm for optimal parameter selection and optimal projection matrix determination. The proposed FFS algorithm is successfully applied to face recognition. Some experiments are implemented on three publicly available face databases, namely ORL, FERET and CMU PIE databases. Compared with some existing single-structural feature extraction approaches, experimental results show that our FFS approach gives superior performance. Acknowledgements This paper is partially supported by NSF of China Grant (61272252, 61472257) and Science & Technology Planning Project of Shenzhen City (JCYJ20130326111024546). We would like to thank Olivetti Research Laboratory in Cambridge UK for providing the ORL face database and Army Research Laboratory for providing FERET databases of facial images. [1] M.A. Turk, A.P. Pentland. Face recognition using eigenfaces, Proceed20

[2]

[3] [4]

[5] [6] [7]

[8] [9] [10] [11] [12]

ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586-591, 1991. P.N. Belhumeur, J.P. Hespanha, D. Kriegman. Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.19, No.7, pp.711-720 , 1997. X.F. He, P. Niyogi, Locality preserving projections, In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, pp.153C160, 2003. J. Yang, D. Zhang, J.Y. Yang and B. Niu. Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.29, No.4, pp.650-664, 2007. A.M. Martnez, A.C. Kak. Pca versus lda, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.23, No.2, pp.228-233, 2001. L. Qiao, S. Chen, X. Tan. Sparsity preserving projections with applications to face recognition, Pattern Recognition, Vol.43, No.1, pp.331-341, 2010. T. Zhang, B. Fang, Y.Y. Tang. Locality preserving nonnegative matrix factorization with application to face recognition, International Journal of Wavelets, Multiresolution and Information Processing, Vol.8, No.5, pp.835-846, 2010. F. Dornaika, A. Bosaghzadeh. Exponential local discriminant embedding and its application to face recognition, IEEE Transactions on Cybernetics, Vol.43, No.3, pp.921-934, 2013. H. Yu, J. Yang. A direct LDA algorithm for high-dimensional datawith application to face recognition, Pattern Recognition, Vol.34, No.17, pp.2067-2070, 2001. A. Sharma, K. K. Paliwal. A two-stage linear discriminant analysis for face-recognition, Pattern Recognition Letters, Vol.33, No.9, pp.11571162, 2012. G.Y. Feng, D.W. Hu, Z.T. Zhou. A direct locality preserving projections (DLPP) algorithm for image recognition, Neural Processing Letters, Vol.27, No.3, pp.247-255, 2008. J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition, Pattern Recognition Letters, Vol.26, No.2, pp.181-191, 2005. 21

[13] W.S. Chen, P.C. Yuen, J. Huang, B. Fang. Two-step single parameter regularization Fisher discriminant method for face recognition, International Journal of Pattern Recognition and Artificial Intelligence, Vol.20, No.2, pp.189-207, 2006. [14] D.Q. Dai, P.C. Yuen. Face recognition by regularized discriminant analysis, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.37, No.4, pp,1080-1085, 2007. [15] W.S. Chen , P.C. Yuen, X. Xie. Kernel machine-based rank-lifting regularized discriminant analysis method for face recognition, Neurocomputing, Vol.74, No.17, pp.2953-2960, 2011. [16] X. Gu, W. Gong, L. Yang. Regularized locality preserving discriminant analysis for face recognition, Neurocomputing, Vol.74, No.17, pp.30363042, 2011. [17] X.R. Li, T. Jiang, K. Zhang. Efficient and robust feature extraction by maximum margin criterion, IEEE Transactions on Neural Networks, Vol.17, No.1, pp.157-165, 2006. [18] W.H. Yang, D.Q. Dai. Two-dimensional maximum margin feature extraction for face recognition, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.39, No.4, pp.1002-1012, 2009. [19] Y. Cui, L. Fan. Feature extraction using fuzzy maximum margin criterion, Neurocomputing, Vol.86, pp.52-58, 2012. [20] D. Cai, X. He, K. Zhou, J. Han and H. Bao. Locality sensitive discriminant analysis, Proc. 20th Int’,l Joint Conf. Artificial Intelligence (IJCAI ’,07), Jan, pp. 708-713, 2007. [21] J. Lu. Enhanced locality sensitive discriminant analysis for image recognition, Electronics Letters, Vol.46, No.3, pp. 213-214, 2010. [22] Q. Gao, J. Liu, K. Cui, H. Zhang and X. Wang. Stable locality sensitive discriminant analysis for image recognition, Neural Networks, Vol.54, pp.49-56, 2014. [23] L.F. Chen, H.Y.M. Liao, M.T. Ko, J.C. Lin and G.J. Yu. A new LDAbased face recognition system which can solve the small sample size problem, Pattern Recognition, Vol.33, No.10, pp.1713-1726, 2000. [24] W.S. Chen, J. Huang, J. Zou and F. Bin. Wavelet-face based subspace LDA method to solve small sample size problem in face recognition, International Journal of Wavelets, Multiresolution and Information Processing, Vol.7, No.2, pp. 199-214, 2009.

22

[25] H. Huang, J. Li and H, Feng. Subspaces versus submanifolds: a comparative study in small sample size problem, International Journal of Pattern Recognition and Artificial Intelligence, Vol.23, No.3, pp.463-490, 2009. [26] H. Mohammadzade, D. Hatzinakos. Projection into expression subspaces for face recognition from single sample per person, IEEE Transactions on Affective Computing, Vol.4, No.1, pp.69-82, 2013. [27] Y. Song, F. Nie, C. Zhang, S. Xiang. A unified framework for semisupervised dimensionality reduction, Pattern Recognition, Vol. 41, No. 9, pp. 2789-2799, 2008. [28] H. Yu, J. Yang. A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognition, Vol.34, pp.20672070, 2001. [29] P.J. Phillips, H. Moon, P.J. Rauss and S. Rizvi. The FERET evaluation methodology for face-recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.10, pp.1090-1104, 2000. [30] R.M. Bolle, J.H. Connell, S. Pankanti, N.K. Ratha and A.W. Senior. The relation between the ROC curve and the CMC, Automatic Identification Advanced Technologies, Forth IEEE Workshop on, pp.15-20, 2005.

23