Two-dimensional margin, similarity and variation embedding

Two-dimensional margin, similarity and variation embedding

Neurocomputing 86 (2012) 179–183 Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom ...

272KB Sizes 0 Downloads 63 Views

Neurocomputing 86 (2012) 179–183

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Letters

Two-dimensional margin, similarity and variation embedding Quanxue Gao a,b, Haijun Zhang a,n, Jingjing Liu a a b

School of Telecommunications Engineering, Xi Dian University, Xi’an, China State Key Laboratory of Integrated Services Networks, Xi Dian University, Xi’an, China

a r t i c l e i n f o

abstract

Article history: Received 10 September 2011 Received in revised form 26 December 2011 Accepted 19 January 2012 Communicated by J. Kwok Available online 23 February 2012

Previous works have demonstrated that manifold-based learning discriminant approaches can improve the face recognition accuracy. However, they ignore the variation among nearby face images from the same class, which is important to further improve the recognition accuracy and avoid the over-fitting problem in discriminant approaches. To avoid this problem, we propose a novel approach for face recognition. In our proposed approach, we construct two adjacency graphs to model the margin and information including similarity and variation of face images from the same class, respectively, and then incorporate the information and margin into the dimensionality reduction function. Experiments demonstrate the effectiveness of our approach. & 2012 Elsevier B.V. All rights reserved.

Keywords: Discriminant analysis Manifold learning Margin Similarity Variation Face recognition

1. Introduction Many previous studies have demonstrated that the face recognition performance can be improved significantly via manifold-based learning discriminant approaches [1–8], which are straightforward in preserving the intrinsic geometrical structure and discriminant structure of face images. The most prevalent approaches are MFA (Margin Fisher Analysis) [2], LDE (Local Discriminant Embedding) [3], LSDA (Locality Sensitive Discriminant Analysis) [5], and LFDA (Local fisher discriminant analysis) [6]. All of them use one adjacency graph to model the intrinsic geometrical structure of the manifold on which the data points possible reside, and another adjacency graph for the discriminant structure of data points. And then the discriminant structure and geometrical structure are incorporated into the dimensionality reduction objection for data classification. In this way, these approaches discover the discriminant structure and simultaneously preserve the intrinsic geometrical structure of the face space. However, the intrinsic geometrical structure characterizes only the similarity among nearby face images and ignores the variation among face images from the same class, which characterizes the diversity of pattern and is important for image classification [9]. It indicates that all of these discriminant approaches do not best preserve the information of face images,

n

Corresponding author. E-mail address: [email protected] (H. Zhang).

0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2012.01.023

and the generalization capability and robustness of these approaches are not good enough. To avoid these limitations, we propose a novel efficient approach, called two-dimensional margin and information embedding (2DMIE), for face recognition, which explicitly considers the margin and information including similarity and variation. To be specific, two adjacency graphs are constructed to model the local structure of face space, where one adjacency graph characterizes the information, i.e., similarity and variation, of the face images from the same class; another graph characterizes the margin among nearby face images from different classes. A low-dimensional face subspace is learned by joint margin and information which avoids transforming image matrices into vectors. In this way, our 2DMIE approach preserves information of face images and simultaneously detects the discriminant structure, so it is robust to the variations in lighting and facial expression.

2. 2DMIE 2.1. Discover the local information of the intra-class faces Given data matrix AT ¼ ½A1 T A2 T    AN T , where AiARm  n denotes the ith face image, N is the number of training images. In order to model the intrinsic geometrical structure of face images, we construct an adjacency graph Gg ¼{A,S,H}, over the training images. S and H are the weight matrices and characterize

180

Q. Gao et al. / Neurocomputing 86 (2012) 179–183

elements of weight matrix F can be defined as ( 1KðAi ,Aj Þ, if Ai A N k ðAj Þ or Aj A Nk ðAi Þ, and ti a tj F ij ¼ 0, otherwise

Fig. 1. Distribution of two-dimensional data points.

the similarity and variation among nearby face images belonging to the same class, respectively. The elements Sij in weight matrix S can be defined as [10] ( KðAi ,Aj Þ, if Ai A N k ðAj Þ or Aj A Nk ðAi Þ, and ti ¼ tj Sij ¼ ð1Þ 0, otherwise 2

where KðAi ,Aj Þ ¼ expð:Ai Aj :F =tÞ, t 40 is a suitable parameter, :U:F the Frobenius norm. ti denotes the class label of Ai, Nk(Ai) the set of k nearest neighbors of Ai. From the viewpoint of statistics, if two points Ai and Aj are very 2 close to each other, i.e. :Ai Aj :F is small, then the variation between them is also small and the similarity between them should be large. Take the points in Fig. 1 as an example. It is easy to see that, variation among points, which lie on the 3-nearest neighbors of point with rectangular shape, is small and similarity among them is large. On the contrary, if two points Ai and A j are 2 far apart, i.e. :Ai Aj :F is large, then the variation between them is large and the corresponding similarity should be small. For example, variation among nearby points, which lie on the 3-nereast neighbors with triangle shape, is large and the corresponding similarity is small. Thus, the elements of weight matrix H can be defined as follows: ( 1KðAi ,Aj Þ, if Ai A Nk ðAj Þ or Aj A Nk ðAi Þ, and ti ¼ tj Hij ¼ ð2Þ 0, otherwise Now consider the problem of mapping the face images to onedimensional space so that the information among nearby face images from the same class can be preserved. Suppose Yi is a map of face image Ai ði ¼ 1,2,    ,NÞ, a reasonable map is to optimize the following objective functions: X 2 min :Y i Y j : Sij

ð3Þ

ij

X 2 max :Y i Y j : Hij

ð4Þ

ij

The objective function (3) incurs a heavy penalty if two points Ai and Aj, which are close to each other, are mapped far apart while they are actually in the same class. Likewise, the objective function (4) incurs a heavy penalty if neighboring points sharing the same label are mapped very close, such as a single point. Therefore, minimizing (3) is an attempt to ensure that if Ai and Aj are close and sharing the same label then Yi and Yj are close as well. Thus, similarity of the intra-class face images is preserved. Maximizing (4) is an attempt to overcome the over-fitting problem in the objective function (3) and simultaneously ensure that the intra-class variation can be also preserved.

2.2. Discover the margin of the inter-class images Let Gd ¼{A,F} be an adjacency graph over the training images. F is a weight matrix and characterizes the variation, also called margin, among nearby images from different classes. The

ð5Þ

In order to detect the discriminant structure, we consider the problem of mapping face images to one-dimensional space so that the nearby face images of adjacency graph Gd stay as distant as possible. Suppose Yi is a map of face image matrix Ai ði ¼ 1, 2,    ,NÞ, a reasonable map is to optimize the following objective function: X 2 ð6Þ max :Y i Y j : F ij ij

The objective function (6) incurs a heavy penalty if neighboring points Ai and Aj are mapped close together while they actually belong to different classes. So, maximizing (6) is an attempt to ensure that if Ai and Aj are close but have different labels then Yi and Yj are far apart. Thus, the margin among nearby face images having different classes is maximized. 2.3. Optimal linear embedding Suppose x is a projection vector, substituting Yi ¼Aix into the objective function (3), we see that X X 2 :Y i Y j : Sij ¼ ðY i Y j ÞT ðY i Y j ÞSij ij

ij

X ¼ 2 ðAi xAj xÞT ðAi xAj xÞSij ij

¼ 2x

0

T@

X

T

1 X T Ai Sij Aj Ax

Ai Dsii Ai 

i

ij

¼ 2xT ðAT ðDs  Im ÞAAT ðS  Im ÞAÞx ¼ 2xT AT ðLs  Im ÞAx

ð7Þ

where Ds is a diagonal matrix whose elements on diagonal are column (or row since S is a symmetric matrix) sum of S, i.e. P Dsii ¼ j Sij . Ls ¼Ds  S is the Laplacian matrix of Gg. Similarly, substituting Yi ¼ Aix into the objective function (4), we see that X 2 :Y i Y j : Hij ¼ 2xT AT ðLv  Im ÞAx ð8Þ ij

where Dv is a diagonal matrix whose elements on diagonal are column (or row since H is a symmetric matrix) sum of H, i.e. P Dvii ¼ j Hij . Lv ¼Dv  H. Likewise, substituting Yi ¼Aix into the objective function (6), we see that X 2 :Y i Y j : F ij ¼ 2xT AT ðLw  Im ÞAx ð9Þ ij

where Dw is a diagonal matrix whose elements on diagonal are column (or row since F is a symmetric matrix) sum of F, i.e. P w Dw ii ¼ j F ij . Lw ¼D  F is the Laplacian matrix of Gd. Substituting (7), (8), and (9) into objective functions (3), (4), and (6), respectively. By simple algebraic steps, the optimal objective function can be rewritten as follows: xn ¼ arg max xT AT ððLw þLv Ls Þ  Im ÞAx

ð10Þ

xT x ¼ 1

Obviously, the objective function (10) considers that the local information, i.e. similarity and variation, of the intra-class images and margin of the inter-class images are equally important in learning the optimal projective vector. This reduces the flexibility of the algorithm. The intra-class variation plays an important role in avoiding the over-fitting problem caused by the objective function (3) and

Q. Gao et al. / Neurocomputing 86 (2012) 179–183

181

in improving the generalization capability of the algorithm. Thus, we should assign a large weight to Lv and a small weight to Lw. Finally, the optimal objective function becomes

Let V ¼ ½j1 j    jd  denote the projection matrix of 2DPCA. Projecting training face images Ais onto V and W together, yielding the d by l feature matrices

xn ¼ arg max xT AT ðððaLw þ ð1aÞLv ÞLs Þ  Im ÞAx

Z i ¼ V T Ai W

i ¼ 1,    ,N

ð15Þ

xT x ¼ 1 n

¼ arg max xT AT ððLd Ls Þ  Im ÞAx xT x

ð11Þ

¼1

where Ld ¼aLw þ(1  a)Lv. The parameter a controls the balance between the variation embedded in the intraclass data and discriminating information of data. If a is set 1, it means that the variation embedded in the intraclass data is neglected, which will impair the generalization ability and stableness of the algorithm. If a is set 0.5, it indicates that both the discriminating information and variation embedded in the intraclass data are equally important. In real-world applications, the distance among nearby images from different classes, which characterizes the discriminating information of data, is usually very larger than the distance, which characterizes the variation embedded in the intraclass data, among nearby images from the same class. In order to emphasize the role of the interclass variation, i.e., improving the generalization ability and stableness of the algorithm, we should select the value of parameter a within the interval (0,0.5). In the following experiments, a is set 0.06. The optimal projection vector x that maximizes the (11) is given by the maximum eigenvalue solution to the generalized eigenvalue problem: AT ððLd Ls Þ  Im ÞAx ¼ lx

ð12Þ

Let the column vectors x1 ,x2 ,    ,xl be the solutions of Eq. (12), ordered according to their eigenvalues, l1 4 l2 4    4 ll . For arbitrary face image AARm  n, the embedding is as follows: A-Y ¼ AW,

W ¼ ½x1 x2    xl 

ð13Þ

After obtaining the optimal projection matrix W, the features can be obtained by projecting the images onto W. Denote Yj and Yn by the projected features of Aj ðj ¼ 1,    ,NÞ and probe image An, respectively. Classification can then be realized via the dissimilarity between Yj and Yn which can be defined as dðY n ,Y j Þ ¼

l X

:yni yji :2

ð14Þ

i¼1

where yni and yji denote the i-th column of Ynand Yj, respectively. If dðY n ,Y p Þ ¼ min dðY n ,Y j Þ, then the classification decision is that An j and Ap belong to the same class.

3. 2DMIEþ2DPCA Previous works have demonstrated that the low-dimensional representations Y i ði ¼ 1,    ,NÞ contain redundancy [11,12], which will impair the recognition accuracy. Moreover, the size of Yi is still large, which increases the requirement to save these lowdimensional representations. 2DPCA (Two-dimensional Principal Component Analysis) [12] is one of the prevalent methods to remove the redundancy embedded in features. Motivated by 2DPCA, we introduce an effective approach, called 2DMLVþ 2DPCA, to further reduce the dimensionality of features. To be specific, we perform 2DPCA on the column vectors of Yi. Suppose we have obtained the n by l projection matrix W ¼ ½x1 x2    xl  of 2DMIE and the low-dimensional representaP tions Y i ¼ Ai Wði ¼ 1,    ,NÞ. Let Y ¼ 1=N i Ai denote the mean of low-dimensional representations. The optimal projection vectors j1 , j2 ,    , jd of 2DPCA [12] arePthe d leading eigenvectors of the N T image covariance matrix 1=N i ¼ 1 ðY i YÞðY i YÞ .

Given a test face image A , first use Eq. (15) to get the feature matrix Zn ¼VTAnW, then a nearest classifier is used for classification. Here the distance between Zi and Zn is defined by dðZ i ,Z n Þ ¼

d X

:Z q i Z q n :2

ð16Þ

q¼1

where Z nq and Z iq denote the q-th row of Zn and Zi, respectively.

4. Experiments In this section, we evaluate the performance of our approaches 2DMIE and 2DMIEþ2DPCA on Yale and AR face databases and compare with some prevalent two-dimensional discriminant approaches including 2DLDA [13], 2DMMC [14], 2DELDA (Twodimensional counterpart of ELDA [15]),and 2DMFA [2]. We empirically set the parameters k and t in all experiments for 2DMFA and our approaches. Moreover, parameter a is set to be 0.06 for our approaches unless otherwise specified. The Yale face database (http://cvc.yale.edu/projects/yalefaces/ yalefaces.html). contains 165 images of 15 individuals (each person providing 11 different images) under various facial expressions and lighting conditions. In our experiments, each image is manually cropped and resized to 32  32 pixels [10]. In experiments, the first 3, 6, and 9 images per person are selected as training, respectively, and the corresponding rest images for testing. Thus, we have three experiments. The recognition results of the six approaches are shown in Table 1. Fig. 2 shows a plot of recognition accuracy vs. number of project vector when the 6 images per person are selected for training. The AR face database (http://rvll.ecn.purdue.edu/~aleix/aleix_ face_ DB.html). contains over 4000 color face images of 126 people (70 men and 56 women) including frontal views of faces with different facial expressions, lighting conditions and occlusions. The pictures of most persons were taken in two sessions (separated by two weeks). Each session contained 13 color images and 120 individuals participated in both sessions. The facial portion of each image is manually cropped and then normalized to the size of 50  40 [11]. In the experiments, the images from the first session without occlusions are selected for training, and the corresponding images from the second session are selected for testing. Table 2 shows the top recognition accuracy of six approaches and the corresponding number of features. Fig. 3 shows a plot of recognition accuracy of six approaches vs. number of project vectors. From Tables 1and 2, Figs. 2 and 3, it is easy to see that, first, our 2DMIE approach significantly outperforms 2DLDA, 2DELDA, Table 1 The top recognition accuracy (%) of the six approaches and the corresponding dimension (shown in parentheses) on the Yale database. Training/ Testing numbers

2DLDA 2DELDA 2DMMC 2DMFA 2DMIE 2DMIE þ2DPCA

45/120

56.67 (32n3) 77.33 (32n4) 86.67 (32n3)

90/75 135/30

54.17 (32n2) 73.33 (32n4) 90.00 (32n3)

64.17 (32n4) 77.33 (32n3) 93.33 (32n2)

52.50 (32n3) 74.67 (32n2) 93.33 (32n3)

65.00 (32n3) 82.67 (32n4) 96.67 (32n2)

72.50 (7n5) 81.33 (6n4) 100.00 (5n4)

182

Q. Gao et al. / Neurocomputing 86 (2012) 179–183

0.9

structure, which characterizes the similarity and variation of face images, both are important for face recognition. Second, compared with 2DMIE approach, our 2DMIE þ2DPCA approach significantly improves the face recognition performance. This is probably because that 2DMIE ignores the relationships among pixels in column direction of images, which impairs the recognition accuracy. Moreover, the low-dimensional representations obtained by 2DMIE contain much redundancy, while 2DPCA can efficiently reduces the redundancy embedded in features and explicitly considers the relationships among pixels. Thus, 2DMIEþ2DPCA approach can obtain good recognition accuracy with the small number of features.

0.85 0.8

Recognition Accuracy

0.75 0.7 0.65 0.6 2DLDA 2DELDA 2DMMC 2DMFA 2DMIE 2DMIE+2DPCA

0.55 0.5 0.45

5. Conclusion

0.4 0

5

10

15

Number of Project Vector Fig. 2. The recognition accuracy vs. number of project vectors.

Table 2 The top classification accuracy (%) of the six approaches and the corresponding number of features on the AR database. Methods

2DLDA 2DELDA 2DMMC 2DMFA 2DMIE 2DMIEþ 2DPCA

Recognition accuracy Dimension

58.57

58.33

67.50

62.14

67.62

69.40

50n26

50n18

50n13

50n16

50n11

17n10

Acknowledgment

0.7

We would like to thank the anonymous reviewers for their constructive comments and suggestions. This work is supported by the National Science Foundation of China under Grant no. 60802075, State Key Laboratory Integrated Services Networks, Xi Dian University, China, and the Open Project Program of the State Key Laboratory of CAD&CG, Zhejiang University, China. This work is also part supported by 111 Project of China (B08038) and the Fundamental Research Funds for the Central Universities of China.

0.65 0.6 Recognition Accuracy

In this letter, we propose a novel approach, called twodimensional margin and information embedding (2DMIE) for face recognition. Different from the existing discriminant approaches, our 2DMIE approach explicitly considers the information including variation and similarity among of the intra-class face images, and the margin among nearby face images from different classes, and then incorporates the information and margin into the dimensionality reduction objective function. In this way, our 2DMIE approach detects the discriminant structure and simultaneous preserves the information of images. Furthermore, we present 2DMIEþ2DPCA algorithm to further reduce the redundancy of features and efficiently improve the recognition accuracy. Experiment results on Yale and AR databases demonstrate the effectiveness of our approaches.

0.55 0.5 0.45 0.4

2DLDA 2DELDA 2DMMC 2DMFA 2DMIE 2DMIE+2DPCA

0.35 0.3

References

0.25 0

2

4

6

8

10

12

14

16

18

20

Number of Project Vector

Fig. 3. The recognition accuracy vs. number of project vectors.

2DMMC, and 2DMFA approaches although all of these approaches detect the discriminant structure of face images. The main reason may be that these four approaches ignore the variation of face images from the same class, which is important to avoid the overfitting problem in these four approaches and improve the generalization capability on testing images. Different from these four approaches, our 2DMIE approach employs an adjacency graph to model the intrinsic geometric structure, which explicitly characterizes the variation and similarity of face images, and then incorporates information, i.e. variation and similarity, and margin into the dimensionality reduction objective function for face recognition. It indicates that discriminant structure and intrinsic

[1] S. Yan, H. Zhang, Y. Hu, B. Zhang, Q. Cheng, Discriminant analysis on embedded manifold, In: Proceedings of the 8th European Conference on Computer Vision, 2004. [2] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 40–51. [3] H.T. Chen, H.W. Chang, T.L. Liu, Local discriminant embedding and its variants, In: Proceedings of the Computer Visual and Pattern Recognition, 2005. [4] M. Wan, Z. Lai, J. Shao, Z. Jin, Two-dimensional local graph embedding discriminant analysis (2DLGEDA) with its application to face and palm biometrics, Neurocomputing 73 (2009) 193–203. [5] D. Cai, X. He, K. Zhou, J. Han, H. Bao, Locality sensitive discriminant analysis, In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007. [6] M. Sugiyama, Local fisher discriminant analysis for supervised dimensionality reduction, In: Proceedings of the International Conference on Machine Learning, 2006. [7] Y. Xu, G. Feng, Y. Zhao, One improvement to two-dimensional locality preserving projection method for use with face recognition, Neurocomputing 73 (2009) 245–249. [8] Z. Lai, M. Wan, Z. Jin, J. Yang, Sparse two-dimensional local discriminant projections for feature extraction, Neurocomputing 74 (2011) 629–637. [9] Q. Gao, H. Xu, Y. Li, D. Xie, Two-dimensional supervised similarity and diversity projection, Pattern Recognition 43 (10) (2010) 3359–3363.

Q. Gao et al. / Neurocomputing 86 (2012) 179–183

[10] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340. [11] Q. Gao, L. Zhang, D. Zhang, Sequential row-column independent component analysis for face recognition, Neurocomputing 72 (2009) 1152–1159. [12] J. Yang, D. Zhang, A.F. Frangi, J.Y. Yang, Two-dimensional PCA: A new approach to appearance-based face representation and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (1) (2004) 131–137. [13] J. Ye, R. Janardan, Q. Li, Two-dimensional linear discriminant analysis, In Advances in Neural Information Processing Systems, MIT Press, 2005. [14] W.H. Yang, D.Q. Dai, Two-dimensional maximum margin feature extraction for face recognition, IEEE Trans. Syst. Man Cybern. Part B 39 (4) (2009) 1002–1012. [15] C. Liu, H. Wechsler, Enhanced fisher linear discriminant models for face recognition, In: Proceedings of the International Conference on Pattern Recognition, 1998.

Quan-xue Gao received the Ph.D. degree from the Northwestern Polytechnical University, Xi’ an, PR China, in 2005. From October 2006 to November 2007, he was a research associate in The Hong Kong Polytechnic University. He is currently an associate professor at XI DIAN University, Xi’an PR China. His research interests include face recognition, machine learning, sparse representation, and statistical pattern recognition.

183

Hai-jun Zhang received the B.S. degree from Xi Dian University, Xi’an, China, in 2009. He is currently a Master student at Xidian University. His research interests include dimensionality reduction, manifold learning, and face recognition.

Jing-jing Liu received the B.S. degree in communication engineering from Hebei University of Science and Technology, Shijiazhuang, China in 2010. She is currently working toward the M.E. degree from Xi Dian University, Xi’an, China. Her research interests include manifold learning, dimensionality reduction and classification.