Face Recognition based on Fuzzy Linear Discriminant Analysis

Face Recognition based on Fuzzy Linear Discriminant Analysis

A Available onlinee at www.scienccedirect.com Available online at www.sciencedirect.com IERI Proceddia 00 (2012) 000–000 0 IERI Procedia 2 (2012) 873 ...

284KB Sizes 0 Downloads 96 Views

A Available onlinee at www.scienccedirect.com Available online at www.sciencedirect.com IERI Proceddia 00 (2012) 000–000 0 IERI Procedia 2 (2012) 873 – 879

2012 Internationnal Conferennce on Futuure Computeer Supportedd Educationn

Facce Recoognitioon baseed on Fuzzy F Lineaar D Discrim minantt Analysis Genyuann Zhang a,b b

a Wuuhan University of o Technology Zhejianng University of Media M and Comm munications, Hang gzhou, P.R. Chinaa

E Email:zgy871 [email protected]

Absttract Reecently, linear discriminant annalysis (LDA) was proposed to manifold leearning and paattern classificaation. LDA, a superrvised method, aims to find thhe optimal set of o projection veectors that max ximize the deterrminant of the bbetween-class scatter matrix and at a the same time minimise thee determinant off the within-claass scatter matrrix. But, since tthe dimension ndreds of samplles, an intrinsicc limitation of of veectors is high aand the number of observationns is small,usuaally tens or hun tradittional LDA is tthat it fails to work w when the within-class w scaatter matrix beco omes singular, which is knownn as the small sampple size (SSS) problems. p In thhe real-world appplications, thee performances of face recogn nition are alwayys affected by variaations in illuminnation conditions and differennt facial expresssions. In this sttudy, the fuzzy linear discrimiinant analysis (FLD DA) algorithm is proposed, inn which the fuuzzy k-nearest neighbor (FKN NN) is implem mented to reducce these outer effeccts to obtain thhe correct locaal distribution information too persuit good performance. In the proposeed method, a mem mbership degreee matrix is firrstly calculatedd using FKNN, then the mem mbership degrree is incorporated into the definnition of the Laaplacian scatter matrix to obtaain the fuzzy Laaplacian scatterr matrix. The optimal projections of FLDA can be obtained by b solving a generalised g eiggenfunction. E Experimental reesults on ORL L face databasses show the effecctiveness of the proposed methhod.

Publishedd by by Elsevier Elsevier B.V. B B.V. © 2012 20012 Published Seleection and and peer peer review review under undeerresponsibility responsibilittyof ofInformation InformattionEngineering EngineeringResearch ResearchInstitute Institute Selection Keyw words: LDA; Facee Recognition; FKNN; FLDA;

2212-6678 © 2012 Published by Elsevier B.V. Selection and peer review under responsibility of Information Engineering Research Institute doi:10.1016/j.ieri.2012.06.185

874 2

Genyuan Zhang / IERI Procedia (2012) 873 – 879 Author name / IERI Procedia 002(2012) 000–000

1. Introduction Techniques for dimensionality reduction in linear and nonlinear learning tasks have attracted much attention in the areas of pattern recognition and computer vision. Linear dimensionality reduction seeks to find a meaningful low dimensional subspace from a high-dimensional input space. The subspace can provide a compact representation of the input data. Two of the most fundamental linear dimensionality reduction methods are principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2].PCA aims to find a linear mapping, which preserves the total variance by maximising the trace of feature covariance matrix.The optimal projections of PCA are corresponding to the first k-largest eigenvalues of the data’s total covariance matrix.But, PCA may probably discard much useful information and weaken the recognition accuracy since the label information is not used. However, LDA, a supervised method, aims to find the optimal set of projection vectors that maximise the determinant of the between-class scatter matrix and at the same time minimise the determinant of the within-class scatter matrix. But, since the dimension of vectors is high and the number of observations is small,usually tens or hundreds of samples, an intrinsic limitation of traditional LDA is that it fails to work when the within-class scatter matrix becomes singular, which is known as the small sample size (SSS) problems. Recently, many manifold learning-based algorithms or locality preserving methods have been presented. Among them, isometric feature mapping [19], locally linear embedding [20, 21], Laplacian eigenmap (LE) [22, 23] and local tangent space alignment [24] are widely used. Recently, He and Niyogi [25] and He et al. [26] proposed locality preserving projections (LPP), which is a linear subspace learning method derived from LE. LPP can preserve the local information and best detect the essential face manifold structure. The optimal projection axes best preserve the local structure of the underlying distribution in the Euclidean space. From analysis they found that LPP is connected with PCA and LDA, LPP can find an embedding space that preserves local information; however, it is unsupervised method. Local discriminant embedding (LDE) was developed by Chen et al. [27], but the underlying ideas of LDE are almost the same as LPP except for using the label information. Focusing on manifold learning, LDE achieves good discriminating performance by integrating the information of neighbour and class relations between data points. LDE incorporates the class information into the construction of embedding and derives the embedding for nearest-neighbour classification in a lowdimensional space which learns the embedding for the sub-manifold of each class by solving an optimisation problem. Recently, Yan et al. [28] proposed a newly general framework called graph embedding for dimensionality reduction. In this paper, the fuzzy linear discriminant analysis (FLDA) algorithm is proposed, in which the fuzzy knearest neighbour (FKNN) is implemented to achieve the local distribution information of original samples. In the proposed method, a membership degree matrix is calculated using FKNN, then the membership degree is incorporated into the definition of the Laplacian scatter matrix to obtain the fuzzy Laplacian scatter matrices. Significantly, differing from the existing graph-based algorithms that two novel fuzzy neighbour graphs are constructed in FLDA, where it is important to maintain the original neighbour relations for neighbouring data points of the same class and also crucial to keep away neighbouring data points of different classes after the FLDA. Through the fuzzy neighbour graphs, FLDA algorithm has lower sensitivities to the sample variations caused by varying illumination, expression, viewing conditions and shapes. So, the class of a new test point can be more reliably predicted by the nearest-neighbour criterion, owing to the locally discriminating nature. The rest of the paper is structured as follows: In Section 2 we introduce LDA and FKNN. In Section 3, we propose the idea and describe FLDA in detail. Finally, we give conclusions in Section 5.

Genyuan Zhang / IERI Procedia 2 (2012) 873 – 879

Author name / IERI Procedia 00 (2012) 000–000

2. Outline of LDA and FKNN Fuzzy K-nearest neighbour [31] With FKNN algorithm, the computations of the membership degree can be realised through the following three steps: Step1: Compute the Euclidean distance matrix between pairs of feature vectors in training set. Step2: Set diagonal elements of this Euclidean distance matrix to infinity. Step3: Sort the distance matrix (treat each of its columns separately) in an ascending order. Collect the corresponding class labels of the patterns located in the closest neighbourhood of the pattern under consideration (as we are concerned with ‘k’ neighbours, this returns a list of ‘k’ integers). Compute the membership degree to class ‘i ’ for jth pattern using the expression proposed in the literature ��� ���� � ���� � � � � � � �K� ��� � u�� = � ���� � ���� ���� � � �K� ��� where u�� stands for the number of the neighbours of the jth data (pattern) that belong to the ith class. As usual, u�� satisfies two obvious properties ∑���� u�� =1 0< ∑���� u��


= ∑���V T x� � V T x� �� U�� W�� �

= ∑�� V T x� U�� W�� x� T v � ∑�� V T x� U�� W�� x� T v = V T XDF x T v � V T X�U�� W�x T v = V T XDF x T v � V T X� W F x T v = V T X�DF � W F �x T v = V T XLF x T v where X = [x1, x2,…,xN], and ‘.∗’ denotes the matrix element wise multiplication, the fuzzy diagonal matrix DF and the fuzzy Laplacian matrix LF of a graph G are defined as LF � � DF � W F , DF�� � ∑��� U�� W�� ,��

875 3

876 4

Genyuan Zhang / IERI Procedia 2 (2012) 873 – 879

Author name / IERI Procedia 00 (2012) 000–000

Fuzzy linear discriminant analysis Suppose there are c known pattern classes w1,w2,…,wc, where m is the total number of training samples, and � mi is the number of training samples in class i. In class i, the jth training sample is denoted by x� , and class label l� of x� (i= 1, . . . , N). Then, the proposed FLDA can be realised by the following four steps: � denote two (undirected) graphs both over all 1. Construct fuzzy neighbourhood graphs. Let G����� and G����� data points. To construct G����� : l� � l� �i� j� � �� G����� :if � i � NK� �j�or j � NK� �i� l� � l� � �i� j� � �� :if � For G����� i � NK� �j�or j � NK� �i�

NK� �j�or NK� �i� indicates the index set of the K nearest neighbours of the sample xi or xj. 2. Constructing the intraclass compactness fuzzy graph and separability fuzzy graph. ��� ���� � ���� � � � � i� l� � l� �n� i � NK� �j� � G G����� :U�� =� n ���� � � ���k� � �l�� G����� :U��G�

���

���� � ���� � � � � i� l� � l� �n� i � NK� �j� � =� n ���� � � ���k� � �l�� �

� �

∑���y� � y� �� U��G W��

= ∑���V T x� � V T x� �� U�� W�� �

= ∑�� V T x� U��G W�� x� T v � ∑�� V T x� U��G W�� x� T v = V T XDG����� x T v � V T X�U��G �� W�x T v G = V T XDG����� x T v � V T X� W����� xTv G G T T = V X�D����� � W����� �x v = V T XLG����� x T v

fuzzy and the fuzzy Laplacian matrix LG fuzzy of the graph Gfuzzy are defined as The other fuzzy affinity matrix WG fuzzy, the fuzzy diagonal matrix DG fuzzy and the fuzzy Laplacian matrix LG can be computed in the same way. Find the generalized eigenvectors v1, v2, . . . , vd that correspond to the d largest eigenvalues. 4. Experiments and results To evaluate the proposed FLDE algorithm, we compare it with the PCA, LDA and LDE algorithm in four databases: Wine dataset from UCI, ORL, Yale and AR face database.The Wine is synthetic data sets taken from UCI. It used to show the effectiveness of the proposed fuzzy neighbourhood graphs and test the

Genyuan 2 (2012) 873 –000–000 879 Author Zhang name // IERI IERIProcedia Procedia 00 (2012)

clustering performance of the proposed algorithm as a toy example. The ORL database was used to evaluate the performance of FLDE under conditions where the pose and sample size are varied.The Yale database was used to examine the system performance when both facial expressions and illumination are varied. The AR database was employed to test the performance of the system under conditions where there is a variation over time, in facial expressions, and in lighting conditions. After the projection matrix was computed from the training part, all the images including training part and the test part were projected to feature space. Euclidean distance and nearest neighbourhood classifier were used in all the experiments.The ORL database (http://www.uk.research.att.com/facedatabase.html) contains images from 40 individuals, each providing ten different images. The facial expressions and facial details (glasses or no glasses) also vary. The images were taken with a tolerance for some tilting and rotation of the face of up to 208. Moreover, there is also some variation in the scale of up to about 10%. All images normalised to a resolution of 56 × 46. Now, we test the recognition performances of the four methods: PCA, LDA, LDE and FLDE. In the experiments, l images (l varies from 2 to 6) are randomly selected from the image gallery of each individual to form the training sample set. The remaining 10 2 l images are used for testing. In the PCA phase of LDA, LDE and FLDE, we keep 90% image energy. For each l, we independently run 50 times. The top average recognition accuracy and the average CPU time(s) consumed of the four methods, which corresponds to different number of images per person used for training. The performance of FLDE is better than PCA, LDA and LDE. The values in parentheses denote the number of eigenvectors for the best recognition accuracy. From the ORL database, the performance of FLDE under conditions is better than other methods where the pose and sample size are varied. It is because that two novel fuzzy neighbour graphs help to maintain the original neighbour relations for neighbouring data points.The variation of accuracy along different number of eigenvectors used and the recognition accuracy when the four images per class are randomly selected for training. We can see that FLDE performs always better than the other three methods. The figures also demonstrate that the performance of the proposed method outperforms the other methods under the same conditions, it further shows that the proposed method can extract more discriminative features than the other methods. 5. Conclusions In this paper, we develop a supervised discriminant technique for feature extraction and recognition, called the FLDA, in which the FKNN is to achieve the local distribution information of original samples. We redefined the fuzzy Laplacian scatter matrix according to FKNN. This reduces the sensitivity of LDA to substantial variations between face images caused by varying illumination, viewing conditions and facial expression. By using the fuzzy membership, the influence of the outliers on feature extraction and final classification can be effectively reduced and the more effective discriminative features can be obtained than those by other algorithm (PCA, LDA and LDE). Experimental results on ORL face databases show the effectiveness of the proposed method. In the future, we will make more tests on other types of datasets and further improved the objective function of the proposed algorithm with kernel methods in the non-linear case. References [1]Turk, M., Pentland, A.P.P.: ‘Face recognition using eigenfaces’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’91), Maui, Hawaii, 3–6 June 1991, pp. 586–591 [2]Belhumeur, P.N., Hespanha, J.P., Kriengman, D.J.: ‘Eigenfaces vs.Fisherfaces: recognition using class specific linear projection’, IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19, (7), pp. 711–720 [3]Li, H., Jiang, T., Zhang, K.: ‘Efficient and robust feature extraction by maximum margin criterion’, IEEE Trans. Neural Netw., 2006, 17, (1), pp. 1157–1165 [4]Howland, P.,Wang, J., Park, H.: ‘Solving the small sample size problem in face recognition using

877 5

878 6

Genyuan Zhang / IERI Procedia 2 (2012) 873 – 879

Author name / IERI Procedia 00 (2012) 000–000

generalized discriminant analysis’, Pattern Recognit., 2006, 39, pp. 277–287 [5]Zheng, W., Zhao, L., Zou, C.: ‘Foley–Sammon optimal discriminant vectors using kernel approach’, IEEE Trans. Neural Netw., 2005, 16, (1), pp. 1–9 [6]Zheng, W., Zhao, L., Zou, C.: ‘An efficient algorithm to solve the small sample size problem for LDA’, Pattern Recognit., 2004, 37, pp. 1077–1079 [7]Friedman, J.H.: ‘Regularized discriminant analysis’, J. Am. Stat. Assoc., 1989, 84, (405), pp. 165–175 Golub, G.H., Van Loan, C.F.: ‘Matrix computations’ (The Johns Hopkins University Press, Baltimore, MD, 1996, 3rd edn.) [8]9 Belhumeour, P.N., Hespanha, J.P., Kriegman, D.J.: ‘Eigenfaces vs. Fisherfaces: recognition using class specific linear projection’, IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19, (7), pp. 711–720 Swets, D.L., Weng, J.: ‘Using discriminant eigenfeatures for image retrieval’, IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18, (8), pp. 831–836 [9]Howland, P., Jeon, M., Park, H.: ‘Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition’, SIAM J. Matrix Anal. Appl., 2003, 25, (1), pp. 165–179 Ye, J., Janardan, R., Park, C.H., Park, H.: ‘An optimization criterion for generalized discriminant analysis on undersampled problems’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (8), pp. 982–994 [10]Ye, J., Li, Q.: ‘A two-stage linear discriminant analysis via QRdecomposition’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (6), pp. 929–941 [11]Chen, L.-F., Hong-Yuan, X., Liao, M., Ko, M.-T., Lin, J.-C., Yu, G.-J.: ‘A new LDA-based face recognition system which can solve the small sample size problem’, Pattern Recognit., 2000, 33, pp. 1713– 1726 [12]Kirby, M., Sirovich, L.: ‘Application of the KL procedure for the characterization of human faces’, IEEE Trans. Pattern Anal. Mach. Intell., 1990, 12, (1), pp. 103–108 [13]Lee, J.M.: ‘Riemannian manifolds: an introduction to curvature’ (Springer, Berlin, 1997) [14Scholkopf, B., Smola, A., Muller, K.R.: ‘Nonlinear component analysis as a kernel eigenvalue problem’, Neural Comput., 1998, 10, (5), pp. 1299–1319 [15]Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Smola, A., Muller, K.-R.: ‘Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces’, IEEE Trans. Pattern Anal. Mach. Intell., 2003, 25, (5), pp. 623–628 [16]Tenenbaum, J.B., de Silva, V., Langford, J.C.: ‘A global geometric framework for nonlinear dimensionality reduction’, Science, 2000, 290, (5500), pp. 2319–2323 [1]Roweis, S.T., Saul, L.K.: ‘Nonlinear dimensionality reduction by locally linear embedding’, Science, 2000, 290, (5500), pp. 2323–2326 [17]Saul, L.K., Roweis, S.T.: ‘Think globally, fit locally: unsupervised learning of low dimensional manifolds’, J. Mach. Learn. Res., 2003, 12, (4), pp. 119–155 [18] Belkin, M., Niyogi, P.: ‘Laplacian eigenmaps and spectral techniques for embedding and clustering’. Proc. Conf. Advances in Neural Information Processing System, Vancouver, Canada, 3–6 December 2001, vol. 15 [19]Belkin,M., Niyogi, P.: ‘Laplacian eigenmaps for dimensionality reduction and data representation’, Neural Comput., 2003, 15, (6), pp. 1373–1396 [20]Zhang, Z ., Zha, H.: ‘Principal manifolds and nonlinear dimensionality reduction via tangent space alignment’, SIAM J. Sci. Comput., 2004, 26, (1), pp. 313–338 [21]He, X., Niyogi, P.: ‘Locality preserving projections’. Proc. 17th Annual Conf. Neural Information Processing Systems, Vancouver and Whistler, Canada, 8–13 December 2003 [22]He, X., Yan, S., Hu, Y., Niyogi, P., Zhang, H.-J.: ‘Face recognition using Laplacianfaces’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (3), pp. 328–340 [23]Chen, H., Chang, H., Liu, T.: ‘Local discriminant embedding and its variants’ (CVPR, 2005) [24]Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: ‘Graph embedding and extensions: a general

Genyuan Zhang / IERI Procedia 2 (2012) 873 – 879

Author name / IERI Procedia 00 (2012) 000–000

framework for dimensionality reduction’, IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2007, 29, (1), pp. 40–51 [25]Zadeh, L.A.: ‘Fuzzy sets’, Inf. Control, 1965, 8, pp. 338–353 [26]Bezdek, J.C., Keller, J., Krishnapuram, R.: ‘Fuzzy models and algorithms for pattern recognition and image processing’ (Kluwer Academic Publishers, Dordrecht, 1999) [27]Kw, K.C., Pedry, W.: ‘Face recognition using a fuzzy fisher classifier’, Pattern Recognit., 2005, 38, (10), pp. 1717–1732

879 7