Supervised locality pursuit embedding for pattern classification

Supervised locality pursuit embedding for pattern classification

Image and Vision Computing 24 (2006) 819–826 www.elsevier.com/locate/imavis Supervised locality pursuit embedding for pattern classification Zhonglon...

813KB Sizes 0 Downloads 84 Views

Image and Vision Computing 24 (2006) 819–826 www.elsevier.com/locate/imavis

Supervised locality pursuit embedding for pattern classification Zhonglong Zheng a,b,*, Jie Yang b a

Institute of Information Science and Engineering, P.O. Box 143, Zhejiang Normal University, Jinhua 321004, Zhejiang, People’s Republic of China b Institute of Image and Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, People’s Republic of China Received 5 April 2005; received in revised form 17 November 2005; accepted 16 February 2006

Abstract In pattern recognition research, dimensionality reduction techniques are widely used since it may be difficult to recognize multidimensional data especially if the number of samples in a data set is not large comparing with the dimensionality of data space. Locality pursuit embedding (LPE) is a recently proposed method for unsupervised linear dimensionality reduction. LPE seeks to preserve the local structure, which is usually more significant than the global structure preserved by principal component analysis (PCA) and linear discriminant analysis (LDA). In this paper, we investigate its extension, called supervised locality pursuit embedding (SLPE), using class labels of data points to enhance its discriminant power in their mapping into a low dimensional space. We compare the proposed SLPE approach with traditional LPE, PCA and LDA methods on real-world data sets including handwritten digits, character data set and face images. Experimental results demonstrate that SLPE is superior to other three methods in terms of recognition accuracy. q 2006 Elsevier B.V. All rights reserved. Keywords: Dimensionality reduction; Principal component analysis; Linear discriminant analysis; Locality pursuit embedding; Supervised learning methods

1. Introduction In pattern recognition task, raw data acquired by sensors like cameras or scanners are often served as the input module to a recognition system as they are or after simple preprocessing. Being straightforward, the drawbacks of such approach are: a large data dimensionality makes recognition difficult and timeconsuming on one hand; and the effect known as the curse of dimensionality unavoidably lowers the accuracy rate on the other hand. Therefore, a kind of dimensionality reduction is needed to eliminate unfavorable consequences of using multidimensional data for recognition. The underlying structure of multidimensional data can be characterized by a small number of parameters in many cases. It is also important to reduce the dimensionality of such data for visualizing the intrinsic structure. In the past decades, there have been many methods proposed for dimensionality reduction [1–7,15]. Two canonical forms of them are principal component analysis (PCA) and * Corresponding author. Tel.: C86 579 2282155; fax: C86 579 2298229. E-mail address: [email protected] (Z. Zheng), [email protected] (Z. Zheng).

0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2006.02.007

multidimensional scaling (MDS). Both of them are eigenvector methods aimed at modeling linear variability in the multidimensional space. PCA computes the linear projections of the greatest variance from the top eigenvectors of the data covariance matrix. MDS, however, computes the low dimensional embedding that best preserves pair wise distances between data points. The results of MDS will be equivalent to PCA if the similarity is Euclidean distance. Both methods are simple to implement and not prone to local minima. However, for the data on a nonlinear sub-manifold embedded in the feature space, the results given by PCA preserve only the global structure. In many cases, local structure is emphasized especially when using nearest neighbor classifier. Locally linear embedding [6,7] and Laplacian Eigenmap [3] are nonlinear local approaches proposed recently to discover the nonlinear structure of the manifold. The essence of the two methods is to map nearby points on a manifold to nearby points in a low dimensional space. Isomap [5] is a nonlinear global approach based on MDS and seeks to preserve the intrinsic geometry of the data. These nonlinear methods have achieved impressive results both on some benchmark artificial datasets and some real applications [8,13]. Nevertheless, the nonlinearity makes them computationally expensive. In addition, the mappings derived from them are defined on the training set and how to evaluate a novel test data remains unclear.

820

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

Recently, an unsupervised linear dimensionality reduction method, locality pursuit embedding (LPE), was proposed and applied to real datasets [9–12,26]. LPE aims to preserve the local structure of the multidimensional structure instead of global structure preserved by PCA. In addition, LPE shares some similar properties compared with LLE such as a locality preserving character. However, their objective functions are totally different. LPE is the optimal linear approximation to the eigenfunctions of the Laplace Beltrami operator on the manifold [26]. In this paper, we describe a supervised variant of LPE, called the supervised locality pursuit embedding (SLPE) algorithm. Unlike LPE, SLPE projects high dimensional data to the embedded low space taking class membership relations into account. This allows obtaining well-separated clusters in the embedded space. It is worthwhile to highlight the discriminant power of SLPE by using class information besides inheriting the properties of LPE. Therefore, SLPE demonstrates powerful recognition performance when applied to some pattern recognition tasks. The rest of the paper is organized as follows: Section 2 describes locality pursuit embedding versus PCA and LDA. The proposed supervised locality pursuit embedding is described in Section 3. In Section 4, we apply SLPE to some real datasets including handwritten digits, character dataset and face datasets to test its performance compared with PCA, LDA (linear discriminant analysis) and LPE. Finally, we provide some concluding remarks and suggestions for future work in Section 5.

2. LPE versus PCA and LDA More formally, let us consider a set of M sample images taking values in an n-dimensional image space XZ {x1,x2,.,xM}, and assume that each image belongs to one of c classes {C1,C2,.,Cc}. PCA seeks a linear transformation mapping the original n-dimensional image space into an r-dimensional feature space, where r!n. Then the transformed new feature vectors yk 2Rr are defined as follows: yk Z W T xk ;

k Z 1;2;.;M

M X

ðxk KmÞðxk KmÞT

(2)

kZ1

where M is the number of sample images and m is the mean image of all samples. Then the scatter of the transformed feature vectors YZ{y1,y2,.,yM} is WTSTW. The optimal projection WPCA of PCA is selected to maximize the determinant of the total scatter matrix of the transformed samples [14], i.e. WPCA Z arg max jW T ST Wj W

SB Z

c X

Ni ðmi KmÞðmi KmÞT

(3)

While PCA searches for efficient directions of representation, linear discriminant analysis (LDA) seeks efficient direction of

(4)

iZ1

SW Z

c X X

ðxk Kmi Þðxk Kmi ÞT

(5)

iZ1 xk2xi

where Ni is the number of samples in class Xi, and mi is the mean image of class Xi. The optimal projection WLDA of LDA is chosen as the matrix with orthonormal columns, which maximizes the ratio of the determinant of the between-class scatter matrix of the projected samples to the determinant of the within-class scatter matrix of the projected samples, i.e. WLDA Z arg max W

jW T SB Wj jW T SW Wj

(6)

We can conclude that both PCA and LDA aim to preserve the global structure. In fact, the local structure is more important in many real cases. Locality pursuit embedding (LPE) is a linear approximation algorithm of the nonlinear Laplacian Eigenmap for learning a locality preserving subspace [26]. LPE aims to preserve the intrinsic geometry of the data and local structure. The objective function of LPE is defined as X min jjyi Kyj jj2 Sij (7) ij

where S is a symmetry similarity measure matrix. A possible way of defining such S is [9] ( expðKjjxi Kxj jj2 =tÞ; jjxi Kxj jj2 ! 3 Sij Z (8) 0; otherwise or Sij Z

8 <

expðKjjxi Kxj jj2 =tÞ;

:

0;

(1)

The total scatter matrix of original sample images ST is defined as ST Z

discrimination. The between-class scatter matrix and the withinclass scatter matrix are defined as follows, respectively

if xi is among k nearestneighborsof xj or xj is among k nearestneighborsof xi otherwise (9)

where 3O0 defines the radius of the local neighborhood. Here, the selection of 3 is somewhat like that of in LLE algorithm [6]. The imposed constraint in [9] is proposed as yTDyZ1. Finally, the minimization problem reduces to the following form arg min W T XLX T W W

with

W T XDX T W Z 1

(10)

P where D is a diagonal matrix, Dii Z j Sji . And LZDKS is the Laplacian matrix [3]. Then the transformation vector WLPP is determined by the minimum eigenvalue solution to the generalize eigenvalue problem: XLX T W Z lXDX T W

(11)

For more detailed information about LPE, see [9–11,26].

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

3. Supervised locality pursuit embedding From the above analysis, both PCA and LPE are unsupervised learning methods. They do not take the class membership relation into account. Imprecisely speaking, one of the differences between them lies in the global or local preserving property. The locality preserving property leads to the fact that LPE outperforms PCA in [9–12,26]. While PCA and LDA are all global methods, LDA utilizes the class information to enhance its discriminant ability. That is the reason why LDA outperforms PCA in [14,16–19,27,28]. To some extent, we can say that both locality preserving property and discriminant ability are significant in learning a new feature subspace. It is our motivation to combine locality preserving property with discriminant information to enhance the performance of LPE in pattern analysis. Being unsupervised, the original LPE does not make use of class membership relation of each point to be projected. To complement the original LPE, a supervised LPE has been proposed in the paper, called SLPE. The name implies that membership information influences on which points are included in the neighborhood of each point. That is to say, SLPE employs prior information about the original dataset to perform learning the feature subspace [29]. The essence of SLPE algorithm is how to choose the similarity measure matrix S in Eq. (7). In LPE, S is only related to the 3 neighborhood or the k nearest neighbors. In other words, the selection of S is independent of class information though the samples in 3 neighborhood may belong to the same class, but not sure. Due to the locality preserving property of LPE, the points corresponding to samples would be close to each other in the reduced space if they are in 3 neighborhood, though they may belong to different classes. This will result in an unfavorable situation in pattern analysis especially in classification problem. Let us rearrange the order of samples in the original dataset, which will not affect the procedure of the algorithm. Suppose that the first M1 columns of X are occupied by the data of the first class, the next M2 columns are composed of the second class, etc., i.e. data of a certain class are compactly stored in X. This step is of benefit to simplify the explanation of SLPE algorithm. As a consequence, X is changed to be an orderly matrix which is composed of submatrices Ai of size n!Mi, iZ1,2,.,c, where c is the number of classes. Then the nearest neighbors for each xj2A1 are sought in A1 only. When applied to all xj0 s 2A1 , the procedure leads to a construction of the matrix B1. In the same manner, B2,.,Bc are generated by repeating the same process. When obtained B1,.,Bc, the similarity measure matrix S is constructed by taking Bi as its diagonal structural elements, i.e. 2 6 6 S Z6 4

3

B1

7 7 7 5

B2 / Bc

(12)

821

To simplify the similarity computation between different points, we just set the weight equal to 1 if the two points belong to the same class. Therefore, Bi takes the following form:

We can find out that S is a symmetric matrix since each structural element Bi is symmetric. The algorithmic procedure of SLPE is then stated as: (1) Order rearrangement. As we have described in paragraph 4 of this section, samples of a certain class are stored compactly in original samples matrix X after order rearrangement. (2) PCA projection. For the sake of avoiding singularity of XDXT and of reducing noise, we project X to its PCA subspace [12]. X is still used to denote samples in PCA space, and the transformation matrix is denoted by WPCA. (3) Computing similarity measure matrix S. Because we have finished the order rearrangement of samples, Bi in Eq. (13) is easily computed. Then S is constructed by taking Bi as its diagonal structural elements. (4) Eigenmap. Solve the generalized eigenvector problem: XLX T W Z lXDX T W

(14)

Then the final transformation matrix from original sample space to the embedded feature space is WSLPP Z WPCA W

(15)

where W is the solution of Eq. (14). At a first glance, it is hard to evaluate the influence of the changed selection for similarity matrix S on the final feature space. Although the change is small, the influence is significant because S is used to calculate another matrix W, which determines embedded coordinates. As a result, the reduced feature spaces obtained with unsupervised and supervised LPE are different. To visualize the differences, SLPE and LPE were applied to a small database from Simon Lucas, together with PCA and LDA. This database is composed of digits ‘0’–‘9’ and capital ‘A’–‘Z’. The size of images in the database is 20!16. We just took ‘A’–‘Z’ out of the database and randomly selected 13 samples for each class. Using PCA, LDA, LPE and SLPE algorithms, we projected the samples to the two-dimensional embedded space. The visualization results are shown in Fig. 1. To be convenient, the description of ‘V–Z’ is absent from the right column of each figure though the corresponding samples are pictured using different color or symbols. When projected to two-dimensional space using our SLPE algorithm, the samples belonging to the same class construct a nearly compact set. This is interesting and exciting! However, PCA, LDA and LPE projections do not demonstrate such excellent

822

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

Fig. 1. Part of samples in the data set (a) and two-dimensional projections achieved by PCA (b), LDA (c), LPE (d), and SLPE (e).

performances. Many samples are mixed up when projected to two-dimensional space. 4. Experimental results In the Section 3, the results shown in Fig. 1 indicate that SLPE can have more discriminant power than PCA, LDA and LPE. In this section, several experiments are carried out on different datasets to show the accuracy of our proposed SLPE for pattern classification.

each class in the training set. These 2000 images are used to learn the linear embedding space. In addition, we also randomly selected 100 samples for each class in the testing set. These 1000 images were used to verify the effectiveness of the reduced feature subspace. The random selection for training and testing experiments are repeated for 10 times, and the average is reported. It should be noted that Eq. (13) is used to construct the similarity matrix S instead of Eq. (8) or (9). When

4.1. Handwritten digits dataset The dataset used in this part is a subset of the United States Postal Services (USPS) digit set, a set of representative samples of which is shown in Fig. 2. Each sample in the dataset is a 16!16 image with each pixel represented by a gray level in the range [K1,1]. USPS dataset is divided into training and testing set at source. However, in our case we have reduced the training set such that no bias toward any one particular digits. In the experiment, we randomly selected 200 samples for

Fig. 2. Part of samples of USPS.

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826 Table 1 Recognition comparison on USPS Algorithm

Recognition rate (%)

Dimension

PCA LDA LPE SLPE

85.1 86.5 86.6 92.1

27 9 24 10

823

the testing images were projected to the feature subspace, the new samples were identified by a nearest neighbor classifier with Euclidean distance measure just for its simplicity. The recognition rates are listed in Table 1. It is found that the SLPE method significantly outperforms the other three methods. The recognition rates reach the best results with 27, 9, 24, 10 dimensions for PCA, LDA, LPE and SLPE, respectively. Fig. 3 shows the plots of recognition rates versus dimensionality reduction. 4.2. ORL face dataset

Fig. 3. Recognition rates versus dimensionality reduction on USPS.

ORL face data set consists of 400 facial images of 40 people. Each entry is resized to 56!46 image with each pixel represented by gray level in the range [0, 255] in the following experiments. There are 10 images for each person with different pose, expression and slight light variation. Parts of images in ORL are shown in Fig. 4. In terms of theoretical analysis, different method (PCA, LDA, LPE, and SLPE) would result in different embedding feature space. In fact, the basis of the feature space is the eigenvectors of the corresponding method. Furthermore, we can display these eigenvectors as images. Because these images look like human faces, they are often called Eigenfaces, Fisherfaces, Laplacianfaces [10,14,20]. For the eigenvectors of our SLPE, it may be called S-Laplacianfaces. Using the ORL face data set as the training

Fig. 4. Part of face images of ORL data set.

Fig. 5. Different ‘faces’ obtained by PCA, LDA, LPE, and SLPE, respectively.

824

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

Table 2 Recognition comparison on ORL Algorithm

Recognition rate (%)

Dimension

PCA LDA LPE SLPE

81.5 81.5 83.5 87.0

70 39 70 44

dimensions for PCA, LDA, LPE and SLPE, respectively. The best recognition rates are shown in Table 2 together with the corresponding dimensions. Fig. 6 illustrates the recognition rates versus dimensionality reduction. The ORL face dataset is the so called ‘job well done’ for face recognition: no occlusion, slight illumination variation and no exaggerated facial expression. As a result, the four methods achieved pretty good recognition accuracy. However, SLPE still outperforms the other three at least 4%. The following experiment was carried on a more complex face data set to test their performances. 4.3. AR face dataset

Fig. 6. Recognition rates versus dimensionality reduction on ORL.

set, we present the first 10 S-Laplacianfaces in Fig. 5, together with the other three kinds of ‘faces’. We just randomly select five images for each subject, i.e. 200 images are selected for training, and the rest 200 images for testing. The random selection for training and testing experiments are repeated for 10 times, and the average is reported. We concatenated all columns of an image into a single column vector for each one in the training set, no other preprocessing. That is, each image represents a point in 56!46 dimensional space. We still used Eq. (13) to construct the similarity matrix S. Nearest neighbor classifier was adopted to perform the recognition task for its simplicity, though there are other classifiers for pattern recognition such as Neural Network [23], Bayesian [21], Support Vector Machine [22], etc. The recognition rates approach the best with 70, 39, 66, 44

AR face data set [24] consists of 26 frontal images with different facial expressions, illumination conditions, and occlusions (sunglass and scarf) for 126 subjects (70 men and 56 women). Images were recorded in two different sessions separated by 2 weeks. Thirteen images were taken at the CVC under strictly controlled conditions in each session. The images taken from two sessions of one specific person are shown in Fig. 7. The size of the images in AR data set is 768!576 pixels, and each pixel is represented by 24 bits of RGB color values. We select 100 individuals (60 men and 40 women) for our experiment. Ten images in each session were chosen for every subject, seven non-occluded and three occluded (two sunglasses and one scarf). That is, there are 20 images per subject in our database. Totally, our database includes 2000 images. Then these images were converted to grayscale and normalized. Next, the facial area were cropped and resized to 45!32. The training set consists of 10 images randomly selected from each person in our database, and the rest of images as the testing set. The random selection for training and testing experiments are repeated for 10 times, and the average is reported. The S-Laplacianfaces derived from the training set are shown in Fig. 8, together with Eigenfaces, Fisherfaces and Laplacianfaces. Still, the nearest neighbor classifier is employed using Euclidean distance for classification. The recognition rates reach the best results with 125, 99, 124, 105 dimensions for PCA, LDA, LPE

Fig. 7. Part of face images of AR data set.

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

825

Fig. 8. Different ‘faces’ obtained by PCA, LDA, LPE, and SLPE, respectively.

and SLPE, respectively. Table 3 lists the best results and the corresponding dimensions. Fig. 9 shows the recognition rates versus dimensionality reduction. SLPE method still achieves the best recognition result comparing with PCA, LDA and LPE, though the recognition accuracy obtained by all these methods is relatively low. One

Table 3 Recognition comparison on AR

possible reason is that there are some occluded images in training set and testing set. These images severely deteriorate the determinant power of theses methods besides the exaggerated expression of non-occluded images. But the most important we want to demonstrate is that SLPE is more effective than the other three methods. And the experiment proves its good performances.

5. Discussion and future work

Algorithm

Recognition rate (%)

Dimension

PCA LDA LPE SLPE

48.10 50.10 50.00 60.50

125 99 124 105

Fig. 9. Recognition rates versus dimensionality reduction on AR data set.

A general framework for supervised LPE was proposed in this paper. Experiments on a number of data sets demonstrated that SLPE is a powerful feature extraction method, which when coupled with simple classifiers can yield very promising recognition results. SLPE takes the class membership information into account besides holding the locality preserving property of LPE. Since both local structure and discriminant information are important for classification, SLPE outperforms the traditional LPE, together with PCA and LDA which tend to preserve the global structure though LDA involves class information. It is worthwhile mentioning that SLPE takes advantage of more training samples, which is important to learn an efficient and effective embedding space representing the nonlinear structure of original patterns. Sometimes there might be only one training sample available for each class. How to overcome such problem is one of our future works. In fact, it is not always the case that we can obtain all the class information in pattern analysis. Sometimes we only have unlabeled samples or partially labeled samples. Therefore, another extension of our work is to consider how to use these unlabeled samples for discovering the manifold structure and, hence, improving the classification

826

Z. Zheng, J. Yang / Image and Vision Computing 24 (2006) 819–826

performance [10,25]. We are now exploring these problems with full enthusiasm. Acknowledgements The authors wish to acknowledge that this work is supported by Fundamental Project of Shanghai under grant number 03DZ14015. References [1] I.T. Jolliffe, Principal Component Analysis, Springer, New York, 1986. [2] H. Klock, J. Buhmann, Data visualization by multidimensional scaling: a deterministic annealing approach, Pattern Recognit. 33 (4) (1999) 651–669. [3] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, Proceedings of the Conference Advances in Neural Information Processing System, vol. 15, 2001. [4] D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999) 788–791. [5] M.S. Barlett, H.M. Ladesand, T.J. Sejnowsky, Independent component representations for face recognition, Proceedings of the SPIE 3299 (1998) 528–539. [6] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326. [7] J.B. Tenenbaum, A global geometric framework for nonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323. [8] Zhonglong Zheng, Jie Yang, Extended LLE with Gabor Wavelet for Face Recognition, The 17th Australian Joint Conference on Artificial Intelligence, Cairns Australia, 6–10th December, 2004. [9] X. He, P. Niyogi, Locality preserving projections, Proceedings of the Conference Advances in Nerual Information Processing Systems, 2003. [10] X. He, Shicheng Yan, et al., Face recognition using laplacianfaces, IEEE Trans. PAMI 27 (3) (2005) 328–340. [11] Xin Zheng, Deng Cai, Xiaofei He, Wei-Ying Ma, Xueyin Lin, Locality preserving clustering for image database, ACM Conference on Multimedia, New York City, October 10–16, 2004. [12] Xiaofei He, Shuicheng Yan, Yuxiao Hu, Hong-Jiang Zhang, Learning a locality preserving subspace for visual recognition, IEEE International Conference on Computer Vision, 2003. [13] D. de Ridder, R.P.W. Duin, Locally linear embedding for classification, Technical Report PH-2002-01, Pattern Recogntion Group, Department of Imaging Science & Technology, Delft University of Technology, Delft, the Netherlands, 2002.

[14] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Trans. PAMI 19 (7) (1997) 711–721. [15] W. Zhao, R. Chellappa, N. Nandhakumar, Empirical performance analysis of linear discriminant classifers, Proc. Comput. Vis. Pattern Recognit. (1998) 164–169. [16] D.L. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Trans. PAMI 18 (8) (1996) 831–836. [17] H. Cevikalp, M. Neamtu, Discriminant common vectors for face recognition, IEEE Trans. PAMI 27 (1) (2005) 4–13. [18] T.K. Kim, J. Kittler, Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image, IEEE Trans. PAMI 27 (3) (2005) 318–327. [19] Jian Yang, Alejandro.F. Frangi, Jing-yu Yang, KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognition, IEEE Trans. PAMI 27 (2) (2005) 230–245. [20] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1991) 71–86. [21] Ira Cohen, Nicu Sebe, Fabio.G. Cozman, Marcelo.C. Cirelo, Thomas.S. Huang, Learning Bayesian network classifiers for facial expression recognition with both labeled and unlabeled data, IEEE Conf. Comput. Vis. Pattern Recognit. (2003). [22] Arnulf.B.A. Graf, Alexander.J. Smola, Silvio Borer, Classification in a normalized feature space using support vector machines, IEEE Trans. PAMI 14 (3) (2003) 597–605. [23] Rui-Ping Li, Masao Mukaidono, I. Burhan Turksen, A fuzzy neural network for pattern classification and feature selection, Fuzzy Sets Syst. 130 (2002) 101–108. [24] A.M. Martinez, R. Benavente, The AR face database, CVC Technical Report #24, 1998. [25] I. Cohen, F.G. Cozman, N. Sebe, M.C. Cirelo, T.S. Huang, Semisupervised learning of classifiers: theory, algorithms, and their application to human–computer interaction, IEEE Trans. PAMI 26 (12) (2004) 1553– 1567. [26] Wanli Min, Ke Lu, Xiaofei He, Locality pursuit embedding, Pattern Recognit. 37 (4) (2004) 781–788. [27] W. Liu, Y. Wang, S.Z. Li, T. Tan, Null space-based Kernel Fisher discriminant analysis for face recognition, IEEE Proc. Int. Conf. Auto. Face Gest. (2004). [28] Q.S. Liu, H.Q. Lu, S.D. Ma, Improving kernel fisher discriminant analysis for face recognition, IEEE Trans. Circuits Syst. Video. Technol. 14 (1) (2004) 42–49. [29] D. De Ritter, et al., Supervised locally linear embedding, Artificial Neural Networks and Neural Information, Processing, ICANN/ICONIP 2003 Proceedings, Lecture Notes in Computer Science 2714, Springer, pp. 333–341.