On some variants of locality preserving projection

On some variants of locality preserving projection

Neurocomputing 173 (2016) 196–211 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom On some...

9MB Sizes 1 Downloads 87 Views

Neurocomputing 173 (2016) 196–211

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

On some variants of locality preserving projection Gitam Shikkenawis, Suman K. Mitra n Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India

art ic l e i nf o

a b s t r a c t

Article history: Received 1 July 2014 Received in revised form 10 November 2014 Accepted 15 January 2015 Available online 6 September 2015

High dimensional data is hard to interpret and work with in its raw form; hence dimensionality reduction is applied beforehand to discover underlying low dimensional manifold. Locality Preserving Projection (LPP) was introduced using the concept that neighboring data points in the high dimensional space should remain neighbors in the low dimensional space as well. In a typical pattern recognition problem, true neighbors are defined as the patterns belonging to same class. Ambiguities in regions having data points from different classes close by, less reducibility capacity and data dependent parameters are some of the issues with conventional LPP. In this article, some of the variants of LPP have been introduced that try to resolve these problems. A weighing function that tunes the parameters depending on data and takes care of the other issues is used in Extended version of LPP (ELPP). Better class discrimination is obtained using the concept of intra and inter-class distance in a supervised variant (ESLPPMD). To capture the non-linearity of the data, Kernel based variants are used, that first map the data to feature space. Data representation, clustering, face and facial expression recognition performances are reported on a large set of databases. & 2015 Elsevier B.V. All rights reserved.

Keywords: Locality preserving projection Subspace analysis Dimensionality reduction Data discrimination

1. Introduction Number of pixels used to represent the image constitute dimensions of the image space in which each image of that space is considered as a data point. Out of many possible combinations in this space, only a few turn out to be meaningful images. Generally, the distribution of such images i.e. data points in the original space is not uniform and may seek representation in a lower dimensional subspace. It becomes difficult to deal with a very high dimensional data, be it classification, recognition or analysis task. Hence, image representation in subspace is advantageous. This reduces the storage space as well as the computational complexity. One simple experiment can easily convince us about the image data residing compactly in the image space. The experiment exploits redundancy present in the image as discussed in Chen et al. [26]. The plots of log probabilities of horizontal and vertical derivatives of various kinds of images are shown in Fig. 1. It can be observed that the peak occurs at the values where derivatives are nearly zero which signifies high correlation among the neighboring pixels of the images. Hence, it can be inferred that the data, n Corresponding author at: DA-IICT Post Bag 4, Near Indroda Circle, Gandhinagar 382007, Gujarat, India. Tel.: þ 91 79 30510584; fax: þ 91 79 30520010. E-mail addresses: [email protected] (G. Shikkenawis), [email protected] (S.K. Mitra). URL: http://intranet.daiict.ac.in/  suman_mitra/newSite/ (S.K. Mitra).

http://dx.doi.org/10.1016/j.neucom.2015.01.100 0925-2312/& 2015 Elsevier B.V. All rights reserved.

visually varying, in the high dimensional space actually resides in a very compact lower dimensional space. To represent the images in a subspace having much lower dimensions, liner and non-linear dimensionality reduction techniques are present in literature. These techniques try to represent the original image in a much compact way keeping the information content intact. Principle Component Analysis [20,31] is one of the most popular dimensionality reduction methods. This linear approach finds the subspace from the data covariance information and preserves the directions of maximum variance. Linear Discriminant Analysis (LDA) [3], a supervised linear dimensionality reduction approach, maximizes inter class variability whereas minimizes intra-class variability in order to have better separation between different classes. Another popular liner approach is Independent Component Analysis (ICA) [18] that aims at making the components as independent as possible. Data not always lies on a liner manifold, many a times, the manifold on which data lies happens to be non-linear. In such cases, linear dimensionality reduction methods fail to discover the non-linearity present in the data. Isomap [30] is a non-linear dimensionality reduction technique that preserves the intrinsic geometry of the data using geodesic distances between the data points. Like Isomap, Locally Linear Embedding (LLE) [25] finds non-linear manifold by stitching small linear neighborhoods. LLE finds a set of weights that perform local linear interpolations to closely approximate the data. But in these approaches, it is not

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

197

Fig. 1. Log probabilities of horizontal (column 2) and vertical (column 3) derivatives of different kind of images.

clear how to project a new data point in the lower dimensional projection space. Neighborhood Preserving Embedding (NPE) [15], is a linear approximation of LLE that aims to discover the local structure of the data manifold. Locality Preserving Projection (LPP) [16,14] a linear dimensionality reduction approach, tries to capture the non-linearity present in the data using neighborhood information. Nonorthonormal basis vectors are obtained using notion of Laplacian of graph constructed by considering the data points i.e. images as nodes. Weight of the edges in the graph is assigned using the Euclidean distance between the data points in the original space. Orthogonal version of LPP, Orthogonal Locality Preserving Projection (OLPP) [7] produces the orthogonal basis functions with the aim of having more locality preserving power. Conventional LPP is sensitive to noise and outliers and depends highly on the parameters for constructing the neighborhood graph and the weight matrix. A few extensions [11,12,32,28,22,6] have been proposed to overcome these issues. Robust path based similarity is used in Enhanced Locality Preserving Projection [12] to obtain robustness against noise and outliers. Parameter free LPP [11] is also developed using Pearson correlation and adaptive neighborhood information. Ambiguities can take place in the regions having data points from two or more classes in case of LPP as it considers only

a few nearest neighbors in order to preserve the local structure [29]. Supervised LPP [34] uses class labels to have better separation between different classes. The data points having same class labels are only considered as neighbors, thereby resolving the ambiguous situations. To find the non-linear manifold of the data more precisely, various kernels are used to map the data non-linearity in the feature space before applying the conventional dimensionality reduction approaches such as Kernel PCA [1] and Kernel Discriminant Analysis [2]. LPP well preserves the local structure of the data, still to find the non-linear manifold of the data in a much better way, Supervised Kernel LPP [19] uses class label information as well as non-linear kernel mapping for better data discrimination. The paper covers locality preserving projection (LPP) and some of its variants. The regions having data points from two or more different classes nearby may have ambiguous mappings in the LPP projection space. Also, values of parameters of LPP play important role in finding the data embedding and are data dependent. An extension of LPP (ELPP) taking care of these issues is explained in detail. A z-shaped weighing function, that automatically tunes the parameters depending on data is used. In face and facial expression recognition tasks, many a times the class labels of the training

198

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

face images are already available. In order to have better data discrimination, an extended supervised variant of LPP with modified distance (ESLPP-MD) is suggested. The approach shrinks or diverges the distances between the data points depending on the class information. The method tends to improve the class separability in the projection space. Though, LPP and its variants try to discover the non-linear structure of the data, to capture the nonlinearity present in the data in a better way, Kernel based variants are suggested. After mapping the data in the kernel space, process of embedding it in the subspace is carried out using the proposed variants of LPP. Face recognition performances of the conventional and proposed approaches are compared on some of the widely used benchmark face databases. The article is organized as follows: Locality Preserving Projection (LPP) and some observations based on LPP are explained in detail in Section 2. Section 3 describes variants of LPP namely, Extended Locality Preserving Projection (ELPP), Supervised Locality Preserving Projection (SLPP), Extended Supervised Locality Preserving Projection with Modified Distance (ESLPP-MD) and kernel based variants of LPP. Projection as well as face and facial expression results on various data bases are included in Section 4 followed by concluding remarks.

2. Locality preserving projection (LPP) The non-linear dimensionality reduction methods [30,25] do yield impressive results on some benchmark artificial data-sets, as well as on real world data sets. However, their non-linear property makes them computationally expensive. Moreover, they yield maps that are defined only on the training data points and how to evaluate the map on novel test points remains unclear [16]. In many real world problems, local manifold structure is more important than the global Euclidean structure [17]. Locality reserving Projection (LPP) [16] is a recently proposed linear approach for dimensionality reduction that tries to capture the non-linear manifold structure of the data. It finds an embedding that preserves local information and obtains a subspace that best detects the essential data manifold structure. In LPP, neighborhood information is stored in a graph and basis vectors are found using the notion of Laplacian of graph. A weighing function is used to assign weights to the edges of the graph. This function incurs heavy penalty if data points are mapped far apart hence giving more emphasis to nearest neighbors. LPP is obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold [16]. It aims to preserve the neighborhood information. The objective function is X min ðyi  yj Þ2 Sij ð1Þ ij

where S is the similarity matrix. It is a symmetric matrix representing the weights of edges of the adjacency graph. The data points are considered as the nodes of the graph while existence of edges depends on whether two nodes are considered as neighbors or not. The procedure of calculating Sij as given in He and Niyogi [16] consists of two steps and then the transformation matrix w is found: Constructing the adjacency graph: Let G be the graph having the images in the training data-set as its nodes. An edge is present between nodes i and j if xi and xj are neighbors i.e. close to each other. The closeness can be determined in two different ways. 

ϵ–neighborhood: The nodes i and j are connected by an edge if J xi xj J 2 o ϵ A R. Here the norm is the usual Euclidean norm in Rn.

 k–nearest neighbors: The nodes i and j are connected by an edge if i is among k-nearest neighbors of j or vice-versa, k A N. Once the adjacency graph is obtained, LPP will try to optimally preserve the local structure defined by the adjacency graph. In case of LPP, for constructing the adjacency graph, K-NN is widely used over the ϵ-neighborhood method because of its simplicity and ease of implementation. Choosing the value of K properly is difficult because of the non-linearity and variety of the high dimensional data. Estimation of weights: S is a sparse symmetric matrix of m  m with Sij having the weight of the edge connecting vertices i-j and 0 if no such edge is there. Again we have two variations for weighing the edges.

 No parameter: Sij ¼ 1 if nodes i and j are connected by an edge.  Heat kernel: Sij ¼ e  J x  x J =t if nodes i and j are connected by an i

j

2

edge, t A R. The objective function with this choice of symmetric weights incurs a heavy penalty if neighboring points xi and xj are mapped far apart. This is an attempt to ensure that if xi and xj are close then their mappings in the projection space are close as well. Transformation matrix computation: In order to compute the transformation matrix, the objective function can be reduced to the matrix form, derivation of which can be found in [17] 1X ðy  yj Þ2 Sij ¼ WT XLXT W 2 ij i

ð2Þ

where W is the transformation matrix, L ¼ D  S is the Laplacian P matrix [9] and Dii ¼ i Sij , a diagonal matrix which provides a natural measure on the data points. Hence, a constraint WT XDXT W ¼ 1 is imposed on the objective function which incorporates normalization on the data points using the volume of the graph G [27,4]. The transformation matrix W that minimizes the objective function under the constraint is given by solution to the generalized eigenvalue problem [17] XLXT W ¼ λXDXT W

ð3Þ

This transformation matrix is then used to project the high dimensional data in the lower dimensional subspace. 2.1. Observations From LPP Manifold obtained using LPP highly depends on the construction of the similarity matrix. In case of conventional LPP, in order to obtain the similarity matrix, setting the values of parameters plays a vital role. First thing to be fixed is the neighborhood selection approach, as the graph structure is built-up using this information. Another parameter is the width of the Gaussian kernel that is controlled by t in the heat kernel approach, which weighs the edges of the graph constructed in the previous step. Generally, mean of the pair wise distances is used as t but this value not necessarily finds the optimal underlying manifold for all the data sets. The local structure of the data is well preserved by LPP but it pays a little or no attention in the overlapping regions of two or more classes. Many a times it happens that nearest neighbor of a data point is a data point belonging to the other class. In such cases, though the points belong to different classes, they could be connected because of their closeness. One such example is shown in Fig. 2(a). Here, two classes are denoted by A and B, a region is shown where the boundaries of two classes are intersecting. In this region, data points from both the classes are neighbors.

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

199

Fig. 2. (a) An example where data points from class A and B are mapped close by. (b) Energy curve for PCA (solid) and LPP (dotted).

In LPP, similarity function e  J xi  xj J =t is applied to few nearest neighbors of the data point of interest. Considering only the nearest neighbors may lead to wrong classification in the region of overlap. Another observation made is depending on energy preservation capacity of conventional LPP. More energy preservation ensures better reducibility capacity. Fig. 2(b) shows energy curves of PCA and LPP for an experiment. Experiments over a range of data indicate that considering only half of the dimensions in the PCA space preserves more than 90% energy whereas the same in the LPP space comes out to be only 60%. Though the figures may vary in accordance with the data, in general, the energy preservation capacity of LPP remains much less than that of PCA and hence, the reducibility capacity of LPP needs to be increased. With all mentioned above, an extension of LPP is proposed. 2

3. Variants of locality preserving projection 3.1. Extended Locality Preserving Projection (ELPP) This extension of LPP i.e. Extended Locality Preserving Projection (ELPP) [29] aims at improving the reducibility capacity as well as resolving ambiguities in the overlapping region. A weighing scheme has been used that extends the neighborhood of data and also considers the data points that are at a moderate distance from the point of interest along with the nearest neighbors. The weighing function automatically discards the data points far apart as it works based on the distance information. In order to exploit the natural grouping of the data, K-means algorithm is used based on which the neighbors are decided. The value of K is assumed to be available as the prior information. For constructing the adjacency graph, the decision whether to consider two data points as neighbors or not is taken by applying K-means classifier. Use of K-means helps to exploit the natural grouping of the data-set. Conventional LPP uses k–nearest neighbor or ϵ-neighborhood approach. For both the approaches, k and ϵ remain constant for all the data points and it is hard to fix these parameters. The training set first undergoes K-means classifier and then if class assigned to the two points is same, they are connected by an edge. This is an adaptive strategy for the data points to select the neighbors. Weight is assigned in the similarity matrix S~ according to the newly proposed z-shaped weighing function. Based on the range of values given as input, weights are assigned to the distances over the complete scale as per Eq. (4). Plot of the function using different parameters is shown in Fig. 3. As the distance between the data points i.e. x increases, weight at that

Fig. 3. Plot of the Z-shaped function with different parameters.

point decreases 8 1 > > >  2 > > > < 1 2 xbijaa  S ij ¼   > x b 2 > > 2 bij a > > > : 0

9 > > > > > = if a r xij r a þ2 b > if xij r a

if

> > aþb > 2 r xij r b > > otherwise

ð4Þ

> ;

Here, a and b specify the range of values along which the function changes its values and can be controlled. Between a and b two functions are used to make the final output function the zshaped one. The slope of the function is dependent on the parameters a and b. a is set to be a very small value and to set the value of b, the natural clusters found using k-means approach are used. After the k-means clustering step, data clusters are formed and labels to all the data points are assigned. For each cluster, maximum pairwise distance of all the data points belonging to that cluster referred to as the radius of the cluster is set to be b. Hence, for each cluster, b has a unique value depending upon the pairwise distance between the data points belonging to it. All the data points belonging to a particular cluster have same value of b. This makes the procedure adaptive according to the data. Thus, selection of neighbors and weight, both the processes become adaptive and data dependent. Parameters are not to be set explicitly. As opposed to the conventional LPP where only few neighbors are considered, here the data points that are at a moderate distance from the point of interest are also taken into consideration and weighed accordingly.

200

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

~  S~ [9] where Thus, the Laplacian matrix turns out to be L~ ¼ D ~ ¼ P S~ij . The objective function now turns out to be D i ~ T w subject to the constraint wT XDX ~ T w ¼ 1. Eigenmin wT XLX vectors of the generalized eigenvalue solution of ~ T w for the basis vectors of the newly found sub~ T w ¼ λXDX XLX space of ELPP. 3.2. Supervised Locality Preserving Projection (SLPP) As described in Section 2, LPP is an unsupervised learning method which does not take into account the class membership information. In practical scenarios, many a times neighboring data points belong to different classes. According to the objective function of LPP, if the data points are neighbors i.e. close to each other in the original space, they should remain close in the reduced space as well even if they do not belong to the same class. In such cases, LPP may lead to wrong classification as it does not use the class labels. The class information can be utilized to enhance the discriminant analysis as proposed in LDA [3]. It can be said that the locality preserving property and discriminant ability are significant in learning a new feature subspace [34]. The class information can be combined with the locality preserving property of LPP to enhance the performance. Supervised Locality Preserving Projection (SLPP) [34] is a variant of LPP, where the known class labels of the data points are used. The neighbors of a point are decided based on the already known class labels i.e. the prior information about the original data set is used to learn the feature subspace. Choosing the neighbors in this way will prevent the points from two different classes to be projected close by. The objective function of SLPP is same as LPP only the computation of weight matrix S is different. In case of LPP, the similarity matrix S is computed based only on the nearest neighbors and is independent of the class information. Whereas, in case of SLPP, two points are considered to be neighbors only if they belong to same class. To make the computations simpler, data samples are arranged compactly i.e. all the samples from same class are arranged together in X which simplifies the computation of S. As a result, X is changed to be the orderly matrix which is composed of sub matrices X subi of size n  M, i ¼ 1; 2; …c, where c is the number of classes. In this way, the nearest neighbors for each xj A X sub1 are sought in Xsub1 only. For each class, matrix Ssubi is calculated which will then be used for constructing S. Here, the weight is set to be 1 if two data points belong to the same class otherwise weight is made 0. Hence, Ssubi turns out to be 2 3 0 1 ⋯ 1 61 0 1 ⋮7 6 7 S subi ¼ 6 7 4⋮ 1 ⋱ 15 1



1

incorporated along with the class information, the underlying manifold can be revealed in a better way. The methods discussed so far concentrate on the neighboring data points coming from same class, but nothing has been suggested about the neighboring data points from different classes. 3.3. Extended Supervised Locality Preserving Projection with Modified Distance (ESLPP-MD) An improvement over SLPP is suggested in this section that tries to focus on both the issues raised in the previous section. As mentioned in the previous section, all the neighbors are treated in the same way irrespective of the distance between them. Also, no attention has been paid to the data points belonging to different classes. The distance between data points from different classes must be relatively more than their actual Euclidean distance as they belong to different classes. The class information can be incorporated so as to make the distance between two points belonging to different classes relatively larger than their Euclidean distance. This could be achieved by shrinking the intra-class distance while expanding the inter-class distance which will result in better separability between classes. The data points can be made strong neighbors by shrinking the distance if they belong to same class otherwise the distance can be increased so as to make them weak neighbors. To manipulate the distance according to this, Function (5) is used which is as follows: 8 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 9 2   < 1  e  ðdði;jÞ Þ=β ; ci ¼ cj = pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dist xi ; xj ¼ ð5Þ 2 : eðdði;jÞ Þ=β  α; ci a cj ; Here, ci is the class label of xi whereas cj is the class label of xj. Also, dði; jÞ ¼ ║xi  xj ║ is the Euclidean distance between the data points xi and xj. Constant α is used to adjust the similarity between the points from different classes. It takes the values between ½0; 1. Parameter β is used to prevent the distance from increasing too fast when dði; jÞ is relatively large. Usually, the average distance between all pairs of data points is taken as β. The plot of the distance functions defined in Eq. (5) is shown in Fig. 4. Here, the dotted line shows the behavior of the function if the data points belong to different classes and the solid line indicates the same for data points from the same class. Similar functions have been used to have better separability in case of Locally Linear Embedding (LLE) [8]. Modifying distances in this manner adds more discriminating power for different classes. This new distance matrix is used for constructing the weight matrix Sij. In addition to the class information, if the weights assigned in S are based on the distance as

0

S is constructed by arranging the Ssubi s for all the classes diagonally i.e. 2 3 S sub1 6 7 S sub2 6 7 7 S¼6 6 7 ⋱ 4 5 S subc All other computations for finding the transformation matrix are same as done in LPP i.e. the transformation matrix w is found by solving the generalized eigenvalue problem: XLX T w ¼ λXDX T w. In the formulation of SLPP, all the data points belonging to same class are treated in the same manner i.e. in the similarity matrix S, weight 1 is assigned if the data points belong to the same class otherwise weight is made 0 without considering the distance between data points. If the distance information is

Fig. 4. Plot of the distance function for the data points belonging to same class (solid) and different classes (dotted) [8].

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

opposed to SLPP where a fixed value is assigned to all the neighboring data points, it may be more helpful in revealing the underlying manifold. One way to do so is by using a weighing function to compute the matrix S as in case of LPP. But, as observed from LPP, to overcome the ambiguity occurring in the overlapping regions of two or more classes as well as to increase the reducibility capacity, the same z-shaped weighing function is used which automatically gives emphasis to the data points at a moderate distance from the point of interest in addition to the nearest neighbors. The transformation matrix then can be found as in case of ELPP. This scheme is called Extended Supervised Locality Preserving Projection with modified distance (ESLPP-MD). Extension in the sense that it tries to assign weights to the data points which are at a moderate distance from the point of interest, in addition to the closest neighbors as in the case of LPP. Modified in the sense that it uses a modified distance between two data points as opposed to Euclidean distance used in LPP. Note that distance x defined in the function to compute Sij is actually “dist” that is defined earlier.

As discussed in Section 1, many a times face images lie on a non-linear manifold due to high expression, pose and illumination variations. Linear dimensionality reduction techniques such as PCA, LDA, and ICA fail to capture the non-linearity present in the data. LPP well preserves the local structure of the data and captures the non-linearity in many circumstances, but it may fail to capture complex non-linear changes. Hence, to find the non-linear manifold of the data in a much better way, kernel based methods can be used to map the data non-linearity in the feature space F. After mapping the data in this feature using a proper kernel, the modified versions of LPP can be applied in F as shown in Fig. 5. Kernel based versions of PCA [1], LDA [2] and LPP [19] are already available. In these approaches, a function ϕ : Rn -F is used to map the data from the original n-dimensional space to the nonlinear feature space. Let the data in the original space be denoted by x1 ; x2 ; ⋯; xm , then the data mapped in the feature space is denoted as ϕðx1 Þ; ϕðx2 Þ; ⋯; ϕðxm Þ. Mapping the data in the feature space F using ϕ is equivalent to choosing K (the kernel)  function   where K xi ; xj ¼ ϕðxi Þ; ϕ xj . Hence, it can be said that kernels are nothing but the inner product between the data points in some space, and even without knowing the non-linear mapping function ϕ explicitly, the relationship between two data points in the feature space can be directly determined using K [19]. Some of the very popular kernels are as linear kernel, polynomial kernel, Gaussian kernel and so on. In this article, polynomial and Gaussian kernels are used, mathematical forms of which are as follows: 

variants of LPP. As suggested in [19], kernelized version of the proposed ELPP is developed. In the objective function of ELPP, we P have, min i;j ║yi  yj ║2 S~ ij , here as data is first mapped to the feature space, yi ¼ wTF ϕðxi Þ. The transformation can be represented function turns out to be: as wTF ¼ ϕðX Þw hence, the objective   P ~  S~ ϕðX Þ  ϕðX Þw ¼ 2wT KLKw. min i;j ║yi  yj ║2 S~ ij ¼ 2wT ϕðX Þ  ϕðX Þ D K can be selected from any of the kernels discussed above. Note that, here S~ is the same as in Eq. (4). Similarly for ESLPP-MD also, kernelized version is developed.

4. Experiments Comparison of the methods discussed in the previous sections has been carried out in this section. Existing approaches are first compared with the respective modified versions followed by the comparison of all the approaches on three benchmark data-sets for Face Recognition. 4.1. LPP vs. ELPP

3.4. Kernel based versions of LPP



201





 Polynomial Kernel: K xi ; xj ¼ x0i nxj d  Gaussian Kernel: K xi ; xj ¼ eð  J x  x J =2σ i

j

2

2

Þ

Using one of these or any other kernel, the data can be mapped to the feature space and then to the transformed space using

All the experiments of LPP have been performed using the author specified parameters in the code available online. The neighborhood parameter in the code is set to be 5 or 6. For ELPP, in order to obtain natural grouping of the data, k-means clustering is used. The data points that are assigned same labels, are considered as neighbors and weight according to the z-shaped weighing function is applied. Estimating the actual value of ‘k’ from the given data is still an open problem in Pattern Recognition community. Better estimate of ‘k’ i.e. number of classes leading towards more efficient clustering. Some methods of selecting optimal ‘k’ can be found in [24,33,23]. In the current experiment, we have not paid attention for finding optimal ‘k’. As mentioned in Section 3.1, value of ‘k’ i.e. number of classes is assumed to be known as a prior information. However, in the face recognition experiments on the YALE and ORL face data bases, results have been taken for varying ‘k’ and the same is presented in Tables 1 and 2 . It can be concluded from these tables that the actual ‘k’ leads to the best results. Synthetic data-sets having overlapping regions between classes have been generated for the initial experimentation purpose. Afterwords, various real world digit and face data-sets are used. The high dimensional images are projected on lower dimensional subspaces using both LPP and ELPP. Synthetic data-set: Synthetic data-sets are generated by randomly sampling data points from different 2 and 3 dimensional Gaussian distributions in such a way that some of the classes overlap. An example of synthetic data-set and the selected testing samples from the overlapping regions of the classes are shown in Fig. 6(a) and (b) respectively. Fig. 7 shows the classification of the testing samples using LPP and ELPP. The quantitative measure of the results is shown in Table 3, nearest neighbor classifier is used to classify the points. From Fig. 7 and Table 3, it can be concluded

Fig. 5. General framework of Kernel based methods.

202

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Table 1 Results (errors in %) of face recognition on the YALE face data base with different values of ‘k’ in the k-means clustering step. Actual number of classes for the database are 15 which is shown by bold letters. k

11

12

13

14

15

16

17

18

19

Error rate (%)

12.7

12.7

11.5

10.9

9.1

10.9

10.9

13.3

12.7

Table 2 Results (errors in %) of face recognition on the ORL face data base with different values of ‘k’ in the k-means clustering step. Actual number of classes for the database are 40 which is shown by bold letters. k

36

37

38

38

40

41

42

43

44

Error rate (%)

13.5

11

13.5

13

10

12.5

12.5

12.5

13

that in the overlapping region, proposed approach is able to give more accurate classification results. Experimental results on other synthetic data-sets are also mentioned in Table 3. The MNIST database of handwritten digits: Structurally similar digit pairs are chosen for experimentation from the MNIST database of handwritten digits.1 As all the images are of handwritten digits, sometimes because of the way of writing, there may be a lot of similarity between some of the digits such as 3, 8, 1, and 7. Strongest two dimensional projections of the digit pairs using LPP and ELPP are shown in Fig. 8. Horizontal axis represents the direction of the strongest component of the transformation matrix while vertical axis represents the 2nd strongest component. The same convention is used to show the results throughout this article. Each data point in the projection domain is assigned label of its nearest data point, thus clustering the projected data. Results according to increasing number of dimensions are shown in Table 4. As we want to show that ELPP is distinguishing structurally similar digits in a much better way with very less dimensions, results of clustering on similar looking digits are included here. The energy curves for PCA, LPP and ELPP are also shown in Fig. 11. Although the reducibility capacity of ELPP is still less than that of PCA, it is significantly improving than that of LPP. The results in Table 4 also second the argument. A better separation between the digit pairs is achieved using ELPP. Video data-set: To evaluate how the algorithm behaves on a person's varying expressions, a video data-set (DA-IICT video dataset) has been created in which videos of around 30 s of 11 subjects are recorded. Single video contains four different expressions of a

Fig. 7. The results of applying LPP and ELPP. Empty circles (black) are the correctly classified data points using both the approaches. Asterisks (blue) are the misclassified points using both the approaches. It is to be noted that all the data points which are misclassified using ELPP are misclassified using LPP as well. In addition to those, LPP misclassified some more data points which are shown using square (red) signs. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) Table 3 Results (errors in %) of classifying the testing samples using LPP and ELPP from Synthetic dataset. Synthetic dataset

Dataset 1 Dataset 2 Dataset 3

Dimensions

2 2 3

Classes

7 12 6

Error rate (%) LPP

ELPP

30.23 24 31.66

18.60 20 26.6

subject i.e. Angry, Normal, Smiling and Open mouth as shown in Fig. 9. Strongest 2D projections of single person are shown in Fig. 10. Better separation between different expressions of a person can be clearly seen using the proposed weighing function. To justify the results quantitatively, in the projection domain, data points are assigned labels of their nearest neighbors for both the approaches, however results of face and facial expression recognition on the whole video data set along with some other benchmark data sets are reported later in this article. For clustering, as reported in Table 5, both LPP and ELPP show similar results at maximum dimensions but ELPP appears to be better than LPP at lower dimensions. Much less error observed in ELPP at a very low dimension implies its strong reducibility capacity. To support this argument, the energy curve for DAIICT Video dataset using PCA, LPP and ELPP are shown in Fig. 11(b).

Fig. 6. Example of synthetic data-set: (a) Data points of the generated data-set. Each color represents a class and (b) Class boundaries of all the classes and the selected testing samples. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

1

http://yann.lecun.com/exdb/mnist/

The strongest 3 dimensional projection of the whole Video data-set (i.e. 11 subjects, 4 expressions) using ELPP is shown in Fig. 12. Also to check how different expressions of a person are

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

203

Fig. 8. Strongest 2D projection of the selected digit pairs using LPP (column 1) and ELPP (column 2). Table 4 Results (errors in %) of clustering the projected digit pairs using LPP and ELPP from MNIST dataset of Hand-Written digits. Digit pairs

# Dimensions 2

3 and 8 1 and 7 1 and 5

10

50

100

MAX

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

42.2 61.2 33.0

17.0 10.2 4.75

52.0 13.5 13.5

14.5 9.25 2.75

45.7 5.5 3.0

10.5 3.0 2.0

51.2 4.25 2.5

9.75 2.75 2.0

48.2 4.25 2.0

9.75 2.5 1.5

being projected, projection of some of the subjects has been enlarged and each expression is represented by a different color. It can be clearly observed that, the sub manifolds of expression are also significantly discriminated. 4.1.1. Face and expression recognition Face Recognition is one of the most widely used applications where dimensionality reduction is used and then the desired

recognition task is performed [21]. Face images are represented as high-dimensional pixel arrays, due to high correlation between the neighboring pixel values; they often belong to an intrinsically lower dimensional manifold. The distribution of data in a high dimensional space is non-uniform and is generally concentrated around some kind of low dimensional structures [5]. Machine learning algorithms find it difficult to classify the given data in such a high dimensional space. Hence the representation of data in a lower dimensional space is highly sought. Locality Preserving Projection is widely used for face recognition tasks and performs better than some of the traditional dimensionality reduction approaches such as PCA and LDA. In this section, Face Recognition is carried out using LPP and the proposed ELPP. Four different test data-sets having facial images other than training data, from the Video data-set are created. In Face recognition, the test image is projected using the transformation matrix and it is assigned the class of its nearest training image from the training data-set in the low dimensional space. Results of Face Recognition on the Video data-set are shown in Table 6. The results are taken using Nearest Neighbor classifier. It can be observed that in case of recognition, ELPP is almost 100% accurate

Fig. 9. Examples of facial expressions from DAIICT Video dataset: (a) angry, (b) normal, (c) smiling, (d) open mouth respectively.

204

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Fig. 10. Projection of single person's face images from DAIICT Video database using two strongest dimensions of LPP (column 1) and ELPP (column 2). Table 5 Results (errors in %) of clustering the projected face images from the Video data-set using LPP and ELPP. Person Number

# Dimensions 2

10

LPP P1 P2 P1, P3

ELPP

LPP

50 ELPP LPP

72.47 3.89 71.98 0.85 71.80 0.73 73.92 0 85.41 13.98 82.30 0.35

100 ELPP LPP

70.52 0 69.24 0 87.32 0

MAX ELPP LPP

67.11 0 70.06 0 87.32 0

ELPP

0 0 0 0 0.12 0.59

using much less dimensions as compared to LPP. From the results of Face recognition, it can be observed that using more number of dimensions the results are not that significantly different for both

the methods. But the proposed approach is behaving much better using very less number of dimensions. In addition to identifying the person, to check whether both the approaches are able to find the expression sub-manifold i.e. to identify the person as well as the expression at a time, experiments of Expression Recognition are also carried out. In such case, as different expressions of a single person are to be observed, there will be a lot of overlap in these classes. Hence, this experiment provides a better check to what extent the problem of overlapping classes has been solved with the use of ELPP. Again, the test image is projected using the transformation matrix and it is assigned the class of its nearest training image from the training data-set in the low dimensional space. Here, the classes of training images are considered according to the person as well as expression. Results of Expression Recognition using

Fig. 11. Energy curves: PCA (solid), LPP (dotted) and ELPP (dashed) for (a) MNIST dataset of handwritten digits. (b) DAIICT Video database.

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

205

Fig. 12. Strongest 3 dimensional projection of all 11 subjects from Video data-set and some examples of expression discrimination in a single person's manifold. Each person is represented by a different sign or color in the figure on left whereas for single person, each expression is represented by different color on the right. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) Table 6 Results (errors in %) of face recognition on various Test data-sets having 11 subjects created from the Video data-set using nearest neighbor approach. Test data

Expression

Angry

Normal

Smiling

Open mouth

Angry Normal Smiling Open mouth

79.09 7.27 8.18 0

14.54 90.0 20.90 1.81

6.36 2.72 70.0 0

0 0 0.9 98.18

# Dimensions 3

1 2 3 4

Table 8 Confusion matrix for Test data 1.

10

50

500

MAX

LPP

ELPP LPP

ELPP LPP

ELPP LPP

ELPP LPP

ELPP

85.22 83.63 83.86 86.13

3.86 5.0 5.23 5.23

0.22 0.22 0.68 1.36

0 0 0.45 0.68

0 0 0.22 0.45

0 0 0.02 0

87.5 87.5 86.36 87.04

88.18 86.81 83.40 88.18

88.40 86.59 85.45 87.27

15.68 14.77 14.77 15.68

Table 9 Confusion matrix for Test data 2.

nearest neighbor approach on DAIICT Video data-set are shown in Table 7. It can be observed that in case of recognition, ELPP is almost 82% accurate using much less dimensions as compared to the conventional LPP. To analyze the results of Expression Recognition in more detail, confusion matrix for each Test data-set is obtained. It is a specific table layout that allows visualization of the performance of an algorithm. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. This makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). Here, the columns and rows represent four different facial expressions. Results are shown in Tables 8–11 for Test Data 1, 2, 3 and 4 respectively. Diagonal entries in the tables represent correctly classified expressions. From some test cases, it seems that Angry and Smiling expressions are getting mixed up with the Normal

Expression

Angry

Normal

Smiling

Open mouth

Angry Normal Smiling Open mouth

80.90 5.45 7.27 0

13.63 91.81 18.18 1.81

5.45 2.72 72.72 0

0 0 1.81 98.18

Table 10 Confusion matrix for Test data 3. Expression

Angry

Normal

Smiling

Open mouth

Angry Normal Smiling Open mouth

80.90 7.27 8.18 0

14.54 84.54 18.18 1.81

4.54 7.27 72.72 0

0 0.90 1.81 98.18

Table 7 Results (errors in %) of expression recognition on various Test data-sets having 11 subjects with mainly 4 different facial expressions created from the Video data-set using nearest neighbor approach. Test data

# Dimensions 3

1 2 3 4

10

50

500

MAX

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

93.18 91.36 90.22 91.81

33.86 34.09 33.63 37.72

94.31 92.72 92.27 94.54

20.22 18.63 20.22 24.09

94.31 91.81 90.22 94.54

16.59 15.90 17.5 20.68

94.31 90.68 90.0 92.5

15.90 15.90 17.72 20.22

28.40 27.50 28.40 27.04

15.90 14.09 15.45 17.72

206

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Table 11 Confusion matrix for Test data 4. Expression

Angry

Normal

Smiling

Open mouth

Angry Normal Smiling Open mouth

78.18 6.36 8.18 0

15.45 83.64 18.18 1.81

6.36 7.27 72.72 0

0 0.90 1.81 98.18

expression while the accuracy for Open Mouth is highest in all cases i.e. approximately 98%.

degree 2 are used. The results are reported in Tables 15–18. Almost 100% face recognition accuracy is obtained using only strongest 3 dimensions for both Polynomial and Gaussian kernels with ELPP, whereas around 95% accuracy for facial expression recognition is achieved using strongest 10 dimensions for polynomial kernel with ELPP. Hence, it is evident that mapping the data into feature space before applying LPP or ELPP enhances the ability of capturing the non-linearity present in the data and boosts the recognition performance. Experiments for the kernelized versions of SLPP and ESLPP-MD have also been carried out on some of the benchmark face databases results for which are reported in the next sub-section.

4.2. SLPP vs. ESLPP-MD The results are shown on the same databases used earlier in this article, i.e. MNIST data-set of handwritten digits and the Video data-set. Similar experiments for projecting the higher dimensional data in the lower dimensional subspace are performed taking the same subsets of the databases. Strongest 2 dimensional projections of the digit pairs 3–8, 1–7 using SLPP and ESLPP-MD are shown in Fig. 13. In the 2 dimensional space, SLPP is separating the digits in a nice manner but still some ambiguity is present at the boundary points of two classes. ESLPP-MD is able to separate the structurally similar digit pairs in a much better way. Also, in the projection space, the intrinsic characteristics of a digit are revealed i.e. thickness and slope of the digit are changing as we move forward on the horizontal axis. Again, the quantitative justification is given by clustering the projected data points and validated with the known class information with different number of dimensions in Table 12. The proposed approach provides clear discrimination between classes with no errors using only 2 or 3 strongest dimensions which shows its strong reducibility capacity. In case of SLPP, the errors decrease as more dimensions are used. The current approach is also tested on the Video data-set. Strongest 3 dimensional projection of face images of all persons from this data-set is shown in Fig. 14 in which corresponding person's face image is shown near each cluster. It can be observed clearly that the current proposal is distinguishing different persons completely. All the data points belonging to same class are getting projected in one cluster. ESLPP-MD is not only able to cluster the data points belonging to the same class, but also to discriminate different classes in a much better way.

4.2.1. Face and expression recognition Experiments for face and facial expression recognition are also performed on the Video data-set using the current proposal. The training and testing data-sets used in Section 4.1.1 are used here. Results of face recognition of 4 test data-sets using ESLPP-MD are included in Table 13. 100% accuracy is achieved using only 2 or 3 dimensions in almost all the cases using ESLPP-MD which shows the strong reducibility capacity as well as very strong discriminating power of ESLPP-MD. Results of facial expression recognition on the same test data-sets are reported in Table 14. On an average, 85% facial expressions are correctly recognized using proposed approach.

4.3. KLPP vs. KELPP Face and facial expression recognition using LPP and ELPP on the video database are performed on the same test data-sets using polynomial and Gaussian kernel before using the dimensionality reduction approach (i.e. either LPP or ELPP). For all the experiments involving polynomial kernel, polynomial kernels with

4.4. Results on benchmark databases In this section, performance of all the variants of LPP discussed in this article is evaluated on some of the widely used benchmark face data-sets. Performances of ELPP, ESLPP-MD and their respective Kernel based versions K-ELPP and K-ESLPP-MD are compared with LPP [16,17], SLPP [34], K-LPP [16] and K-SLPP [19]. Polynomial kernel with degree 2 is used for all the experiments involving kernel mapping before applying the dimensionality reduction techniques. Three database are used for testing i.e. the ORL database,2 the YALE face database [13] and the AR database [10]. The databases are divided into distinct training and testing sets and nearest neighbor classifier is used to classify the test samples in the projection space. Training samples are used to calculate the transformation matrix that maps the data into the face subspace, test samples are then projected in the lower dimensional projection space using the same transformation matrix and test samples are identified with nearest neighbor classifier. In case of kernel based methods, samples are mapped to the feature space before finding the face subspace. ORL database: The ORL database of faces contains 10 different images of 40 subjects. The face images were captured varying illumination conditions, facial expressions such as open or closed eyes, smiling or not smiling and facial details such as glasses or no glasses. The images were cropped and re-sized 64  64 with 256 gray levels per pixels. Some of the sample images from the ORL database are included in Fig. 15(a). The images are randomly divided into distinct training and testing subsets. 60% samples are used for training whereas 40% samples are used for testing. Recognition results in terms of error rates are presented in graphical format in Fig. 16. The plot represents error rates for all the methods with varying number of strongest dimensions used. It can be observed that kernelized versions of all the methods outperform respective non-kernelized methods at higher dimensions with less error rate. Also, K-ESLPP-MD outperforms all other approaches across all the dimensions. YALE face database B: The Yale face database B [13] was constructed at the Yale Center for Computational Vision and Control. The database contains 5850 images of 10 subjects each seen under 576 viewing conditions (9 poses  64 illumination conditions). Again the images were cropped and re-sized to 64  64 with 256 gray levels per pixel, some of which are illustrated in Fig. 15(b). 60% samples from the dataset are randomly selected for training and rest of the samples form test set. Recognition results with varied number of dimensions are shown in Fig. 17. Extended Supervised LPP with modified distance (ESLPP-MD) performs better than LPP, ELPP, SLPP as well as K-LPP and is comparable to K-ELPP but K-ESLPP-MD beats all the approaches with 98% recognition accuracy with 30 dimensions. 2

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

207

Fig. 13. Strongest 2 dimensional projection of digits 3–8 (row 1) and 1–7 (row-2) using SLPP and ESLPP-MD respectively. In both the cases, ESLPP-MD not only discriminates the classes, but it also takes care of structural properties such as slanting and thickness within single digit.

Table 12 To compare the results of SLPP and ESLPP-MD, nearest neighbor approach is used to classify the projected points into different classes using different strongest dimensions on the digit pairs from MNIST Digit database. Errors in % are included here. Dataset

Dataset

# Dimensions

# Dimensions 2

Digit 3,8 Digit 1,7

Table 13 Results (errors in %) of face recognition on various test data-sets having 11 subjects created from the Video data-set using nearest neighbor approach using ESLPP-MD.

10

50

SLPP

ESLPP-MD

SLPP

ESLPP-MD

SLPP

ESLPP-MD

15.75 9.75

0 0

19.25 4.75

0 0

19.0 5.5

0 0

Test Test Test Test

data data data data

1 2 3 4

3

10

50

500

MAX

0 0 0.2 0.4

0 0 0.2 0.4

0 0 0 0.2

0 0 0 0.2

0 0 0 0

Fig. 14. Strongest 3 dimensional projection of all the persons from the Video database. All the face images of same person are being projected in the same cluster while those of different persons' are well separated. The axis represent 3 strongest dimensions of ESLPP-MD.

208

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Table 14 Results (errors in %) of facial expression recognition on various Test data-sets having 11 subjects created from the Video data-set using nearest neighbor approach using ESLPP-MD. Dataset

Test Test Test Test

# Dimensions

data data data data

1 2 3 4

Test data

10

50

500

MAX

3

28.18 26.6 26.4 30

22.29 20.22 19.66 22.61

16.4 13.85 12.95 15.23

15.9 13.64 12.95 15.23

15.9 13.64 12.95 15.23

LPP

ELPP

LPP

ELPP LPP

ELPP

LPP

ELPP

LPP

ELPP

20.68 19.09 19.77 25.9

26.36 23.86 26.5 32.04

6.81 7.04 6.81 9.54

5.9 5.9 7.2 9.54

12.05 9.77 10.22 13.4

11.36 10.68 11.36 14.09

11.36 10.9 12.05 13.86

11.36 10.68 11.59 13.63

11.13 10.9 11.59 13.63

# Dimensions 3

1 2 3 4

# Dimensions

3

Table 15 Results (errors in %) of face recognition using Polynomial Kernel before applying LPP and ELPP on various Test data-sets created from the Video data-set using nearest neighbor approach. Test data

Table 18 Results (errors in %) of Facial Expression Recognition using Gaussian Kernel before applying LPP and ELPP on various Test data-sets created from the Video data-set using nearest neighbor approach.

10

50

500

MAX

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

1.59 1.13 1.36 1.59

0 0 0 0.22

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0

0 0 0 0

1 2 3 4

10

50

12.95 12.27 11.81 15.22

500

MAX

sunglasses as shown in Fig. 15(c). These complex changes make the database more challenging for face recognition problems. Each image of size 40  55 is converted to gray scale with 256 gray levels per pixel. 50% samples for each subject are used for training and rest of them are used for testing from a subset of 60 subjects from the database. Results of face recognition showing error rates corresponding to various number of dimensions are illustrated in Fig. 18. As can be observed from the plot, ESLPP-MD shows significant improvement in error rates and is comparable to K-SLPP.

5. Conclusion Table 16 Results (errors in %) of face recognition using Gaussian Kernel before applying LPP and ELPP on various Test data-sets created from the Video data-set using nearest neighbor approach. Test data

# Dimensions 3

1 2 3 4

10

50

500

MAX

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

LPP

ELPP

1.5 1.13 1.36 1.81

0 0 0 0.22

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0.22

0 0 0 0

0 0 0.22 0.22

0 0 0 0

Table 17 Results (errors in %) of facial expression recognition using Polynomial Kernel before applying LPP and ELPP on various Test data-sets created from the Video data-set using nearest neighbor approach. Test data

# Dimensions 3

1 2 3 4

10

50

500

MAX

LPP

ELPP

LPP

ELPP LPP

ELPP

LPP

ELPP

LPP

ELPP

26.36 23.86 26.59 32.05

20.22 17.95 18.63 25.45

5.9 5.9 7.27 9.77

4.54 4.77 5.0 6.13

10.9 10.45 10.45 13.18

11.36 11.13 12.27 13.86

10.9 10.68 11.59 13.86

11.13 10.9 12.04 13.63

10.68 10.45 11.13 13.86

12.04 9.77 10.22 13.4

AR database: The AR face database [10] contains 4000 frontal view color images of 100 subjects with different illumination conditions, facial expressions and occlusions such as scarf and

Some variants of LPP are suggested in this article to have a more robust projection of the data with much less dimensions resulting in better data discrimination. Extending neighborhood by considering moderately distant data points and weighing in a z-shaped, monotonically decreasing fashion not only resolves problems in the overlapping regions of two or more classes to a certain extent, but also enhances the reducibility power of Extended LPP (ELPP). In case of recognition scenarios, class labels of the training data points are already available (supervised) that are incorporated to enhance class discrimination. This information is used to make data points from same class as strong neighbors whereas mapping data points from different classes far apart by reducing and increasing the distance between them respectively. Projection and recognition results using ESLPP with modified distance (ESLPP-MD) are encouraging and show improved class discrimination ability with only a few strongest dimensions. The approach used here is also revealing the expression sub-manifold of a person, boosting the facial expression recognition performance. Significant improvement in the recognition rate on complex database such as AR shows ability of ESLPP-MD to capture illumination and appearance changes as well as occlusions such as faces occluded by scarf and eyes occluded by glasses. Use of kernels to map the data in feature space before applying dimensionality reduction leads to capture the non-linearity present in the data. Usually, kernel based versions of LPP perform better than the non-kernelized ones but ESLPP-MD is comparable to Kernel-SLPP. Moreover, the proposed kernelized version of ESLPP-MD outperforms all other techniques. Less than 8% errors across all the test databases are achieved with K-ESLPP-MD with only 20 most significant dimensions.

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Fig. 15. Sample face images from (a) ORL database, (b) YALE face database B, (c) AR database.

209

210

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211

Fig. 16. Error rate (%) vs. dimensionality reduction on the ORL database.

Fig. 17. Error rate (%) vs. dimensionality reduction on the YALE face database B.

Fig. 18. Error Rate (%) vs. dimensionality reduction on the AR face database.

Acknowledgments Gitam Shikkenawis would like to acknowledge Tata Consultancy Services for providing the financial support for carrying out this work.

[5] M. Belkin, P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [6] C. Lu, X. Liu, W. Liu, Face recognition based on two dimensional locality preserving projections in frequency domain, Neurocomputing 98 (3) (2012) 135–142. [7] D. Cai, X. He, J. Han, H. Zhang, Orthogonal Laplacianfaces for face recognition, IEEE Trans. Image Process. (2006) 3608–3614. [8] J. Chen, Z. Ma, Locally linear embedding: a review, Int. J. Pattern Recognit. Artif. Intell. 25 (7) (2011) 985–1008. [9] F. Chung, Spectral graph theory, in: Regional Conference Series in Mathematics, No. 92, 1997. [10] L. Ding, A. Martinez, Features versus context: an approach for precise and detailed detection and delineation of faces and facial features, IEEE Trans. Pattern Anal. Mach. Intell. 32 (March (11)) (2010) 2022–2038. [11] F. Dornaika, A. Assoum, Enhanced and parameterless locality preserving projections for face recognition, Neurocomputing 99 (2013) 448–457. [12] G. Yun, H. Peng, J. Wei, Q. Ma, Enhanced locality preserving projections using robust path based similarity, Neurocomputing 71 (2011) 598–605. [13] A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Mach. Intell. 23 (6) (2001) 643–660. [14] X. He, Locality preserving projections (Ph.D. thesis), University of Chicago, 2005. [15] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in: IEEE International Conference on Computer Vision, vol. 2, 2005, pp. 1208–1213. [16] X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems, March 2003. [17] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Lapacian faces, IEEE Trans. Pattern Anal. Mach. Intell. 27 (March (3)) (2005). [18] A. Hyvärinen, Survey on independent component analysis, Neural Comput. Surv. 2 (1999) 94–128. [19] J. Cheng, Q. Liu, H. Liu, Y.W. Chen, Supervised kernel locality preserving projections for face recognition, Neurocomputing 67 (2005) 443–449. [20] I. Jolliffe, Principal Component Analysis, Springer, Berlin, New York, 2002. [21] S. Li, A. Jain, Hand book of Face Recognition, Springer-Verlag, London, 2011. [22] J. Lu, Y.P. Tan, Regularized locality preserving projections and its extensions for face recognition, IEEE Trans. Syst. Man Cybern. Part B (TSMC-B) 40 (3) (2010) 958–963. [23] P. Jaikumar, A. Singh, S.K. Mitra, Efficient learning of finite mixture densities using mutual information, in: International Conference on Advances in Pattern Recognition, 2009, pp. 95–98. [24] S. Ray, R. Turi, Determination of number of clusters in k-means clustering and application in color image segmentation, in: International Conference on Advances in Pattern Recognition and Digital Techniques, 1999, pp. 137–143. [25] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326. [26] H. Scharr, M.J. Black, H.W. Haussecker, Image statistics and anisotropic diffusion, in: IEEE International Conference on Computer Vision, 2003, pp. 840– 847. [27] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 888–905. [28] G. Shikkenawis, S. Mitra, A new proposal for locality preserving projection, in: PerMIn, 2012, pp. 298–305. [29] G. Shikkenawis, S.K. Mitra, Improving locality preserving projection for dimensionality reduction, in: Emerging Applications of Information Technology, IEEE Computer Society, Kolkata, 2012, pp. 161–164. [30] J. Tenenbaum, V. Silva, J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [31] M. Turk, A. Pentland, Eigenfaces for face recognition, Cogn. Neurosci. 3 (1) (1991) 71–86. [32] Y. Xu, A. Zhong, J. Yang, D. Zhang, Lpp solution schemes for use with face recognition, Pattern Recognit. 43 (12) (2010) 4165–4176. [33] Y. Lee, K. Y. Lee, J. Lee, The estimating optimal number of Gaussian mixtures based on incremental k-means for speaker identification, Int. J. Inf. Technol. 12 (7) (2006) 13–21. [34] Z. Zhenga, F. Yanga, W. Tana, J. Jiaa, J. Yangb, Gabor feature-based face recognition using supervised locality preserving projection, Signal Process. 87 (2007) 2473–2483.

Gitam Shikkenawis is a Ph.D. Scholar at Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar and has received TCS Research Fellowship, 2013. Gitam obtained her M.Tech. in Machine Intelligence from Dhirubhai Ambani Institute of Information and Communication Technology (DAIICT), Gandhinagar, India. She received her B.E. degree in Computer Science from G. H. Patel College of Engineering and Technology, V.V. Nagar, India, in 2010. She is currently working on dimensionality reduction and image restoration. Her area of interest is Pattern Recognition and Image Processing. Gitam was a recipient of Google India Women in Engineering

References [1] B. Scholkopf, A. Smola, K.R. Müller, Kernel principal component analysis, in: Artificial Neural Networks – ICANN'97, Lecture Notes in Computer Science, 1997, pp. 583–588. [2] G. Baudat, F. Anouar, Generalized discriminant analysis using a kernel approach, Neural Comput. 12 (2000) 2385–2404. [3] P. Belhumeur, J. Hespanha, D. Kriengman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (13) (1997) 711–720. [4] M. Belkin, Problems of learning on manifolds (Ph.D. thesis), University of Chicago, 2003.

Award, 2011.

G. Shikkenawis, S.K. Mitra / Neurocomputing 173 (2016) 196–211 Suman K. Mitra is a Professor at the Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar. Dr. Mitra obtained his Ph.D. from the Indian Statistical Institute, Calcutta. Earlier to his current position, Dr. Mitra was with the Institute of Neural Computation at the University of California, San Diego, USA and with the Department of Mathematics at the Indian Institute of Technology, Bombay. Dr. Mitra has 70 technical papers published in various prestigious National and International Journals and Conferences. He also has 2 US patents in his credit. Dr. Mitra's research interest includes image processing, pattern recognition, Bayesian networks and digital watermarking. Currently, Dr. Mitra is serving International Journal of Image and Graphics (IJIG) as an Associate Editor. Dr. Mitra is a Senior Member of IEEE.

211