Enhanced regularized least square based discriminative projections for feature extraction

Enhanced regularized least square based discriminative projections for feature extraction

Accepted Manuscript Enhanced regularized least square based discriminative projections for feature extraction Ming-Dong Yuan , Da-Zheng Feng , Wen-Ju...

552KB Sizes 0 Downloads 85 Views

Accepted Manuscript

Enhanced regularized least square based discriminative projections for feature extraction Ming-Dong Yuan , Da-Zheng Feng , Wen-Juan Liu , Chun-Bao Xiao PII: DOI: Reference:

S0165-1684(17)30158-5 10.1016/j.sigpro.2017.04.018 SIGPRO 6464

To appear in:

Signal Processing

Received date: Revised date: Accepted date:

19 September 2016 21 March 2017 26 April 2017

Please cite this article as: Ming-Dong Yuan , Da-Zheng Feng , Wen-Juan Liu , Chun-Bao Xiao , Enhanced regularized least square based discriminative projections for feature extraction, Signal Processing (2017), doi: 10.1016/j.sigpro.2017.04.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Enhanced regularized least square based discriminative projections for feature extraction *

Ming-Dong Yuan, Da-Zheng Feng , Wen-Juan Liu, Chun-Bao Xiao National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, Shaanxi, China

CR IP T

Abstract: The regularized least square based discriminative projections (RLSDP) for extracting features was recently proposed, which aims to seek discriminant projection directions that maximize the between-class scatter and minimize the within-class compactness. However, in RLSDP, the retrieval samples are reconstructed by the

AN US

coefficients only associated with the same class, and may have large errors. Moreover, the distances between each sample and other within-class samples characterize the most important within-class compactness information, and are not minimized in RLSDP. To deal with the above two problems, we propose an enhanced regularized least

M

square based discriminative projections (ERLSDP). ERLSDP utilizes all the related coefficients of each sample for reconstruction and explicitly minimizes the distances between all the within-class samples, and thus it has better

ED

reconstruction accuracy and more discriminating power than that of RLSDP. Experimental results demonstrate that

PT

ERLSDP gets a clear improvement over RLSDP when the training sample size is small.

CE

Keywords: feature extraction; regularized least square; collaborative representation; sparse representation

AC

1. Introduction

Feature extraction, which aims to produce compact and effective low-dimensional feature representations of

high-dimensional data, has been extensively studied over the past several decades. Compared with the global based principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] approaches, manifold learning methods are more appealing since they can discover the local intrinsic structure of data. Representative manifold *

Corresponding author. Email address: [email protected] (D. Feng); [email protected] (M. Yuan) 1

ACCEPTED MANUSCRIPT

learning methods include locality preserving projections (LPP) [3], locality preserving discriminant projections (LPDP) [4], discriminative locality alignment (DLA) [5], discriminant locality preserving projections (DLPP) [6], marginal Fisher analysis (MFA) [7], etc. Although their motivations are different, they all can be unified in the graph embedding (GE) framework [7], and their differences lie in graph construction. Manifold learning has found

CR IP T

its wide applications in various fields. For example, Li et al. [8] developed a discriminative distance metric learning (DML) algorithm based on manifold learning, and further derived a distributed and parallel computational scheme to deal with the large-scale metric learning problem. Reference [9] exploited the manifold learning method to

AN US

analyze multivariate variable-length sequence data. Gao et al. [10] integrated local and global manifold structures for face and image classification.

Recently, sparse representation has shown its promising performance in many domains [11-15]. For instance,

M

Wright et al. [11] proposed a sparse representation based classification (SRC) for face recognition. Zhou et al. [12] proposed a double shrinking algorithm (DSA) for sparse projection eigenvectors. Moreover, many research efforts

ED

[16, 17] had shown that the neighborhood relationship of each data could be adaptively obtained by sparse 1 -graph

was robust to noise. Based on

1 -graph,

Qiao et al. [16]

PT

representation methods, and the resulted

proposed a sparsity preserving projections (SPP) for feature extraction, which aims at preserving the sparse

CE

reconstruction relationship of the data both in original space and low-dimensional embedding space. By combining

AC

the supervised SPP and maximum margin criterion, Gui et al. [18] introduced a discriminant sparse neighborhood preserving embedding (DSNPE) algorithm. Gao et al. [10] gave a discriminative sparsity preserving projections (DSPP), which first employs sparse representation to build an intrinsic graph and a penalty graph, and then integrates global within-class structure for dimensionality reduction. Despite their good performance, sparse representation methods need to solve

1

norm minimization problem, which has higher computational

complexity. 2

ACCEPTED MANUSCRIPT

Zhang et al. [19, 20] claimed that the collaborative representation mechanism was the key factor for the success of SRC, and proposed a collaborative representation based classification (CRC) method. CRC replaces the

1

norm in SRC with simpler

2

norm. CRC has similar properties and competitive classification performance

to SRC. Based on CRC, Yang et al. [21] constructed a

2

-graph and developed a collaborative representation

CR IP T

based projections (CRP) to preserve the collaborative reconstruction relationship of the data. Hua et al. [22] proposed a collaborative representation reconstruction based projections (CRRP). The projection matrix in CRRP is obtained by maximizing the collaborative reconstruction between-class scatter and minimizing the collaborative

AN US

reconstruction within-class scatter. Another method was proposed in [23], which is similar to [22]. In [24], Yang et al. developed a regularized least square based discriminative projections (RLSDP). It maximizes the between-class scatter adopted by LDA and minimizes the within-class compactness by the reconstruction residual from the same

M

class. However, RLSDP includes two main problems. First, the reconstructions by the coefficients only corresponding to the same class will have large errors, and thus RLSDP cannot give the best reconstruction for each

ED

sample. Second, it does not minimize the distances between each sample and other within-class samples, which is

PT

important for minimizing the within-class compactness. To address the above two problems, we propose an enhanced regularized least square based discriminative

CE

projections (ERLSDP). In ERLSDP, each sample is reconstructed by all the associated coefficients, which results in

AC

smaller reconstruction error. More importantly, the distances between each sample and all its reconstructed within-class samples, which characterize the most important within-class compactness, are minimized. The optimal discriminant projection of ERLSDP is achieved by maximizing the between-class scatter and minimizing the within-class compactness simultaneously. Experiments on three face databases indicate that our ERLSDP performs better than RLSDP. The main contributions of our work are as follows. 1) We make use of the whole representation coefficients to 3

ACCEPTED MANUSCRIPT

reconstruct each sample. In contrast, the original RLSDP only utilizes the partial representation coefficients corresponding to the same class for reconstruction of each sample. Thus, our ERLSDP achieves smaller reconstruction error and better classification performance. 2) We build a weight matrix to explicitly characterize the within-class geometry of the data, and minimize the distances between all the within-class samples. Meanwhile, by

CR IP T

maximizing the between-class scatter, the samples sharing the same class label will be pulled together and those from different classes will be pushed apart, which is a very desirable property for classification tasks.

The rest of this paper is structured as follows. In Section 2, the regularized least square (RLS) and RLSDP are

AN US

briefly reviewed. The proposed ERLSDP is detailed in Section 3. The experimental results are illustrated in Section 4, and the conclusions are given in Section 5. 2. RLS and RLSDP

Given a set of n training samples X  [x1 , x2 ,

, xn ] 

mn

with C classes, where xi 

, xcnc ] 

mnc

is the

, XC ] , where

c

contains the samples associated with class c , x j denotes the jth sample in

ED

Xc  [x1c , xc2 ,

M

ith sample. Based on the class labels, X can also be portioned as X  [X1 , X2 ,

m

PT

the cth class and nc is the number of samples in class c . 2.1. RLS

min xi  Xsi si

2 2

  si q ,

(1)

AC

CE

According to [24], both SRC and CRC can be unified in the regularized least square formulation as follows

where

  0 is the regularized parameter, and q is often taken as 1 or 2. When q  1 , Eq. (1) is the sparse

representation method, which has no closed-form solution and must be solved in an iterative manner. If

 is large

enough, some of the elements in s i will be close to zero, leading to sparse solution of s i .The case q  2 is the collaborative representation method, which has analytical solution and is computationally more efficient. 2.2. RLSDP 4

ACCEPTED MANUSCRIPT

RLSDP is a supervised feature extraction method which is based on the

2

-norm regularized least square. It

minimizes the reconstruction error of each sample by the coefficients from the same class, and maximizes the between-class separation simultaneously. For each training sample x i , its corresponding reconstruction coefficient vector s i can be obtained by Eq. (1) with q  2 , where si  [ si ,1 ,

, si ,n ]T 

si ,i 1 ,0, si ,i 1

n

. The goals of

2

min  i 1 PT xi   j 1 si, j PT x j , n

n

P

2

max tr (PT Sb P),

(d

m) is the low-dimensional projection matrix, Sb  (1 n) j 1 ni  mi  m  mi  m  C



is the between-class scatter in LDA, in which m  (1 n)

n

i 1

T

i xi and mi  (1 ni ) j 1 xij are the total mean

AN US

md

(2) (3)

P

where P 

CR IP T

RLSDP correspond to the following two optimization problems

n



and the mean of the ith class, respectively. tr () is the trace operator and si , j takes the form  s , if xi and x j are in the same class, si, j   i , j otherwise. 0,

M

(4)

The objective function of RLSDP is defined as

ED

tr (PT Sb P) max , P tr (PT X(I  S   ST  S ST )XT P) 

PT



where S  [s1 , s2 ,

(5)

, sn ] 

nn

and I is an identity matrix with proper size. The solution of Eq. (5) can be

solved by the following maximum generalized eigenvalue problem of Sbp   X(I  S  S   S S  )X p , T

CE

T

 is the maximum eigenvalue and p is the corresponding eigenvector.

AC

where

T

3. Enhanced regularized least square based discriminative projections 3.1. Motivations

It is seen from Eq. (2) that to minimize the within-class compactness, RLSDP only minimizes the 

reconstruction error of each sample x i and its reconstructed form by the coefficients s i , where the element of

s i is defined in Eq. (4). There are mainly two problems. First, using s i to reconstruct x i will have larger error 5

ACCEPTED MANUSCRIPT



since these non-zero values in s i only associated with the same class as x i . Second, RLSDP neglects the within-class geometry which is very important for characterizing the within-class compactness. Eq. (2) indicates that RLSDP merely minimizes the error of each x i and its reconstruction. However, RLSDP does not let the distances be minimized between each x i and all other reconstructed samples sharing the same class label as x i .

to say, the within-class compactness cannot be guaranteed in RLSDP.

CR IP T

Accordingly, the samples in the same class of RLSDP will not be clustered together in the projected space. That is

Based on the above analysis, we will propose an enhanced RLSDP (ERLSDP) which can simultaneously

AN US

address the two problems in RLSDP. In what follows, the proposed ERLSDP is first described in detail, followed by a discussion of it. 3.2. ERLSDP

In order to achieve our goals, we modify Eq. (2) as

min  i , j 1 PT xi  PT x j Wij , 2

M

n

P

(6)

2

ED

where x j  Xs j is the reconstructed sample of x j , Wij is used to model the within-class geometry, which is

PT

defined as

CE

1, if xi and x j are from the same class, Wij   0, otherwise.

(7)

Performing some simple algebra operations and considering x j  Xs j , Eq. (6) can be rewritten as

min  i , j 1 PT xi  PT Xs j Wij 2

AC

n

P

  tr  P X

2

 tr P X i , j 1  (ei  s j )Wij (ei  s j )T XT P T

n

T

n i , j 1

e W e i

T ij i



 eiWij sTj  s jWij eTi  s jWij sTj XT P



(8)

 tr  PT X(DW  WST  SW  SDW ST ) XT P   tr  PT XMW XT P  , where

ei  [01 , ,0i 1,1,0i 1, ,0n ]T 

n

, S  [s1 , s2 , 6

, sn ] 

nn

is the reconstruction coefficient

ACCEPTED MANUSCRIPT matrix, DW is the diagonal matrix with element DiiW 



n

Wij , and MW  DW  WST  SW  SDW ST .

j 1

For classification purpose, it is desirable to simultaneously minimize the within-class compactness and maximize the between-class separation. As a result, we formulate the objective function of ERLSDP as

P

tr (PT Sb P) . tr (PT XMW XT P)

(9)

CR IP T

max

The optimal projection P can be solved by the maximum generalized eigenvalue problem

Sb P   XMW XT P.

, pd be the eigenvectors of Eq. (10) corresponding to d maximum eigenvalues  1   2 , ,   d ,

then P  [p1 , p2 ,

, pd ] is the optimal projection matrix. In small sample size (SSS) cases, the dimension of the

AN US

Let p1 , p2 ,

(10)

T

training sample vector is larger than the training sample size, which makes XMW X

singular. We thus adopt T

nonsingular in the PCA subspace.

ED

3.3. The outline of ERLSDP

is

M

PCA to reduce the dimension by throwing away those smallest principal components, so that XMW X

The main procedure of ERLSDP is summarized as follows:

PT

Step 1: Use PCA to preprocess the training data X and discard the smallest principal components. We still

CE

utilize X to represent the training data after PCA projection and denote the projection matrix by VPCA . Step 2: Calculate the reconstruction coefficient matrix S  [s1 , s2 ,

, sn ] by Eq. (1) with q  2 , and

AC

construct the weight matrix W by Eq. (7). Step 3: Compute the between-class scatter matrix Sb  (1 n)



C

n  mi  m  mi  m 

j 1 i

T

and the matrix

XMW XT by Eq. (8). Step 4: Perform the generalized eigenvalue decomposition of Eq. (10) to get the optimal projection matrix

P  [p1 , p2 ,

, pd ] , whose column vectors are the eigenvectors corresponding to the d largest eigenvalues of

Eq. (10). 7

ACCEPTED MANUSCRIPT

Step 5: The final projection matrix is given by V  VPCA P . For each sample x , its low-dimensional embedding y is expressed as x  y  V x . T

3.4. Discussions 

The first difference between RLSDP and the proposed ERLSDP is that the partial coefficients s i in RLSDP

CR IP T

are replaced by the whole coefficients s i (for convenience, the algorithm only related to this change is denoted by ERLSDP-1). We adopt ORL database to show the benefits of this change. The description of ORL database is postponed to Section 4. Fig. 1 gives the reconstruction errors of twenty samples (belonging to the first two classes)

AN US

by RLSDP and ERLSDP-1. As shown in Fig. 1, the reconstruction errors of ERLSDP-1 are much smaller than that of RLSDP. Accordingly, the objective function value of ERLSDP-1 will become larger, which is exactly what we are looking for. To further test their classification performance, we randomly choose 4, 5, 6 and 7 images from each

M

subject to construct the training set and the rest those as the testing set. Each experiment is independently repeated 20 times. It should be emphasized that the experimental settings here are the same to those in Section 4. The

ED

maximum average accuracies of RLSDP and ERLSDP-1 (with their best parameters) are shown in Fig. 2. It can be

PT

seen from Fig. 2 that ERLSDP-1 achieves better recognition accuracies than that of RLSDP, and their gaps become smaller with the increase of the number of training samples per class. This is probably because the larger

CE

coefficients may not concentrate on its own class when the number of training samples per class is small, and the

AC

collaborative representation effect by other samples with different labels is more critical. Therefore, RLSDP will have larger reconstruction errors. Of course, this phenomenon will be alleviated when the number of training samples per class increases. In this case, the larger coefficients will be approximately centered on its own class with higher chances. To illustrate the second difference between RLSDP and ERLSDP, we split Eq. (6) into two parts

8

ACCEPTED MANUSCRIPT

min  i , j 1 PT xi  PT x j Wij 2

n

P

2

  i 1 PT xi  PT xi n

2 2

  i , j 1 PT xi  PT x j Wij n

2

i j

2

  i 1 PT xi   j 1 si , j PT x j n

n

2 2

(11)

  i , j 1 PT xi  PT x j Wij , n

2

i j

2

part 1

part 2



n

s x j , and

j 1 i , j

CR IP T

where part 1 corresponds to minimizing the error of each sample x i and its reconstruction xi =

due to the constraints of W , part 2 corresponds to minimizing the distances between each x i and other reconstructed samples with the same class label as x i . Compared with Eq. (2) in RLSDP, if the whole coefficients of the related sample are used, Eq. (2) will be equivalent to the part 1 in Eq. (11), but the part 2 is neglected in

AN US

RLSDP. From this point of view, the proposed ERLSDP can minimize the distances between all the within-class samples. Therefore, all the samples belonging to the same class will be pulled together, which means that the within-class samples will be more compact after ERLSDP projection. Moreover, by maximizing the between-class

M

scatter at the same time, the distances between different classes will be drawn apart, which benefits for

ED

classification.

To clearly manifest the compact effect produced by our proposed ERLSDP, we perform two-dimensional (2-D)

PT

visualization experiments and compare it with the original RLSDP. The public available ORL and AR face

CE

databases are adopted in this experiment. The introduction of AR database can refer to Section 4, and those images only with illuminations and expressions (1400 images in total) are used. Specifically, for both databases, we

AC

randomly select 4 images from each class as the training set to learn the projection directions of RLSDP and ERLSDP, and then project all images onto a 2-D space. The scatter plots of six classes (denoted as Class 1 to Class 6) randomly picked from ORL and AR databases by RLSDP projection are illustrated in Fig. 3(a) and Fig. 3(c) respectively, and the scatter plots of the same samples using ERLSDP method are shown in Fig. 3(b) and Fig. 3(d). From Fig. 3, we observe that the 2-D embedding results of RLSDP tend to be scattered, and some data points from different classes mix together, which will easily increase the wrong classification rate of them. By contrast, the 9

ACCEPTED MANUSCRIPT

distributions of the within-class samples by the proposed ERLSDP are more compact, and the margins between different classes are greater than that of RLSDP, which intuitively demonstrate the superiority of ERLSDP. 4. Experimental results To show the effectiveness of ERLSDP, we compare it with CRP [21], LDA [2], MFA [7], DSNPE [18], DLPP

CR IP T

[6], CRRP [23] and RLSDP [24] on three face databases, namely ORL [25], AR [26] and FERET [27]. For MFA, we empirically set the neighbor parameters k1 as ni  1 and select k 2 from {1C,3C,5C,7C,9C} , where

ni and C are the number of training samples in class i and the number of classes, respectively. The public

parameter

1

-magic (http://users.ece.gatech.edu/~justin/l1magic/) is used for DSNPE. The regularized

AN US

available solver

 is fixed as 0.1 in the proposed ERLSDP and searched by grids for CRP, CRRP and RLSDP. For all

the algorithms, we first use PCA to reduce the dimension by keeping 98% data energy, so as to alleviate the small

M

sample size (SSS) problem. The nearest neighbor (1NN) classifier with Euclidean metric is used for classification. All our experiments are conducted on MATLAB R2010b installed on a notebook with Inter(R) Core(TM)

ED

i5-2450M 2.50GHz CPU and 4G memory.

PT

The ORL face database contains 400 images from 40 individuals with 10 images for each individual. These images are taken in different times and varied with facial expressions and facial details. All the images are cropped

CE

to 32  32 pixels. The full AR database is over 4000 color images of 126 individuals, including 70 men and 56

AC

women. A subset of the AR face database provided by Martinez is used in our experiments. It contains 2600 images corresponding to 100 people (50 men and 50 women), and each person has 26 different images. These images are taken in two sections and each section has 13 images, including 7 full facial images (with illumination changes and expressions) and 6 occlusion images (3 images with sunglasses and 3 images with scarves). The images are cropped to 55  40 pixels. The FERET face database was sponsored by the U.S. Department of Defense. We exploit a subset of FERET database which contains 1400 images of 200 individuals with variations of facial expression, 10

ACCEPTED MANUSCRIPT

illumination and pose. The images are cropped to 32  32 pixels. Some images of these databases are shown in Fig. 4. All the images are transformed into long vectors with fixed order, and each vector is normalized to unit length. We start our experiments using full facial images without occlusions. For each database, several images randomly

CR IP T

selected from each subject are used as training, and the rest those used as testing. Specifically, 4, 5, 6 and 7 images are randomly selected from ORL. Three, four and five images are randomly selected from both AR (1400 face images with illumination changes and expressions) and FERET. Each experiment is independently repeated 20

AN US

times and the averaged results are reported. Tables 1-3 list the maximum average accuracies, the standard derivations and the corresponding dimensions of all the algorithms on ORL, AR, and FERET, respectively. The best results are in boldface. Fig. 5 shows the average accuracies with the variation of dimensions on ORL, AR and

M

FERET when the number of training samples per class is fixed as 5. Fig. 6 plots the maximum average accuracies of RLSDP and the proposed ERLSDP with the changes of

 on three databases when the number of training

ED

samples per class is 4. The average training time (second) of each method on FERET database across 20 times is

PT

tabulated in Table 4.

To further investigate the performance of our proposed ERLSDP, we conduct three experiments on AR

CE

database with sunglasses and scarf occlusions. The first experiment considers the impact of face images occluded

AC

by sunglasses, where the occlusion rate is about 20%. We randomly select 4 images from 14 full facial images, and 1 image from 6 occlusion images with sunglasses as the training set, and the remaining full facial images and the occlusion images with sunglasses as the testing set. In this case, we have 500 training images and 1500 testing images. The second experiment considers the impact of face images with scarves, which occlude about 40% of the image. In this experiment, we randomly select 4 from 14 full facial images and 1 occlusion image with scarves for training, and the remaining full images and the images with scarves for testing. The last experiment considers the 11

ACCEPTED MANUSCRIPT

impact of images with both sunglasses and scarves. In this experiment, we randomly select four full facial images, one occlusion image with sunglasses and one occlusion image with scarves for training, and the rest for testing. Therefore, we have 600 training images and 2000 testing images. Each experiment is independently repeated 20 times. Since the pixel region of sunglasses or scarves covers a portion of face region, they can be viewed as large

CR IP T

noise or outliers. Fig. 7 shows the maximum average recognition accuracies and the corresponding standard derivations of eight methods under three experiments, and Fig. 8 plots the recognition accuracies of all the methods across 20 different splits under different occlusion images.

AN US

From Tables 1-4 and Figs. 5-8, we get the following observations. (1) CRP performs the worst on all databases since it is unsupervised in nature, and the label information is crucial for classification. (2) The proposed ERLSDP consistently outperforms RLSDP for different training sample sizes. (3) The improvements of ERLSDP over other methods are more evident when the training sample size per class is small. (4) ERLSDP is robust to the

 in a wide range, and the recognition performance of ERLSDP is consistently superior to

that of RLSDP when

 varies. (5) ERLSDP is slightly slower than RLSDP in the training stage. This is because

ED

M

regularized parameter

PT

ERLSDP needs more matrix-matrix multiplications than that of RLSDP, which can be seen from their objective functions in Eq. (9) and Eq. (5), respectively. However, ERLSDP is faster than DSNPE. This is due to the fact that

1

norm to calculate the collaborative representation coefficients and has closed-form solution,

norm used in DSNPE has no analytical solution and must be solved iteratively. (6) ERLSDP gets the

AC

while the

2

CE

ERLSDP utilizes

best performance among all the compared methods both on full facial images and occlusion images with sunglasses and scarves.

5. Conclusions and future work In this paper, we propose an ERLSDP for feature extraction. Compared with the original RLSDP, ERLSDP utilizes all the corresponding coefficients of each sample so that it achieves better reconstruction accuracy. In 12

ACCEPTED MANUSCRIPT

addition, ERLSDP also explicitly minimizes the distances between all the within-class samples at the same time. Therefore, it is able to make the within-class samples more compact that is desirable for classification. Experimental results on three face databases validate its effectiveness. Although better performance has been achieved by our ERLSDP, it is still prone to be badly affected by

2

norm to measure both the loss function in Eq.

CR IP T

extreme outliers or in complex noise condition, since we adopts

(1) and the distances between pairwise points in the objective function of Eq. (9). When there are outliers,

2

norm will magnify the effect of outliers related to large deviations, leading to unsatisfactory results. To address this 1

norm [20, 28-31],

2,1

norm [32-34] and

AN US

problem, some outlier-robust metrics have been proposed, such as

correntropy [35-37]. Moreover, a weighting approach is also presented in [38] to suppress outliers. It is very interesting to further study the performance of ERLSDP using different metrics and weighting scheme for more

M

robust learning. We left them as our future work.

ED

Acknowledgment

CE

References

PT

This work was supported by the National Natural Science Foundation of China under Grant 61271293.

[1] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci., 3 (1991) 71-86.

AC

[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell., 19 (1997) 711-720.

[3] X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using Laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 328-340.

[4] J. Gui, W. Jia, L. Zhu, S.-L. Wang, D.-S. Huang, Locality preserving discriminant projections for face and palmprint recognition, Neurocomputing, 73 (2010) 2696-2707. [5] T. Zhang, D. Tao, X. Li, J. Yang, Patch Alignment for Dimensionality Reduction, IEEE Trans. Knowl. Data Eng., 21 (2009) 1299-1313. [6] W. Yu, X. Teng, C. Liu, Face recognition using discriminant locality preserving projections, Image Vis. Comput., 24 (2006) 239-248. [7] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for 13

ACCEPTED MANUSCRIPT

dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007) 40-51. [8] J. Li, X. Lin, X. Rui, Y. Rui, D. Tao, A Distributed Approach Toward Discriminative Distance Metric Learning, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015) 2111-2122. [9] S.S. Ho, P. Dai, F. Rudzicz, Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search, IEEE Trans. Neural Netw. Learn. Syst., 27 (2016) 1333-1344. [10] Q. Gao, Y. Huang, H. Zhang, X. Hong, K. Li, Y. Wang, Discriminative sparsity preserving projections for image recognition, Pattern Recognit., 48 (2015) 2543-2553. [11] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., 31 (2009) 210-227.

CR IP T

[12] T. Zhou, D. Tao, Double shrinking sparse dimension reduction, IEEE Trans. Image Process., 22 (2013) 244-257. [13] H. Cheng, Z. Liu, L. Yang, X. Chen, Sparse representation and learning in visual recognition: Theory and applications, Signal Process., 93 (2013) 1408-1425.

[14] S. Huang, Y. Yang, D. Yang, L. Huangfu, X. Zhang, Class specific sparse representation for classification, Signal Process., 116 (2015) 38-42.

[15] X. Gao, N. Wang, D. Tao, X. Li, Face Sketch–Photo Synthesis and Retrieval Using Sparse Representation, IEEE

AN US

Trans. Circuits Syst. Video Technol., 22 (2012) 1213-1226.

[16] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognit., 43 (2010) 331-341.

[17] B. Cheng, J. Yang, S. Yan, Y. Fu, T.S. Huang, Learning with l1-graph for image analysis, IEEE Trans. Image Process., 19 (2010) 858-866.

[18] J. Gui, Z. Sun, W. Jia, R. Hu, Y. Lei, S. Ji, Discriminant sparse neighborhood preserving embedding for face recognition, Pattern Recognit., 45 (2012) 2884-2893.

M

[19] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 471-478. [20] L. Zhang, M. Yang, X. Feng, Y. Ma, D. Zhang, Collaborative representation based classification for face recognition,

ED

in: arXiv preprint arXiv:1204.2358, 2012.

[21] W. Yang, Z. Wang, C. Sun, A collaborative representation based projections method for feature extraction, Pattern Recognit., 48 (2015) 20-27.

PT

[22] J. Yin, L. Wei, M. Song, W. Zeng, Optimized projection for Collaborative Representation based Classification and its applications to face recognition, Pattern Recognit. Lett., 73 (2016) 83-90.

CE

[23] J. Hua, H. Wang, M. Ren, H. Huang, Dimension reduction using collaborative representation reconstruction based projections, Neurocomputing, 193 (2016) 1-6. [24] W. Yang, C. Sun, W. Zheng, A regularized least square based discriminative projections for feature extraction,

AC

Neurocomputing, 175 (2016) 198-205. [25] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for human face identification, in: Proceedings of the Second IEEE Workshop on Applications of Computer Vision, Sarasota, FL, 1994, pp. 138-142. [26] A.M. Martínez, A.C. Kak, Pca versus lda, IEEE Trans. Pattern Anal. Mach. Intell., 23 (2001) 228-233. [27] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERET database and evaluation procedure for face-recognition algorithms, Image Vis. Comput., 16 (1998) 295-306. [28] H. Wang, X. Lu, Z. Hu, W. Zheng, Fisher Discriminant Analysis With L1-Norm, IEEE T. Cybern., 44 (2014) 828-842. [29] F. Zhong, J. Zhang, D. Li, Discriminant Locality Preserving Projections Based on L1-Norm Maximization, IEEE Trans. Neural Netw. Learn. Syst., 25 (2014) 2065-2074. [30] H. Wang, F. Nie, H. Huang, Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and 14

ACCEPTED MANUSCRIPT

Maximization, in: Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1836-1844. [31] G.-F. Lu, J. Zou, Y. Wang, L1-norm and maximum margin criterion based discriminant locality preserving projections via trace Lasso, Pattern Recognit., 55 (2016) 207-214. [32] C.-X. Ren, D.-Q. Dai, H. Yan, Robust classification using ℓ 2, 1-norm based regression model, Pattern Recognit., 45 (2012) 2708-2718. [33] F. Nie, H. Huang, X. Cai, C.H. Ding, Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization, in: Advances in Neural Information Processing Systems, 2010, pp. 1813-1821. IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 171-184.

CR IP T

[34] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust Recovery of Subspace Structures by Low-Rank Representation, [35] R. He, B.G. Hu, W.S. Zheng, X.W. Kong, Robust Principal Component Analysis Based on Maximum Correntropy Criterion, IEEE Trans. Image Process., 20 (2011) 1485-1494.

[36] R. He, W.S. Zheng, B.G. Hu, Maximum Correntropy Criterion for Robust Face Recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011) 1561-1576.

[37] W. Liu, P.P. Pokharel, J.C. Principe, Correntropy: Properties and Applications in Non-Gaussian Signal Processing,

AN US

IEEE Trans. Signal Process., 55 (2007) 5286-5298.

[38] C.X. Ren, D.A.I. D. Q, X. He, H. Yan, Sample Weighting: An Inherent Approach for Outlier Suppressing

AC

CE

PT

ED

M

Discriminant Analysis, IEEE Trans. Knowl. Data Eng., 27 (2015) 3070-3083.

15

ACCEPTED MANUSCRIPT

Figure captions

CR IP T

Fig 1 Reconstruction errors of twenty samples by RLSDP and ERLSDP-1 on ORL face database.

Fig 2 The maximum average accuracies by RLSDP and ERLSDP-1 on ORL database when the number of training

AN US

samples per class increases from 4 to 7.

Fig 3 Scatter plots of six classes randomly picked from ORL and AR face databases by RLSDP and ERLSDP

M

methods, respectively. (a) RLSDP on ORL, (b) ERLSDP on ORL, (c) RLSDP on AR and (d) ERLSDP on AR.

ED

Fig 4 Some samples from three face databases.

PT

Fig 5 The average accuracies versus dimensions of all methods on ORL, AR and FERET face databases when the

AC

FERET.

CE

number of training samples per class is fixed as 5. (a) 5 Train on ORL, (b) 5 Train on AR and (c) 5 Train on

Fig 6 The maximum average accuracies of RLSDP and the proposed ERLSDP with the changes of  when the number of training samples per class is 4. (a) 4 Train on ORL, (b) 4 Train on AR and (c) 4 Train on FERET.

Fig 7 The maximum average recognition accuracies, and the corresponding standard derivations of eight methods under three experiments (sunglass, scarf, and sunglass+scarf) on AR database. 16

ACCEPTED MANUSCRIPT

Fig 8 The recognition accuracies of eight methods across 20 times on AR database using occlusion images such as sunglass and scarf. The x -axis represents 20 different partitions into the training and testing sets and the y -axis

M

AN US

CR IP T

is the corresponding recognition accuracies. (a) Sunglass, (b) Scarf and (c) Sunglass+Scarf.

5000

3500 3000

PT

Reconstruction Error

4000

RLSDP ERLSDP-1

ED

4500

2500 2000

CE

1500 1000

AC

500 0

0

5

10 Index of Samples

Fig 1

17

15

20

ACCEPTED MANUSCRIPT

98 RLSDP ERLSDP-1

94

92

90

88

4 Train

5 Train

6 Train

7 Train

AC

CE

PT

ED

M

AN US

Fig 2

CR IP T

Recognition Accuracy

96

18

ACCEPTED MANUSCRIPT

RLSDP (ORL)

ERLSDP (ORL)

150

300 200

100

100

50

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6

0

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6

-50

-100 -300

-200

-100 -200

-100

0

100

-300 -500

200

-300

-100

(a)

500

ERLSDP (AR)

300

400 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6

200 0 -200

0

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6

AN US

100

300

(b)

RLSDP (AR)

200

100

CR IP T

0

-400

-100

-600

-200 -300 -500

-800

-400

-300

-200

-100

0

100

200

-1000 -600

-400

-200

0

(d)

(c)

AC

CE

PT

ED

M

Fig 3

19

200

400

600

ACCEPTED MANUSCRIPT

(b) AR face database

(c) FERET face database

0.9

Recognition Accuracy

AC

CE

PT

ED

M

AN US

Fig 4

CR IP T

(a) ORL face database

0.8 0.7 0.6

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.5 0.4 0.3 0.2 10

20

30

40 50 Dimensions

20

60

70

80

ACCEPTED MANUSCRIPT

(a)

0.8 0.7 0.6

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.5 0.4 0.3 0.2

10

20

30

40

50 60 Dimensions

70

80

(b) 0.9

100

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.7 0.6 0.5

0.3

20

30

ED

10

M

0.4

0.2

90

AN US

Recognition Accuracy

0.8

CR IP T

Recognition Accuracy

0.9

40 50 Dimensions

60

70

80

PT

(c) Fig 5

1

Recognition Accuracy

AC

CE

0.95

0.9

0.85

0.8

0.75 0.7 1e-3 5e-3 1e-2 5e-2

RLSDP ERLSDP 0.1

0.2

(a)

21

λ

0.3

0.4

0.5

0.6

0.7

0.8

ACCEPTED MANUSCRIPT

1

Recognition Accuracy

0.95

0.9

0.85

0.8

0.75

0.1

0.2

λ

0.3

0.4

0.5

(b) 0.9

0.6

0.7 0.8

AN US

Recognition Accuracy

0.8

CR IP T

0.7 1e-3 5e-3 1e-2 5e-2

RLSDP ERLSDP

0.7

0.6

M

0.5

0.1

RLSDP ERLSDP

0.2

0.3

λ

(c) Fig 6

AC

CE

PT

ED

0.4 1e-3 5e-3 1e-2 5e-2

22

0.4

0.5

0.6

0.7 0.8

ACCEPTED MANUSCRIPT

1

0.8

0.7 CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.6

0.5

0.4

Sunglass

Scarf

Sunglass+Scarf

AC

CE

PT

ED

M

AN US

Fig 7

CR IP T

Recognition Accuracy

0.9

23

ACCEPTED MANUSCRIPT

Sunglass

Scarf

0.98

0.96 0.92 CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.9 0.86 0.82 0.78

Recognition Accuracy

Recognition Accuracy

0.94

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.88 0.84 0.8 0.76 0.72

0.74

0.68 0.7

0.64 2

4

6

8 10 12 14 Twenty different splits

16

18

0.6

20

2

4

6

(a)

0.94 0.9

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

0.86 0.82 0.78 0.74 0.7 0.66 0.62 2

4

6

AN US

Recognition Accuracy

8 10 12 14 Twenty different splits

(b)

Sunglass+Scarf

0.58

CR IP T

0.66

8 10 12 14 Twenty different splits

M

(c)

AC

CE

PT

ED

Fig 8

24

16

18

20

16

18

20

ACCEPTED MANUSCRIPT

Table captions

Table 1 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8

CR IP T

algorithms on ORL database.

Table 2 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8

AN US

algorithms on AR database.

Table 3 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8

M

algorithms on FERET database.

AC

CE

PT

ED

Table 4 The average training time (second) of different methods on FERET database.

25

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

Method

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

Table 1

4 Train

5 Train

6 Train

7 Train

84.96±2.13(107) 90.00±2.03(39) 90.15±2.06(47) 89.58±2.06(40) 90.02±2.04(40) 89.31±2.10(39) 89.37±2.29(39) 94.29±1.66(39)

88.70±2.59(100) 93.95±1.65(39) 93.90±1.67(43) 93.58±1.82(40) 94.18±1.61(40) 93.25±1.81(39) 93.80±1.88(39) 96.55±1.38(38)

90.87±2.84(86) 95.47±2.15(39) 95.56±2.05(50) 95.28±1.97(40) 95.69±2.26(45) 94.94±2.16(39) 95.38±2.21(39) 97.37±1.44(39)

92.54±2.01(152) 96.42±2.51(38) 96.71±1.88(54) 96.37±2.23(40) 96.67±2.39(42) 96.13±2.69(37) 96.50±2.34(39) 98.08±1.33(39)

26

Table 2

CR IP T

ACCEPTED MANUSCRIPT

3 Train

4 Train

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

55.25±8.38(135) 86.15±1.50(72) 86.57±1.30(75) 85.85±1.46(62) 86.15±1.47(72) 85.41±1.35(72) 86.07±1.43(72) 90.59±1.16(72)

64.90±8.22(156) 92.33±1.02(69) 92.51±1.21(65) 92.20±1.07(67) 92.34±1.04(68) 91.80±1.18(68) 92.40±0.98(68) 95.03±1.07(66)

5 Train

70.26±5.18(179) 95.18±0.86(69) 95.35±0.80(71) 95.25±0.85(49) 95.18±0.86(69) 94.80±0.72(76) 95.14±0.81(65) 96.75±0.72(63)

AC

CE

PT

ED

M

AN US

Method

27

ACCEPTED MANUSCRIPT

Table 3 3 Train

4 Train

5 Train

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

34.28±1.78(186) 77.22±1.69(22) 77.24±1.89(23) 76.49±1.84(23) 77.30±1.66(22) 74.44±1.96(23) 75.85±1.91(20) 84.79±0.94(21)

39.01±1.65(203) 86.06±1.12(22) 86.05±0.94(22) 85.72±1.10(22) 86.10±1.03(22) 85.02±1.29(21) 85.34±1.21(19) 88.69±0.83(25)

44.24±2.38(211) 89.88±0.94(18) 89.89±1.07(31) 89.80±1.00(23) 89.89±1.07(18) 89.13±1.06(22) 89.35±1.22(19) 90.94±0.91(25)

M

AN US

CR IP T

Method

Method

3 Train

4 Train

5 Train

1.2675 0.7862 7.5660 5.1745 0.8260 18.5032 1.4133 1.6816

3.1262 1.8946 15.0673 8.3795 1.8883 25.2518 3.2924 3.9990

5.7782 3.4140 24.6021 11.9262 3.4569 31.7259 5.9217 6.7766

AC

CE

PT

CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP

ED

Table 4

28