Accepted Manuscript
Enhanced regularized least square based discriminative projections for feature extraction Ming-Dong Yuan , Da-Zheng Feng , Wen-Juan Liu , Chun-Bao Xiao PII: DOI: Reference:
S0165-1684(17)30158-5 10.1016/j.sigpro.2017.04.018 SIGPRO 6464
To appear in:
Signal Processing
Received date: Revised date: Accepted date:
19 September 2016 21 March 2017 26 April 2017
Please cite this article as: Ming-Dong Yuan , Da-Zheng Feng , Wen-Juan Liu , Chun-Bao Xiao , Enhanced regularized least square based discriminative projections for feature extraction, Signal Processing (2017), doi: 10.1016/j.sigpro.2017.04.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Enhanced regularized least square based discriminative projections for feature extraction *
Ming-Dong Yuan, Da-Zheng Feng , Wen-Juan Liu, Chun-Bao Xiao National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, Shaanxi, China
CR IP T
Abstract: The regularized least square based discriminative projections (RLSDP) for extracting features was recently proposed, which aims to seek discriminant projection directions that maximize the between-class scatter and minimize the within-class compactness. However, in RLSDP, the retrieval samples are reconstructed by the
AN US
coefficients only associated with the same class, and may have large errors. Moreover, the distances between each sample and other within-class samples characterize the most important within-class compactness information, and are not minimized in RLSDP. To deal with the above two problems, we propose an enhanced regularized least
M
square based discriminative projections (ERLSDP). ERLSDP utilizes all the related coefficients of each sample for reconstruction and explicitly minimizes the distances between all the within-class samples, and thus it has better
ED
reconstruction accuracy and more discriminating power than that of RLSDP. Experimental results demonstrate that
PT
ERLSDP gets a clear improvement over RLSDP when the training sample size is small.
CE
Keywords: feature extraction; regularized least square; collaborative representation; sparse representation
AC
1. Introduction
Feature extraction, which aims to produce compact and effective low-dimensional feature representations of
high-dimensional data, has been extensively studied over the past several decades. Compared with the global based principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] approaches, manifold learning methods are more appealing since they can discover the local intrinsic structure of data. Representative manifold *
Corresponding author. Email address:
[email protected] (D. Feng);
[email protected] (M. Yuan) 1
ACCEPTED MANUSCRIPT
learning methods include locality preserving projections (LPP) [3], locality preserving discriminant projections (LPDP) [4], discriminative locality alignment (DLA) [5], discriminant locality preserving projections (DLPP) [6], marginal Fisher analysis (MFA) [7], etc. Although their motivations are different, they all can be unified in the graph embedding (GE) framework [7], and their differences lie in graph construction. Manifold learning has found
CR IP T
its wide applications in various fields. For example, Li et al. [8] developed a discriminative distance metric learning (DML) algorithm based on manifold learning, and further derived a distributed and parallel computational scheme to deal with the large-scale metric learning problem. Reference [9] exploited the manifold learning method to
AN US
analyze multivariate variable-length sequence data. Gao et al. [10] integrated local and global manifold structures for face and image classification.
Recently, sparse representation has shown its promising performance in many domains [11-15]. For instance,
M
Wright et al. [11] proposed a sparse representation based classification (SRC) for face recognition. Zhou et al. [12] proposed a double shrinking algorithm (DSA) for sparse projection eigenvectors. Moreover, many research efforts
ED
[16, 17] had shown that the neighborhood relationship of each data could be adaptively obtained by sparse 1 -graph
was robust to noise. Based on
1 -graph,
Qiao et al. [16]
PT
representation methods, and the resulted
proposed a sparsity preserving projections (SPP) for feature extraction, which aims at preserving the sparse
CE
reconstruction relationship of the data both in original space and low-dimensional embedding space. By combining
AC
the supervised SPP and maximum margin criterion, Gui et al. [18] introduced a discriminant sparse neighborhood preserving embedding (DSNPE) algorithm. Gao et al. [10] gave a discriminative sparsity preserving projections (DSPP), which first employs sparse representation to build an intrinsic graph and a penalty graph, and then integrates global within-class structure for dimensionality reduction. Despite their good performance, sparse representation methods need to solve
1
norm minimization problem, which has higher computational
complexity. 2
ACCEPTED MANUSCRIPT
Zhang et al. [19, 20] claimed that the collaborative representation mechanism was the key factor for the success of SRC, and proposed a collaborative representation based classification (CRC) method. CRC replaces the
1
norm in SRC with simpler
2
norm. CRC has similar properties and competitive classification performance
to SRC. Based on CRC, Yang et al. [21] constructed a
2
-graph and developed a collaborative representation
CR IP T
based projections (CRP) to preserve the collaborative reconstruction relationship of the data. Hua et al. [22] proposed a collaborative representation reconstruction based projections (CRRP). The projection matrix in CRRP is obtained by maximizing the collaborative reconstruction between-class scatter and minimizing the collaborative
AN US
reconstruction within-class scatter. Another method was proposed in [23], which is similar to [22]. In [24], Yang et al. developed a regularized least square based discriminative projections (RLSDP). It maximizes the between-class scatter adopted by LDA and minimizes the within-class compactness by the reconstruction residual from the same
M
class. However, RLSDP includes two main problems. First, the reconstructions by the coefficients only corresponding to the same class will have large errors, and thus RLSDP cannot give the best reconstruction for each
ED
sample. Second, it does not minimize the distances between each sample and other within-class samples, which is
PT
important for minimizing the within-class compactness. To address the above two problems, we propose an enhanced regularized least square based discriminative
CE
projections (ERLSDP). In ERLSDP, each sample is reconstructed by all the associated coefficients, which results in
AC
smaller reconstruction error. More importantly, the distances between each sample and all its reconstructed within-class samples, which characterize the most important within-class compactness, are minimized. The optimal discriminant projection of ERLSDP is achieved by maximizing the between-class scatter and minimizing the within-class compactness simultaneously. Experiments on three face databases indicate that our ERLSDP performs better than RLSDP. The main contributions of our work are as follows. 1) We make use of the whole representation coefficients to 3
ACCEPTED MANUSCRIPT
reconstruct each sample. In contrast, the original RLSDP only utilizes the partial representation coefficients corresponding to the same class for reconstruction of each sample. Thus, our ERLSDP achieves smaller reconstruction error and better classification performance. 2) We build a weight matrix to explicitly characterize the within-class geometry of the data, and minimize the distances between all the within-class samples. Meanwhile, by
CR IP T
maximizing the between-class scatter, the samples sharing the same class label will be pulled together and those from different classes will be pushed apart, which is a very desirable property for classification tasks.
The rest of this paper is structured as follows. In Section 2, the regularized least square (RLS) and RLSDP are
AN US
briefly reviewed. The proposed ERLSDP is detailed in Section 3. The experimental results are illustrated in Section 4, and the conclusions are given in Section 5. 2. RLS and RLSDP
Given a set of n training samples X [x1 , x2 ,
, xn ]
mn
with C classes, where xi
, xcnc ]
mnc
is the
, XC ] , where
c
contains the samples associated with class c , x j denotes the jth sample in
ED
Xc [x1c , xc2 ,
M
ith sample. Based on the class labels, X can also be portioned as X [X1 , X2 ,
m
PT
the cth class and nc is the number of samples in class c . 2.1. RLS
min xi Xsi si
2 2
si q ,
(1)
AC
CE
According to [24], both SRC and CRC can be unified in the regularized least square formulation as follows
where
0 is the regularized parameter, and q is often taken as 1 or 2. When q 1 , Eq. (1) is the sparse
representation method, which has no closed-form solution and must be solved in an iterative manner. If
is large
enough, some of the elements in s i will be close to zero, leading to sparse solution of s i .The case q 2 is the collaborative representation method, which has analytical solution and is computationally more efficient. 2.2. RLSDP 4
ACCEPTED MANUSCRIPT
RLSDP is a supervised feature extraction method which is based on the
2
-norm regularized least square. It
minimizes the reconstruction error of each sample by the coefficients from the same class, and maximizes the between-class separation simultaneously. For each training sample x i , its corresponding reconstruction coefficient vector s i can be obtained by Eq. (1) with q 2 , where si [ si ,1 ,
, si ,n ]T
si ,i 1 ,0, si ,i 1
n
. The goals of
2
min i 1 PT xi j 1 si, j PT x j , n
n
P
2
max tr (PT Sb P),
(d
m) is the low-dimensional projection matrix, Sb (1 n) j 1 ni mi m mi m C
is the between-class scatter in LDA, in which m (1 n)
n
i 1
T
i xi and mi (1 ni ) j 1 xij are the total mean
AN US
md
(2) (3)
P
where P
CR IP T
RLSDP correspond to the following two optimization problems
n
and the mean of the ith class, respectively. tr () is the trace operator and si , j takes the form s , if xi and x j are in the same class, si, j i , j otherwise. 0,
M
(4)
The objective function of RLSDP is defined as
ED
tr (PT Sb P) max , P tr (PT X(I S ST S ST )XT P)
PT
where S [s1 , s2 ,
(5)
, sn ]
nn
and I is an identity matrix with proper size. The solution of Eq. (5) can be
solved by the following maximum generalized eigenvalue problem of Sbp X(I S S S S )X p , T
CE
T
is the maximum eigenvalue and p is the corresponding eigenvector.
AC
where
T
3. Enhanced regularized least square based discriminative projections 3.1. Motivations
It is seen from Eq. (2) that to minimize the within-class compactness, RLSDP only minimizes the
reconstruction error of each sample x i and its reconstructed form by the coefficients s i , where the element of
s i is defined in Eq. (4). There are mainly two problems. First, using s i to reconstruct x i will have larger error 5
ACCEPTED MANUSCRIPT
since these non-zero values in s i only associated with the same class as x i . Second, RLSDP neglects the within-class geometry which is very important for characterizing the within-class compactness. Eq. (2) indicates that RLSDP merely minimizes the error of each x i and its reconstruction. However, RLSDP does not let the distances be minimized between each x i and all other reconstructed samples sharing the same class label as x i .
to say, the within-class compactness cannot be guaranteed in RLSDP.
CR IP T
Accordingly, the samples in the same class of RLSDP will not be clustered together in the projected space. That is
Based on the above analysis, we will propose an enhanced RLSDP (ERLSDP) which can simultaneously
AN US
address the two problems in RLSDP. In what follows, the proposed ERLSDP is first described in detail, followed by a discussion of it. 3.2. ERLSDP
In order to achieve our goals, we modify Eq. (2) as
min i , j 1 PT xi PT x j Wij , 2
M
n
P
(6)
2
ED
where x j Xs j is the reconstructed sample of x j , Wij is used to model the within-class geometry, which is
PT
defined as
CE
1, if xi and x j are from the same class, Wij 0, otherwise.
(7)
Performing some simple algebra operations and considering x j Xs j , Eq. (6) can be rewritten as
min i , j 1 PT xi PT Xs j Wij 2
AC
n
P
tr P X
2
tr P X i , j 1 (ei s j )Wij (ei s j )T XT P T
n
T
n i , j 1
e W e i
T ij i
eiWij sTj s jWij eTi s jWij sTj XT P
(8)
tr PT X(DW WST SW SDW ST ) XT P tr PT XMW XT P , where
ei [01 , ,0i 1,1,0i 1, ,0n ]T
n
, S [s1 , s2 , 6
, sn ]
nn
is the reconstruction coefficient
ACCEPTED MANUSCRIPT matrix, DW is the diagonal matrix with element DiiW
n
Wij , and MW DW WST SW SDW ST .
j 1
For classification purpose, it is desirable to simultaneously minimize the within-class compactness and maximize the between-class separation. As a result, we formulate the objective function of ERLSDP as
P
tr (PT Sb P) . tr (PT XMW XT P)
(9)
CR IP T
max
The optimal projection P can be solved by the maximum generalized eigenvalue problem
Sb P XMW XT P.
, pd be the eigenvectors of Eq. (10) corresponding to d maximum eigenvalues 1 2 , , d ,
then P [p1 , p2 ,
, pd ] is the optimal projection matrix. In small sample size (SSS) cases, the dimension of the
AN US
Let p1 , p2 ,
(10)
T
training sample vector is larger than the training sample size, which makes XMW X
singular. We thus adopt T
nonsingular in the PCA subspace.
ED
3.3. The outline of ERLSDP
is
M
PCA to reduce the dimension by throwing away those smallest principal components, so that XMW X
The main procedure of ERLSDP is summarized as follows:
PT
Step 1: Use PCA to preprocess the training data X and discard the smallest principal components. We still
CE
utilize X to represent the training data after PCA projection and denote the projection matrix by VPCA . Step 2: Calculate the reconstruction coefficient matrix S [s1 , s2 ,
, sn ] by Eq. (1) with q 2 , and
AC
construct the weight matrix W by Eq. (7). Step 3: Compute the between-class scatter matrix Sb (1 n)
C
n mi m mi m
j 1 i
T
and the matrix
XMW XT by Eq. (8). Step 4: Perform the generalized eigenvalue decomposition of Eq. (10) to get the optimal projection matrix
P [p1 , p2 ,
, pd ] , whose column vectors are the eigenvectors corresponding to the d largest eigenvalues of
Eq. (10). 7
ACCEPTED MANUSCRIPT
Step 5: The final projection matrix is given by V VPCA P . For each sample x , its low-dimensional embedding y is expressed as x y V x . T
3.4. Discussions
The first difference between RLSDP and the proposed ERLSDP is that the partial coefficients s i in RLSDP
CR IP T
are replaced by the whole coefficients s i (for convenience, the algorithm only related to this change is denoted by ERLSDP-1). We adopt ORL database to show the benefits of this change. The description of ORL database is postponed to Section 4. Fig. 1 gives the reconstruction errors of twenty samples (belonging to the first two classes)
AN US
by RLSDP and ERLSDP-1. As shown in Fig. 1, the reconstruction errors of ERLSDP-1 are much smaller than that of RLSDP. Accordingly, the objective function value of ERLSDP-1 will become larger, which is exactly what we are looking for. To further test their classification performance, we randomly choose 4, 5, 6 and 7 images from each
M
subject to construct the training set and the rest those as the testing set. Each experiment is independently repeated 20 times. It should be emphasized that the experimental settings here are the same to those in Section 4. The
ED
maximum average accuracies of RLSDP and ERLSDP-1 (with their best parameters) are shown in Fig. 2. It can be
PT
seen from Fig. 2 that ERLSDP-1 achieves better recognition accuracies than that of RLSDP, and their gaps become smaller with the increase of the number of training samples per class. This is probably because the larger
CE
coefficients may not concentrate on its own class when the number of training samples per class is small, and the
AC
collaborative representation effect by other samples with different labels is more critical. Therefore, RLSDP will have larger reconstruction errors. Of course, this phenomenon will be alleviated when the number of training samples per class increases. In this case, the larger coefficients will be approximately centered on its own class with higher chances. To illustrate the second difference between RLSDP and ERLSDP, we split Eq. (6) into two parts
8
ACCEPTED MANUSCRIPT
min i , j 1 PT xi PT x j Wij 2
n
P
2
i 1 PT xi PT xi n
2 2
i , j 1 PT xi PT x j Wij n
2
i j
2
i 1 PT xi j 1 si , j PT x j n
n
2 2
(11)
i , j 1 PT xi PT x j Wij , n
2
i j
2
part 1
part 2
n
s x j , and
j 1 i , j
CR IP T
where part 1 corresponds to minimizing the error of each sample x i and its reconstruction xi =
due to the constraints of W , part 2 corresponds to minimizing the distances between each x i and other reconstructed samples with the same class label as x i . Compared with Eq. (2) in RLSDP, if the whole coefficients of the related sample are used, Eq. (2) will be equivalent to the part 1 in Eq. (11), but the part 2 is neglected in
AN US
RLSDP. From this point of view, the proposed ERLSDP can minimize the distances between all the within-class samples. Therefore, all the samples belonging to the same class will be pulled together, which means that the within-class samples will be more compact after ERLSDP projection. Moreover, by maximizing the between-class
M
scatter at the same time, the distances between different classes will be drawn apart, which benefits for
ED
classification.
To clearly manifest the compact effect produced by our proposed ERLSDP, we perform two-dimensional (2-D)
PT
visualization experiments and compare it with the original RLSDP. The public available ORL and AR face
CE
databases are adopted in this experiment. The introduction of AR database can refer to Section 4, and those images only with illuminations and expressions (1400 images in total) are used. Specifically, for both databases, we
AC
randomly select 4 images from each class as the training set to learn the projection directions of RLSDP and ERLSDP, and then project all images onto a 2-D space. The scatter plots of six classes (denoted as Class 1 to Class 6) randomly picked from ORL and AR databases by RLSDP projection are illustrated in Fig. 3(a) and Fig. 3(c) respectively, and the scatter plots of the same samples using ERLSDP method are shown in Fig. 3(b) and Fig. 3(d). From Fig. 3, we observe that the 2-D embedding results of RLSDP tend to be scattered, and some data points from different classes mix together, which will easily increase the wrong classification rate of them. By contrast, the 9
ACCEPTED MANUSCRIPT
distributions of the within-class samples by the proposed ERLSDP are more compact, and the margins between different classes are greater than that of RLSDP, which intuitively demonstrate the superiority of ERLSDP. 4. Experimental results To show the effectiveness of ERLSDP, we compare it with CRP [21], LDA [2], MFA [7], DSNPE [18], DLPP
CR IP T
[6], CRRP [23] and RLSDP [24] on three face databases, namely ORL [25], AR [26] and FERET [27]. For MFA, we empirically set the neighbor parameters k1 as ni 1 and select k 2 from {1C,3C,5C,7C,9C} , where
ni and C are the number of training samples in class i and the number of classes, respectively. The public
parameter
1
-magic (http://users.ece.gatech.edu/~justin/l1magic/) is used for DSNPE. The regularized
AN US
available solver
is fixed as 0.1 in the proposed ERLSDP and searched by grids for CRP, CRRP and RLSDP. For all
the algorithms, we first use PCA to reduce the dimension by keeping 98% data energy, so as to alleviate the small
M
sample size (SSS) problem. The nearest neighbor (1NN) classifier with Euclidean metric is used for classification. All our experiments are conducted on MATLAB R2010b installed on a notebook with Inter(R) Core(TM)
ED
i5-2450M 2.50GHz CPU and 4G memory.
PT
The ORL face database contains 400 images from 40 individuals with 10 images for each individual. These images are taken in different times and varied with facial expressions and facial details. All the images are cropped
CE
to 32 32 pixels. The full AR database is over 4000 color images of 126 individuals, including 70 men and 56
AC
women. A subset of the AR face database provided by Martinez is used in our experiments. It contains 2600 images corresponding to 100 people (50 men and 50 women), and each person has 26 different images. These images are taken in two sections and each section has 13 images, including 7 full facial images (with illumination changes and expressions) and 6 occlusion images (3 images with sunglasses and 3 images with scarves). The images are cropped to 55 40 pixels. The FERET face database was sponsored by the U.S. Department of Defense. We exploit a subset of FERET database which contains 1400 images of 200 individuals with variations of facial expression, 10
ACCEPTED MANUSCRIPT
illumination and pose. The images are cropped to 32 32 pixels. Some images of these databases are shown in Fig. 4. All the images are transformed into long vectors with fixed order, and each vector is normalized to unit length. We start our experiments using full facial images without occlusions. For each database, several images randomly
CR IP T
selected from each subject are used as training, and the rest those used as testing. Specifically, 4, 5, 6 and 7 images are randomly selected from ORL. Three, four and five images are randomly selected from both AR (1400 face images with illumination changes and expressions) and FERET. Each experiment is independently repeated 20
AN US
times and the averaged results are reported. Tables 1-3 list the maximum average accuracies, the standard derivations and the corresponding dimensions of all the algorithms on ORL, AR, and FERET, respectively. The best results are in boldface. Fig. 5 shows the average accuracies with the variation of dimensions on ORL, AR and
M
FERET when the number of training samples per class is fixed as 5. Fig. 6 plots the maximum average accuracies of RLSDP and the proposed ERLSDP with the changes of
on three databases when the number of training
ED
samples per class is 4. The average training time (second) of each method on FERET database across 20 times is
PT
tabulated in Table 4.
To further investigate the performance of our proposed ERLSDP, we conduct three experiments on AR
CE
database with sunglasses and scarf occlusions. The first experiment considers the impact of face images occluded
AC
by sunglasses, where the occlusion rate is about 20%. We randomly select 4 images from 14 full facial images, and 1 image from 6 occlusion images with sunglasses as the training set, and the remaining full facial images and the occlusion images with sunglasses as the testing set. In this case, we have 500 training images and 1500 testing images. The second experiment considers the impact of face images with scarves, which occlude about 40% of the image. In this experiment, we randomly select 4 from 14 full facial images and 1 occlusion image with scarves for training, and the remaining full images and the images with scarves for testing. The last experiment considers the 11
ACCEPTED MANUSCRIPT
impact of images with both sunglasses and scarves. In this experiment, we randomly select four full facial images, one occlusion image with sunglasses and one occlusion image with scarves for training, and the rest for testing. Therefore, we have 600 training images and 2000 testing images. Each experiment is independently repeated 20 times. Since the pixel region of sunglasses or scarves covers a portion of face region, they can be viewed as large
CR IP T
noise or outliers. Fig. 7 shows the maximum average recognition accuracies and the corresponding standard derivations of eight methods under three experiments, and Fig. 8 plots the recognition accuracies of all the methods across 20 different splits under different occlusion images.
AN US
From Tables 1-4 and Figs. 5-8, we get the following observations. (1) CRP performs the worst on all databases since it is unsupervised in nature, and the label information is crucial for classification. (2) The proposed ERLSDP consistently outperforms RLSDP for different training sample sizes. (3) The improvements of ERLSDP over other methods are more evident when the training sample size per class is small. (4) ERLSDP is robust to the
in a wide range, and the recognition performance of ERLSDP is consistently superior to
that of RLSDP when
varies. (5) ERLSDP is slightly slower than RLSDP in the training stage. This is because
ED
M
regularized parameter
PT
ERLSDP needs more matrix-matrix multiplications than that of RLSDP, which can be seen from their objective functions in Eq. (9) and Eq. (5), respectively. However, ERLSDP is faster than DSNPE. This is due to the fact that
1
norm to calculate the collaborative representation coefficients and has closed-form solution,
norm used in DSNPE has no analytical solution and must be solved iteratively. (6) ERLSDP gets the
AC
while the
2
CE
ERLSDP utilizes
best performance among all the compared methods both on full facial images and occlusion images with sunglasses and scarves.
5. Conclusions and future work In this paper, we propose an ERLSDP for feature extraction. Compared with the original RLSDP, ERLSDP utilizes all the corresponding coefficients of each sample so that it achieves better reconstruction accuracy. In 12
ACCEPTED MANUSCRIPT
addition, ERLSDP also explicitly minimizes the distances between all the within-class samples at the same time. Therefore, it is able to make the within-class samples more compact that is desirable for classification. Experimental results on three face databases validate its effectiveness. Although better performance has been achieved by our ERLSDP, it is still prone to be badly affected by
2
norm to measure both the loss function in Eq.
CR IP T
extreme outliers or in complex noise condition, since we adopts
(1) and the distances between pairwise points in the objective function of Eq. (9). When there are outliers,
2
norm will magnify the effect of outliers related to large deviations, leading to unsatisfactory results. To address this 1
norm [20, 28-31],
2,1
norm [32-34] and
AN US
problem, some outlier-robust metrics have been proposed, such as
correntropy [35-37]. Moreover, a weighting approach is also presented in [38] to suppress outliers. It is very interesting to further study the performance of ERLSDP using different metrics and weighting scheme for more
M
robust learning. We left them as our future work.
ED
Acknowledgment
CE
References
PT
This work was supported by the National Natural Science Foundation of China under Grant 61271293.
[1] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci., 3 (1991) 71-86.
AC
[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell., 19 (1997) 711-720.
[3] X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using Laplacianfaces, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 328-340.
[4] J. Gui, W. Jia, L. Zhu, S.-L. Wang, D.-S. Huang, Locality preserving discriminant projections for face and palmprint recognition, Neurocomputing, 73 (2010) 2696-2707. [5] T. Zhang, D. Tao, X. Li, J. Yang, Patch Alignment for Dimensionality Reduction, IEEE Trans. Knowl. Data Eng., 21 (2009) 1299-1313. [6] W. Yu, X. Teng, C. Liu, Face recognition using discriminant locality preserving projections, Image Vis. Comput., 24 (2006) 239-248. [7] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: a general framework for 13
ACCEPTED MANUSCRIPT
dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007) 40-51. [8] J. Li, X. Lin, X. Rui, Y. Rui, D. Tao, A Distributed Approach Toward Discriminative Distance Metric Learning, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015) 2111-2122. [9] S.S. Ho, P. Dai, F. Rudzicz, Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search, IEEE Trans. Neural Netw. Learn. Syst., 27 (2016) 1333-1344. [10] Q. Gao, Y. Huang, H. Zhang, X. Hong, K. Li, Y. Wang, Discriminative sparsity preserving projections for image recognition, Pattern Recognit., 48 (2015) 2543-2553. [11] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., 31 (2009) 210-227.
CR IP T
[12] T. Zhou, D. Tao, Double shrinking sparse dimension reduction, IEEE Trans. Image Process., 22 (2013) 244-257. [13] H. Cheng, Z. Liu, L. Yang, X. Chen, Sparse representation and learning in visual recognition: Theory and applications, Signal Process., 93 (2013) 1408-1425.
[14] S. Huang, Y. Yang, D. Yang, L. Huangfu, X. Zhang, Class specific sparse representation for classification, Signal Process., 116 (2015) 38-42.
[15] X. Gao, N. Wang, D. Tao, X. Li, Face Sketch–Photo Synthesis and Retrieval Using Sparse Representation, IEEE
AN US
Trans. Circuits Syst. Video Technol., 22 (2012) 1213-1226.
[16] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognit., 43 (2010) 331-341.
[17] B. Cheng, J. Yang, S. Yan, Y. Fu, T.S. Huang, Learning with l1-graph for image analysis, IEEE Trans. Image Process., 19 (2010) 858-866.
[18] J. Gui, Z. Sun, W. Jia, R. Hu, Y. Lei, S. Ji, Discriminant sparse neighborhood preserving embedding for face recognition, Pattern Recognit., 45 (2012) 2884-2893.
M
[19] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, in: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 471-478. [20] L. Zhang, M. Yang, X. Feng, Y. Ma, D. Zhang, Collaborative representation based classification for face recognition,
ED
in: arXiv preprint arXiv:1204.2358, 2012.
[21] W. Yang, Z. Wang, C. Sun, A collaborative representation based projections method for feature extraction, Pattern Recognit., 48 (2015) 20-27.
PT
[22] J. Yin, L. Wei, M. Song, W. Zeng, Optimized projection for Collaborative Representation based Classification and its applications to face recognition, Pattern Recognit. Lett., 73 (2016) 83-90.
CE
[23] J. Hua, H. Wang, M. Ren, H. Huang, Dimension reduction using collaborative representation reconstruction based projections, Neurocomputing, 193 (2016) 1-6. [24] W. Yang, C. Sun, W. Zheng, A regularized least square based discriminative projections for feature extraction,
AC
Neurocomputing, 175 (2016) 198-205. [25] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for human face identification, in: Proceedings of the Second IEEE Workshop on Applications of Computer Vision, Sarasota, FL, 1994, pp. 138-142. [26] A.M. Martínez, A.C. Kak, Pca versus lda, IEEE Trans. Pattern Anal. Mach. Intell., 23 (2001) 228-233. [27] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERET database and evaluation procedure for face-recognition algorithms, Image Vis. Comput., 16 (1998) 295-306. [28] H. Wang, X. Lu, Z. Hu, W. Zheng, Fisher Discriminant Analysis With L1-Norm, IEEE T. Cybern., 44 (2014) 828-842. [29] F. Zhong, J. Zhang, D. Li, Discriminant Locality Preserving Projections Based on L1-Norm Maximization, IEEE Trans. Neural Netw. Learn. Syst., 25 (2014) 2065-2074. [30] H. Wang, F. Nie, H. Huang, Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and 14
ACCEPTED MANUSCRIPT
Maximization, in: Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1836-1844. [31] G.-F. Lu, J. Zou, Y. Wang, L1-norm and maximum margin criterion based discriminant locality preserving projections via trace Lasso, Pattern Recognit., 55 (2016) 207-214. [32] C.-X. Ren, D.-Q. Dai, H. Yan, Robust classification using ℓ 2, 1-norm based regression model, Pattern Recognit., 45 (2012) 2708-2718. [33] F. Nie, H. Huang, X. Cai, C.H. Ding, Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization, in: Advances in Neural Information Processing Systems, 2010, pp. 1813-1821. IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 171-184.
CR IP T
[34] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust Recovery of Subspace Structures by Low-Rank Representation, [35] R. He, B.G. Hu, W.S. Zheng, X.W. Kong, Robust Principal Component Analysis Based on Maximum Correntropy Criterion, IEEE Trans. Image Process., 20 (2011) 1485-1494.
[36] R. He, W.S. Zheng, B.G. Hu, Maximum Correntropy Criterion for Robust Face Recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011) 1561-1576.
[37] W. Liu, P.P. Pokharel, J.C. Principe, Correntropy: Properties and Applications in Non-Gaussian Signal Processing,
AN US
IEEE Trans. Signal Process., 55 (2007) 5286-5298.
[38] C.X. Ren, D.A.I. D. Q, X. He, H. Yan, Sample Weighting: An Inherent Approach for Outlier Suppressing
AC
CE
PT
ED
M
Discriminant Analysis, IEEE Trans. Knowl. Data Eng., 27 (2015) 3070-3083.
15
ACCEPTED MANUSCRIPT
Figure captions
CR IP T
Fig 1 Reconstruction errors of twenty samples by RLSDP and ERLSDP-1 on ORL face database.
Fig 2 The maximum average accuracies by RLSDP and ERLSDP-1 on ORL database when the number of training
AN US
samples per class increases from 4 to 7.
Fig 3 Scatter plots of six classes randomly picked from ORL and AR face databases by RLSDP and ERLSDP
M
methods, respectively. (a) RLSDP on ORL, (b) ERLSDP on ORL, (c) RLSDP on AR and (d) ERLSDP on AR.
ED
Fig 4 Some samples from three face databases.
PT
Fig 5 The average accuracies versus dimensions of all methods on ORL, AR and FERET face databases when the
AC
FERET.
CE
number of training samples per class is fixed as 5. (a) 5 Train on ORL, (b) 5 Train on AR and (c) 5 Train on
Fig 6 The maximum average accuracies of RLSDP and the proposed ERLSDP with the changes of when the number of training samples per class is 4. (a) 4 Train on ORL, (b) 4 Train on AR and (c) 4 Train on FERET.
Fig 7 The maximum average recognition accuracies, and the corresponding standard derivations of eight methods under three experiments (sunglass, scarf, and sunglass+scarf) on AR database. 16
ACCEPTED MANUSCRIPT
Fig 8 The recognition accuracies of eight methods across 20 times on AR database using occlusion images such as sunglass and scarf. The x -axis represents 20 different partitions into the training and testing sets and the y -axis
M
AN US
CR IP T
is the corresponding recognition accuracies. (a) Sunglass, (b) Scarf and (c) Sunglass+Scarf.
5000
3500 3000
PT
Reconstruction Error
4000
RLSDP ERLSDP-1
ED
4500
2500 2000
CE
1500 1000
AC
500 0
0
5
10 Index of Samples
Fig 1
17
15
20
ACCEPTED MANUSCRIPT
98 RLSDP ERLSDP-1
94
92
90
88
4 Train
5 Train
6 Train
7 Train
AC
CE
PT
ED
M
AN US
Fig 2
CR IP T
Recognition Accuracy
96
18
ACCEPTED MANUSCRIPT
RLSDP (ORL)
ERLSDP (ORL)
150
300 200
100
100
50
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
0
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
-50
-100 -300
-200
-100 -200
-100
0
100
-300 -500
200
-300
-100
(a)
500
ERLSDP (AR)
300
400 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
200 0 -200
0
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
AN US
100
300
(b)
RLSDP (AR)
200
100
CR IP T
0
-400
-100
-600
-200 -300 -500
-800
-400
-300
-200
-100
0
100
200
-1000 -600
-400
-200
0
(d)
(c)
AC
CE
PT
ED
M
Fig 3
19
200
400
600
ACCEPTED MANUSCRIPT
(b) AR face database
(c) FERET face database
0.9
Recognition Accuracy
AC
CE
PT
ED
M
AN US
Fig 4
CR IP T
(a) ORL face database
0.8 0.7 0.6
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.5 0.4 0.3 0.2 10
20
30
40 50 Dimensions
20
60
70
80
ACCEPTED MANUSCRIPT
(a)
0.8 0.7 0.6
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.5 0.4 0.3 0.2
10
20
30
40
50 60 Dimensions
70
80
(b) 0.9
100
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.7 0.6 0.5
0.3
20
30
ED
10
M
0.4
0.2
90
AN US
Recognition Accuracy
0.8
CR IP T
Recognition Accuracy
0.9
40 50 Dimensions
60
70
80
PT
(c) Fig 5
1
Recognition Accuracy
AC
CE
0.95
0.9
0.85
0.8
0.75 0.7 1e-3 5e-3 1e-2 5e-2
RLSDP ERLSDP 0.1
0.2
(a)
21
λ
0.3
0.4
0.5
0.6
0.7
0.8
ACCEPTED MANUSCRIPT
1
Recognition Accuracy
0.95
0.9
0.85
0.8
0.75
0.1
0.2
λ
0.3
0.4
0.5
(b) 0.9
0.6
0.7 0.8
AN US
Recognition Accuracy
0.8
CR IP T
0.7 1e-3 5e-3 1e-2 5e-2
RLSDP ERLSDP
0.7
0.6
M
0.5
0.1
RLSDP ERLSDP
0.2
0.3
λ
(c) Fig 6
AC
CE
PT
ED
0.4 1e-3 5e-3 1e-2 5e-2
22
0.4
0.5
0.6
0.7 0.8
ACCEPTED MANUSCRIPT
1
0.8
0.7 CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.6
0.5
0.4
Sunglass
Scarf
Sunglass+Scarf
AC
CE
PT
ED
M
AN US
Fig 7
CR IP T
Recognition Accuracy
0.9
23
ACCEPTED MANUSCRIPT
Sunglass
Scarf
0.98
0.96 0.92 CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.9 0.86 0.82 0.78
Recognition Accuracy
Recognition Accuracy
0.94
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.88 0.84 0.8 0.76 0.72
0.74
0.68 0.7
0.64 2
4
6
8 10 12 14 Twenty different splits
16
18
0.6
20
2
4
6
(a)
0.94 0.9
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
0.86 0.82 0.78 0.74 0.7 0.66 0.62 2
4
6
AN US
Recognition Accuracy
8 10 12 14 Twenty different splits
(b)
Sunglass+Scarf
0.58
CR IP T
0.66
8 10 12 14 Twenty different splits
M
(c)
AC
CE
PT
ED
Fig 8
24
16
18
20
16
18
20
ACCEPTED MANUSCRIPT
Table captions
Table 1 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8
CR IP T
algorithms on ORL database.
Table 2 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8
AN US
algorithms on AR database.
Table 3 The maximum average accuracies (%), the standard derivations (%) and the corresponding dimensions of 8
M
algorithms on FERET database.
AC
CE
PT
ED
Table 4 The average training time (second) of different methods on FERET database.
25
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
AC
Method
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
Table 1
4 Train
5 Train
6 Train
7 Train
84.96±2.13(107) 90.00±2.03(39) 90.15±2.06(47) 89.58±2.06(40) 90.02±2.04(40) 89.31±2.10(39) 89.37±2.29(39) 94.29±1.66(39)
88.70±2.59(100) 93.95±1.65(39) 93.90±1.67(43) 93.58±1.82(40) 94.18±1.61(40) 93.25±1.81(39) 93.80±1.88(39) 96.55±1.38(38)
90.87±2.84(86) 95.47±2.15(39) 95.56±2.05(50) 95.28±1.97(40) 95.69±2.26(45) 94.94±2.16(39) 95.38±2.21(39) 97.37±1.44(39)
92.54±2.01(152) 96.42±2.51(38) 96.71±1.88(54) 96.37±2.23(40) 96.67±2.39(42) 96.13±2.69(37) 96.50±2.34(39) 98.08±1.33(39)
26
Table 2
CR IP T
ACCEPTED MANUSCRIPT
3 Train
4 Train
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
55.25±8.38(135) 86.15±1.50(72) 86.57±1.30(75) 85.85±1.46(62) 86.15±1.47(72) 85.41±1.35(72) 86.07±1.43(72) 90.59±1.16(72)
64.90±8.22(156) 92.33±1.02(69) 92.51±1.21(65) 92.20±1.07(67) 92.34±1.04(68) 91.80±1.18(68) 92.40±0.98(68) 95.03±1.07(66)
5 Train
70.26±5.18(179) 95.18±0.86(69) 95.35±0.80(71) 95.25±0.85(49) 95.18±0.86(69) 94.80±0.72(76) 95.14±0.81(65) 96.75±0.72(63)
AC
CE
PT
ED
M
AN US
Method
27
ACCEPTED MANUSCRIPT
Table 3 3 Train
4 Train
5 Train
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
34.28±1.78(186) 77.22±1.69(22) 77.24±1.89(23) 76.49±1.84(23) 77.30±1.66(22) 74.44±1.96(23) 75.85±1.91(20) 84.79±0.94(21)
39.01±1.65(203) 86.06±1.12(22) 86.05±0.94(22) 85.72±1.10(22) 86.10±1.03(22) 85.02±1.29(21) 85.34±1.21(19) 88.69±0.83(25)
44.24±2.38(211) 89.88±0.94(18) 89.89±1.07(31) 89.80±1.00(23) 89.89±1.07(18) 89.13±1.06(22) 89.35±1.22(19) 90.94±0.91(25)
M
AN US
CR IP T
Method
Method
3 Train
4 Train
5 Train
1.2675 0.7862 7.5660 5.1745 0.8260 18.5032 1.4133 1.6816
3.1262 1.8946 15.0673 8.3795 1.8883 25.2518 3.2924 3.9990
5.7782 3.4140 24.6021 11.9262 3.4569 31.7259 5.9217 6.7766
AC
CE
PT
CRP LDA MFA DSNPE DLPP CRRP RLSDP ERLSDP
ED
Table 4
28