A new decision rule for sparse representation based classification for face recognition

A new decision rule for sparse representation based classification for face recognition

Neurocomputing ] (]]]]) ]]]–]]] Contents lists available at SciVerse ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom ...

541KB Sizes 0 Downloads 99 Views

Neurocomputing ] (]]]]) ]]]–]]]

Contents lists available at SciVerse ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A new decision rule for sparse representation based classification for face recognition Jiang Li a, Can-Yi Lu b,c,n a

College of Information System and Management, National University of Defense Technology, Changsha 410073, China Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, China c Department of Automation, University of Science and Technology of China, Hefei, China b

a r t i c l e i n f o

Keywords: Face recognition Sparse representation Decision rule

abstract The sparse representation based classification (SRC) method attracts much attention in recent years, due to its promising result and robustness for face recognition. Different from the previous improved versions of SRC which emphasize more on sparsity, we focus on the decision rule of SRC. SRC predicts the label of a given test sample based on the residual which measures the representational capability of the training data of each class. Such decision rule is the same as the nearest feature classifiers (NFCs), but not optimal for SRC which is based on the mechanism of sparsity. In this paper, we first review the NFCs, and rewrite them in a unified formulation. We found that the objective of NFCs is different from SRC but they use the same decision rule. In order to capture more discriminative information from the sparse coding coefficient, we propose a new decision rule, sum of coefficient (SoC), which matches well with SRC. SoC is based on the fact that the sparse coefficient reflects the similarities between data, which are able to take full advantage of sparsity for classification. SoC can be regarded as the voting decision rule which is widely used in ensemble learning, i.e. Adaboost, Bagging. We compare our method with the original SRC on three representative face databases and show that SoC is much more discriminative and accurate. Crown Copyright & 2012 Published by Elsevier B.V. All rights reserved.

1. Introduction Face recognition is one of the most widely studied classification problem in computer version and pattern recognition. Recently, sparse representation (or sparse coding) based classification (SRC) [1] has been successfully used and led to promising results in face recognition. SRC utilizes the linear subspace structure of face images, and casts the face recognition problem as a sparse representation framework. Specifically, SRC looks for the sparsest representation of a query image in a dictionary composed of all training data across all classes. Then it classifies the query image to the class of training data with the minimal reconstruction error. SRC is robust to face occlusion and corruption. Based on sparse representation, many methods have been proposed to improve the performance and robustness, and applied to other machine learning problems. For robustness, [2] casts the face recognition problem as a structured sparse representation problem. It explicitly takes the advantage of the block structure where the training data in each

n Corresponding author at: Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei, China. E-mail addresses: [email protected], [email protected] (C.-Y. Lu).

class form few blocks of the dictionary. Compared with SRC, it is more robust to occlusion, corruption, and disguise. Ref. [3] presents another robust sparse coding method which uses the MLE to model the coding residual. Analogously, [4] uses the maximum correntropy criterion to model the coding residual, which aims to detect the noise and utilize the unterminated data to yield a robust sparse representation. A kernel version of SRC is introduced in [5], which is a sparse coding method in a high dimensional feature space. In addition to classifier construction and label prediction, feature extraction is also an important stage for face recognition. Principal component analysis (PCA) [6] and linear discriminant analysis (LDA) [7] are two most popular methods. In order to further exploit more discriminative information from sparsity, [8] proposes a sparsity preserving projections (SPP) method. Ref. [9] further utilizes the label information and conducts an optimized projection for sparse representation based classification (OP-SRC) method. It makes use of the decision rule of SRC to design the projections and is optimal for SRC. The sparse coding technique has also been applied to deal with other learning problems. Ref. [10] uses the sparse coding coefficient to measure the similarities between data. Then the normalized cuts [11] method is employed to segment data into different subspaces. Furthermore, [12] uses the sparse coding coefficient to

0925-2312/$ - see front matter Crown Copyright & 2012 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.04.034

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

2

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]] 1

construct a ‘ -graph and applies it to various machine learning tasks, i.e. clustering, subspace learning, and semi-supervised learning. Another important topic in sparse coding is how to solve the optimization problem for face recognition. A review can be found in [13]. It shows that the homotopy [14] and ALM [15] are better for their accuracy and fast speed for face recognition. Different from the above improved or extended sparse representation based methods, we focus on the decision rule of SRC, which is also very important but not deeply studied before. For a given query image, after obtaining the sparse coding coefficient, SRC makes the decision based on the residual, which is the same as nearest feature classifiers (NFCs), i.e. nearest neighbor (NN), nearest feature line (NFL), nearest feature plane (NFP), and nearest feature subspace (NFS). NFCs aim to seek the minimal residual between the query image and the subspace of training data of each class, while SRC looks for the sparsest representation of the query image on all training data. The objective of SRC does not match well with its decision rule. Motivated by the work in [10,12] which use the sparse coefficient to measure the similarities between data, we propose a new decision rule, sum of coefficient (SoC), which considers the similarity between query image and each class as a sum of sparse representation coefficient corresponding to the training data of each class. SoC is able to capture more discriminative information from sparsity and it matches well with the mechanism of SRC for classification. The rest of this paper is structured as follows. We first briefly review the NFCs and SRC method, and further rewrite the NFCs as similar optimization problems and show that the residual based decision rule is suitable for NFCs but not SRC. In Section 3, we propose the SoC method and discuss its superiority. In Section 4, the proposed method is validated by conducting some experiments on three face databases along with the comparison with the original SRC method. We conclude this paper in Section 5.

2.1.1. Nearest neighbor NN is one of the most widely used pattern recognition methods, due to its simplicity and efficiency. This rule classifies the query image y by assigning it the class label c^ , according to dðy,xci^^ Þ ¼

min

1 r c r k,1 r i r nc

dðy,xci Þ,

where dðy,xci Þ ¼ Jyxci J2 denotes the distance between samples. 2.1.2. Nearest feature line NFL is an extension of NN. It generalizes each pair of datum points belonging to the same class: fxci ,xcj g by a linear function Lcij , which is called feature line. NFL classifies the query image y by ^ according to assigning it the class label c, ^

dðy,Lci^j^ Þ ¼

dðy,Lcij Þ ¼

min

1 r c r k,1 r i,j r nc ,i a j

min

1 r c r k,1 r i,j r nc ,i a j

Jypcij J2 ,

where pcij is the projection of y onto Lcij , it can be computed as pcij ¼ xci þ lðxcj xci Þ, where l ¼ ðyxci ÞT ðxcj xci Þ=Jxcj xci J22 . 2.1.3. Nearest feature plane NFP is an extension of NFL. NFP is based on the assumption that at least three linearly independent points are available for each class. It generalizes three data points fðxci ,xcj ,xcm Þg of the same class by a feature plane F cijm . NFP classifies the query image y by ^ according to assigning it the class label c, ^

dðy,F ci^j^m^ Þ ¼ ¼

min

dðy,F cijm Þ

min

Jypcijm J2 ,

1 r c r k,1 r i,j,m r nc ,i a j a m

1 r c r k,1 r i,j,m r nc ,i a j a m

where pcijm is the projection of y onto F cijm , it can be computed as pcijm ¼ X cijm ðX cijm TX cijm Þ1 X cijm Ty,

2. Related works

where X cijm ¼ ½xci xcj xcm .

A basic problem in pattern recognition is to correctly determine the class which a new test sample yA Rd belongs to. We arrange the nc training samples from the c-th class as columns of a matrix X c ¼ ½xc1 , . . . ,xcnc  A Rdnc where d is the dimension. By concatenating the training samples of all k classes, we get a new matrix X for the entire training set as X ¼ ½X 1 , . . . ,X k  A Rdn , P where n ¼ kc ¼ 1 nc is the total number of training data. The task is to predict the class label of a given test sample y by the classifier which is constructed from training data. A series of methods, NFCs and SRC, assume the training data from a single class do lie on a subspace. If the sampling training data are sufficient, a query image y of subject c can be approximated by a linear combination of Xc, i.e. y ¼ X c ac for some coefficient vector ac A Rnc . But the true label c of y is unknown, we only can express y as a linear combination of all training data

2.1.4. Nearest feature subspace NFS is an extension of NFP. It generates all the datum points of c-th class by a feature subspace Sc. NFS classifies the query image y by assigning it the class label c^ , according to

y ¼ X a ¼ X 1 a1 þ    þ X k ak , 1

k

ð1Þ

n

where a ¼ ½a ;    ; a  A R . The major difference between NFCs and SRC lies in learning their coefficient vector a for face recognition. We will review these works in a similar formulation. 2.1. Nearest feature classifiers Nearest feature classifiers are a series of face recognition methods, including NN [16], NFL [17], NFP [18], and NFS [18]. SRC can be regarded as a generalization of NN and NFS. It strikes a balance between NN and NFS, which is similar to NFL and NFP. We first review the NFCs, then reformulate them in a general way.

^

dðy,Sc Þ ¼ min dðy,Sc Þ ¼ min Jypc J2 , 1rcrk

1rcrk

c

where p is the projection of y onto Sc, it can be computed as T

T

pc ¼ X c ðX c X c Þ1 X c y: It is easy to obtain that dðy,F cijm Þ r minðdðy,Lcij Þ,dðy,Lcim Þ,dðy,Lcjm ÞÞ r minðdðy,xci Þ,dðy,xcj Þ,dðy,xcm ÞÞ

and dðy,Sc Þ r mindðy,F cijm Þ: i,j,m

Consequently, NFL classifier is supposed to handle more variations than NN, NFP should capture more variations of each class than NFL and NFS should handle more variations than NFP. So, it is expected that NFL outperforms NN, NFP performs better than NFL and NFS is more accurate than NFP. It was suggested that the improvement gained by using more training data is due to their faculty to expand the representational ability of the available feature points, accounting for new conditions not represented by the original set.

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]]

3

Table 1 A general formulation of NFCs and SRC. Methods

Objective function

Constrains

Decision rule T

NN

Jac J0 ¼ 1,1c ac ¼ 1

NFL

c

cT

Ja J0 ¼ 2,1 ac ¼ 1

NFP

min c

k P

fa g c ¼ 1

NFS SRC

arg minJyX c ac J2

T

Jac J0 ¼ 3,1c ac ¼ 1

c

JyX ac J2

c

 y ¼ Xa

minJaJ1 a

0.8 0.6 0.4 0.2 0 −0.2 0

200

400

600

800

1000

1200

0.8

1.5

SoC

Residual

0.6 1

0.5

0.4 0.2 0

0

−0.2 0

10

20 Subject

30

40

0

10

20 Subject

30

40

1

Fig. 1. A test image. (a) The value of the sparse representation coefficient obtained by the ‘ -minimization problem. The test image belongs to subject 1. (b) The residuals of the test image with respect to the training data of each class. The ratio between the two smallest residuals is 1:2.6. (c) The sum of coefficient of the test image with respect to the training data of each class. The ratio between the two largest sums of coefficient is 1:5.6.

2.1.5. A general formulation of nearest feature classifiers The differences between different NFCs are their representational ability for query image. They compute the distance between y and the subspace of c-th class in different ways. As shown in Table 1, we rewrite the NFCs as the optimization problems, which is a new perspective for understanding NFCs and their relationships. They have the same objective function and use residual as the decision rule. Different NFCs use different numbers of training data of each class to express the query image. NN uses only one, NFL uses two, NFP uses three, and NFS uses all the training data of each class. Different number of training data used in each class for representing the query image results to different representational abilities. The more data are used, the more powerful of their representational ability, and the more variations they can capture. 2.2. Sparse representation based classification SRC is a generalization of NFCs and strikes a balance between NN and NFS. Different from NFCs, the representation of SRC is global, using all the training data as a dictionary. Motivated by the fact that a 0 correct representation is sparse, SRC aims to solve the following ‘ minimization problem:

a^ 0 ¼ arg minJaJ0 subject to y ¼ X a,

ð2Þ

0

where J  J0 denotes the ‘ -norm, which counts the number of 0 nonzero entries in a vector. The ‘ -minimization problem is NP-hard, 1 but it is equal to the ‘ -minimization problem if its solution is sparse 0 enough [19,20], the above ‘ -minimization problem can be relaxed to 1 the following ‘ -minimization problem:

a^ 1 ¼ arg minJaJ1 subject to y ¼ X a,

ð3Þ

Pn

where JaJ1 ¼ i ¼ 1 9ai 9. For handling occlusions and corruptions in 1 real-world face recognition, the ‘ -minimization problem is extended 1 to the following stable ‘ -minimization problem:

a^ 1 ¼ arg minJaJ1 subject to JyX aJ2 r e,

ð4Þ

where e Z 0 is a given tolerance. This problem can be solved in polynomial time by a standard linear programming method [21]. After obtaining the sparse coefficient a^ 1 , SRC uses the same decision rule as NFCs which is based on residual IdentityðyÞ ¼ arg minJyX c ac J2 : c

NFCs seek the residual between y and each class of training data, and have some constraints on coefficient, while SRC looks for the sparest coefficient, and constrains the residual. The objective and decision rule may not match well for SRC. An alternative way should be based on the coefficient.

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

4

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]]

3. Decision by coefficient Decision by residual does not optimal for SRC. We propose a new method which is able to take full advantage of sparsity. Motivated by [10], which shows that the entries of sparse coefficient obtained by 1 solving the ‘ -minimization problem (3) or (4) reflect the similarities between data. We explicitly use the sum of coefficient (SoC) for decision T

IdentityðyÞ ¼ argmax 1c ac , c

where 1c is a vector with all entries one and is of compatible dimension. There are at least two reasons that make SoC reasonable: (1) From the formulation of NFCs and SRC in Table 1, we can see that the residual based decision rule is more reasonable for NFCs while the coefficient based decision rule is optimal for SRC. (2) The sparse coding coefficient actually measures the similarity between the query image y and each training data, the SoC actually measures the similarity between y and the c-th class. In other hand, SoC can be regarded as the voting method which is widely used in ensemble learning [22,23]. For a new coming test sample, all the training data

Fig. 2. Some sample images from the: (a) Yale, (b) Extended Yale B, and (c) AR databases.

0.85

0.85

0.8

0.8

0.75

0.75

0.7

0.7

0.65

0.65

0.6

0.6 Residual SoC

0.55 0.5

10

20

30

40

50

60

0.5

0.9

0.9

0.85

0.85

0.8

0.8

0.75

0.75

0.7

0.7

0.65

0

20

40

60

80

0.65 Residual SoC

0.6 0.55

Residual SoC

0.55

0

50

Residual SoC

0.6 100

0.55

0

50

100

Fig. 3. Plots of face recognition accuracies versus reduced dimensions under different sizes of training set on the Yale database: (a) 4 Train, (b) 5 Train, (c) 6 Train, and (d) 7 Train.

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]]

take a vote and make a decision by the maximum SoC. Compared with the original residual based decision rule, SoC is much more discriminative. Fig. 1 shows an example from the Extended Yale B database. The ratio between the two smallest residuals is 1:2.6 while the ratio between the two largest SoC is 1:5.6. It is much more easy to get a correct prediction by SoC. The ratio between the two smallest residuals is limited since the residuals are neither too small (all are nonnegative), nor too large (the data are normalized). Since the coefficient measures the similarity between the query image y and each training data, we directly use the sum of coefficient to measure the similarity between y and the c-th class. The range of SoC is larger and may correct the erroneous prediction when the residuals are similar. Generally speaking, SoC is more discriminative and optimal for the mechanism of sparse representation.

5

Table 2 The best accuracies under different sizes of training set on the Yale database(%). Methods

4 Train

5 Train

6 Train

7 Train

Residual SoC

72.67 76.10

75.00 77.00

80.80 83.33

81.17 83.50

2.5 2.4 2.3 2.2

4. Experimental verification In this section, we demonstrate the visual results of SoC for face recognition along with the comparison with the original SRC. Experiments are performed on three public face recognition databases, namely the Yale [7], Extended Yale B [24], and AR [25] databases. Some sample images are shown in Fig. 2. We use PCA [6] to reduce the dimension of images before classification. The SRC method is employed for classification by solving 1 the ‘ -minimization problem (4), the SPAMS package1 [26,27] is used, we experimentally set e ¼0.005. Then we compare the SoC decision rule with the original residual based SRC method.

2.1 2 1.9 1.8

Residual SoC

1.7 1.6 10

20

30

40

50

60

70

80

90

100

Fig. 4. The ratios between the two smallest residuals and the two largest sums of coefficient on the Yale database (7 Train).

4.1. Yale database Yale database contains 165 face images of 15 subjects. Each subject has 11 images, and these 11 images are, respectively, under the following different facial expression or configuration: center-light, wearing glasses, happy, left-light, wearing no glasses, normal, right-light, sad, sleepy, surprised, and wink. In our experiment, the images are cropped to a size of 32  32, and are normalized to [0 1]. A random subset with (l ¼4, 5, 6, 7) images per subject is taken with labels to form the training set, and the rest for test. For each given l, we average the recognition accuracies over 10 random splits. Fig. 3 shows the plots of accuracies versus reduced dimensions, and the best results obtained by residual and SoC are shown in Table 2. We can see that SoC performs better then the residual based method, especially when the dimension increases. Fig. 4 plots the ratios between the two smallest residuals and the two largest sums of coefficient. From Fig. 4, we observe that decision by SoC is much more discriminative.

ratios with respect to different feature spaces. We can see that SoC outperforms the original SRC which is based on residual decision rule. 4.3. AR database The AR database consists of over 4000 frontal images of 126 subjects. We use the same experimental setting as [1] which takes a subset (with only illumination and expression changes) contains 50 male subjects and 50 female subjects. For each subject, seven images from session 1 are used for training, with the other images from session 2 for test. The data are converted to gray scale and normalized. We evaluate on four feature space dimensions: 30, 54, 130, and 540. The comparison results are shown in Fig. 6. As can be seen, SoC improves the original SRC by about 2% under different dimensions. This indicates that making a decision based on coefficient takes full advantage of sparsity.

4.2. Extended Yale B database

4.4. Overall observations and discussions

The Extended Yale B database consists of 2414 frontal face images from 38 subjects under various lighting conditions. Each subject contains about 64 images which were taken under various laboratory-controlled lighting conditions. In our experiments, we use the cropped images with the resolution of 32  32,2 and the data are normalized. For each subject, we randomly select half of which for training (i.e. about 32 images for each subject), and the rest for test. We compute the recognition accuracies with the feature dimensions 30, 56, 120, and 504, which is the same setting in [1]. Fig. 5 shows the comparison of accuracies and

1. Based on the same obtained sparse representation coefficient 1 by solving the ‘ -minimization problem, our SoC decision rule always outperforms the original residual based method on the three databases. This indicates that it is successful to measure the similarity by the entries of sparse coefficient, and the coefficient based decision rule is optimal for sparse representation based face recognition. 2. Our experiments also show that the ratios between the two largest sum of coefficient is much larger than the ratios between the two smallest residuals, which reveal the superiority of SoC. Thus SoC is much more discriminative. 3. The SoC decision rule can also be applied to some related works, such as [3–5].

1 2

http://www.di.ens.fr/willow/SPAMS/ http://www.zjucadcg.cn/dengcai/Data/FaceData.html

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

6

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]]

1

9

0.98

8

0.96

7

0.94 0.92

6

0.9

5

0.88

4

0.86

Residual SoC

0.84

3

0.82

Residual SoC

2 0

100

200

300

400

500

600

0

100

200

300

400

500

600

Fig. 5. Face recognition results on the Extended Yale B database. (a) Plots of accuracies versus reduced dimensions. (b) Plots of ratios between the two smallest residuals and the two largest sums of coefficient.

3.2

0.95

3 0.9

2.8 2.6

0.85

2.4 2.2

0.8

2 1.8

Residual SoC

0.75

Residual SoC

1.6

0.7

1.4 0

100

200

300

400

500

600

0

100

200

300

400

500

600

Fig. 6. Face recognition results on the AR database. (a) Plots of accuracies versus reduced dimensions. (b) Plots of ratios between the two smallest residuals and the two largest sums of coefficient.

5. Conclusion In this paper, we reformulate the nearest feature classifiers as optimization problems, and find that the residual based decision rule matches well with NFCs, but not SRC. Motivated by the fact that the sparse coefficient reflects the similarities of samples, we explicitly use the sum of coefficient for decision. Such voting decision rule matches well with the objective of SRC, and is more discriminative and able to make best of sparsity for classification. Our experiments on three face databases demonstrate the superiority of our method.

Acknowledgments We would like to thank Jian-Xun Mi and Jie Gui for their discussions in this work. This work was supported by the Grants of the National Science Foundation of China, Nos. 60975005, 61005010, 60873012, 60805021, 60905023, 31071168, 30900321, and the Knowledge Innovation Program of the Chinese Academy of Sciences, Y023A61121. References [1] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 210–227.

[2] E. Elhamifar, R. Vidal, Robust classification using structured sparse representation, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1873–1879. [3] M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 625– 632. [4] R. He, W.S. Zheng, B.G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011) 1561–1576. [5] S. Gao, I. W.-H. Tsang, L.-T. Chia, Kernel sparse representation for image classification and face recognition, in: European Conference on Computer Vision, pp. 1–14. [6] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591. [7] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 711–720. [8] L.S. Qiao, S.C. Chen, X.Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition 43 (2010) 331–341. [9] C.-Y. Lu, D.-S. Huang, Optimized projection for sparse representation based classification, Neurocomputing (2012). [10] E. Elhamifar, R. Vidal, Sparse subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2797. [11] J.B. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 888–905. 1 [12] B. Cheng, J. Yang, S. Yan, Y. Fu, T.S. Huang, Learning with ‘ -graph for image analysis, IEEE Trans. Image Process. 19 (2010) 858–866. 1 [13] A.Y. Yang, S.S. Sastry, A. Ganesh, Y. Ma, Fast ‘ -minimization algorithms and an application in robust face recognition: a review, in: IEEE International Conference on Image Processing, pp. 1849–1852. [14] D.M. Malioutov, M. Cetin, A.S. Willsky, Homotopy continuation for sparse signal representation, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 733–736. 1 [15] J. Yang, Y. Zhang, Alternating direction algorithms for ‘ -problems in compressive sensing, SIAM J. Sci. Comput. 33 (2011) 250–278.

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034

J. Li, C.-Y. Lu / Neurocomputing ] (]]]]) ]]]–]]]

[16] T.M. Cover, P.E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1967) 21–27. [17] S.Z. Li, J.W. Lu, Face recognition using the nearest feature line method, IEEE Trans. Neural Networks 10 (1999) 439–443. [18] J.-T. Chien, C.-C. Wu, Discriminant waveletfaces and nearest feature classifiers for face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002) 1644–1649. [19] D.L. Donoho, For most large underdetermined systems of linear equations the 1 minimal ‘ -norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (2006) 797–829. [20] E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59 (2006) 1207–1223. [21] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43 (2001) 129–159. [22] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55 (1997) 119–139. [23] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140. [24] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 643–660. [25] A. Martinez, The ar face database, CVC Technical Report 24 (1998). [26] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res. 11 (2010) 19–60. [27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online dictionary learning for sparse coding, in: International Conference on Machine Learning, vol. 382.

7 Jiang Li, Ph.D, Associate Professor in the College of Information System and Management, National University of Defense Technology, Changsha, China. His research interest covers neural networks, risk analysis, and intelligence decision-making technology. He presided over multinomial importance scientific research projects. Currently he has published over 20 academic papers in international learned periodical and international conference.

Can-Yi Lu received the B.S. degree in Information and Computing Science & Applied Mathematics from Fuzhou University (FZU), Fuzhou, China, in 2009. Now, he is a candidate of the master degree in Pattern Recognition & Intelligent Systems from University of Science and Technology of China (USTC), Hefei, China. His research interests include sparse representation and low rank based machine learning and applications.

Please cite this article as: J. Li, C.-Y. LuA new decision rule for sparse representation based classification for face recognition, Neurocomputing (2012), http://dx.doi.org/10.1016/j.neucom.2012.04.034