Sparsity analysis versus sparse representation classifier

Sparsity analysis versus sparse representation classifier

Author's Accepted Manuscript Sparsity Analysis versus Sparse Representation classifier Baochang Zhang, Suli Ji, Li Li, Shengping Zhang, Wankou Yang ...

696KB Sizes 1 Downloads 56 Views

Author's Accepted Manuscript

Sparsity Analysis versus Sparse Representation classifier Baochang Zhang, Suli Ji, Li Li, Shengping Zhang, Wankou Yang

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(15)00898-X http://dx.doi.org/10.1016/j.neucom.2015.06.052 NEUCOM15719

To appear in:

Neurocomputing

Received date: 9 February 2015 Revised date: 10 April 2015 Accepted date: 24 June 2015 Cite this article as: Baochang Zhang, Suli Ji, Li Li, Shengping Zhang, Wankou Yang, Sparsity Analysis versus Sparse Representation classifier, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2015.06.052 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Sparsity Analysis versus Sparse Representation classifier Baochang Zhanga,∗, Suli Jia , Li Lia , Shengping Zhangb , Wankou Yangc a

b

School of Automation Science and Electrical Engineering, Beihang University, China School of Computer Science and Technology, Harbin Institute of Technology at Weihai, Weihai 264209, China c School of Automation, Southeast University, Nanjing 210096, China

Abstract Sparse representation Classifier (SRC) is the state-of-the-art method, and the theory of SRC has interesting links to compressed sensing. This paper proposes a new method named Sparse Regression Analysis (SRA) for object representation and recognition. In SRA, the L1-norm minimization method is combined with regression process to represent the input signal. We show that the discriminative ability of SRC and SRA derives from the fact that the subset which most compactly expresses the input signal is activated in the regression analysis, and SRA is a more direct and powerful way to use compressed sensing for the recognition tasks. To achieve a further improvement, Kernelized SRA (KSRA) is developed to make a nonlinear extension of SRA. The experiments are extensively conducted on both palmprint and face recognition, which show that the proposed methods achieve a much better performance than sparse representation classifier, the linear regression classification method, principal component analysis, kernel discrimi∗

Corresponding author Email addresses: [email protected] (Baochang Zhang), [email protected] (Suli Ji), [email protected] (Li Li), [email protected] (Shengping Zhang), [email protected] (Wankou Yang)

Preprint submitted to Neurocomputing

July 10, 2015

nant analysis ,and linear discriminant analysis. Keywords: Sparse representation, L1 norm minimization, face recognition, palmprint recognition, kernel function. 1. Introduction Object representation in parsimony is one of the important principles for object recognition. One initial work is the principle of minimum description length in model selection [1, 2], which yields the most compact representation for decisionmaking tasks such as classification. The Small Sample Size (SSS) statistic learning acquires a parsimony representation based on only a few of the observations by selecting a small subset of features for classification or visualization [3]. Support vector machine (SVM) [4] is one of the most successful methods using a small subset of relevant training examples to characterize the decision function between classes. Considering that many signals and images contain redundant information, feature reduction aiming to represent data in a parsimony way is often an important step for a pattern recognition system. One popular approach for compressing the input signal is the Principal Component Analysis (PCA) [5]. It transforms a number of correlated variables into a small number of uncorrelated variables called principal components, by exploiting the fact that many signals have a sparse representation in terms of some basis. The discriminant versions of component analysis have been thoroughly investigated, such as Linear Discriminant Analysis (LDA) [28, 7, 8] and Kernel Discriminant Analysis (KDA) [9]. One problem lies in these methods is that they are optimal when the data are in Gaussian distribution, and the generalization problem is not well solved yet. These 2

works mentioned above indicate a common theme: using parsimony as the principle for choosing a limited subset of features or models from the training data, rather than directly using the data for representing or classifying the input signal or object, is more effective for object representation and recognition. Investigation and study in the human vision system (HVS) have shown that a selective and small subset of neurons is active for a variety of specific stimuli [10, 31, 32], such as color, texture, shape, and scale. Considering a large amount of neurons in the human vision system, the firing of the neurons to a specific input object is typically highly sparse. In the statistical signal processing community, the algorithmic problem of computing sparse linear representations of an over-complete dictionary of basis components or basis samples has been well developed [11, 12, 13, 14, 15]. One of the most important results is that a sufficiently sparse representation can be efficiently computed by convex optimization [11], and this problem is further solved by an algorithm linear in the size of the training set [15]. In [14], a promising application of sparse representation in building a classifier for face recognition shows that the sparse representation is effective for classification even using a simple L1 -norm minimization scheme. In this paper, we propose a new method, named Sparse Regression Analysis (SRA), for discriminant analysis on object recognition. Intuition behind SRA lies in the fact that the coefficients of the subset compactly expressing the input object are nonzero, and others are almost zero, when combining L1-norm minimization and regression analysis for the reconstruction of the input object. It is possible to automatically represent a test object as a linear combination of only those training samples from the same class or similar samples. The coefficients of the sparse representation therefore automatically discriminate different classes

3

based on a common basis. The proposed method is different from other related methods [11, 12, 13, 14] [16] in three aspects. First, our method focuses on feature extraction, but others focus on reconstruction error [11, 12, 13] or its application to build a classifier [14] in object recognition. Secondly, we extend the sparse representation to the kernelized version to obtain nonlinear features. In addition, the proposed method can be used to extract discriminant features from a training set without any class information provided(i.e. the class labels of the training samples). It is a desirable property for many recognition applications, such as face recognition and palmprint recognition. We need a large amount of training samples for traditional LDA and KDA, which suffer from the SSS problem [3]. The proposed SRA, however, can perform the optimization process based on any training set (no class information provided). SRA exploits L1-norm minimization and regression analysis for the reconstruction of the input object, and the coefficients of the sparse representation can discriminate different classes based on a common basis. Considering that the L1-norm minimization is robust to occlusion and noise, the proposed method can naturally inherit those excellent properties. The rest of this paper is organized as follows: Section 2 describes the proposed method including sparse representation, sparse regression analysis and classification with the sparse regression representation respectively. Section 3 shows experimental results and compares SRA with other methods. The conclusion and future work are given in Section 4.

4

2. Sparse Regression Analysis 2.1. Sparse Representation When we say an object y has a sparse representation, we mean that y is well approximated by a linear combination of a small subset of a vector set Ψ of size P N , i.e., i∈IK wi ψ i , where IK is a subset of {1, 2, . . . , N } with K being an integer, K  N , and wi is the weight corresponding to ψ i ∈ Ψ . In this case, we say that y is K-sparse in Ψ . The number of the non-zero coefficients is denoted by kwk0 with w = (w1 , w2 , . . . , wN )T . Minimizing kwk0 is the principle to obtain a sparse representation, which is, however, an NP-hard problem. Recent development in the theory of sparse representation shows that the solution of L1-norm minimization subject to the linear regression of the input sample can be used to find sparse enough representation. The resulting optimization problem, similar to the LASSO in statistics [17], penalizes the L1-norm of the coefficients in the linear combination, rather than directly penalizing the number of nonzero coefficients (kwk0 ). The original works in this research are not for classification, but for representation of signals. The algorithm performance is measured in terms of sparsity of the representation and reconstruction of the signals. It is possible to represent a test sample as a linear combination of only those training samples from the same class. The coefficients of the sparse representation therefore automatically discriminate different classes present in a given basis on a dictionary of basis elements. In [14], Wright et al. provide a general classifier using the sparse representation in the face recognition problem. Its validity has been testified extensively on public databases.

5

2.2. Sparse Regression Analysis In this section, we exploit the discriminative ability of sparse regression representation to perform feature extraction. We represent a test sample in a dictionary whose basis elements are the training samples themselves, when sufficient training samples are available. To recover a signal from a training set, we choose to minimize kwk1 . In this model, all test samples are constructed from the same set of the basis vectors (the training samples), with different coefficients. The problem is formulated as follows: minimize:

kwk1 ,

(1)

subject to

Ψw = y,

(2)

where Ψ = (x1 , x2 , . . . , xN ) is a matrix with each training sample arranged on its columns, and Eq. 2 is the regression representation of the input signal y , which can be evaluated by kΨw − yk2 = 0. The model shown in Eq. 1 and Eq. 2 can be generalized to: minimize

kΨw − yk2 + λkwk1 ,

(3)

which can be further changed to an equivalent one well solved using the Interior Point Method (IPM) [18]. For some practical applications, a linear model cannot obtain good performance. To solve nonlinear problems, we exploit the kernel trick to make a nonlinear extension of the original model. In kernel-based methods, the input data is first projected into an implicit feature space F by a nonlinear mapping Φ : x → f ∈ F [4, 9]. In a high-dimensioned space, we define a new constraint function as: Φ(Ψ)w = Φ(y), 6

(4)

where Φ(Ψ) = (Φ(x1 ), Φ(x2 ), . . . , Φ(xN )), and Φ is the projection function mapping signals from the original space to the high-dimensional space. We cannot directly calculate Eq. 4 because the mapping function is implicit. To implement it, we rewrite Eq. 4 as ΦT (Ψ)Φ(Ψ)w = ΦT (Ψ)Φ(y).

(5)

Then we can have the following equation:



ΦT (x1 ) · Φ(x1 )

ΦT (x1 ) · Φ(x2 ) · · ·

  T  Φ (x2 ) · Φ(x1 ) ΦT (x2 ) · Φ(x2 )   .. ..  . .  ΦT (xN ) · Φ(x1 ) ΦT (xN ) · Φ(x2 )  = ΦT (y) · Φ(x1 ) ΦT (y) · ΦT (x2 )

ΦT (x1 ) · Φ(xN )



  · · · Φ (x2 ) · Φ(xN )  w  .. .. .  .  T · · · Φ (xN ) · Φ(xN ) T T T · · · Φ (y) · Φ (xN ) . T

(6)

The inner product in the high-dimensional space can be calculated by using a kernel function k(·, ·)defined in the original space as ΦT (xi ) · Φ(xj ) = k(xi , xj ).

(7)

We use the fractional kernel function [19] in our application, which is defined as: 1

k(xi , xj ) = (xTi · xj ) n ,

n ≥ 1, is an interger.

(8)

The evidence that similar images of the same class can well recover the input signal [14] motivates us to use the coefficients wi , 1 ≤ i ≤ N for extracting features to discriminate among different classes, which provides a new approach for object classification. In contrast, statistic methods such as linear discriminant 7

(a)

(b)

(c)

(d)

Figure 1: Samples in the AR database. (a) and (b) are from the same person. (c) and (d) are from another person.

analysis that utilizes training samples generally incorporate more explicit prior knowledge about the types of sample variations. To see how SRA works for sparse object representation, we show an experiment on a subset of the AR face database [21]. The subset containing the samples of indices from 1 to 7 is used to train SRA. Some samples are given in Fig. 1. Fig. 2 shows the SRA coefficients of the four test images in Fig. 1. From Fig. 2, we can see that the distributions of the coefficients are sparse, and the distributions from the same class are similar, while those from different classes are different. The feature reduction can be obtained by just reserving the non-zero coefficients, but in the classification phase, all the coefficients are used. 2.3. Classification with the Sparse Regression Representation Our use of the sparse representation for classification differs from the other related feature representation techniques. Instead of using the sparsity to identify a relevant model or classifier that can later be used for classifying test samples, we use the sparse regression coefficients of each individual test sample directly for classification, adaptively selecting the training samples that give the most compact representation. The proposed feature extraction can be easily combined with 8

0.6

0.8

0.7

0.5

0.6 0.4

Sparse Coefficients

Sparse Coefficients

0.5 0.3

0.2

0.1

0.4

0.3

0.2

0.1 0 0 −0.1

−0.2

−0.1

100

200

300

400 500 Frame Index

600

700

−0.2

800

100

200

300

(a)

400 500 Frame Index

600

700

800

(b)

0.7

0.6

0.6

0.5

0.5

Sparse Coefficients

Sparse Coefficients

0.4 0.4

0.3

0.2

0.3

0.2

0.1 0.1

0

0

−0.1

100

200

300

400 500 Frame Index

600

700

−0.1

800

100

(c)

200

300

400 500 Frame Index

600

700

800

(d)

Figure 2: Illustration of the SRA coefficients. The coefficients in (a), (b), (c), and (d) correspond to the four images in Figs. 1(a), (b), (c), and (d), respectively.

9

training samples

SRA

gallery samples

wj

test sample

pi

SRA

wi

similarity computation

sim(pi , pj )

Figure 3: The flowchart of SRA for similarity computation of two samples.

popular classifiers such as nearest neighbor (NN). For the object recognition problems discussed in the next sections we simply use the NN classifier with the cosine distance being the similarity measure, which classifies a test sample based on its best match image in the gallery set. Let wi and wj be the features extracted from two samples pi and pj by the linear SRA. The similarity between them is computed by: Sim(pi , pj ) =

wTi wj . kwi kkwj k

(9)

Fig. 3 shows the flowchart of the computation of the similarity between two samples. Note that all the representation coefficients of the samples in the gallery can be pre-computed offline. The training samples are used to calculate the coefficient vectors as the features for samples in the gallery and test sets. It should be emphasized that before classifying a test sample, it is not needed to include samples of the same class as the test sample in the training set, which however is necessary in the Sparse Representation Classifier (SRC) in [14]. The proposed method is generally based on the similarity property of samples of the same category. For example, it is human faces in face recognition, and it is palm10

prints in palmprint recognition. To validate the proposed method, we apply it to palmprint and face recognition. In palmprint recognition, we compare our method with Palmcode [20], which is the state-of-the-art. In face recognition, some wellknown approaches, Eigenface, Fisherface, and SRC are compared on both the AR and Infrared face databases.SRA gets a much better performance than SRC as shown in Fig.4, when the training size is small. But the performance of both SRA and SRC will be improved much when the training set becomes larger, which indicates that SRA is also suitable for a big training set. Furthermore, SRA is a kind of unsupervised method, which does not suffer from the generalization problem. 3. Experiments 3.1. Palmprint Recognition We conduct experiments on the Polyu palmprint database [20] which contains 600 images for 100 subjects with 6 images for each person. The protocol includes a training set, a gallery set, and a probe set (containing test images). The first image of each class is included in the gallery set, and the rest are divided into the train and probe sets. Since SRC [14] does not use a gallery set, the training set for it is the combination of the training set and the gallery set. Both SRC and SRA use the same probe set. Compared with SRC, SRA achieves a much better performance when the training set is smaller as shown in Fig. 4. The curve of the recognition rate of SRA remains flatter, while SRC suffers a lot from the reducing size of the training set. It should be noted that the training set is a part of the gallery set for the SRC method. We further compare SRA with the kernelized SRA (KSRA), and we emprically set n = 4 (see Eq. 8). As shown in Fig. 5, KSRA achieves 100% accu11

100 98

Recognition Rates

96 94 92 90 88 86 84

SRA

SRC

82 80

1−2

1−3 Training Samples

1−4

Figure 4: The comparative results between SRA and SRC on the Polyu palmprint database, where means samples with indices form 1 to in every class (person) are used for training in SRC. Note that all the samples with index 1 are in the gallery in SRA; thus the number of training samples in SRA is one less than that in SRC in every class.’1-2’ means that the images from 1st to 2nd are in the training set. ’1-3’ and ’1-4’ are similar to ’1-2’. The indexes in Fig. 5 are also same to the ones in Fig. 4.

12

100.5 100 Recognitsion Rate

99.5 99 98.5 SRA KSRA

98 97.5 97 96.5 96

2−2

2−3 Training Samples

2−4

Figure 5: The comparative results between SRA and KSRA on the Polyu palmprint database.

racy when the size of the training set is big. It should be noted that SRA can be considered as a general way to extract a representation from a new viewpoint of compressed sensing, while Palmcode is based on the texture descriptor, i.e. Gabor wavelet. 3.2. Face recognition on the AR Face Database We test the proposed method on the AR database, which consists of 126 persons. For each person, 26 pictures are taken in two separate sessions [22]. The face regions are cropped and normalized to the 88 × 88. Similar to [14], we downsize the face images into 12 × 10 so that the feature extraction process does not take too much time. These images include facial variations such as illumination changes, expressions, and facial disguises. For each person, the images with only illumination changes and expressions are selected: the images from Session 1 for training, and the images from Session 2 for testing. We try different numbers of training samples.The gallery set contains all the 14th images in Session 2 in SRA, while the images of indices from 15th to 20th from Session 2 are in the probe set for both SRC and SRA. 13

80

Recognition Rates

70 60 50 40 30 20 10 0

SRA 1−7

SRC

LDA

1−5 Training Samples

PCA 1−3

Figure 6: The comparative results among SRA, SRC, LDA, and PCA on the AR database.

Fig. 6 shows the comparative results among SRA, SRC, LDA and PCA. The results of SRC are 75.3% , 59.52%, and 38.3%, but our results remain stable, which are 76.1%, 73.9%, and 71.55%, when the train set contains 1-7, 1–5, and 1–3 images respectively. ‘1-7’ means the images with labels from 1st to 7th are used in our experiment. Similar to ‘1-7’, ‘1-3’ is same. The proposed method achieves a much better performance than SRC when using a small set of training samples. The main reason is that SRC uses only part of the training samples to do classification, ignoring other useful information from different classes. The LDA method also suffers a lot from the reducing size of the training set, because the class discriminative information is lost in training the classification model. However, we can see that LDA obtains a better performance than SRC when a small training set of samples are available. We also give the comparative results between KSRA and SRA in Fig. 7, from which, we can see that KSRA performs even better. As an extension of the SRC method, SRA is a new investigation into compressed sensing, which is more effective for face feature representation than being

14

80 KSRA SRA

Recognition Rates

78 76 74 72 70 68

1−7

1−5 Training Samples

1−3

Figure 7: The comparative results between KSRA and SRA on the AR database.

a classifier. SRA is an unsupervised way to make feature extraction, and so it is also compared with the unsupervised PCA method for the performance validation. We further compare with the supervised feature extraction methods, such as LDA and KDA, and show that our unsupervised method even is better than the supervised ones, which confirm that the coefficients calculated by SRA can discriminate between different classes. It should be noted that our implementation of SRC follows that in [14]. When seven images from Session 1 for each subject (total 100 subjects are selected) are used for training and the other seven from Session 2 are used for testing, about 85% recognition rate is obtained in [14], while our implementation of their method achieves a similar results of 86%. 3.3. Face Recognition on the PolyU-NIRFD Face Database Usually infrared face images do not contain information as detailed as the grey-level images. The sparse representation can work well on images even with higher degree of downsampling [14]. We choose the PolyU-NIRFD face database [23] in this experiment. Some samples are shown in Fig. 8. The training set contains 15

(a)

(b)

(c)

(d)

Figure 8: Samples of the infrared face images,(a) A neutral face. ( b) A face expression. (c) with a face with pose variation. (d) A face with scale variation.

419 frontal images of 138 persons, and the probe and gallery sets have 2763 and 574 images, respectively. No persons in the query and target sets appear in the training set. The facial portion of each original image is cropped and normalized to 64 × 64 pixels based on the locations of the eyes, which are further downsampled into 12 × 10. To see how well PCA, LDA, KDA, SRA and KSRA perform on this database, their recognition performances are shown in Fig. 9. As we know from the AR database, SRA and KSRA work better than PCA, LDA and KDA. The results confirm that that the coefficients of the sparse representation can discriminate different classes based on a common basis. Especially, SRAs work well and show their robustness to low-resolution images. It should be noted that SRC cannot be tested on this experiment, because the numbers of samples for some classes in the Gallery set are too small. The proposed SRA has a similar computational complexity as that of SRC, but it is not as efficient as PCA and LDA. In the Intel Core i5-3470 CPU with 8 GB RAM, the time for one face image feature extraction is 1.12 seconds. Morevoer, KSRA is similar to KDA on the computational complexity.

16

95

90

85

80

PCA

LDA

KDA

SRA

KSRA

Figure 9: Experiment results on the infrared PolyU-NIRFD face database

3.4. The comparative experiments on L1-norm and L2-norm minimization SRA, and LRC In this section, we do experiments to compare L1-norm and L2-norm minimization schemes in SRA, and also the linear Regression classification(LRC) method[29] on both AR and Yale-B databases.For the Extended Yale B database,the test protocols follow those in [29]. And the subset 2 and 3 include 12 images per subject of slight-to-moderate luminance variations, while the subset 4 and 5 consist of 14 and 19 images per subject respectively. Details about the database can also refer to[30]. In both databases,the face regions are cropped and downsampled into the size of 48 × 56, which is further divided into 4 × 8-sized subregions. We can also see from the experimental results in Table 2 that L1-norm can be much better than L2-norm in most situations, even the results are same in Table 1. SRA gets a much better performance than LRC in many cases, i.e., SRA is 62.04% while LRC is 33.61% on the subset 5 of Yale-B.

17

Table 1 The comparative experiments on LRC, SRC and SRA on the AR occlusion subset Recognition Accuracy

Approach

Sunglass

Scarf

LRC

96%

26%

SRC

87%

59.5%

SRA-L1

21.67%

74.5%

SRA-L2

21.67%

74.5%

Table 2 The comparative restuls between LRC, SRC and SRA on Extended Yale B Approach

Recognition Accuracy Subset2

Subset3

Subset4

Subset5

LRC

100%

100%

83.27%

33.61%

SRC

100%

100%

71.40%

17.79%

SRA-L1

100%

73.57%

76.14%

62.04%

SRA-L2

91.67%

72.53%

76.33%

62.04%

18

4. Conclusion and Future Work This paper has proposed a new method, named Sparse Regression Analysis (SRA), for discriminant analysis on object recognition. SRA combines the L1norm minimization and regression analysis to classify samples of multiple classes. SRA and its kernelized version KSRA are validated on both palmprint and face recognition. Compared with the recently developed SRC and LRC methods, SRA and KSRA obtain much better performance especially when a small set of training samples is available. SRA and KSRA also achieve better performance than the popular methods PCA, LDA, and KDA. The SRA method gets a much better performance on the scarf occlusion problem, but worse on the sunglass problem than SRC, maybe the eye part contains more discriminant information than that of the mouth part for SRA. The future work will focus on the combination of SRA with the Fisher and kernel Fisher methods [28] for possible better feature extraction. We will also try to apply the proposed methods to other problems, not only for recognition but also for signal encoding, or detection[33]. Acknowledgements B. zhang acknowledged the support of the Natural Science Foundation of China, under Contracts 61272052 and 61473086, and the Program for New Century Excellent Talents of the University of Ministry of Education of China. S. Zhang was supported in part by the National Natural Science Foundation of China under Grants 61300111;in part by the China Post-Doctoral Science Foundation under Grant 2014M550192; in part by the Research Fund for the Doctoral Program of Higher Education of China under Grant 20132302120084. 19

References [1] J. Rissanen, “Modeling by shortest data description,” Automatica 14 (1978) 465–471. [2] M. Hansen and B. Yu, “Model selection and the minimum description length principle,” Journal of the American Statistical Association 96(2001) 746–774. [3] J.W. Lu, K.N. Plataniotis and A.N. Venetsanopoulos, “Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition,” Pattern Recognition Letter 26(2) (2005) 181–191. [4] V. Vapnik, The Nature of Statistical Learning Theory, Springer, 2000. [5] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience 3 (1991) 71–86. [6] P. Belhumeur, J. Hespanda and D. Kriegman, “Eigenfaces vs. Fisher-faces: recognition using class specific linear projection,” IEEE Trans. Patt. Anal. Mach. Intell. 19(7) (1997) 711–720. [7] W.S. Chen, P.C. Yuen and J. Huang, “ A New Regularized Linear Discriminant Analysis Methods to Solve Small Sample Size Problems,” International Journal of Pattern Recognition and Artificial Intelligence, 19(7) (2005): 917– 936. [8] W.S. Chen, P.C. Yuen, J. Huang and Bin Fang, “Two-step single parameter regularization fisher discriminant method for face recognition,” International Journal of Pattern Recognition and Artificial Intelligence, 20(2) (2006): 189– 208. 20

[9] J. Yang, A.F. Frangi, J.Y. Yang, D. Zhang and Z. Jin, “KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition,” IEEE Trans. Patt. Anal. Mach. Intell. 27(2) (2005) 230–244. [10] T. Serre, “Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines,” Ph. D. Thesis, MIT, 2006. [11] K. Huang and S. Aviyente, “Sparse representation for signal classification,” Proc. Neural Information Processing Systems, 2006, pp. 1–8. [12] D. Donoho, “For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution,” Comm. on Pure and Applied Math 59(6) (2006) 797–829. [13] E. Candes, J. Romberg and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. on Pure and Applied Math 59(8) (2006) 1207-1223. [14] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry and Y. Ma, “Robust Face Recognition Via Sparse Representation,” IEEE Trans. Patt. Anal. Mach. Intell. 28 (2008) 2037–2041. [15] D. Donoho and Y. Tsaig, “Fast solution of L1-norm minimization problems when the solution may be sparse,” Technical report, Stanford, 2007. [16] H. Guo, R. Wang, J. Choi, L.S. Davis,“Face Verification Using Sparse Representations,” Computer Vision and Pattern Recognition Workshops. (2012) 37–44

21

[17] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society B 58(1) (1996) 267–288. [18] M. Grant, “Disciplined Convex Programming,” Ph. D. Thesis, Stanford, 2004. [19] C. Liu, “Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance,” IEEE Trans. Patt. Anal. Mach. Intell. 28(5) (2006) 725–737. [20] D. Zhang, A.W. Kong, J. You and M. Wong, “Online Palmprint Identification,” IEEE Trans. Patt. Anal. Mach. Intell. 25(9) (2003) 1041–1050. [21] W.Y. Zhao, R. Chellappa, P.J. Phillips, and A. Rosenfeld, “Face Recognition: A Literature Survey,” ACM Computing Survey 35(4) (2003) 399–458. [22] A. Martinez, “Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class,” IEEE Trans. Patt. Anal. Mach. Intell. 24(6) (2002) 748–763. [23] B.C. Zhang, L. Zhang, D. Zhang, L.L. Shen, Z.H. Guo, “Polyu Near-Infrared Face Database,” Technical report, Polyu, 2009. [24] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Ma. Yi, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Patt. Anal. Mach. Intell. 31(2) (2009) 210–227. [25] Y. Meng, D. Zhang, Y. Jian, D. Zhang, “Robust Sparse Coding for Face Recognition,” IEEE Conference on Computer Vision and Pattern Recognition. (2011) 625–632. 22

[26] D. Zhang, Yang. Meng, X.C. Feng, “Sparse Representation or Collaborative Representation: Which Helps Face Recognition,” IEEE International Conference on Computer Vision (2009) 417–478. [27] I. Naseem, R. Togneri, M. Bennamoun, “Linear Regression for Face Recognition,” IEEE Trans. Patt. Anal. Mach. Intell. 32(11) (2010) 2106–2112. [28] P.N Belhumeur, J.P Hespanha, D. Kreigman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Patt. Anal. Mach. Intell. 19(7) (1997) 711-720. [29] Naseem l, Togneri R, Bennamoun M., “Linear regression for face recognition,” IEEE Trans. Patt. Anal. Mach. Intell. 32(1) (2010). [30] “http://vision.ucsd.edu/ leekc/ExtYaleDatabase/ExtYaleB.html” . [31] J. Han, Sheng He, Xiaoliang Qian, Dongyang Wang, Lei Guo, Tianming Liu, An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans. on Circuits and Systems for Video Technology, 2013. [32] J. Han,Peicheng Zhou, Dingwen Zhang, Gong Cheng, Lei Guo, Zhenbao Liu, Shuhui Bu, Jun WuEfficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS Journal of Photogrammetry and Remote Sensing, 2014. [33] J. Han, Dingwen Zhang, Gong Cheng, Lei Guo, Jinchang Ren, Object detection in optical remote sensing images based on weakly supervised learning 23

and high-level feature learning, IEEE Trans. on Geoscience and Remote Sensing, 53(6): 3325-3337, 2015.

24