Optik 124 (2013) 3310–3313
Contents lists available at SciVerse ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Face image recognition via collaborative representation on selected training samples Jian-Xun Mi a,b,∗ a b
Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, Guangdong Province, China Key Laboratory of Network Oriented Intelligent Computation, Shenzhen, Guangdong Province, China
a r t i c l e
i n f o
Article history: Received 16 May 2012 Accepted 14 October 2012
Keywords: Face recognition Image processing Machine vision
a b s t r a c t In this paper, we present a collaborative representation-based classification on selected training samples (CRC STS) for face image recognition. The CRC STS uses a two stage scheme: The first stage is to select some most significant training samples from the original training set by using a multiple round of refining process. The second stage is to use collaborative representation classifier to perform classification on the selected training samples. Our method can be regarded as a sparse representation approach but without imposing l1 -norm constraint on representation coefficients. The experimental results on three well known face databases show that our method works very well. © 2012 Elsevier GmbH. All rights reserved.
1. Introduction In recent years, face recognition has become an active research area in biometrics. The identity of a person is automatically recognized from a digital image or a video frame from a video source of his/her face. After the acquisition of facial images, to build a robust classifier is of key importance. A lot of recently proposed classifiers for face recognition make an assumption that the images of a subject tend to lie on a subspace [1,2]. Subspace models are flexible to capture a lot of variations in face images, such as varying lighting and expression. A number of linear representation-based classification methods built based on the subspace assumption are sufficient for face recognition [2–5]. Furthermore, those methods could be more robust and more effective if test images are represented sparsely [1,5]. The sparse representation-based classification (SRC), proposed in [1], is one of most famous methods for face recognition using training samples across different classes to perform collaborative representation (CR) with the sparsity constraint on the coding vector. Since the training samples from the correct class are most likely to represent a test image compactly and precisely, the authors in [1] argued that the sparse representation has the nature of discrimination. However, the very recent studies had different conclusions that to solely enforce sparsity is not useful for increasing recognition accuracy [5–7]. In [5], the authors held that the CR mechanism
is the main reason which helps to improve the face recognition not the sparsity constraint. However, when CR-based classification (CRC) is used to perform face recognition, the result will become unstable while the number of classes or the number of training samples is too many. Therefore, if a certain technique is available to enforce the representation on only a portion of the entire training set, the accuracy of CR-based classifiers will be further improved. From this viewpoint, SRC implements sparsity constraint on coding vector to restrict the number of the training samples used to represent a test image. In this paper, we propose a two-stage scheme to perform CRC on a portion of training samples. Unlike SRC which is a single stage optimization problem achieving sparse representation (SR) and CR simultaneously, our first stage is to select a subset from all training samples and then perform CRC on the selected training samples (CRC STS). For the first stage, we propose a multiple-round refining approach to select the most representative training samples. The second stage is a typical CRC process where the representation contribution of the correct class will be stressed, which leads to higher recognition accuracy. The experimental results show that the proposed CRC STS method is very competitive. This paper is organized as follows. Section 2 surveys the collaborative representation classification method. Section 3 presents our CRC STS method. The experimental results are shown in Section 4.
2. The collaborative representation classification method ∗ Correspondence address: Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, Guangdong Province, China. Tel.: +86 13590158600; fax: +86 755 26032461. E-mail address:
[email protected] 0030-4026/$ – see front matter © 2012 Elsevier GmbH. All rights reserved. http://dx.doi.org/10.1016/j.ijleo.2012.10.051
First of all, we describe a general face recognition scenario. Suppose there are c classes and n training samples. Let C={1,. . ., c} be the label set and N={1,. . ., n} be the index set of training samples.
J.-X. Mi / Optik 124 (2013) 3310–3313
3311
We transform test image and training images into vectors which (m) are y ∈ Rq×1 and xi ∈ Rq×1 respectively where i ∈ N, m ∈ C, and q is the sample dimension (n < q). The collaborative representation (CR) classification seeks to represent y collaboratively by using t training samples from dif-
ferent classes at first, i.e. y ≈ X␣, where X = xi(j1 ) , . . . , xi(jt )
1
t
T
,
j1 , . . . , jt ⊆ C, i1 , . . . , it ⊆ N, and ˛ = [˛i1 , . . . , ˛it ]. The coefficient vector ␣ can be estimated by:
−1
␣ = XT X + I
XT y
(1)
where is a positive small constant (we set = 0.01 in this paper). The aim of adding and I is an identity matrix which makes the calculation of inverse matrix stable and also plays a role as a l2 norm constraint on ␣, which is believed to make coefficients more discriminative [5]. We can see that in the original CRC all training samples are used to represent a test sample, i.e. t = n. Then the test sample y is classified depending on the decision variable, or residual which is calculated as follows:
2 (i) ei = y − ˛ x j j j ∈ C (i)
(2)
where i ∈ C and C(i) is a index set for the training samples of the ith class which are used in representing the test image. The test image will be classified to the class which has the minimum decision variable. 3. The CRC STS method and interpretation In our two stage scheme of CRC STS, we select the first M most representative training samples from all training samples before performing CRC method. That is to say, CRC is performed on a selected subset of the training samples. The gist for sample selection is that the training samples which have high contribution to represent the test sample will be kept. That is to say, the contribution of a training sample indicates its significance and insignificance samples will be removed. In CR, we can view the coefficient ˛is (s = 1,. . ., t) as the significance of xis to representation of y. Therefore, we use CR to select the subset. For example, a straightforward way is to use a CR once, and then we select M training samples of the highest contribution (the larger the coefficient is, the more significant the corresponding training sample is), which we called one-round CR refining process (in brief, 1CR). However, these selected training samples are not exactly the M most significant training samples, which we will explain later. Therefore, we will propose a more general scheme. For finding the significant training samples more precisely, we need to do l + 1 rounds of CR refining process. In the first round of the CR refining process, we choose the first M + l*k significant training samples from the original training set. And then do l times CR refining process while at each time the k least significant training samples are removed. For example, the general scheme is identical to 1 CR approach when l = 0. The most precise way is to let k = 1 and l = n − M − 1, which means we need to do n − M times CR and remove the least significant training sample at each time. This is because the order of significance of the remaining training samples may rearrange after removing any training sample. That is to say, removing the insignificant training sample one by one is the most precisely way to assess the contribution of each training sample. In practice, the M training samples selected by using various times of CR refining process are different. In Fig. 1, we give a demonstration to show how the number of doing CR influences the recognition accuracy of CRC STS method. We can see that more times of CR will pick a better subset which increases the recognition accuracy. However, removing one insignificant training sample at
Fig. 1. The average classification error rates (%) of CRC STS using different rounds of CR refining process on the ORL database (for details, see 5 training sample experiments in Section 3). We test 10 different sizes of subset from 10 to 100 training samples. Note that if no CR refining process is used, i.e. original CRC method, the means of the mean classification error rate is 6.81%.
a time (k = 1) is too time-consuming. Therefore, we only perform a few times of CR for the training sample selection (for example l < 5, k 1), which can already produce a good recognition result. Next, we interpret the reason why performing CR refining process increases the recognition accuracy. The previous studies suggested that the face images of a subject tend to lie on a subspace [1,2]. Therefore, the training samples from the correct class are most significant or have high contribution to represent the test image. However, there are usually many subjects in a face recognition system. The more classes there are, the less contribution each class has to represent the test image. In other words, the significance of the samples from the correct class will decrease as the number of classes grows. Therefore, removing some insignificant samples will sharpen the significance of the samples from the correct class. In Fig. 2, we give an example to show why this is true. From Fig. 2 (a)–(d), how the representation coefficients change are shown for a four-rounds CR refining process which is to select 100 significant training samples for a given test sample from class 1. After each round of refining, we can see the coefficients of the first class increase and some coefficients corresponding to the insignificant training samples equals 0, which means that these samples has no contribution to represent the test sample any more. In other words, the representation of the test image becomes sparse and the contribution of the correct class is stressed. As a comparison, we show the coefficients of a one-round CR refining process where the contribution of correct class is not as significant as that of the four-round CR refining process. Therefore, the multiple round CR refining process can assess the significant of training samples more precisely. We also illustrate the residuals calculated by (2) in the stage of classification. In the original CRC method, since no refining process is used, the residual of class 1 is 0.612 which is higher than that of CTC STS with 1 CR and 4 CR, respectively. The residual of class 1 in CTC STS with 4 CR is the smallest, which proves again that the multiple round CR refining process is helpful to improve the discriminative power. The second stage of the CRC STS is a standard CR classificaM significant tion. Here, y is represented by using the selected ˜ = x(j1 ) , . . . , x(jM ) , ˇ = ˜ training samples, i.e. y ≈ X, where X T
[ˇi1 , . . . , ˇiM ] ,
j1 , . . . , jM
⊆ N, and
i1
i1 , . . . , iM T
iM
⊆ C. The coef-
X
+ I) ficient vector  is calculated by using  = (X compute the residual for each remaining class, 2 (i) ei = y − ˇ x j j j ∈ C¯ (i)
−1
T
y. Then, X
(3)
3312
J.-X. Mi / Optik 124 (2013) 3310–3313
Fig. 3. Samples of a subject from AT@T database.
Fig. 2. Example of classifying a test image of the first class in ORL database. (a)–(d) The representation coefficients for each round refining in a 4 CR which is to select 100 significant training samples (k = 30). (e) The representation coefficients obtained by 1 CR. (f) The residuals calculated for the 4 CR in classification stage. The blank bar means all the training samples in that class are removed. (g) The residuals calculated for 1 CR. (h) The residuals for the original CRC where no refining process is performed.
where C¯ (i) is a index set for the remaining training samples in the ith class. The testing sample y will be assigned to the class which has minimum residual. 4. Experiments We conducted experiments on three famous public face databases. The ORL database includes 40 subjects with 10 images
per subject (see Fig. 3). The images were resized to 46 × 56. We used 5 images of each class for training and the rest for test. Therefore there were 252 different possible combinations all of which were tested. Then we also used 6 images of each subject as training set, while the rest 4 were used as tests (all 210 combinations were tested). In the AR database, a subset (with only illumination and expression changes) consisting of 50 male subjects and 50 female subjects was chosen from the original database (see Fig. 4). We used the cropped face images and resized them to 40 × 50. For each subject, 7 images from Session 1 were used for training while the other 7 images from Session 2 were for testing [5]. In the FERET database, we used 200 individuals with 7 images per subject (image names are marked with two-character strings: ‘ba’, ‘bj’, ‘bk’, ‘bf’, ‘bd’, and ‘bg’) (see Fig. 5). We used 4 images of each class for training and the rest for test. Therefore there are 35 different possible combinations all of which were tested. The images were resized to 40 × 40. We compared CRC STS with other state of the art methods including LRC [2], SRC [1], and CRC (called CRC RLS in [5]). These three recent methods are representation based approaches and the latter two methods are CR-based approaches. For ORL database k is set to be 30, and for other two databases k is set to be 40, and l is set to be 3 in all experiments. We tested varying values of M ranged from 5% to 95% of all training samples (every 5% each time). Generally, best results are obtained when M is less than 30% of the training set, which proves that representing a test image sparsely increases the classification accuracy of the CR classification.
Fig. 4. Samples of a subject from the AR database. Samples in the first row are captured in Session 1 and those in the second row are from Session 2.
Fig. 5. The samples for a typical subject from FERET database.
J.-X. Mi / Optik 124 (2013) 3310–3313 Table 1 The average recognition rates (%) on ORL database.
5 training samples per class 6 training samples per class
CRC STS
SRC
LRC
CRC
95.87% 96.88%
93.65% 95.00%
95.02% 95.44%
93.19% 95.89%
Table 2 The average recognition rates (%) on AR database. CRC STS
SRC
LRC
CRC
77.74%
75%
76.14%
73.93%
3313
multiple-round refining scheme for selecting a good subset of trainings samples, which is very useful to increase the discriminative power of original CRC method. The CRC STS has shown good results when compared with other state of the art methods on ORL, AR, and FERET face databases. Moreover, our study provides an evidence to support the theory that reducing the training samples used in representing test images will help to improve the recognition accuracy of the CR-based classification methods. Acknowledgments This work was supported by the National Nature Science Committee of China under Grant Nos. 61202276, 61071179, 61203376, 61263032 and 61272292.
Table 3 The average recognition rates (%) on FERET database. CRC STS
SRC
LRC
CRC
72.28%
68.64%
73.30%
57.86%
All the experimental results are shown in Tables 1–3 which demonstrate two facts: Firstly, SRC and CRC STS usually outperform CRC, which indicates that representation of a test sample on a portion of training set benefits the CR-based methods. Secondly, using CR is quite effective to select a good training subset, which obviously raises recognition accuracy of the original CRC method. 5. Conclusion In conclusion, the proposed CRC STS is to represent a test image on the selected training samples of significance. We proposed a
References [1] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 210–227. [2] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010) 2106–2112. [3] Y. Xu, A.N. Zhong, J.A. Yang, D. Zhang, Bimodal biometrics based on a representation and recognition approach, Opt. Eng. 50 (2011). [4] Y. Xu, Q. Zhu, D. Zhang, Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments, Neurocomputing 74 (2011) 3946–3952. [5] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which helps face recognition? in: 2011 IEEE 13th International Conference on Computer Vision, 2011. [6] R. Rigamonti, M.A. Brown, V. Lepetit, Are sparse representations really relevant for image classification? in: 2011 IEEE Conference on Computer Vision, Pattern Recognition (CVPR), 2011, pp. 1545–1552. [7] Q. Shi, A. Eriksson, A. van den Hengel, C. Shen, Is face recognition really a compressive sensing problem? in: 2011 IEEE Conference on Computer Vision, Pattern Recognition (CVPR), 2011, pp. 553–560.