Face image recognition via collaborative representation on selected training samples

Optik 124 (2013) 3310–3313 Contents lists available at SciVerse ScienceDirect Optik journal homepage: www.elsevier.de/ijleo Face image recognition ...

Download PDF

846KB Sizes 2 Downloads 88 Views

Report

PDF Reader
Full Text

Optik 124 (2013) 3310–3313

Contents lists available at SciVerse ScienceDirect

Optik journal homepage: www.elsevier.de/ijleo

Face image recognition via collaborative representation on selected training samples Jian-Xun Mi a,b,∗ a b

Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, Guangdong Province, China Key Laboratory of Network Oriented Intelligent Computation, Shenzhen, Guangdong Province, China

a r t i c l e

i n f o

Article history: Received 16 May 2012 Accepted 14 October 2012

Keywords: Face recognition Image processing Machine vision

a b s t r a c t In this paper, we present a collaborative representation-based classiﬁcation on selected training samples (CRC STS) for face image recognition. The CRC STS uses a two stage scheme: The ﬁrst stage is to select some most signiﬁcant training samples from the original training set by using a multiple round of reﬁning process. The second stage is to use collaborative representation classiﬁer to perform classiﬁcation on the selected training samples. Our method can be regarded as a sparse representation approach but without imposing l1 -norm constraint on representation coefﬁcients. The experimental results on three well known face databases show that our method works very well. © 2012 Elsevier GmbH. All rights reserved.

1. Introduction In recent years, face recognition has become an active research area in biometrics. The identity of a person is automatically recognized from a digital image or a video frame from a video source of his/her face. After the acquisition of facial images, to build a robust classiﬁer is of key importance. A lot of recently proposed classiﬁers for face recognition make an assumption that the images of a subject tend to lie on a subspace [1,2]. Subspace models are ﬂexible to capture a lot of variations in face images, such as varying lighting and expression. A number of linear representation-based classiﬁcation methods built based on the subspace assumption are sufﬁcient for face recognition [2–5]. Furthermore, those methods could be more robust and more effective if test images are represented sparsely [1,5]. The sparse representation-based classiﬁcation (SRC), proposed in [1], is one of most famous methods for face recognition using training samples across different classes to perform collaborative representation (CR) with the sparsity constraint on the coding vector. Since the training samples from the correct class are most likely to represent a test image compactly and precisely, the authors in [1] argued that the sparse representation has the nature of discrimination. However, the very recent studies had different conclusions that to solely enforce sparsity is not useful for increasing recognition accuracy [5–7]. In [5], the authors held that the CR mechanism

is the main reason which helps to improve the face recognition not the sparsity constraint. However, when CR-based classiﬁcation (CRC) is used to perform face recognition, the result will become unstable while the number of classes or the number of training samples is too many. Therefore, if a certain technique is available to enforce the representation on only a portion of the entire training set, the accuracy of CR-based classiﬁers will be further improved. From this viewpoint, SRC implements sparsity constraint on coding vector to restrict the number of the training samples used to represent a test image. In this paper, we propose a two-stage scheme to perform CRC on a portion of training samples. Unlike SRC which is a single stage optimization problem achieving sparse representation (SR) and CR simultaneously, our ﬁrst stage is to select a subset from all training samples and then perform CRC on the selected training samples (CRC STS). For the ﬁrst stage, we propose a multiple-round reﬁning approach to select the most representative training samples. The second stage is a typical CRC process where the representation contribution of the correct class will be stressed, which leads to higher recognition accuracy. The experimental results show that the proposed CRC STS method is very competitive. This paper is organized as follows. Section 2 surveys the collaborative representation classiﬁcation method. Section 3 presents our CRC STS method. The experimental results are shown in Section 4.

2. The collaborative representation classiﬁcation method ∗ Correspondence address: Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, Guangdong Province, China. Tel.: +86 13590158600; fax: +86 755 26032461. E-mail address: [email protected] 0030-4026/$ – see front matter © 2012 Elsevier GmbH. All rights reserved. http://dx.doi.org/10.1016/j.ijleo.2012.10.051

First of all, we describe a general face recognition scenario. Suppose there are c classes and n training samples. Let C={1,. . ., c} be the label set and N={1,. . ., n} be the index set of training samples.

J.-X. Mi / Optik 124 (2013) 3310–3313

3311

We transform test image and training images into vectors which (m) are y ∈ Rq×1 and xi ∈ Rq×1 respectively where i ∈ N, m ∈ C, and q is the sample dimension (n < q). The collaborative representation (CR) classiﬁcation seeks to represent y collaboratively by using t training samples from dif-

ferent classes at ﬁrst, i.e. y ≈ X␣, where X = xi(j1 ) , . . . , xi(jt )

1

t

T

,

j1 , . . . , jt ⊆ C, i1 , . . . , it ⊆ N, and ˛ = [˛i1 , . . . , ˛it ]. The coefﬁcient vector ␣ can be estimated by:

−1

␣ = XT X + I

XT y

(1)

where is a positive small constant (we set = 0.01 in this paper). The aim of adding and I is an identity matrix which makes the calculation of inverse matrix stable and also plays a role as a l2 norm constraint on ␣, which is believed to make coefﬁcients more discriminative [5]. We can see that in the original CRC all training samples are used to represent a test sample, i.e. t = n. Then the test sample y is classiﬁed depending on the decision variable, or residual which is calculated as follows:

2 (i) ei = y − ˛ x j j j ∈ C (i)

(2)

where i ∈ C and C(i) is a index set for the training samples of the ith class which are used in representing the test image. The test image will be classiﬁed to the class which has the minimum decision variable. 3. The CRC STS method and interpretation In our two stage scheme of CRC STS, we select the ﬁrst M most representative training samples from all training samples before performing CRC method. That is to say, CRC is performed on a selected subset of the training samples. The gist for sample selection is that the training samples which have high contribution to represent the test sample will be kept. That is to say, the contribution of a training sample indicates its signiﬁcance and insigniﬁcance samples will be removed. In CR, we can view the coefﬁcient ˛is (s = 1,. . ., t) as the signiﬁcance of xis to representation of y. Therefore, we use CR to select the subset. For example, a straightforward way is to use a CR once, and then we select M training samples of the highest contribution (the larger the coefﬁcient is, the more signiﬁcant the corresponding training sample is), which we called one-round CR reﬁning process (in brief, 1CR). However, these selected training samples are not exactly the M most signiﬁcant training samples, which we will explain later. Therefore, we will propose a more general scheme. For ﬁnding the signiﬁcant training samples more precisely, we need to do l + 1 rounds of CR reﬁning process. In the ﬁrst round of the CR reﬁning process, we choose the ﬁrst M + l*k signiﬁcant training samples from the original training set. And then do l times CR reﬁning process while at each time the k least signiﬁcant training samples are removed. For example, the general scheme is identical to 1 CR approach when l = 0. The most precise way is to let k = 1 and l = n − M − 1, which means we need to do n − M times CR and remove the least signiﬁcant training sample at each time. This is because the order of signiﬁcance of the remaining training samples may rearrange after removing any training sample. That is to say, removing the insigniﬁcant training sample one by one is the most precisely way to assess the contribution of each training sample. In practice, the M training samples selected by using various times of CR reﬁning process are different. In Fig. 1, we give a demonstration to show how the number of doing CR inﬂuences the recognition accuracy of CRC STS method. We can see that more times of CR will pick a better subset which increases the recognition accuracy. However, removing one insigniﬁcant training sample at

Fig. 1. The average classiﬁcation error rates (%) of CRC STS using different rounds of CR reﬁning process on the ORL database (for details, see 5 training sample experiments in Section 3). We test 10 different sizes of subset from 10 to 100 training samples. Note that if no CR reﬁning process is used, i.e. original CRC method, the means of the mean classiﬁcation error rate is 6.81%.

a time (k = 1) is too time-consuming. Therefore, we only perform a few times of CR for the training sample selection (for example l < 5, k 1), which can already produce a good recognition result. Next, we interpret the reason why performing CR reﬁning process increases the recognition accuracy. The previous studies suggested that the face images of a subject tend to lie on a subspace [1,2]. Therefore, the training samples from the correct class are most signiﬁcant or have high contribution to represent the test image. However, there are usually many subjects in a face recognition system. The more classes there are, the less contribution each class has to represent the test image. In other words, the signiﬁcance of the samples from the correct class will decrease as the number of classes grows. Therefore, removing some insigniﬁcant samples will sharpen the signiﬁcance of the samples from the correct class. In Fig. 2, we give an example to show why this is true. From Fig. 2 (a)–(d), how the representation coefﬁcients change are shown for a four-rounds CR reﬁning process which is to select 100 signiﬁcant training samples for a given test sample from class 1. After each round of reﬁning, we can see the coefﬁcients of the ﬁrst class increase and some coefﬁcients corresponding to the insignificant training samples equals 0, which means that these samples has no contribution to represent the test sample any more. In other words, the representation of the test image becomes sparse and the contribution of the correct class is stressed. As a comparison, we show the coefﬁcients of a one-round CR reﬁning process where the contribution of correct class is not as signiﬁcant as that of the four-round CR reﬁning process. Therefore, the multiple round CR reﬁning process can assess the signiﬁcant of training samples more precisely. We also illustrate the residuals calculated by (2) in the stage of classiﬁcation. In the original CRC method, since no reﬁning process is used, the residual of class 1 is 0.612 which is higher than that of CTC STS with 1 CR and 4 CR, respectively. The residual of class 1 in CTC STS with 4 CR is the smallest, which proves again that the multiple round CR reﬁning process is helpful to improve the discriminative power. The second stage of the CRC STS is a standard CR classiﬁcaM signiﬁcant tion. Here, y is represented by using the selected ˜ = x(j1 ) , . . . , x(jM ) , ˇ = ˜ training samples, i.e. y ≈ X␤, where X T

[ˇi1 , . . . , ˇiM ] ,

j1 , . . . , jM

⊆ N, and

i1

i1 , . . . , iM T

iM

⊆ C. The coef-

X

+ I) ﬁcient vector ␤ is calculated by using ␤ = (X compute the residual for each remaining class, 2 (i) ei = y − ˇ x j j j ∈ C¯ (i)

−1

T

y. Then, X

(3)

3312

J.-X. Mi / Optik 124 (2013) 3310–3313

Fig. 3. Samples of a subject from AT@T database.

Fig. 2. Example of classifying a test image of the ﬁrst class in ORL database. (a)–(d) The representation coefﬁcients for each round reﬁning in a 4 CR which is to select 100 signiﬁcant training samples (k = 30). (e) The representation coefﬁcients obtained by 1 CR. (f) The residuals calculated for the 4 CR in classiﬁcation stage. The blank bar means all the training samples in that class are removed. (g) The residuals calculated for 1 CR. (h) The residuals for the original CRC where no reﬁning process is performed.

where C¯ (i) is a index set for the remaining training samples in the ith class. The testing sample y will be assigned to the class which has minimum residual. 4. Experiments We conducted experiments on three famous public face databases. The ORL database includes 40 subjects with 10 images

per subject (see Fig. 3). The images were resized to 46 × 56. We used 5 images of each class for training and the rest for test. Therefore there were 252 different possible combinations all of which were tested. Then we also used 6 images of each subject as training set, while the rest 4 were used as tests (all 210 combinations were tested). In the AR database, a subset (with only illumination and expression changes) consisting of 50 male subjects and 50 female subjects was chosen from the original database (see Fig. 4). We used the cropped face images and resized them to 40 × 50. For each subject, 7 images from Session 1 were used for training while the other 7 images from Session 2 were for testing [5]. In the FERET database, we used 200 individuals with 7 images per subject (image names are marked with two-character strings: ‘ba’, ‘bj’, ‘bk’, ‘bf’, ‘bd’, and ‘bg’) (see Fig. 5). We used 4 images of each class for training and the rest for test. Therefore there are 35 different possible combinations all of which were tested. The images were resized to 40 × 40. We compared CRC STS with other state of the art methods including LRC [2], SRC [1], and CRC (called CRC RLS in [5]). These three recent methods are representation based approaches and the latter two methods are CR-based approaches. For ORL database k is set to be 30, and for other two databases k is set to be 40, and l is set to be 3 in all experiments. We tested varying values of M ranged from 5% to 95% of all training samples (every 5% each time). Generally, best results are obtained when M is less than 30% of the training set, which proves that representing a test image sparsely increases the classiﬁcation accuracy of the CR classiﬁcation.

Fig. 4. Samples of a subject from the AR database. Samples in the ﬁrst row are captured in Session 1 and those in the second row are from Session 2.

Fig. 5. The samples for a typical subject from FERET database.

J.-X. Mi / Optik 124 (2013) 3310–3313 Table 1 The average recognition rates (%) on ORL database.

5 training samples per class 6 training samples per class

CRC STS

SRC

LRC

CRC

95.87% 96.88%

93.65% 95.00%

95.02% 95.44%

93.19% 95.89%

Table 2 The average recognition rates (%) on AR database. CRC STS

SRC

LRC

CRC

77.74%

75%

76.14%

73.93%

3313

multiple-round reﬁning scheme for selecting a good subset of trainings samples, which is very useful to increase the discriminative power of original CRC method. The CRC STS has shown good results when compared with other state of the art methods on ORL, AR, and FERET face databases. Moreover, our study provides an evidence to support the theory that reducing the training samples used in representing test images will help to improve the recognition accuracy of the CR-based classiﬁcation methods. Acknowledgments This work was supported by the National Nature Science Committee of China under Grant Nos. 61202276, 61071179, 61203376, 61263032 and 61272292.

Table 3 The average recognition rates (%) on FERET database. CRC STS

SRC

LRC

CRC

72.28%

68.64%

73.30%

57.86%

All the experimental results are shown in Tables 1–3 which demonstrate two facts: Firstly, SRC and CRC STS usually outperform CRC, which indicates that representation of a test sample on a portion of training set beneﬁts the CR-based methods. Secondly, using CR is quite effective to select a good training subset, which obviously raises recognition accuracy of the original CRC method. 5. Conclusion In conclusion, the proposed CRC STS is to represent a test image on the selected training samples of signiﬁcance. We proposed a

References [1] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009) 210–227. [2] I. Naseem, R. Togneri, M. Bennamoun, Linear regression for face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010) 2106–2112. [3] Y. Xu, A.N. Zhong, J.A. Yang, D. Zhang, Bimodal biometrics based on a representation and recognition approach, Opt. Eng. 50 (2011). [4] Y. Xu, Q. Zhu, D. Zhang, Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments, Neurocomputing 74 (2011) 3946–3952. [5] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which helps face recognition? in: 2011 IEEE 13th International Conference on Computer Vision, 2011. [6] R. Rigamonti, M.A. Brown, V. Lepetit, Are sparse representations really relevant for image classiﬁcation? in: 2011 IEEE Conference on Computer Vision, Pattern Recognition (CVPR), 2011, pp. 1545–1552. [7] Q. Shi, A. Eriksson, A. van den Hengel, C. Shen, Is face recognition really a compressive sensing problem? in: 2011 IEEE Conference on Computer Vision, Pattern Recognition (CVPR), 2011, pp. 553–560.

Face image recognition via collaborative representation on selected training samples

Face image recognition via collaborative representation on selected training samples

Recommend Documents