Optik 140 (2017) 853–859
Contents lists available at ScienceDirect
Optik journal homepage: www.elsevier.de/ijleo
Original research article
Weighted sparse representation based on virtual test samples for face recognition Ningbo Zhu, Shuoxuan Chen ∗ College of Information Science and Engineering, Hunan University, Changsha, China
a r t i c l e
i n f o
Article history: Received 3 December 2016 Accepted 4 May 2017 Keywords: Face recognition Sparse representation Virtual sample Automatic weight assignment
a b s t r a c t Face recognition with a very limited or even one training sample per subject is a very difficult task and it seems very challengeable to arise the accuracy of face recognition in such a condition. In this paper, we propose a novel weighted sparse representation method based on virtual test samples for face recognition. The presented method includes three steps. Firstly, generating virtual test samples for original test samples, and computing the distance between the test sample and each training sample to build a weighted training set. Secondly, representing the test sample over the weighted training set. Finally, computing the weight of each test sample and then conducting classification. The use of virtual samples of each individual allows us to get more distinguishing features and to obtain facial variations information from the external data. The used weight plays a role in enhancing the importance of these training images closer to a query image in representing this query image. An important advantage of the proposed approach is that the weight of each test sample is dynamically computed, instead of manual setting. Extensive experiments on YALE, AR and FERET face databases indicate that the proposed approach outperforms the other methods used in competition. © 2017 Elsevier GmbH. All rights reserved.
1. Introduction As a very active and important topic in computer vision, face recognition (FR) has a large amounts of applications, containing access control, social network, human-computer interface, criminal investigation, etc [1–3]. During the past two decades, thousands of algorithms and methods have been proposed for FR [4–14]. Recent years, sparse representation based classification (SRC) [15] has been widely applied in face recognition due to its excellent performance. SRC [15] used a linear combination of all the training images to represent the query face image, and then to compute the residual of each class in representing the query image, finally to classify the query image according to the residual of all classes. This method boosts the study of sparsity based pattern classification [16–22]. Gao et al. [17] combined the Gaussian kernel function with sparse coding for FR, while Yang et al. [18] used the image local Gabor feature for SRC with a related Gabor occlusion set to deal with the occluded face images. Yang and Zhang [19] proposed a robust sparse coding to solve different types of outliers (e.g., expression, occlusion, pose angle, etc.). Sparse representation based face recognition methods can achieve interesting results when each class provides enough training samples. However, in some practical applications, there is just a very limited, or even a single training sample for
∗ Corresponding author. E-mail address:
[email protected] (S. Chen). http://dx.doi.org/10.1016/j.ijleo.2017.05.032 0030-4026/© 2017 Elsevier GmbH. All rights reserved.
854
N. Zhu, S. Chen / Optik 140 (2017) 853–859
each individual. The recognition accuracy of SRC declined sharply with the training samples per person decreasing. The main reason of this phenomenon is that we can’t be good enough to predict facial variations information of the query image from the limited training samples. Thus, many literatures [29–31] used virtual samples to improve the recognition accuracy when each individual just provided very limited training images. Tang and Zhu [31] expanded the training set by adding random noise to the original training images, while Xu et al. [29] exploited ‘symmetrical face’ to build a symmetric training set, then used the original training set and symmetric training set to perform FR. Although the approaches in [29,31] notly uprated the performance of FR compared with the original SRC method, they didn’t emphasize the distinctiveness of different training samples when building classifiers. In the field of pattern recognition, different training samples usually make their own unique contributions to building classifiers. Some scholars suggested that we should assign different weights to different samples according to the distribution of the sample when building classifiers. We noticed that these training images which are nearer to a query image are more important than others in representing this query image [23–28]. Fan et al. [28] constructed a weighted training set to represent and classify the test sample. Xu and Zhang [27] used a two-stage test sample classification algorithm for FR. This method in [27] firstly finds out M training images which are much closer to the coming query image from the training set, then exploits the selected M training images to represent and perform classification for the query image. The algorithm in [27] shows excellent performance in FR, but leaves a question that how many nearest neighbors should be selected to represent and perform classification for the query image which results in the best classification. In this paper, we propose a weighted sparse representation based on virtual test samples for face recognition. The use of virtual samples of each individual allows us to get more distinguishing features and to obtain facial variations information from the external data. The presented scheme contains three main steps. In the first place, it produces virtual test images for original test image, and computes the weight of each training image. Then new training set is built. Next, this method represents the test samples via new training sets and computes the residuals of each class. The last step of this method computes the weight of each test sample and then conducts classification. Extensive experiments indicate that our method is better than the other competing approaches. It is notable that the proposed method adopts weighted score level fusion to automatically and dynamically set a weight to each test image and is very easy to be used to practical applications. The proposed idea and scheme to determine the weight also seems to be helpful to improving other approaches. The organization of the rest parts follows: Section 2 represents the proposed scheme. Section 3 analyzes the method. Section 4 performs the experiments and Section 5 concludes this paper. 2. The weighted sparse representation method based on virtual test samples In this section, the weighted sparse representation method based on virtual test samples (WSRVTS) will be formally introduced in detail. The WSRVTS includes three main steps. Firstly, WSRVTS generates virtual test samples for original test sample, and computes the weight of each training sample. Then new training set is built. Secondly, WSRVTS represents the test samples and computes the residuals of each class. Finally, WSRVTS computes the weight of each test sample and then conducts classification. 2.1. Generating virtual test samples and producing new training sets One can view the face image as a two-dimensional data matrix Rr×c , where r and c are the row and the column of the face image, respectively. For any coming test sample, denote the original test sample by y1 = y. We use the symmetry transformation [29] to produce y3 two virtual test images y2 and Virtual test sample, and original test sample satisfy the following relationships:
y2 (i, j) =
y3 (i, j) =
y1 (i, j), i = 1, 2, ..., r
j = 1, 2, ..., c/2
y1 (i, col − j + 1), i = 1, 2, ..., r
j = c/2 + 1, c/2 + 2, ..., c
y1 (i, col − j + 1), i = 1, 2, ..., r
j = 1, 2, ..., c/2
y1 (i, j), i = 1, 2, ..., r
j = c/2 + 1, c/2 + 2, ..., c
(1)
(2)
where i and j are coordinates of X-axis and Y-axis, respectively. Denote the training set by X = x1 , x2 , . . ., xn , where n denotes the size of training images and xi is the ith training image in training set X. Next, we computes the distance between each query image and each training image by Eq. (3): d(xi , yj ) = exp(−||xi − yj ||22 /2 2 ), j = 1, 2, 3
i = 1, 2, ..., n
(3)
where , the Gaussian kernel width, is simply set as the average Euclidean distance [28] of all the training images in training set X in our experiment. From Eq. (3), we acquire that the value of the Gaussian kernel distance [30] between any two images ranges from 0 to 1, so we set this value as the weight of the training image. That is, for any training sample xi ∈ Rm , its weight is denoted by wi and equals to d xi , yj . Thus, we get the new training set by computing the weight of each training image. Xj = [w1 x1 , w2 x2 , ..., wn xn ]
(4)
N. Zhu, S. Chen / Optik 140 (2017) 853–859
855
Table 1 The WSRVTS algorithm. Input: The query sample y, train set X and regularization parameter . Output: The class label of sample y. • Normalize each column of X and y to have unit l2 -norm. • Given test sample y, denote the original test sample by y1 =y, and use it to generate two ‘symmetrical face’ test samples y2 and y3 . • Use the Gaussian kernel distance to determine the weight value for each training sample and to generate new training set X1 , X2 and X3 for y1 , y2 and y3 , respectively. • Code yj over Xj (j = 1,2,3) via l1 -minimization. ˇˆj = ˇj ˇj , s.t. y − Xj ˇj < (10) j
2
2
• Compute the residuals by Eq. (7). where ˇik is the coding coefficient vector associated with class k in training set Xi . • Compute the weight for each test sample by Eq. (8). • Output the label of sample y by Eq. (9).
2.2. Representing the test samples and computing the residuals As same as the classical SRC algorithm, we represent a query image as a linear combination of all training images via Eq. (5): yj = Xj ˇj
(5)
We can solve ˇj by the following Eq. (6): ˇj = (Xj T Xj + I)
−1
Xj T yj
(6)
where is the regularization parameter and I is the identity with the same size as XjT Xj . Every class has its own contribution to coding the query image as a linear combination of all training images. And the contribution is related with: ek (yj ) = ||yj − Xjk ˇjk ||2
(7)
where ˇjk denotes the coding coefficient vector related with class k in the training set and the residual is denoted by ek (yi ) which represents the contribution of the kth class in the training set. A greater ek (yi ) signifies a lower contribution to coding the query image. 2.3. Computing the weight of each test sample for classification Considering the fact that different test samples (i.e., original test sample or either of two corresponding virtual test samples) might play different roles in building classifiers, we propose to set a weight to each test image, and the weight is dynamically computed by Eq. (8): wj = where 2 =
1 exp(−||yj − Xj ˇj ||22 /2 2 ) 2 1 3
3
(8)
||yj − Xj ˇj ||22 . Concretely, a test sample will have a small weight when the reconstruction error of this test
j=1
sample is big, and therefore the influence of this test sample in building classifier will be suppressed. Finally, we identify the query image by Eq. (9): Label (y) = arg mink
3
wj ek (yj )
(9)
j=1
The WSRVTS algorithm is summarized in Table 1. 3. Analysis of WSRVTS The proposed scheme will be discussed and analyzed specifically in this section. The previous work [29–31] have proved that generating virtual images of each individual can gain more distinguishing features and capture facial variation from the out data. Benefiting from this knowledge, we use the original test sample to produce virtual test samples to extract more various information about the test sample and design a scheme to set a weight for each test sample. Figs. 1–3 have showed some original test images and the corresponding virtual test images in the YALE, AR and FERET databases, respectively. From above figures, one can observe that the virtual query images generated from the original query
856
N. Zhu, S. Chen / Optik 140 (2017) 853–859
Fig. 1. Some images (first row) in the YALE face database, and the two virtual test images are showed on the second and third row respectively.
Fig. 2. Some images (first row) in the AR face database, and the two virtual test images are showed on the second and third row respectively.
Fig. 3. Some images (first row) in the FERET face database, and the two virtual test images are showed on the second and third row respectively.
image indeed reflect some possible changes of the image in poses, expressions as well as illuminations. Therefore, the virtual images are useful for improving the performance of FR. We observed that some training images which are nearer to a query image tend to be more important in representing this query image than other training images [23–26]. Therefore, the proposed method uses this distance or similarity to set the weight of training samples. Actually, it assumes that the greater the distance the smaller the weight. This assumption is implemented via a Gaussian function on the query image and a training image. An important advantage of the algorithm WSRVTS is that the weight of each query sample is dynamically computed, instead of manual setting. As we all know, to set proper values for parameters is a hard problem and optimal values of parameters always vary with the dataset. Thus, to automatically set values of parameters is an important point of our method. This means that our method is very easy to use for real-world applications. The presented scheme WSRVTS can be seen as a strong classifier which consists of three weak classifiers. Concretely, each test sample (i.e., original test sample or either of two corresponding virtual test samples) can be seen as an independent
N. Zhu, S. Chen / Optik 140 (2017) 853–859
857
Table 2 Means of the rates (%) of recognition errors of different methods on YALE face database. Method
n=1
n=2
n=3
n=4
n=5
SRMVS [31] MNNC [36] TPMTS[27](M = 10) TPMTS[27](M = 20) TPMTS[27](M = 40) WSRC [28] WSRVTS
32.24 32.61 33.21 * * 32.36 29.09
12.44 16.00 16.37 15.66 * 12.89 8.07
10.17 13.83 13.83 13.42 11.75 10.75 5.75
6.19 10.38 12.10 11.33 8.48 6.29 3.43
4.11 10.67 11.78 13.00 8.89 4.00 3.22
The symbol ’*’ in the table means that the method TPMTS cannot be tested in some conditions. Table 3 Means of the rates (%) of recognition errors of different methods on AR face database (part 1). Method
n=1
n=2
n=3
n=4
n=5
SRMVS [31] MNNC [36] TPMTS[27](M = 10) TPMTS[27](M = 20) TPMTS[27](M = 40) WSRC [28] WSRVTS
28.43 28.19 28.38 30.30 26.82 27.83 26.02
14.04 17.04 14.33 13.92 14.50 12.04 9.42
9.14 11.05 10.73 10.05 8.82 8.59 7.05
7.40 8.75 8.05 8.00 7.50 6.40 5.40
4.67 6.17 6.33 5.72 5.44 4.06 4.17
sample for classification, and a weak classifier can be learned from each test sample, finally, the classifier on all test samples are combined to decide the final result [32]. A large number of experiments on YALE, AR and FERET databases will demonstrate that the proposed scheme is superior to the other methods. 4. Experimental results We analyzed the performance of the proposed algorithm WSRVTS on three face databases, i.e., YALE [33], AR [34] and FERET [35]. We compare WSRVTS with sparse representation method based on virtual samples (SRMVS) [31], modified nearest neighbor classifier (MNNC) [36], two-phase test image sparse representation method (TPTSR) [27], and a weighted sparse representation based face classification (WSRC) [28]. For each face database, n images of all the s images per individual are selected as training images, while the rest images s(s−1) · · · (s−n+1) are selected as query images. Thus there are Csn = different combinations. For the sake of simplicity, we just n(n−1) · · · 1 select randomly ten combinations when n equals to 2, 3, 4 or 5 in the following experiment. 4.1. Yale database The Yale face database [33] consists of 165face images of 15 subjects. These images are all captured under different illumination conditions. Each face image is resized to 100 × 100. In TPMTS [27], the neighbors’ number is selected as M (M = 10, 20, 40) to show the effect of different number of neighbors to the performance of this method. Table 2 shows the recognition results of all the methods and the best recognition result is highlighted with the bold italics on the corresponding training subset in the table. The symbol ‘*’ in the table means that the method TPMTS can’t be tested in some conditions. For example, we can’t select 20 or 40 neighbors to evaluate the performance of TPMTS when the training images’ count per person equals to 1. It can be observed that the method WSRVTS obtains the best recognition result under all training subset. For example, the recognition error rate of WSRVTS is lower than that of SRMVS, MNNC, TPMTS and WSRC by 4.42%, 8.08%, 6% and 5% respectively when the training images’ count equals to 3. The essential reasons of better performance of WSRVTS is that the method exploits the similarity relationship between the query image and each training image, and that the method extracts more various information about the test sample by producing corresponding virtual test samples. 4.2. AR database The AR face database [34] consists of about 4000 color face images from 126 subjects, which consists of the images with a variety of lightings, facial expressions and disguises. Each image is changed to the grayscale image and resized to 165 × 120. We divided the database into two parts as following. In part 1, for each subject, 14 face samples just with different illuminations and facial expressions are chosen to test the performance of WSRVTS. In part 2, for each individual,12 face images only with different disguises (i.e., scarf and sunglass) are selected to evaluate the performance of WSRVTS. The recognition results on part 1 and part 2 are reported in Table 3 and Table 4, respectively. In part 1, WSRVTS exhibits better recognition result than the other methods. For instance, the recognition error rate of WSRVTS is lower than that of SRMVS, MNNC, TPMTS and WSRC by 4.62%, 7.62%, 4.5% and 2.62% respectively when the training images’ count equals to
858
N. Zhu, S. Chen / Optik 140 (2017) 853–859
Table 4 Means of the rates (%) of recognition errors of different methods on AR face database (part 2). Method
n=1
n=2
n=3
n=4
n=5
SRMVS [31] MNNC [36] TPMTS[27](M = 10) TPMTS [27](M = 20) TPMTS [27](M = 40) WSRC [28] WSRVTS
64.17 64.47 65.19 64.55 62.19 64.85 61.70
42.25 48.40 50.50 49.55 42.95 45.10 42.95
23.50 32.94 32.94 30.28 26.83 24.00 19.78
9.19 19.62 18.31 15.63 10.88 10.19 6.44
5.71 13.29 13.50 8.29 7.36 6.57 5.00
Table 5 Means of the rates (%) of recognition errors of different methods on FERET face database. Method
n=1
n=2
n=3
n=4
n=5
SRMVS [31] MNNC [36] TPMTS [27](M = 10) TPMTS [27](M = 20) TPMTS [27](M = 40) WSRC [28] WSRVTS
60.19 60.24 55.95 55.95 57.64 55.45 54.12
40.76 43.04 38.56 36.36 36.34 36.24 35.02
30.75 31.58 27.10 25.20 24.85 24.90 23.03
28.93 29.63 22.50 21.43 20.43 22.23 20.50
23.90 22.75 15.95 15.15 14.40 15.65 14.52
2. In part 2, compared with the recognition results on part 1, the performance of all the methods is much worse when the training images’ count is not larger than 3. But the performance of all the methods have improved dramatically with the training samples increasing. And the presented scheme WSRVTS, on the whole, achieves the best results. The recognition error rate of WSRVTS is lower than that of SRMVS, MNNC, TPMTS and WSRC by 3.72%, 13.16%, 7.05% and 4.22% respectively when the training images’ count equals to 3. 4.3. FERET database The FERET face database [35] includes 1400 face images of 200 persons. A subset with 100 individuals are used randomly to evaluate the performance of the proposed method WSRVTS and the other competing methods. Each face image is resized to 40 × 40. The recognition results of different all the approaches are reported in Table 5. The best recognition result is highlighted with the bold italics on each training subset in the table. It can be observed that the proposed approach WSRVTS obtains the best recognition result when the training images’ count equals to 1, 2 and 3, respectively. And TPMTS achieves the best result when the training images’ count equals to 4 and 5, respectively. The experiments on FERET face database with different lightings, pose angles and facial expression imply the effectiveness of WSRVTS. 5. Conclusions A weighted sparse representation method based on virtual test samples for face recognition is proposed in this paper. The approach utilizes virtual samples to learn facial variations from the test image, and uses the distance information between the query image and each training image as the weight of the corresponding training image to build new training sets. The presented approach can be seen as a strong classifier which consists of three weak classifiers. That is, each test sample (i.e., original test sample or either of two corresponding virtual test samples) can be viewed as an independent sample for classification, and a weak classifier can be obtained from each test sample, finally, the weights of all weak classifiers are dynamically computed to form a strong classifier, which outputs the final result. A lot of experiments show that the performance of the proposed approach outperforms that of the other methods. Acknowledgements This work was supported by the National Natural Science Foundation of China under Grant No. 61572177. References [1] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey, ACM Comput. Surv. 35 (2003) 399–458. [2] W. Gao, S. Shan, in: David Zhang (Ed.), Face Verification for Access Control, Biometrics Solutions for Authentication in an E-World, Kluwer Academic Publishers, 2002, pp. 339–376. [3] Y. Xu, A. Zhong, J. Yang, D. Zhang, LPP solution schemes for use with face recognition, Pattern Recogn. 43 (2010) 4165–4176. [4] D. Tao, X. Tang, Kernel full-space biased discriminant analysis, Proc. IEEE Int’l Conf. Multimedia and Expo. 2 (2004) 1287–1290. [5] J. Yang, D. Zhang, A.F. Frangi, J. Yang, P.C.A. Two-dimensional, A new approach to appearance-based face representation and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 131–137. [6] Y. Xu, D. Zhang, J. Yang, A feature extraction method for use with bimodal biometrics, Pattern Recogn. 43 (2010) 1106–1115.
N. Zhu, S. Chen / Optik 140 (2017) 853–859
859
[7] Q. Ye, N. Ye, H. Zhang, C. Zhao, Fast orthogonal linear discriminant analysis with applications to image classification, International Joint Conference on Neural Networks (2014) 299–306. [8] Y. Xu, D. Zhang, Represent and fuse bimodal biometric images at the feature level: complex-matrix-based fusion scheme, Opt. Eng. 49 (3) (2010) 037002. [9] S.W. Park, M. Savvides, A multifactor extension of linear discriminant analysis for face recognition under varying pose and illumination, EURASIP J. Adv. Signal Process. (2010). [10] M. Debruyne, T. Verdonck, Robust kernel principal component analysis and classification, Adv. Data Anal. Classification 4 (2010) 151–167. [11] N. Sun, H. Wang, Z. Ji, C. Zou, L. Zhao, An efficient algorithm for Kernel two-dimensional principal component analysis, Neural Comput. Appl. 17 (2008) 59–64. [12] Y. Xu, D. Zhang, F. Song, J. Yang, Z. Jing, M. Li, A method for speeding up feature extraction based on KPCA, Neurocomputing 70 (2007) 1056–1061. [13] K.I. Kim, K. Jung, H.J. Kim, Face recognition using kernel principal component analysis, IEEE Trans. Signal Proc. Let. 9 (2002) 40–42. ¨ [14] K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf, Anintroduction to kernel-based learning algorithms, IEEE Trans. Neural Netw. 12 (2001) 181–201. [15] J. Wright, A. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE PAMI 31 (2009) 210–227. [16] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, A survey of sparse representation: algorithms and applications, IEEE Access 3 (2015) 490–530. [17] S. Gao, W.H. Tsang, L.T. Chia, Kernel sparse representation for image classification and face recognition, ECCV (2010). [18] M. Yang, L. Zhang, Gabor feature based sparse representation for face recognition with Gabor occlusion dictionary, ECCV (2010). [19] M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, CVPR (2011). [20] R. Rigamonti, M. Brown, V. Lepetit, Are sparse representations really relevant for image classification? CVPR (2011). [21] X. Jie, J. Yang, A nonnegative sparse representation based fuzzy similar neighbor classifier, Neurocomputing 99 (2013) 76–86. [22] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which helps face recognition? ICCV (2011). [23] X. Yang, Q. Song, Y. Wang, A weighted support vector machine for data classification, Int. J. Pattern Recogn. Artif. Intell. 21 (2007) 961–976. [24] R. Paredes, E. Vidal, Learning weighted metrics to minimize nearest-neighbor classification error, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 1100–1110. [25] Q. Hu, P. Zhu, Y. Yang, D. Yu, Large-margin nearest neighbor classifiers via sample weight learning, Neurocomputing 74 (2011) 656–660. [26] J. Wang, P. Neskovic, L.N. Cooper, Improving nearest neighbor rule with a simple adaptive distance measure, Pattern Recogn. Lett. 28 (2007) 207–213. [27] Y. Xu, D. Zhang, J. Yang, J.Y. Yang, A two-phase test sample sparse representation method for use with face recognition: IEEE Trans, Circuits Syst. Video Technol. 21 (2011) 1255–1262. [28] Z. Fan, M. Ni, Q. Zhu, E. Liu, Weighted sparse representation for face recognition, Neurocomputing 151 (2015) 304–309. [29] Y. Xu, X. Zhu, Z. Li, G. Liu, Y. Lu, H. Liu, Using the original and ‘symmetrical face’ training samples to perform representation based two-step face recognition, Pattern Recogn. 46 (2013) 1151–1158. [30] N. Zhu, T. Tang, S. Tang, D. Tang, F. Yu, A sparse representation method based on kernel and virtual samples for face recognition, Optik 124 (2013) 6236–6241. [31] D. Tang, N. Zhu, Y. Fu, W. Chen, T. Tang, A novel sparse representation method based on virtual samples for face recognition, Neural Comput. Appl. 24 (2014) 513–519. [32] P. Zhu, M. Yang, L. Zhang, I.Y. Lee, Local generic representation for face recognition with single sample per person, In ACCV (2016) 2014. [33]
. [34] A. Martinez, The ar face database, in: CVC Technical Report 24, 1998. [35] . [36] Y. Xu, Q. Zhu, Y. Chen, J.S. Pan, An improvement to the nearest neighbor classifier and face recognition experiments, Int. J. Innovative Comput. Inform. Control 9 (2013) 543–554.