Applied Mathematics and Computation 219 (2013) 6410–6419
Contents lists available at SciVerse ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Median null(Sw )-based method for face feature recognition Jian-qiang Gao a,⇑, Li-ya Fan b, Li-zhong Xu a a b
College of Computer and Information Engineering, Hohai University, Nanjing 210098, PR China School of Mathematics Sciences, Liaocheng University, Liaocheng, Shandong 252059, PR China
a r t i c l e
i n f o
Keywords: M-N(Sw ) Linear discriminant analysis (LDA) Within-class median Null space Face recognition
a b s t r a c t With the progress of science and technology artificial intelligence is being paid more and more attention. People want to use computers to deal with complex practical problems. So, linear discriminant analysis (LDA) is widely used as a dimensionality reduction technique in image and text recognition classification tasks. However, a weakness of LDA model is that the class average vector in the formula completely depends on class sample average. Under special circumstances such as noise, bright light, some outliers will appear in the practical input databases. Therefore, by employing several given practical samples, the class sample average is not enough to estimate the class average accurately. So, the recognition performance of LDA model will decline. Compared to human intelligence, computers are far short of necessary fundamental knowledge of judgment which people normally acquire during the formative years of their lives. In order to solve the problem and also to render LDA model more robust, we propose a within-class scatter matrix null space median method (M-N(Sw )), which first transforms the original space by employing a basis of within-class scatter matrix null space, and then in the transformed space the maximum of between-class scatter matrix is pursued. In the second stage, within-class median vector is used in the traditional LDA model. Experiments on ORL, FERET and Yale face data sets are performed to test and evaluate the effectiveness of the proposed method. Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction Face recognition has been researched in many areas such as pattern recognition and computer vision. Because it is a natural and direct biometric approach. Under controlled or uncontrolled conditions, many developments have been made towards recognizing faces as described in [1–7]. Linear discriminant analysis (LDA) [8] and principal component analysis (PCA) [8] methods introduced by Turk and Pentland are two popular methods used in face recognition tasks. As we all know, PCA aims to generate a set of orthonormal projections by maximizing the covariance over all the sample. Therefore, it is an effective approach to represent each face image. However, from classification point of view, this method is not the best due to it does not make full use of the classification information. LDA is a well-known linear learning method, whose goal is seeking optimal linear projections vectors so that the follow fisher criterion of the between-class scatter versus the within-class scatter is maximized. So, it has got better recognition performance than PCA.
⇑ Corresponding author. E-mail address:
[email protected] (J.-q. Gao). 0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2013.01.005
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
JðWÞ ¼
W T Sb W W T Sw W
;
6411
ð1Þ
However, we often meet small sample size and high dimensional data problems in face classification and recognition tasks. Therefore, the traditional LDA cannot be used directly due to the fact that the within-class scatter matrix is always singular. In order to overcome the problem some corresponding methods were proposed [9–17]. LDA/QR [14] method introduced by Ye and Li is a popular approach used in face recognition. LDA/QR is a two-stage linear discriminant analysis learning approach, which aims to overcome the singularity problems of traditional linear discriminant analysis. It can achieve efficiency and stability simultaneously. A more effective null subspace discriminant method by Chen et al. [15] proposed to handle small sample size problem. A direct linear discriminant analysis method by Yu and Yang [10] proposed to deal with high dimensional image data. Song et al. [18] suggested a maximum scatter difference method, which adopts the difference of both within-class scatter and between-class scatter as discriminant criterion. Because the inverse matrix need not be calculated, the small sample size problem is avoided in nature. There is a common weakness in using these approaches should be mentioned, the inverse matrix of between-class, within-class and total scatter matrices must be calculated. Therefore, this is a quite complex procedure. An important problem in using the existing linear discriminant analysis models should be mentioned. In traditional linear discriminant analysis models, the class average vector is always estimated by the class sample average. However, by employing several given practical samples, the class sample average is not enough to estimate the class average accurately in particular when there are outliers in the images with noise and occlusion sample sets [19]. Li et al. [20] proposed a median-based maximum scatter difference method to deal with the complex procedure of within-class scatter matrix problem. We propose a two-stage linear discriminant analysis method called median-based null space of Sw . We call this method M-N(Sw ) for short. Thus, the M-N(Sw ) method should be more robust than the class sample average based traditional linear discriminant analysis models. Experiments on ORL, FERET and Yale face databases are performed to test and evaluate the robustness of the proposed method. The rest of this paper is organized as follows. The traditional linear discriminant analysis model is briefly introduced in Section 2. The current methods PCA and LDA/QR are produced in Section 3. Section 4 introduces the concept of median. Our proposed method is introduced in Section 5. Experimental results and conclusion are summarized in Sections 6 and 7, respectively. 2. Traditional linear discriminant analysis (TLDA) In this section, we first introduce some important notations used in this paper. For matrix A 2 RnN , we consider seeking a linear transformation G 2 Rnl that maps each ai of A to l-dimensional space yi 2 Rl with yi ¼ GT xi . Assume that the original data in A is partitioned into c classes as A ¼ ½A1 ; . . . ; Ac , where Ai 2 RnNi contains sample data points P from the ith class and N ¼ ci¼1 N i . Finding optimal transformation matrix G is a core problem of traditional linear discriminant analysis model, and then the class structure of the original high-dimensional space is preserved in the reduced-dimensional space. In discriminant analysis, the between-class, within-class and total scatter matrices are defined as follows (see [21]):
Sb ¼
c 1X Ni ðmi mÞðmi mÞT ¼ Hb HTb ; N i¼1
Sw ¼
c X i 1X ðxj mi Þðxji mi ÞT ¼ Hw HTw ; N i¼1 j¼1 i
ð2Þ
N
St ¼ Sb þ Sw ;
ð3Þ
ð4Þ
where the precursors Hb and Hw of the between-class and within-class scatter matrices in (2) and (3) are
i pffiffiffiffiffiffi 1 hpffiffiffiffiffiffi N 1 ðm1 mÞ; . . . ; Nc ðmc mÞ ; Hb ¼ pffiffiffiffi N
ð5Þ
1 Hw ¼ pffiffiffiffi A1 m1 eT1 ; . . . ; Ac mc eTc ; N
ð6Þ
ei ¼ ð1; . . . ; 1ÞT 2 RNi . Ai is the data matrix of the ith class, mi is the center of the ith class and m is the total centroid of the training sample set. It is worthwhile to note that the total scatter matrix St can be called covariance matrix in statistics. We can obtain the between-class, within-class and total scatter matrices Slb ¼ GT Sb G; Slw ¼ GT Sw G and Slt ¼ GT St G by using linear transformation matrix G, in the reduced-dimensional space, respectively. An optimal transformation G would
6412
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
maximize trace (Slb ) and minimize trace (Slw ). Common optimizations criterion in traditional linear discriminant analysis includes (see [21]):
max tracefðSlw Þ1 Slb g and min tracefðSlb Þ1 Slw g: G
ð7Þ
G
The optimization problems in (7) are equivalent to finding the generalized eigenvectors satisfying Sb x ¼ kSw x with k – 0. 1 The solution can be procured by using the eigen-decomposition to the matrix S1 w Sb if Sw is nonsingular or Sb Sw if Sb is nonsingular. 3. PCA and LDA/QR methods 3.1. Principal component analysis (PCA) method PCA technique is equal to a multiple of the so called Karhunen–Loeve method, which goals is choose a dimensionality reducing linear projection such that maximizes the scatter of all projected samples. Given a set of N samples image and let us assume that each image xk 2 Rn1 (k ¼ 1; 2; . . . ; N) belongs to one of c classes X ¼ fX 1 ; X 2 ; . . . ; X c g. Let us also consider a linear transformation mapping the original n-dimensional image space into a l-dimensional feature space, where l < n. The new feature vectors yk 2 Rl are defined by the following linear transformation.
yk ¼ W T xk ; nl
where W 2 R
ð8Þ
is a matrix with orthonormal columns. If the total scatter matrix ST ¼
1 N
PN
k¼1 ðxk
T
mÞðxk mÞ , where N is the
number of sample images and m 2 Rn is the average image of total samples, then after applying the linear transformation W T , the scatter of the transformed feature vectors of reduced-dimensional space y1 ; y2 ; . . . ; yN is W T ST W. In PCA, we maximize the determinant of the total scatter matrix of the projected samples by applying the projection W opt .
W opt ¼ arg maxjW T ST Wj ¼ ½w1 ; w2 ; . . . ; wl : W
ð9Þ
where fwi ji ¼ 1; 2; . . . ; lg is the set of n-dimension eigenvector of ST corresponding to the l largest eigenvalues. Since these eigenvectors have the same dimension as the original images, they are referred to as eigen-faces or eigen-pictures in reduced-dimensional space. Algorithm 1 shows the pseudo-code. Algorithm 1. PCA algorithm Input: Data matrix A 2 RnN Output: Reduced data matrix Al 1. Compute Hb ; Hw according to (5) and (6), respectively; 2. Compute St according to (2)–(4); 3. Compute W from the eigenvalue decomposition (EVD) of St : St ¼ W RW T ; 4. Assign the first l columns of W to G; 5. Al ¼ GT A.
3.2. LDA/QR method In this subsection, we review a linear learning algorithm by Ye and Li [14] proposed to handle small sample problem, namely, LDA/QR. It is an extension of TLDA. This approach has two stages, the first of which stage maximizes the separation between the different classes by orthogonal-triangular decomposition of a matrix [22] (that is QR decomposition). In the first stage, the aim of LDA/QR approach is to compute the optimal transformation matrix G. That is to solve the following optimization problem:
G ¼ arg max traceðGT Sb GÞ: GT G¼Il
ð10Þ
The second stage of LDA/QR approach refines the first stage via solving within-class scatter distance problem. The withinclass scatter information is incorporated to W by using a relaxation scheme. In order to obtain optimal W, we need to solve the following optimization problem:
W ¼ arg min traceððW T ðQ T Sb Q ÞWÞ1 ðW T ðQ T Sw Q ÞWÞÞ: W
ð11Þ
By using the eigen-decomposition skill to ðQ T Sb Q Þ1 ðQ T Sw Q Þ we can get W. The pseudo-code for this algorithm is given in Algorithm 2.
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
6413
Algorithm 2. LDA/QR algorithm Input: Data matrix A 2 RnN Output: Reduced data matrix Al 1. Compute Hb and Hw according to (5) and (6), respectively; 2. Apply QR decomposition to Hb as Hb ¼ QR, where Q 2 Rnp , R 2 Rpc , and p = rank(Hb ); 3. e S RRT ; e Sw Q T Hw H T Q ; b
w
e 4. Compute W from the EVD of e S 1 b S w with the corresponding eigenvalues sorted in nondecreasing order; 5. G QW, where W ¼ ½W 1 ; . . . ; W q and q = rankðSw Þ; 6. Al ¼ GT A.
4. Concept of median In this section, we briefly review the concept of median by Yang et al. [19] proposed to handle pattern recognition tasks. Median is the middle value in a distribution for a finite list of numbers, above and below which lie an equal number of values. In mathematics, median refers to the number that is located at the middle of a set of numbers that have been arranged in a descending order. In addition, in cases where there are two values in the middle then the mean of these two values is picked as the median. For the choice of median, we give two simple examples as follows: Case 1. The number of input data values is odd, for Example 9, we have: Input data A1 = {4.3, 4.0, 12, 4.1, 2.0, 4.2, 4.4, 4.6, 4.5}. Output ordered data A1 = {2.0, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 12}. Median data A1 = 4.3, Average data A1 = 4.9. Case 2. The number of input data values is even, for Example 10, we have: Input data A2 = {4.3, 4.0, 12, 4.1, 2.0, 4.2, 4.4, 4.6, 4.5, 4.7}. Output ordered data A1 = {2.0, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 12}. Median data A2 = (4.3 + 4.4)/2 = 4.35, Average data A2 = 4.88. Through the above examples, we can obtain following conclusions. Firstly, the median can also be used as an estimator of the central tendency such as the total average. Meanwhile, an important point in using the median should be seen. The robustness of the median outperforms the sample average for input value with outliers. Secondly, median does work better than the average when the outliers ‘‘2’’ and ‘‘12’’ exist in input data sets. In addition, there is an important fact that the median vector can be calculated by using the following procedure [19]. Given a random sequence of n-dimensional volume vectors Z 1 ; Z 2 ; . . . ; Z q , the following data matrix can be obtained.
0
1
z11
z12
z1q
B z21 B Z ¼ ðZ 1 ; Z 2 ; . . . ; Z q Þ ¼ B B .. @ .
z22 .. .
z2q C C C .. .. C . . A
zn1
zn2
znq
ð12Þ
So, the median vector of Z 1 ; Z 2 ; . . . ; Z q can be defined as M ¼ ðm1 ; m2 ; . . . ; mn ÞT . Where mi is the median of elements on the ith row of the input data matrix Z. Specifically, The symbol Median() denotes the median operator of a set numbers, that is mi ¼ Medianðfzi1 ; zi2 ; . . . ; ziq gÞ. 5. Proposed method M-N(S w ) In this section, we review the main idea of TLDA by Turk and Pentland proposed to handle high-dimensional data such as image, text and voice. In TLDA, the linear transformation is chosen to maximize the class separability in the reduced-dimensional space. Therefore, the criteria of maximize the between-class scatter and minimize the within-class scatter are formulated in dimension reduction TLDA model. As we all know, in TLDA models, the definitions of within-class scatter matrix (Sw ) and between-class scatter matrix (Sb ) employs class average matrix which is generally estimated via the class sample data average. Hence the class sample data average plays an important role in the construction of within-class scatter and between-class scatter matrices. Meanwhile, it also gives a large influence to the projection directions of TLDA model.
6414
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
Face recognition is a complex practical pattern classification tasks in that face images involve a lot of variations such as facial expression, occlusion, and the influences of noise, illumination and bright light are most serious in real-world environment. All the complicated realistic conditions will produce some outliers in training sample sets. So, in reality work of recognition tasks only a few of image samples are available of the class mean applying the class sample mean. Due to the inappropriate estimate of the class average, lots of negative effects should be received by TLDA models such as the robustness. Therefore, this issue becomes worse in particular when there are outliers in input sample sets. So, we propose M-N(Sw ) method, which first projects the original space into the null space of Sw applying an orthonormal basis of null (Sw ). And then in the projected space, a transformation that maximizes the between-class scatter is computed. The median is used only in P the second stage. We consider the eigenvalue decomposition (EVD) of Sw 2 Rnn : Sw ¼ U w w U Tw , partitioning U w , so
P X w1 Sw ¼ U w U Tw ¼ ½U w1 U w2 |{z}|{z} 0 w s1
ns1
0
"
U Tw1
#
U Tw2
0
;
ð13Þ
where s1 = rank(Sw ), null(Sw ) = span(U w2 ). First, the transformation by U w2 projects the original data to null (Sw ) and then the eigenvectors corresponding to the largest eigenvalues of the between-class scatter matrix e S b in the projected space are found.
e Sb ¼
c X e 0 Þðez i m e 0 ÞT ; Ni ðez i m
ð14Þ
i¼1
where c is the number of the pattern classes, N i is the number of training samples in class i; e z i is the projected space withine 0 is the average vector of all training samples of class median per class which can be calculated according to formula (12). m projected space. Let the eigenvalue decomposition (EVD) of e S b be
"P X f b1 g eT e e e e Sb ¼ U b U b ¼ ½ U b1 U b2 |{z} |{z} 0 b s2 ns1 s2
0 0
#"
# eT U b1 ; eT U
ð15Þ
b2
P f b1 2 Rs2 s2 . Then, the transformation matrix G is obtained by G ¼ U w2 U e eTU e e b1 , we call this where U b b ¼ I, s2 ¼ rankð S b Þ and method M-N (Sw ) as an abbreviation. The pseudo-code for this algorithm is given in Algorithm 3. Algorithm 3. M-N(Sw ) algorithm Input: Data matrix A 2 RnN Output: Reduced data matrix Al 1. Compute Hb and Hw according to 5 and 6, respectively; 2. Sw ¼ Hw HTw ; 3. Apply eigenvalue decomposition (EVD) with column pivoting to Sw as (13); 4. The transformation by U w2 projects the original data to null (Sw ); 5. Construct the matrix e S as in 14; b
6. Compute the EVD of e S b in 15; e 7. G U w2 U b1 ; 8. Al ¼ GT A.
6. Experiments and analysis In this section, in order to demonstrate the effective of the proposed method, we conduct a series of experiments with three popular face data sets. Meanwhile, we also compare the recognition performance of we proposed method with LDA/QR and PCA in an all-round way. 6.1. Experiments with ORL face database There are ten different images of 40 distinct subjects in ORL (or called AT & T) face image database [23]. For some subjects, the images were taken at different times, varying the lighting, facial details and facial expressions. The size of each images is 92 112 pixels, with 256 grey levels per pixel. In order to cut down the calculation of computer the face portion of each image is resized to 23 28 pixels. In our experiments, the training set and testing set are selected randomly for each individual. The number of training samples of each person is set as 3, 5, 7 and 9, respectively. Then the corresponding remaining samples are used for test.
6415
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
We repeat the recognition procedure 20 times by choosing different training sets and testing sets, respectively. In addition, the nearest neighbor classifier is used for classification. In the experiment, the number of discriminant vectors is set from 14 to 39 in three methods each time. For example, the average recognition rates (discriminant vector is 23) corresponding to each method versus the training samples number are shown in Table 1. Table 1 shows the recognition rates of PCA, LDA/QR and M-N(Sw ) with respect to the number of the training samples. Our proposed method achieves maximum recognition rate 97.25% when the number of the training samples is 9 and discriminant vector is 23. PCA and LDA/QR in [8,14], however, achieves its maximum recognition rate 96.75% and 96.87% when the number of the training samples is 9 and discriminant vector is 23, respectively. In addition, the number of training samples of each person also affects recognition performance for each method. Experimental results show M-N(Sw ) is more robust to outliers than traditional linear discriminant analysis model. In order to evaluate the performance of our propose method, we have done many experiments with different discriminant vectors. The experimental results are shown in Figs. 1 and 2, which show the recognition rates curves of PCA, LDA/ QR and M-N(Sw ). Figs. 1 and 2 show the recognition rates of PCA, LAD/QR and M-N(Sw ) with respect to the number of the discriminant vectors. Our proposed method nearly gives its best recognition effect when the number of the discriminant vectors is set from 14 to 39. In addition, when the number of training samples of each person is increasing, PCA, LDA/QR and M-N(Sw ) nearly have better recognition rates. Of course, we can see that our method achieves maximum recognition rate 98.50% when the discriminant vector is 39 and the number of the training samples image is 9. PCA and LDA/QR in [8,14], however, achieves its maximum recognition rate 97.50% and 97.25% when the number of the training samples is 9, respectively. [19] shows that the maximal recognition rate of median LDA is 97.0%, while M-N(Sw ) method we proposed is 98.50%. 6.2. Experiments with FERET face database. In this subsection, we briefly review the FERET face database [24]. The FERET face database was sponsored by the US department of defense through the DARPA program. It has become a standard database for testing and evaluating state of face recognition algorithms. Our proposed method is tested on a subset of the FERET database. This subset contains 1000 images (each person has five images) of 200 persons, which involves illumination, pose and variations in facial expression. In the experiment, 80 80 pixels image without histogram equalization is obtained by each original image. In order to cut down the calculation of computer the face portion of each image is resized again to 40 40 pixels. The training set and testing set are selected randomly for each person. The number of training samples of each person is set as 3 and 4, respectively. And then the corresponding remaining samples are used for test. The recognition system runs 20 times by choosing different
Table 1 The comparison of recognition rates (%) using PCA, LDA/QR, M-N(Sw ) with different number of training samples. Methods
3
5
7
9
PCA LDA/QR M-N(Sw )
86.18 82.96 86.20
92.82 92.02 93.15
95.21 95.71 95.96
96.75 96.87 97.25
89
94
88 93 Recognition accuracy(%)
Recognition accuracy(%)
87 86 85 84 83
92
91
90
82 81 80 10
89
PCA(training samples=3) LDA/QR(training samples=3) M−N(Sw)(training samples=3) 15 20 25 30 35 Number of discriminant vector (in ORL face database)
40
88 10
PCA(training samples=5) LDA/QR(training samples=5) M−N(Sw)(training samples=5) 15 20 25 30 35 Number of discriminant vector (in ORL face database)
Fig. 1. The recognition rates of PCA, LAD/QR and M-N(Sw ) in ORL face database (training samples = 3, 5).
40
6416
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419 96.5
98.5
98
95.5
Recognition accuracy(%)
Recognition accuracy(%)
96
95
94.5
94
93 10
15 20 25 30 35 Number of discriminant vector (in ORL face database)
97
96.5
96
PCA(training samples=7) LDA/QR(training samples=7) M−N(Sw)(training samples=7)
93.5
97.5
40
PCA(training samples=9) LDA/QR(training samples=9) M−N(Sw)(training samples=9)
95.5 10
15 20 25 30 35 Number of discriminant vector (in ORL face database)
40
Fig. 2. The recognition rates of PCA, LAD/QR and M-N(Sw ) in ORL face database (training samples = 7, 9).
Table 2 The comparison of recognition rates (%) using PCA, LDA/QR, M-N(Sw ) with different number of training samples and discriminant vectors. Methods
3
4
PCA LDA/QR discriminant vector = 14 M-N(Sw )
32.17 52.37 64.59
36.45 60.62 61.20
PCA LDA/QR discriminant vector = 15 M-N(Sw )
33.15 52.86 65.86
36.72 61.87 62.85
PCA LDA/QR discriminant vector = 16 M-N(Sw )
34.28 53.17 66.79
37.05 61.94 63.77
training sets and testing sets, respectively. Finally, the nearest neighbor classifier is used for classification. In addition, the number of discriminant vectors is set as 14, 15 and 16 in all methods each time, respectively. The average recognition rates corresponding to each approach versus the training samples number are shown in Table 2. Table 2 shows the recognition rates of PCA, LDA/QR and M-N(Sw ) with respect to the number of training samples in FERET face database. Our method achieves maximum recognition performance 66.79% when the number of the training samples is 3. PCA and LDA/QR in [8,14], however, achieves its maximum recognition rate 37.05% and 63.77% when the number of the training samples is 4, respectively. In addition, Table 2 indicates our method is better than PCA and LDA/QR. When the number of discriminant vectors is large, the recognition rates of PCA, LDA/QR and M-N(Sw ) can also be improved. It can be seen that the proposed method achieves higher recognition accuracy than other two methods. In addition, the recognition rate can be improved in the methods given in [8,14] with respect to the number of the training samples, but M-N(Sw ) method we proposed is not improved. Because the discriminant information of the null space of Sw maybe lose. Above all, the proposed method M-N(Sw ) is effective. Fig. 3 shows the recognition rates of the methods given in [8,14] and our method with respect to the number of the discriminant vectors. It is seen in Fig. 3 that our method generally works better than PCA, LDA/QR methods mentioned in [8,14]. 6.3. Experiments with Yale face database. Yale database [8] contains 165 images of 15 individuals under various lighting conditions and facial expressions. Each person has 11 different images and each image is manually cropped and resized to 32 32 pixels in our experiments. The training set and testing set are selected randomly for each individual. The number of training samples of each person is set as 3, 5, 7 and 9, respectively. Then the corresponding remaining samples are used for test. We repeat the recognition system 20 times by choosing different training sets and testing sets, respectively. Finally, the nearest neighbor classifier is used for classification. In addition, the number of discriminant vectors is set from 1 to 14 in all methods each time. The average recognition rates corresponding to each method versus the training samples number are shown in Figs. 4 and 5, respectively.
6417
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419 70
65
65 60 Recognition accuracy(%)
Recognition accuracy(%)
60 55 50 45 PCA(training samples=3) LDA/QR(training samples=3) M−N(Sw)(training samples=3)
40
55
50
PCA(training samples=4) LDA/QR(training samples=4) M−N(Sw)(training samples=4)
45
40
35 30 14
14.5 15 15.5 Number of discriminant vector (in FERET face database)
35 14
16
14.5 15 15.5 Number of discriminant vector (in FERET face database)
16
Fig. 3. The recognition rates of PCA, LAD/QR and M-N(Sw ) in FERET face database (training samples = 3, 4).
70
80
65 70
Recognition accuracy(%)
Recognition accuracy(%)
60 55 50 45 40 35
60
50
40
30
30 PCA(training samples=3) LDA/QR(training samples=3) M−N(Sw)(training samples=3)
25 20
0
PCA(training samples=5) LDA/QR(training samples=5) M−N(Sw)(training samples=5)
20
10
2 4 6 8 10 12 14 Number of discriminant vector (in Yale face database (size is 32x32))
0
2 4 6 8 10 12 14 Number of discriminant vector (in Yale face database (size is 32x32))
80
80
70
70
60
60
Recognition accuracy(%)
Recognition accuracy(%)
Fig. 4. The recognition rates of PCA, LAD/QR and M-N(Sw ) in Yale face database (training samples = 3, 5).
50
40
30
PCA(training samples=7) LDA/QR(training samples=7) M−N(Sw)(training samples=7)
20
10
0
2 4 6 8 10 12 14 Number of discriminant vector (in Yale face database (size is 32x32))
50
40
30
PCA(training samples=9) LDA/QR(training samples=9) M−N(Sw)(training samples=9)
20
10
0
2 4 6 8 10 12 14 Number of discriminant vector (in Yale face database (size is 32x32))
Fig. 5. The recognition rates of PCA, LAD/QR and M-N(Sw ) in Yale face database (training samples = 7, 9).
6418
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419 90 80
Recognition accuracy(%)
70 60 50 40 30 PCA(training samples=10) LDA/QR(training samples=10) M−N(Sw)(training samples=10)
20 10
0
2 4 6 8 10 12 14 Number of discriminant vector (in Yale face database (size is 32x32))
Fig. 6. The recognition rates of PCA, LAD/QR and M-N(Sw ) in Yale face database (training samples = 10).
Figs. 4 and 5 show the recognition rates of PCA and LDA/QR given in [8,14] and of our proposed method with respect to the number of the discriminant vectors. Our proposed method nearly gives its best performance when the number of the discriminant vectors is from 3 to 14. In addition, PCA, LDA/QR and M-N(Sw ) nearly have better recognition performance when the number of training samples of each person is increasing. Meanwhile, we can see that our method achieves maximum recognition performance when the discriminant vector is 14 and the number of the training samples is 9. In order to get reliable experimental results, the following experiments (10 training images per class) have been done using leave-one-out cross validation method. The recognition rate corresponding to each method is illustrated in Fig. 6. In addition, there is a quite important fact that the recognition rates may be unreliable by using LDA/QR method. Because e S b (in Algorithm 2) matrix is close to singular or badly scaled. It is seen in Fig. 6 that our method generally works better than PCA, LDA/QR methods mentioned in [8,14] when discriminant vectors are not more than 7. In addition, from a practical point of view, M-N(Sw ) offers a stable algorithm to handle recognition tasks. Therefore, we should consider the stability of the recognition system. So, M-N(Sw ) method we proposed has stable recognition rates especially when it is based on a few of given practical samples. 7. Conclusion In this paper, a novel feature extraction method M-N(Sw ) is proposed. In M-N(Sw ) method, we first transforms the original space by applying a basis of null (Sw ), and then in the transformed space the maximum of between-class scatter is pursued. In the second stage, within-class median vector is used in the traditional linear discriminant analysis model. There are two advantages of within-class median, one is that it preserves useful details in the sample images, another is robust to outliers that exist in input images. Therefore, the proposed method is more robust than traditional linear discriminant analysis. We conduct the experiments on three popular data sets (ORL database, subset of FERET database and Yale database), using PCA, LDA/QR and the proposed method; the experimental results indicate that the proposed method has better recognition rates. In addition, from a practical point of view, we should find a stable method to handle recognition tasks. The future work on this subject will be investigating the influence of model parameter and kernel function in the face feature recognition problems. In addition, exploring the new algorithms to solve the corresponding optimization problems also is a further research direction. Acknowledgments First of all, we thank the anonymous reviews for their constructive comments and suggestions. In addition, this work is supported by National Natural Science Foundation of China (10871226), Natural Science Foundation of Shandong Province (ZR2009AL006) and Young and Middle-Aged Scientists Research Foundation of Shandong Province (BS2010SF004), PR China. References [1] Q.X. Gao, L. Zhang, D. Zhang, Face recognition using FLDA with single training image per person, Applied Mathematics and Computation 205 (2) (2008) 726–734. [2] M. Koc, A. Barkana, A new solution to one sample problem in face recognition using FLDA, Applied Mathematics and Computation 217 (2011) 10368– 10376. [3] X.Y. Tan, S.C. Chen, Face recognition from a single image per person: a survery, Pattern Recognition 39 (2006) 1725–1745.
J.-q. Gao et al. / Applied Mathematics and Computation 219 (2013) 6410–6419
6419
[4] P.J. Phillips et al., Overview of the face recognition grand challenge, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 947–954. [5] W. Zhao et al, Face pattern: a literature survery, Computing Surverys 35 (4) (2003) 399–458. [6] Q.S. Liu, H.Q. Lu, S.D. Ma, Improving kernel fisher discriminant analysis for face recognition, IEEE Transaction on Circuits and Systems for Video Technology 14 (1) (2004) 42–49. [7] S.C. Yan, D. Xu, B.Y. Zhang, H.J. Zhang, et al, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (1) (2007) 40–51. [8] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [9] L.F. Chen, H.Y.M. Liao, M.T. Ko, et al, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition Letters 33 (10) (2000) 1713–1726. [10] H. Yu, J. Yang, A direct LDA algorithm for high dimensional data with application to face recognition, Pattern Recognition Letters 34 (10) (2001) 2067– 2070. [11] H. Yu, J.Y. Yang, Why can LDA be performed in PCA transformed space, Pattern Recognition Letters 36 (2) (2003) 563–566. [12] J. Yang, Z. Jin, J.Y. Yang, et al, Essence of kernel fisher discriminant: KPCA plus LDA, Pattern Recognition Letters 37 (2004) 2097–2100. [13] F.X. Song, D. Zhang, et al, A parameterized direct LDA and its application to face recognition, Neurocomputing 71 (2007) 191–196. [14] J.P. Ye, Q. Li, A two-stage linear discriminant analysis via QR-decomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (6) (2005) 929–941. [15] L. Chen, H.M. Liao, M. Ko, J. Lin, G. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition 33 (2000) 1713–1726. [16] J.Q. Gao, L.Y. Fan, Kernel-based weighted discriminant analysis with QR decomposition and its application face recognition, WSEAS Transactions on Mathematics 10 (10) (2011) 358–367. [17] J.Q. Gao, L.Y. Fan, L.Z. Xu, Solving the face recognition problem using QR factorization, WSEAS Transactions on Mathematics 8 (11) (2012) 728–737. [18] F.X. Song, K. Cheng, J.Y. Yang, et al, Maximum scatter difference large margin linear projection and support vector machines, Acta Automatica Sinica 30 (6) (2004) 890–896 (in Chinese). [19] J. Yang, D. Zhang, J.Y. Yang, Median LDA: a robust feature extraction method for face recognition, in: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2006, pp. 4208–4213. [20] X.D. Li, S.M. Fei, T. Zhang, Median MSD-based method for face recognition, Neurocomputing 72 (2009) 3930–3934. [21] K. Fukunaga, Introduction to Statistical Pattern Classification, Academic Press, San Diego, Calif, 1990. [22] G.H. Golub, C.F. Vanloan, Matrix Computations, third ed., The Johns Hopkins Univ. Press, 1996. [23] ORL Face Database, AT&T Laboratories Cambridge, 1992–1994. [24] P.J. Phillips, H. Moon, S.A. Rizvi, P. Rauss, The FERET evaluation methodology for face recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2000) 1090–1104.