Pattern Recognition 37 (2004) 1949 – 1952
Rapid and Brief Communication
www.elsevier.com/locate/patcog
An analytical algorithm for determining the generalized optimal set of discriminant vectors Wu Xiao-Juna; b; c;∗ , Josef Kittlerc , Yang Jing-Yud , Wang Shi-Tongd a School
of Electronics and Information, East China Shipbuilding Institute, No. 2 Huancheng Road, Zhenjiang 212003, China Laboratory, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110015, China c CVSSP, University of Surrey, Surrey GU2 7XH, UK d School of Information, Nanjing University of Science & Technology, Nanjing 210094, China
b Robotics
Received 11 July 2003; accepted 25 July 2003
Abstract Generalized linear discriminant analysis has been successfully used as a dimensionality reduction technique in many classi5cation tasks. An analytical method for 5nding the optimal set of generalized discriminant vectors is proposed in this paper. Compared with other methods, the proposed method has the advantage of requiring less computational time and achieving higher recognition rates. The results of experiments conducted on the Olivetti Research Lab facial database show the e:ectiveness of the proposed method. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Feature extraction; Generalized optimal discriminant vectors; Face recognition; LDA
1. Introduction Feature extraction is the most popular and fundamental problem in pattern recognition [1]. Turk and Pentland used ‘eigenfaces’ as bases for constructing the features for human face recognition [1]. Foley–Sammon transform (FST) has been considered as one of the most e:ective methods of dimensionality reduction in terms of discriminatory content of the extracted features [2]. Liu proposed a new class separability criterion leading to a generalized optimal set of discriminant vectors for linear feature extraction. A uni5ed approach to optimizing this criterion is developed in Ref. [2]. Unfortunately, due to the discriminant vectors being calculated step-by-step in Ref. [3], the features yielded by the axes of the generalized optimal set do not exhibit the best separability in the global sense [3]. Based on the generalized ∗ Corresponding author. School of Electronics and Information, East China Shipbuilding Institute, No. 2 Huancheng Road, Zhenjiang 212003, China. Tel.: +86-511-440-1616; fax: +86-511-440-4905. E-mail address: wu
[email protected] (W. Xiao-Jun).
Fisher discriminant criterion, a generalized Foley–Sammon transform (GFST) was proposed as well as an iterative algorithm for determining the corresponding set of discriminant vectors for linear feature extraction in face recognition in Ref. [3]. Unfortunately, the iterative method is computationally demanding. In this paper, we show that the optimal discriminant vectors can be obtained simultaneously. We propose an analytical solution to the generalized optimal set of discriminant vectors, which is the complete solution to the problem of GFST. The results of experiments conducted on the Olivetti Research Lab (ORL) facial database show the e:ectiveness of the proposed algorithm. The rest of the paper is organized as follows. An analytical solution to the generalized discriminant vectors is proposed in Section 2. Experimental results are presented in Section 3 and conclusions are drawn in Section 4. 2. An analytical solution to the generalized optimal set of discriminant vectors Let w1 ; w2 ; : : : ; wm be m known pattern classes, and X = {xi } i = 1; 2; : : : ; N be a set of n-dimensional samples. Each
0031-3203/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2003.07.014
1950
W. Xiao-Jun et al. / Pattern Recognition 37 (2004) 1949 – 1952
xi in X belongs to a class wj , i.e. xi ∈ wj , i = 1; 2; : : : ; N , j = 1; 2; : : : ; m. Suppose the mean vector, the covariance matrix and a priori probability of class wi are mi , ci , P(wi ), respectively. Then, the between-class scatter matrix Sb , the within-class scatter matrix Sw , and the population scatter matrix St can be determined by the maximum-likelihood (ML) estimation. Denition 1 (Guo [3]). Let r T tr(T Sb ) i=1 ’i Sb ’i J () = ; = r T T tr( Sw ) i=1 ’i Sw ’i
(1)
˜ = max J (); J ()
(2)
˜ y = x;
(3)
Let i = Q’i , then ’i = Q−1 i , thus, r r iT (Q−1 )T Sb Q−1 i iT S˜b i i=1 = JQ() = i=1 r r T T i=1 i i i=1 i i =
tr(AT S˜b A) ; tr(AT A)
where A = (1 ; : : : ; r ). Di:erentiate J () with respect to A, we have d(JQ()) 2S˜b Atr(AT A) − 2Atr(AT S˜b A) : = A [tr(AT A)]2 Let d JQ() dA
=0
and
where = (’1 ; ’2 ; : : : ; ’r ) and ˜ = (’˜ 1 ; ’˜ 2 ; : : : ; ’˜ r ). ’1 ; ’2 ; : : : ; ’r and ’˜ 1 ; ’˜ 2 ; : : : ; ’˜ r are unit orthogonal column vectors in the n-dimensional space, Then J () is called the generalized Fisher discriminant function, ’˜ 1 ; ’˜ 2 ; : : : ; ’˜ r are the generalized optimal discriminant vectors, and (3) is the so-called GFST [3]. Lemma 1 (Wang [4]). Assume A is a matrix of size n × n, X is a matrix variable of size n × m, then d (tr(X T AX )) = (A + AT )X: dX Corollary 1. Under the assumption of A being a symmetric matrix, then from Lemma 1, d (tr(X T AX )) = 2AX: dX Lemma 2 (Wang [4]). Assume A is a real positive matrix, then there exists an invertible matrix Q, such that A=QT Q. Theorem. Let Sb be a real symmetric matrix, St be a real positive matrix, then r r ’Ti Sb ’i max(JQ()) = max i=1 = i ; r T i=1 ’i St ’i i=1 where ’i = Q−1 i , 1 ¿ 2 ¿ · · · ¿ r are the eigenvalues of matrix S˜b = (Q−1 )T Sb Q−1 , i is the eigenvector of S˜b corresponding to i and St = QT Q. Proof. St is a positive matrix, so there exists an invertible matrix Q such that St = QT Q according to Lemma 2. Thus, we obtain r r T ’Ti Sb ’i i=1 ’i Sb ’i JQ() = i=1 = : r r T T T i=1 ’i St ’i i=1 ’i Q Q’i
=
tr(AT S˜b A) ; tr(AT A)
then S˜b A = A: Therefore, max(JQ()) = max
r r ’Ti Sb ’i i=1 = i : r T i=1 ’i St ’i i=1
However, the population scatter matrix St is not always invertible, especially in the small sample size problem case. The following algorithm is designed to solve for the generalized optimal discriminant vectors based on the above theorem. Case 1: St is nonsingular In this case, St−1 (0) = { | St = 0} = ;
St−1 (0) = Rn
and St is a positive-de5nite matrix. So there exists an invertible matrix Q such that St = QT Q, and √ 1 −1 .. Q= (1 ; 2 ; : : : ; n ) ; . √ n where 1 ; 2 ; : : : ; n are eigenvalues of St , i is the eigenvector of St corresponding to i . Compute 1 ; : : : ; r , which are the eigenvectors of (Q−1 )T Sb Q−1 corresponding to the r largest eigenvalues. Then ’1 ; : : : ; ’r are the generalized optimal discriminant vectors according to the above analysis. Case 2: St is singular Suppose St−1 (0) = span{1 ; : : : ; k }, St−1 (0) = span{1 ; : : : ; n−k }; where 1 ; : : : ; k ; 1 ; : : : ; n−k are both orthogonal unit vectors.
W. Xiao-Jun et al. / Pattern Recognition 37 (2004) 1949 – 1952
Since ∀ ∈ St−1 (0), T Sb = T Sw = 0, the vectors in contribute nothing to classi5cation, the generalized optimal discriminant vectors should be selected from St−1 (0). ∀ ∈ St−1 (0), St−1 (0)
ˆ = a1 1 + a2 2 + · · · + an−k n−k = P ; where P = (1 ; 2 ; : : : ; n−k );
ˆ = (a1 ; a2 ; : : : ; an−k )T :
Let ’l = P ’ˆ l , l = 1; 2; : : : ; n, in the expressions for JQ(). Then in the subspace of St−1 (0), we have n ’ˆ Tl (P T Sb P)’ˆ l ˆ JQ() = l=1 ≡ JQ(); n ˆ Tl (P T St P)’ˆ l l=1 ’ where ˆ = (’ˆ 1 ; : : : ; ’ˆ n ). It is obvious that P T St P is a positive-de5nite matrix. Analogous to Case 1, ˜ˆ = (’˜ˆ 1 ; ’˜ˆ 2 ; : : : ; ’˜ˆ n ) can be calculated. It is interesting to note that the proposed algorithm is related to the feature extraction method based on the Karhunen –Loeve transformation discussed in Ref. [5].
3. Numerical experiments and analysis The above algorithm is adopted to extract the feature of human face images for face recognition. In all experiments, an image of size m×n is reduced to an n-dimensional vector with the image discriminant analysis method by Liu [2]. In order to test the performance of the proposed algorithm, the present method, Liu’s method, and Guo’s method are used on the reduced space, respectively. Each transformed sample set is tested by the minimum distance classi5er designed on the subspace spanned by the discriminant vectors calculated by the relevant method. In each experiment, we 5rst take a part of the sample set as training samples to calculate the optimal discriminant vectors and to design the minimum
1951
distance classi5er, then use all samples of the sample set totest the classi5er. The experiments are made on ORL face database (http://www.cam-orl.co.uk/facedatabase.html) which can be used freely for academic research. The Cambridge ORL database contains 40 distinct persons, each person having 10 di:erent images, taken at di:erent conditions. The number of features is c-1 in all the experiments, where c is the number of the classes to be classi5ed. Table 1 shows the experimental results using three methods: Liu’s, Guo’s and the proposed method respectively. The experimental results show that the present method is more eScient than the other methods mentioned above in terms of both classi5cation rate and computation time.
4. Conclusion This paper proposes an analytical algorithm for 5nding the optimal set of generalized discriminant vectors for high-dimensional data classi5cation, with application to face recognition. The computational complexity is quite low because all the generalized optimal discriminant vectors can be obtained simultaneously with the proposed algorithm. The algorithm is non-iterative, in contrast to Ref. [3]. Moreover, the proposed algorithm yields a much higher recognition rate.
Acknowledgements This work was supported in part by the following sources: EU Project Banca, National Natural Science Foundation of P. R. China (Grant No. 60072034), Robotics Laboratory, Chinese Academy of Sciences foundation (Grant No. RL200108), Natural Science Foundation of Jiangsu Province, P. R. China (Grant No. BK2002001), and University’s Natural Science research program of Jiangsu Province, P. R. China (Grant
Table 1 Comparison of the performance of several methods (ORL face database) No. of classes
5 11 17 23 29 34 38 40
No. of discriminant vectors
4 10 16 22 28 33 37 39
No. of training samples
4 4 4 4 4 4 4 4
No. of misclassi5ed samples and computation time Liu’s method [2]
Guo’s method [3]
This paper
0 1 20 47 71 85 95 92
0 1 19 47 71 87 80 93
0 0 15 18 17 20 21 33
12.64 20.77 24.39 21.65 13.23 17.79 21.69 23.83
3.57 3.40 4.23 4.83 1.76 1.87 1.70 2.14
3.73 3.13 3.08 2.85 0.77 0.82 1.05 0.88
1952
W. Xiao-Jun et al. / Pattern Recognition 37 (2004) 1949 – 1952
No. 01KJB520002), Open foundation of Image Processing and Image Communication Lab (Grant No. KJS03038). References [1] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci. 3(1) (1991) 71–86. [2] K. Liu, Y.Q. Cheng, J.Y. Yang, A generalized optimal set of discriminant vectors, Pattern Recognition 25(1) (1992) 731–739.
[3] Yuefei Guo, Shijin Li, Jingyu Yang, et al., A generalized Foley–Sammon transform based on generalized 5sher discriminant criterion and its application to face recognition, Pattern Recognition Lett. 24 (1–3) (2003) 147–158. [4] Wang Genglu, Shi Rongchang, Theory of Matrices, National Defense Industry Press, Beijing, 1988. [5] P. Devijver, J. Kittler, Pattern Recognition: A statistical Approach, Prentice-Hall, London, 1982.
About the Author—WU XIAOJUN received the B.S. degree in mathematics from Nanjing Normal University, Nanjing, P.R. China in 1991 and M.S. degree in Engineering in 1996, and Ph.D. in Pattern Recognition and Intelligent System in 2002, both from Nanjing University of Science and Technology, Nanjing, P.R. China, respectively. Since 1996, he has been teaching in East China Shipbuilding Institute where he is an exceptionally promoted Associate Professor in charge of the Department of Computer Science and Technology. He has published more than 50 papers. Currently, he is a visiting researcher in CVSSP, University of Surrey. His current research interests are pattern recognition, fuzzy systems, neural networks and intelligent systems. About the Author—JOSEF KITTLER graduated from the University of Cambridge in Electrical Engineering in 1971, where he also obtained his Ph.D. in Pattern Recognition in 1974, and the Sc.D. degree in 1991. He joined the Department of Electrical Engineering of Surrey University in 1986 where he is a professor in charge of the Centre for Vision, Speech, and Signal Processing. He has worked on various theoretical aspects of pattern recognition and on many applications including automatic inspection, ECG diagnosis, remote sensing, robotics, speech recognition, and document processing. His current research interests include pattern recognition, image processing, and computer vision. He has co-authored a book with the title Pattern Recognition: A Statistical Approach published by the Prentice-Hall. He has published more than 500 papers. He is a member of the editorial boards of Image and Vision Computing, Pattern Recognition Letters, Pattern Recognition and Arti5cial Intelligence, Pattern Analysis and Applications, and Machine Vision and Applications. About the Author—YANG JINGYU received the B.S. degree in Computer Science from Harbin Institute of Military Engineering, Harbin, China. From 1982 to 1984 he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. From 1993 to 1994 he was a visiting professor at the Department of Computer Science, Missuria University. He was a visiting professor at Concordia University in Canada in 1998. He is currently a professor and Chairman of the Faculty of Information at Nanjing University of Science and Technology. His current research interests are in the areas of pattern recognition, robot vision, image processing, information fusion, and arti5cial intelligence. About the Author—WANG SHITONG received the B.S. degree and M.S. degree in Computer Science from Nanjing University of Airspace and Aeronautics, Nanjing, P.R.China in 1984 and in 1987, respectively. His current research interests are pattern recognition, fuzzy systems, neural networks and intelligent systems. Since 1995, Prof. Wang has been a visiting professor in London University, Bristol University, HongKong University of Science and Technology, HongKong Polytechnic University, and several universities in Japan.