Pattern Recognition 36 (2003) 1675 – 1678
Rapid and Brief Communication
www.elsevier.com/locate/patcog
Face recognition based on a group decision-making combination approach Xiao-Yuan Jinga , David Zhanga;∗ , Jing-Yu Yangb a Department
of Computing, Center for Multimedia Signal Processing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, People’s Republic of China b Department of Computer, Nanjing University of Science and Technology, Nanjing 210094, People’s Republic of China Received 6 August 2002; accepted 20 August 2002
Abstract This paper proposes a novel and real-time classi2ers combination approach, group decision-making combination (GDC) approach, which can dynamically select the classi2ers and perform linear combination. We also prove that the orthogonal wavelet transform can be regarded as an e6ective image’s preprocessing tool adapted to classi2ers combination. GDC has been successfully used for face recognition, which can improve on the recognition rate for the algebraic features. Experiment results also show that it is superior to the conventional combination method, majority voting method. ? 2003 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Face recognition; Algebraic feature; Facial image preprocessing; Group decision-making combination (GDC) approach
1. Introduction Face recognition is a signi2cant research topic in terms of its value in theoretical areas and in practical applications. The survey on human and machine recognition of faces [1] indicates that the algebraic feature re?ecting an interior attribute of an image is a kind of intrinsic feature. Both singular value decomposition (SVD) and 2sherface are two commonly used algebraic features, which are applied to face recognition and have obtained e6ects in a certain extent. In this paper, we design a group decision-making combination (GDC) approach to further improve their classi2cation performance. Classi2ers combination has received considerable attention and application in the past decade. In this 2eld, how to perform classi2er selection and linear combination are two important research topics. ∗
Corresponding author. Tel.: +852-2766-7271; fax: +8522774-0842. E-mail addresses:
[email protected] (X.-Y. Jing),
[email protected] (D. Zhang).
Kuncheva presented two types of classi2er selection methods including static and dynamic classi2er selection [2]. A method for linearly combining multiple neural network classi2ers based on statistical pattern recognition theory is also proposed by Ueda [3]. Here, our approach will simultaneously take advantages of the techniques of both dynamic classi2er selection and linear combination. In fact, the group decision-making method has been applied to the decision problem with M schemes and N principles [4]. However, it is di6erent from our current research topic in the 2eld of pattern recognition. Therefore, GDC is a modi2ed group decision-making method that is more suitable for the given combination problems.
2. Facial image preprocessing On classi2ers combination, it is important to construct uncorrelated classi2ers as much as possible. This is because that there will be less redundant information between them and more reliable to obtain combination results. Hence, extracting the uncorrelated image feature is the 2rst-line work
0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 2 ) 0 0 2 8 7 - X
1676
X.-Y. Jing et al. / Pattern Recognition 36 (2003) 1675 – 1678
Table 1 Correlation analysis of SVD features from four sub-images Correlation value of SVD feature
Four sub-images Low-frequency Horizontal Vertical Diagonal
Four sub-images Low-frequency
Horizontal
Vertical
Diagonal
1.0 0.2182 0.3471 0.1229
0.2182 1.0 0.0725 0.0316
0.3471 0.0725 1.0 0.0229
0.1229 0.0316 0.0229 1.0
Fig. 1. Typical example with 12 face images for one person in our database.
for constructing them. The orthogonal wavelet transform not only has the function to eliminate the correlation and unique capability to decompose, but also retains the original information from an image. Consequently, we take this transform as our image’s preprocessing tool. A conventional Daubechies’ orthogonal wavelet base is selected to perform a discrete wavelet transform to a facial image. In this way, four sub-images can be obtained, including the low-frequency one and the others with three di6erent directions like horizontal, vertical and diagonal. Then, we separately extract their algebraic features with the same dimension. In order to verify the e6ectiveness of this preprocessing method, the following formula is applied to evaluate the correlation between the features: Rxy = |(E(xT • y) − E(xT ) • E(y)|:
(1)
An example on the correlation analysis of SVD features is shown in Table 1, where we use one of NUST603 face databases [5] of the Nanjing University of Science and Technology, China. It contains 18 people, each 12 images with 64 × 64 pixels. Fig. 1 shows a typical example of images for one person, where there are signi2cant changes in both the facial expression and the pose, and also small changes in illumination and in the relative distance between the camera and the subject. From Table 1, we can see that the SVD feature of the low-frequency sub-image has a relatively larger correlation with the others, whose mutual correlation is, however, rather low. We believe that although the 2rst one contains the main description and classi2cation information of the original image, the others containing important di6erence information
will perform the supplementary action on classi2ers combination. In order to construct the classi2ers, we extract SVD feature and 2sherface feature separately, and use the nearest neighbor method to perform their classi2cation. 3. GDC approach First, we utilize the Shannon entropy as the divisibility measurement for every classi2er. Therefore, an estimation of the posterior probability should be provided. Since the nearest neighbor classi2cation method is employed, we can obtain the estimation value using the minimal distance between every class and the test sample, that is the smaller the distance is, the larger the sample’s posterior probability for the corresponding class. Assume that c is the class number, di is the distance between the ith class and the test sample, and ui is the estimation of posterior probability. Thus, the objective function is c c uim d2i and ui = 1; (2) J= i=1
i=1
where m is the fuzzy index and m ¿ 1. Solving the above equation to obtain the maximum value of J , we have 1=(m−1) 1=(m−1) c 1 1 ui = : (3) d2j d2j j=1 Generally, m = 2. Accordingly, the Shannon entropy value can be computed. Next, the similarity between classi2ers should be evaluated. Several de2nitions are given below.
X.-Y. Jing et al. / Pattern Recognition 36 (2003) 1675 – 1678
Denition 4. Suppose that I G is the group consistency index, then c G G d I = S(x ; x ) c: (7)
Denition 1. Suppose that two classi2ers x and y are c-dimensional vectors, then xT · y cos(x; y) = ; sin(x; y) = 1 − cos2 (x; y); |x| · |y| S(x; y) = 1 − sin(x; y);
d=1
(4)
Denition 5. If I G ¿ , we say that the group is consistent in Level . Otherwise, we say the group is inconsistent in Level . The value of is determined according to the speci2c details of the application. The key idea of GDC is: First, under the constraint of I G ¿ , 2nd a consistent decision for the whole group. This means either using all classi2ers that can provide a commonly supported opinion, or selecting part of them which can provide an opinion for the majority. Second, according to I d , perform linear classi2ers combination. The concrete algorithm is as follows.
where S(x; y) is the similarity measurement between x and y (0 6 S(x; y) 6 1). Denition 2. Assume that xd (d=1; 2; : : : ; c) represents dth classi2er, hd is its entropy value, and I d is its individual consistency index in the group, which is de2ned as c d d r I = S(x ; x ) (c − 1) (1=hd ): (5) r=1; r=d
Clearly, both the average similarity between an individual and the group and individual divisibility are simultaneously considered to de2ne I d . The larger the value is for I d , then the more representative that dth classi2er is for the group. Furthermore, I d is its weight value for linear combination.
Step 1: Compute I d (d = l; 2; : : : ; c) and sort them in descending order, then compute xG . Step 2: Compute I G . If I G ¿ , then xG is the result of GDC, exit. Else, go to Step 3. Step 3: Assume that I 1 ¿ I 2 ¿ · · · ¿ I c , let t k = I k =I k(L) , where k = 1; 2; : : : ; c − 1 and I k(L) is the average value of all the components ranking behind I d in the vector (I 1 ; I 2 ; : : : I c ), that is I k(L) = (I k+1 + I k+2 + · · · + I c )=(c − k).
Denition 3. The combination result of GDC can be de2ned as c c d xG = (I d · xd ) I : (6) d=1
Recognition rate
d=1
1677
95 90 85 80 75 70 65 60 55 50
Sub1 Sub2 Sub3 Sub4 Orig MVM GDC
4
Recognition rate
(a)
6
95 90 85 80 75 70 65 60 55
Sub1 Sub2 Sub3 Sub4 Orig MVM GDC 4
(b)
5 Training number per class
5
6
Training number per class
Fig. 2. Comparison of the recognition rates (%) from the di6erent methods using two features. (a) Results of SVD and (b) results of 2sherface. (Sub1—low-resolution sub-image, Sub2—horizontal direction, Sub3—vertical direction, Sub4—diagonal direction, Orig—original image, MVM—majority voting method and GDC—group decision-making combination.)
1678
X.-Y. Jing et al. / Pattern Recognition 36 (2003) 1675 – 1678
Let T = maxk {t k } = maxk {I k =I k(L) }, then the 2rst k classi2ers are selected to form a sub-group that plays the dominant role in the whole group. Using this new group where classi2er’s number c is assigned to a value k, just repeat Step 1. Hence, a new xG is achieved, which is the result of GDC. 4. Experiment results Fig. 2 shows the recognition results of two features, SVD and 2sherface, where is set to be 0.8. It is noted that except the training samples randomly chosen per class, the rest ones in every class are taken as test samples to compute the recognition rate. In order to reduce variation, each experiment is repeated at least 10 times. As to SVD feature, the maximum improvement for GDC over the original classi2cation result is 10:7% (=85:4 − 74:7%) where the number of training samples per class is 4, and that for GDC over MVM is 5:8% (=85:4 − 79:6%) where the number is also 4. Next, as to Fisherface feature, the maximum improvement for GDC over the original classi2cation result is 10:1% (=91:2 − 81:1%) where the number is 5, and that for GDC over MVM is 7:4% (=93:1 − 85:7%) where the number is 6. Besides, similar to MVM, GDC needs very short computing time, which is less than 2:0 s for all test samples on Pentium III computer. It is evident that our approach is an e6ective and real-time combination approach.
Acknowledgements We would like to thank Nanjing University of Science and Technology for its face database. The work is partially supported by UGC/CRC fund from the HKSAR Government and the central fund from Hong Kong Polytechnic University. References [1] R. Chellappa, C. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proc. IEEE 83 (5) (1995) 705–740. [2] L.I. Kuncheva, Switching between selection and fusion in combining classi2ers: an experiment, IEEE Trans. System Man Cybern. Part B 32 (2) (2002) 146–156. [3] N. Ueda, Optimal linear combination of neural networks for improving classi2cation performance, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2) (2000) 207–215. [4] R.C. Kwok, J. Ma, D. Zhou, Improving group decision making: a fuzzy GSS approach, IEEE Trans. System Man Cybern. Part C 32 (1) (2002) 54–63. [5] Z. Jin, J. Yang, Z. Hu, Z. Lou, Face recognition based on the uncorrelated discriminant transformation, Pattern Recognition 34 (7) (2001) 1405–1416.