Pattern Recognition 34 (2001) 2041}2047
A theorem on the uncorrelated optimal discriminant vectors Zhong Jin*, Jing-Yu Yang, Zhen-Min Tang, Zhong-Shan Hu Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, People's Republic of China Received 28 March 2000; accepted 11 September 2000
Abstract This paper proposes a theorem on the uncorrelated optimal discriminant vectors (UODVs). It is proved that the classical optimal discriminant vectors are equivalent to UODV, which can be used to extract (¸!1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion function. Experiments on Concordia University CENPARMI handwritten numeral database indicate that UODVs are much more powerful than the Foley}Sammon optimal discriminant vectors. It is believed that when the number of training samples is large, the conjugate orthogonal set of discriminant vectors can be much more powerful than the orthogonal set of discriminant vectors. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Pattern recognition; Discriminant analysis; Dimensionality reduction; Feature extraction; Linear transformation
1. Introduction It is well known that linear feature extraction is an e$cient way of reducing dimensionality. Till date many linear feature extraction methods have been proposed. The Fisher linear discriminant vector [1] is very useful as a technique for pattern analysis. The basic idea is to calculate the Fisher optimal discriminant vector on the condition that the Fisher criterion function takes an extremum, then to construct a 1D feature space by projecting the high-dimensional feature vector on the obtained optimal discriminant vector. In 1962, Wilks proposed (¸!1) vectors for L-class problems [2,3], which can be called as the classical optimal discriminant vectors (CODVs). Based on the Fisher linear discriminant method, Sammon proposed an optimal discriminat plane technology in 1970 [4]. In 1975, Foley and Sammon presented a set of optimal discriminant vectors for two-class problems
* Corresponding author. Tel.: #86-25-431-7235; fax: #8625-431-5510. E-mail addresses:
[email protected] (Z. Jin),
[email protected] (J.-Y. Yang).
[5], which are known as the Foley}Sammon optimal discriminant vectors (FSODVs). Kittler and Young [6] presented an approach to feature selection based on the Larhunen}Loeve expansion in 1973. In 1977, Kittler [7] discussed the relationship of the method of Kittler and Young and the FSODV method, and showed that the former method is based on conjugate orthogonality constraints, and is, from the point of view of dimensionality reduction, more powerful than the FSODV method, which is based on orthogonality constraints. Okada and Tomita [8] proposed an optimal orthonormal system for discriminant analysis in 1985. Duchene and Leclercq [9] solved the problem of "nding the set of FSODVs for multi-class problems. Hamamoto et al. proposed orthogonal discriminant analysis in a transformed space and presented a feature extraction method based on the modi"ed `plus e-take away fa algorithm [10,11]. These authors claimed that the orthogonal set of discriminant vectors is more powerful than CODV [8}11]. Longsta! combined the Fisher vector with the Fukunaga}Koontz transform or a radius vector [12]. Liu et al. presented a generalized optimal set of discriminant vectors [13]. Jin et al. [14] proposed an optimal discriminant plane, which was more powerful than Sammon's plane with Iris data. Jin et al. [15] presented
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 1 3 5 - 7
2042
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047
a set of uncorrelated optimal discriminant vectors (UODVs), which was shown to be more powerful than FSODV, and had been successfully used in face feature extraction [16]. There is a dimensionality problem about pattern feature extraction. It is believed that the accuracy of statistical pattern classi"ers increases as the number of features increases, and decreases as the number becomes too large [17]. Fukunaga [18] showed that for L-class problems, there are (¸!1) ideal features for classi"cation. In other words, for L-class problems, the optimal number of features is (¸!1). However, the ideal features are too hard to obtain in practice. Although CODV can be used to extract (¸!1) features for L-class problems, it is not widely accepted to be e!ective and e$cient. In this paper, we present a theorem on UODV and discuss the e!ectiveness of CODV. The remainder of this paper is organized as follows: Section 2 has an introduction to CODV, FSODV and UODV. Section 3 presents a theorem on UODV. In Section 4, some experiments have been performed with the Concordia University CENPARMI handwritten numeral database. A brief summary is given in Section 5.
2. CODV, FSODV and UODV Let, , , 2, be ¸ known pattern classes. Let * X be an N-dimensional sample. Suppose that m , C , P G G G (i"1, 2, 2, ¸) are the mean vector, the covariance matrix, and a priori probability of class , respectively. The between-class covariance matrix S , the within-class covariance matrix S and the population covariance matrix S are determined by the following formulas: * S " P [m !E(X)][m !E(X)]2, (1) G G G G * * S " P E[(X!m )(X!m )2 ]" P C , (2) G G G G G G G G S "E[X!E(X)][X!E(X)]2"S #S . (3)
The problem is to "nd a rectangular matrix that maximizes the following classi"ability criterion function [2,3]: 2S . J()" (6) 2S The columns of an optimal are the generalized eigenvectors that correspond to the largest eigenvalues in S " S . (7) G G G These generalized eigenvectors are called the G CODVs. Suppose the within-class covariance matrix S is non singular, and the between-class covariance matrix S is of rank (¸!1), the CODVs are the (¸!1) eigenvectors of matrix S\S . 2.2. Foley}Sammon optimal discriminant vectors (FSODVs) [5] The Fisher criterion function can be de"ned as follows: 2S , (8) F()" 2S where is an arbitrary vector in N-dimensional space. The vector corresponding to maximization F() is the Fisher optimal discriminant vector, i.e. Fisher vector. It represents the idea that the projected set of samples on the vector has the minimal within-class covariance and the maximal between-class covariance in the onedimensional subspace spanned by . Suppose that j vectors , , 2, ( j*1) are ob H tained. We can calculate the ( j#1)th vector which H> maximizes the Fisher criterion function F() with the following orthogonality constraints: 2 "0 (i"1, 2, 2, j). H> G These vectors are called the FSODVs. H
(9)
2.3. Uncorrelated optimal discriminant vectors (UODVs) [14}16]
The projection from an N-dimensional space to an (¸!1)-dimensional space is accomplished by (¸!1) discriminant functions:
Let " be the Fisher vector. Suppose that j vec tors , , 2, ( j*1) are obtained. We can calculate H the ( j#1)th vector which maximizes the Fisher H> criterion function F() with the following conjugate orthogonality constraints:
y "2X (i"1, 2, 2, ¸!1). (4) G G If y are viewed as components of a vector > and the G weight vectors are viewed as the columns of an G N-;-(¸!1) matrix , then the projection can be written as a single matrix equation:
2 S "0 (i"1, 2, 2, j). (10) H> G These vectors are called the UODVs since for any H iOj, 2X and 2X are uncorrelated. G H The jth UODV is the eigenvector corresponding to H the maximum eigenvalue of the following eigenequation:
2.1. Classical optimal discriminant vectors (CODVs) [2,3]
>"2X.
(5)
; S " S , H H H H
(11)
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047
where ; "I , , ; "I !S D2(D S S\S D2)\D S S\ ( j'1), H , R H H U H H U D "[ 2 ]2 ( j'1), H H\ I "diag1, 1, 2, 1. (12) , It had been shown that UODV are more powerful than FSODV by experiments on IRIS data [14] and ORL face database [15,16]. Obviously, formulas (11) and (12) are not computationally e$cient for computing UODV. It is needed to develop more e$cient algorithms to compute UODV. On the other hand, it is easy to show that for any iOj, 2X and 2X are uncorrelated. The G H relationship between UODV and CODV should be discussed.
2043
criterion to evaluate feature sets. Since the Bayes classi"er for L-class problems compares a posteriori probabilities, q (X), q (X), 2, q (X), and classi"es the unknown * sample X to the class whose a posteriori probability is the largest, these ¸ functions carry su$cient information to set up the Bayes classi"er. Furthermore, since * q (X)"1, only (¸!1) of these ¸ functions are G G linearly independent. Thus, Fukunaga [18] called q (X), q (X), 2, q (X) the ideal feature set for *\ classi"cation. In practice, the a posteriori probability density functions are hard to obtain. The Bayes error is too complex to extract features for classi"cation and has little practical utility. Fisher criterion functions (6) and (8) are much simpler. UODV (i.e. CODV) have much more practical utility.
4. Experiments and analysis 3. A theorem on UODV In this section, we present a theorem on UODV and give some discussions. Theorem 1. For L-class problems, suppose that the between-class covariance matrix S has rank (¸!1) and the within-class covariance matrix S is nonsingular. Let the (¸!1) nonzero eigenvalues of S\S be represented and ordered from the largest to the smallest as * *2* '0 *\ and suppose
(13)
O (iOj). (14) G H For r)¸!1, regardless of the direction of eigenvectors, the rth UODV is the rth eigenvector of S\S P P corresponding to the rth largest nonzero eigenvalue , i.e. P " (r"1, 2, 2, ¸!1). (15) P P For r'¸!1, the rth UODV has the Fisher criterion P value of zero, i.e. F( )"0 (r'¸!1). (16) P The proof of Theorem 1 is given in the appendix. According to Eq. (16), for r'¸!1, the rth UODV cannot supply any more discriminant information on P the meaning of the Fisher criterion function. Thus, the number of e!ective UODVs can be said to be (¸!1) for L-class problems. Therefore, UODV can be said to be equivalent to CODV based on Eq. (15) and the classi"ability criterion function (6) can be said to be equivalent to the Fisher criterion function (8) with the conjugate orthogonality constraints (10). It is always advantageous to know what are the best features for classi"cation. The Bayes error is an accepted
Experiments have been performed to compare UODV with FSODV on Concordia University CENPARMI handwritten numeral database. Four thousand samples are for training, and the other 2000 samples are for testing. Hu et al. [19] had done some preprocessing work and extracted four kinds of features as follows: X%: 256-dimensional Gabor transformation feature [20], X*: 121-dimensional Legendre moment feature [21], X.: 36-dimensional Pseudo-Zernike moment feature [22], X8: 30-dimensional Zernike moment feature [23]. It is generally accepted that X%, X*, X. and X8 are e!ective features of handwritten numerals [19}23]. The total data of the above four kinds of features for all 6000 samples has a size of about 20.2 MB. For each of X%, X*, X. and X8, 30 FSODV (i"1 2, 2, 30) and 30 UODV (i"1 2, 2, 30), can G G be computed with 4000 training samples. The maximum values of Fisher criterion function, i.e. the maximum eigenvalues of eigenequations can be also computed and are listed in Table 1. Let us take a look at the e!ectiveness of a set of optimal discriminant vectors , , 2, . If the P maximum value of the Fisher criterion function F( ) P> is large, we must add the (r#1)th optimal discriminant vector ( ) to the set , , 2, , otherwise much P> P discriminant information may be lost. If the maximum value of Fisher criterion function F( ) is zero, or very P> small, we need not add the (r#1)th optimal discriminant vector ( ) to the set , , 2, because little P> P discriminant information may be lost. From this point of view, the set of UODV , , 2, is superior. It therefore seems clear that UODV are more e!ective than FSODV in accounting for discriminant information.
2044
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047
Table 1 Maximum values of Fisher criterion function i
F( ) for UODV G
1 2 3 4 5 6 7 8 9 10 11 15 20 30
F( ) for FSODV G
X%
X*
X.
X8
X%
X*
X.
X8
3.90 2.64 1.86 1.63 1.38 0.96 0.69 0.54 0.50 0 0 0 0 0
4.84 2.61 2.18 1.62 1.02 0.96 0.69 0.43 0.42 0 0 0 0 0
3.10 1.61 0.95 0.73 0.47 0.32 0.26 0.13 0.08 0 0 0 0 0
2.76 1.34 0.95 0.68 0.37 0.31 0.24 0.12 0.06 0 0 0 0 0
3.90 3.81 3.69 3.57 3.38 3.18 2.94 2.76 2.60 2.52 2.41 2.00 1.65 1.25
4.84 4.67 4.41 4.29 4.05 3.87 3.69 3.46 3.30 3.11 2.92 2.49 2.08 1.63
3.10 2.47 1.72 1.59 1.21 1.03 0.90 0.74 0.61 0.52 0.48 0.22 0.07 0.01
2.76 2.36 1.96 1.60 1.30 1.16 0.99 0.86 0.77 0.65 0.57 0.34 0.12 0.01
Table 2 The classi"cation error rates in discriminant spaces No. of vectors
1 2 3 4 5 6 7 8 9 10 11 15 20 30
UODV Y-space
FSODV Z-space
X%
X*
X.
X8
X%
X*
X.
X8
0.674 0.489 0.340 0.277 0.262 0.207 0.191 0.190 0.183
0.616 0.360 0.285 0.180 0.160 0.130 0.110 0.108 0.106
0.697 0.572 0.442 0.364 0.337 0.306 0.285 0.299 0.296
0.652 0.539 0.435 0.378 0.353 0.324 0.311 0.305 0.299
0.674 0.668 0.664 0.656 0.654 0.648 0.644 0.634 0.481 0.485 0.470 0.448 0.290 0.260
0.616 0.614 0.616 0.613 0.615 0.608 0.608 0.609 0.609 0.602 0.590 0.421 0.324 0.268
0.697 0.626 0.533 0.484 0.457 0.440 0.438 0.427 0.418 0.431 0.446 0.407 0.416 0.430
0.652 0.621 0.618 0.583 0.500 0.480 0.463 0.441 0.446 0.453 0.464 0.446 0.436 0.449
For each of X%, X*, X. and X8, the linear transformations based on UODV and FSODV can be, respectively, performed with all 6000 samples as follows: >"[y
Z"[z
y
2 y ]2"[
2 ]2X,
z
2 z ]2"[
2 ]2X, (18)
(17)
so that there are three feature spaces, i.e. the original feature X-space, the UODV Y-space, and the FSODV Z-space. Classi"cation experiments have been performed in the UODV Y-space, and the FSODV Z-space, respectively.
The common minimum distance classi"er is used. The classi"cation error rates of 2000 test samples are computed and listed in Table 2 as the number of discriminant vectors varies from 1 to 30. In Table 2, the classi"cation error rates in discriminant spaces decrease as the number of optimal discriminant vectors increases. It decreases much faster in the UODV Y-spaces than in the FSODV Z-spaces. It is clear that the UODV classi"cation with nine vectors easily out-performs the FSODV classi"cation, even when 30 FSODV are used. Experiments have been performed in the original Xspaces. For each of X%, X*, X. and X8, the classi"cation error rates in the original X-spaces are computed and
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047
2045
Table 3 Classi"cation results in three spaces Kind of feature
X% X* X. X8
UODV Y-space
X-space
FSODV Z-space
Error rate
Dimension
Error rate
Dimension
Error rate
Dimension
0.269 0.479 0.429 0.449
256 121 36 30
0.183 0.106 0.296 0.299
9 9 9 9
0.260 0.268 0.430 0.449
30 30 30 30
listed in the "rst column of Table 3. The second and third columns are taken from Table 2. Table 3 shows that the error rates in the UODV Y-spaces are much lower than those in the original Xspace and the FSODV Z-spaces, although the number of discriminant vectors is only nine. This is a convincing demonstration of the e$ciency of using the UODV transformation for classi"cation.
5. Conclusions This paper proposes a theorem on UODV. CODVs are proved to be equivalent to UODV, which can be used to extract (¸!1) uncorrelated discriminant features for L-class problems without losing any discriminant information in the meaning of Fisher discriminant criterion. The classi"ability criterion function (6) can be said to be equivalent to the Fisher criterion function (8) with the conjugate orthogonality constraints (10). Experiments on Concordia University CENPARMI handwritten numeral database indicate that UODV (i.e. CODV) are superior to FSODV. Some authors claimed that the orthogonal set of discriminant vectors is more powerful than the conjugate orthogonal set of discriminant vectors [8}11]. A general rule for feature extraction is to extract features which are as uncorrelated as possible. In our opinion, when the number of training samples is large, the population covariance matrix S can be estimated accurately and the conjugate orthogonal set of discriminant vectors can be much more powerful than the orthgonal set of discriminant vectors.
Acknowledgements We wish to thank K. Liu and C. Y. Suen of Concordia University for their support with CENPARMI handwritten numeral database.
Appendix. Proof of Theorem 1 Step 1: Find an adequate basis for N-dimensional space Since is the ith eigenvector of S\S corresponding G to the ith largest nonzero eigenvalue , we have G S " S (i"1, 2, 2, ¸!1). (A.1) G G G Due to of the conditions O (iOj ), we have G H 2S "2S "0 ( jOi). (A.2) H G H U G Then, , , 2, can be proved to be linearly *\ independent from the following relation: 2S "2S #2S H R G H G H G *2S '0 ( j"i), G G " 0 ( jOi).
(A.3)
The dimensionality of subspace xS x"0 is (N!¸#1). Let , , 2, be a set of lin ,\*> early independent vectors in the subspace with the following constraints: 2S "0 (iOj). (A.4) H G Therefore, , , 2, , , , , is *\ 2 ,\*> a basis for N-dimensional space. It is obvious to have S "0 (i"1, 2, 2, N!¸#1). G Thus, we have
(A.5)
2S "2S / "0 H G H G G and
(A.6)
2S "2S #2S "0. H R G H @ G H U G Step 2: Prove " . can be represented as *\ ,\*> " # , G G H H G H where (i"1, 2, 2, ¸!1) and G N!¸#1) are coe$cients.
(A.7)
(A.8)
( j"1, 2, 2, H
2046
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047
According to Eqs. (A.5) and (A.6), the Fisher criterion function F( ) can be calculated and estimated from condition (13) as follows: *\ 2S G G G G G F( )" ) . *\ 2S # ,\*> 2S G G G G H H H H
(A.9)
Furthermore, there will be equality in Eq. (A.9) if and only if "1, "0 (i"2, 3, 2, ¸!1), G
"0 ( j"1, 2, 2, N!¸#1), H i.e.
(A.10)
*\ ,\*> 2 S " 2S # 2S " P> R I G G R I H H R I I G H (k"1, 2, 2, r). (A.12) From the conjugate orthogonality constraints (10), we obtain (A.13)
*\ ,\*> " # . (A.14) P> G G H H GP> H According to Eqs. (A.5) and (A.6), the Fisher criterion function F( ) can be calculated and estimated from P> condition (13) as follows: *\ 2S G G G G G ) . F( )" P> *\ 2S # ,\*> 2S G G G G H H H H (A.15) Furthermore, there will be equality in Eq. (A.15) if and only if "1, P> "0 (i"r#2, 3, 2, ¸!1), G
"0 ( j"1, 2, 2, N!¸#1), H i.e. " . P> P> Step 4: Prove that F( )"0 for r'¸!1. P
*\ ,\*> " # . (A.17) P G G H H G H According to conditions (A.3) and (A.7), we have *\ ,\*> 2S " 2S # 2S " P I G G I H H I I G H (k"1, 2, 2, ¸!1).
(A.18)
From the conjugate orthogonality constraints (10), we obtain
" . Step 3: Suppose that " (i"1, 2, 2, r(¸!1). G G Prove that " . P> P> can be represented as P> *\ ,\*> " # . (A.11) P> G G H H G H According to conditions (A.3) and (A.7) , we have
"0 (k"1, 2, 2, r), I i.e.
For r'¸!1, can be represented as P
(A.16)
"0 (k"1, 2, 2, ¸!1), I
(A.19)
i.e. ,\*> " . P H H H According to Eq. (A.5), we have 0 2S F( )" P P " "0. P 2S ,\*> 2S H H H H P P
(A.20)
(A.21)
References [1] R.A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugenics 7 (1936) 178}188. [2] S.S. Wilks, Mathematical Statistics, Wiley, New York, 1962, pp. 577}578. [3] R.O. Duda, P.E. Hart, Pattern Classi"cation and Scene Analysis, Wiley, New York, 1973. [4] J.W. Sammon Jr., An optimal transformation plane, IEEE Trans. Comput. 19 (9) (1970) 826}829. [5] D.H. Foley, J.W. Sammon Jr., An optimal set of discriminant vectors, IEEE Trans. Comput. 24 (3) (1975) 281}289. [6] J. Kittler, P.C. Young, A new approach to feature selection based on the Karhunen-Loeve expansion, Pattern Recognition 5 (1973) 335}352. [7] J. Kittler, On the discriminant vector method of feature selection, IEEE Trans. Comput. 26 (6) (1977) 604}606. [8] T. Okada, S. Tomita, An optimal orthonormal system for discriminant analysis, Pattern Recognition 18 (2) (1985) 139}144. [9] J. Duchene, S. Leclercq, An optimal transformation for discriminant and principal component analysis, IEEE Trans. Pattern Anal. Mach. Intell. 10 (6) (1988) 978}983. [10] Y. Hamamoto, T. Kanaoka, S. Tomita, Orthogonal discriminant analysis for interactive pattern analysis, Proceedings Tenth International Conference on Pattern Recognition, 1990, pp. 424}427. [11] Y. Hamamoto, Y. Matsuura, T. Kanaoka, S. Tomita, A note on the orthonormal discriminant vector method for feature extraction, Pattern Recognition 24 (7) (1991) 681}684.
Z. Jin et al. / Pattern Recognition 34 (2001) 2041}2047 [12] I.D. Longsta!, On extensions to Fisher's linear discriminant function, IEEE Trans. Pattern Anal. Mach. Intell. 9 (2) (1987) 321}324. [13] K. Liu, Y.Q. Cheng, J.Y. Yang, A generalized optimal set of discriminant vectors, Pattern Recognition 25 (7) (1992) 731}739. [14] Z. Jin, Z. Lou, J.Y. Yang, An optimal discriminant plane with uncorrelated features, Pattern Recognition Arti". Intell. 12 (3) (1999) 334}339 (in Chinese). [15] Z. Jin, J.Y. Yang, J.F. Lu, An optimal set of uncorrelated discriminant features, Chinese J. Comput. 22 (10) (1999) 1105}1108 (in Chinese). [16] Z. Jin, J.Y. Yang, Z.S. Hu, Z. Lou, Face recognition based on the uncorrelated discriminant transformation, Pattern Recognition, to appear. [17] G.F. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Trans. Inform. Theory 14 (1) (1968) 55}63.
2047
[18] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1990. [19] Z.S. Hu, Z. Lou, J.Y. Yang, K. Liu, C.Y. Suen, Handwritten digit recognition based on multi-classi"er combination, Chinese J. Comput. 22 (4) (1999) 369}374 (in Chinese). [20] H. Yoshihiko et al. Recognition of handwritten numerals using Gabor features, Proceedings of the Thirteenth ICPR, pp. 250}253. [21] S.X. Liao, M. Pawlak, On image analysis by moments, IEEE Trans. Pattern Anal. Machine Intell. 18 (3) (1996) 254}266. [22] R.R. Bailey, S. Mandyam, Orthogonal moment feature for use with parametric and non-parametric classi"ers, IEEE Trans. Pattern Anal. Mach. Intell. 18 (4) (1996) 389}398. [23] K. Alireza, H. Yawhua, Invariant image recognition by Zernike moments, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 489}497.
About the Author*ZHONG JIN was born in Jiangsu, China, on 4th December 1961. He received the B.S. degree in Mathematics, M.S. degree in Applied Mathematics and the Ph.D. degree in Pattern Recognition and Intelligence System from Nanjing University of Science and Technology (NUST), Nanjing, China in 1982, 1984 and 1999, respectively. He is now an Associate Professor in the Department of Computer Science at NUST. He is the author of over 10 scienti"c papers in pattern recognition, image processing, and arti"cial intelligence. His current interests are in the areas of pattern recognition, video image processing, face recognition, content-based image retrieval. About the Author*JING-YU YANG received the B.S. degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984, he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. He is currently Professor and Chairman in the Department of Computer Science at NUST. He is the author of over 100 scienti"c papers in computer vision, pattern recognition, robot, and arti"cial intelligence. His current interests are in the areas of pattern recognition, robot vision, image processing, and arti"cial intelligence. About the Author*ZHEN-MIN TANG received the B.S. degree and M.S. degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. He is now a Professor in the Department of Computer Science at NUST. He is the author of over 40 scienti"c papers in pattern recognition, image processing, and arti"cial intelligence. His current interests are in the areas of pattern recognition, image processing, arti"cial intelligence, and expert system. About the Author*ZHONG-SHAN HU was born in Jiangsu, China, in 1973. He received the B.S. degree in Applied Mathematics, and the Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), Nanjing, China in 1995 and 1999, respectively. He is now an assistant professor in the Department of Computer Science at NUST. He is the author of over 10 scienti"c papers in pattern recognition, image processing, and arti"cial intelligence. His current interests are in the areas of pattern recognition, image processing, handwritten numeral recognition, and face recognition.