Neurocomputing 73 (2010) 3089–3096
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Fusing multiple features for Fourier Mellin-based face recognition with single example image per person Yee Ming Chen n, Jen-Hong Chiang Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan, Taiwan, ROC
a r t i c l e in fo
abstract
Article history: Received 27 June 2009 Received in revised form 22 April 2010 Accepted 27 June 2010 Communicated by Y. Fu Available online 29 July 2010
At present there are many methods that could deal well with frontal view face recognition. However, most of them cannot work well when there is only single example image per person. In order to deal with this problem of single example image per person is stored in the system in the real-world application. In this paper, a comparative study of the AFMT, Fourier-AFMT and Taylor-AFMT are identified for face recognition. And then, we present hybrid Fourier-AFMT framework based face recognition for feature extraction. Firstly, both directionality of edges and intensity facial features are extracted and secondly fuse two kinds of features and classify with correlation coefficient method (CCM). Experiments are implemented on YALE and ORL face databases to demonstrate the efficient of proposed methods. The experimental results show that the average recognition accuracy rates of our proposed fuse multiple feature domains much higher than that of single feature domain. & 2010 Elsevier B.V. All rights reserved.
Keywords: Face recognition Single example image per person Gradient direction Wavelet transform Fourier transform Analytical Fourier-Mellin transform (AFMT) Taylor invariant
1. Introduction Face recognition has been an active research area of computer vision and pattern recognition for decades [1–3,14–16]. Many face recognition methods have been proposed to date and according to Brunelli and Poggio [1], these methods can be roughly classified into two categories, i.e., geometric feature-based algorithms and template-based ones. Geometric feature-based methods analyze explicit local features (such as eyes, mouth and nose) and their geometric relationships. Representative works include the elastic bunch graph matching [3], and Hidden Markov Model (HMM) [14], while in the template-based (or appearance-based) methods match faces using the holistic features of face images. The current state-of-the art of such methods is characterized by a family of subspace methods originated by ‘‘eigenface’’ [2]. Recently, neural networks [4–6], support vector machines [7], kernel methods [8] and ensemble techniques [7] also find great applications in this area. All these algorithms can work well under lots of training set. But some specific scenarios such as law enforcement and surveillance, smart cards, access control and so on, there may be only one image per person can be used for training the face recognition system. Under this condition, most of the traditional methods such as linear discriminant analysis (LDA) [8,10], discriminant eigenfeatures [11] and fisherface [9] can hardly be
n
Corresponding author. Tel.: +886 3 4638800x2515; fax: + 886 3 4638907. E-mail address:
[email protected] (Y.M. Chen).
0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.06.007
used in order to obtain good recognition performance; due to the absence of enough samples for a reliable covariation estimation. This problem, called the one image per person problem, is defined as follows: given a stored database of faces with only one image per person, the goal is to identify a person from the database later in time in any different and unpredictable poses, lighting, etc. from the individual image. Due to its challenge and significance for real-world applications, several researchers have recently made attempts to face the challenge [12]. The methods in literatures include synthesizing virtual samples [14], probabilistic matching [13], SVD perturbation [17] , parallel deformation [18] and so on. But all these above methods still endure some problems. For example, the procedure of all these algorithms is divided into two stages: (1) constructing a new image by combining with the original image; (2) performing the traditional methods on the newly combined images. This causes the high storage and computational cost.
2. Related works In this section, we review existing methods dealing with robust face recognition from a single image. The study of the face recognition is to represent face images with distinct feature vectors that are invariant to transformation [19,20]. The face images have similar geometrical features and hence discriminating one face from the other in the database is a challenging task. In [21], a new method for holistic face
3090
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
representation, called spectroface, was presented. Spectroface representation combines the wavelet transform and the Fourier transform. It proves that the spectroface representation is invariant to translation, scale and on-the plane rotation. In order to enhance the classification information of the single example image sample, each sample is combined with its reconstructed image gotten by perturbing the image’s singular values into a new image sample [22]. The Fourier spectrum is often used as feature for recognition. Many different types of invariant features and invariant representations have been studied [19,23]. There are two broad approaches that are frequency domain invariants and moment domain invariants. An approach is to represent the images with frequency domain invariants that are derived from the FourierMellin transform. The magnitude of the Fourier-Mellin transform is invariant with respect to rotation and scaling, but is incomplete. Although several sets of rotation and scaling invariant descriptors have been designed under the Fourier-Mellin transform framework [24], the completeness property could not usually be satisfied. This is because the phase spectrum was always ignored. In order to overcome this problem, Ghorbel in Ref. [25] proposed a complete set of rotation and scaling invariants under the analytical Fourier-Mellin transform (AFMT) based on the complete complex spectra. In this paper, a comparative study of the AFMT, Fourier-AFMT and Taylor-AFMT are identified for face recognition. And then, we attempt to address the above problems within a hybrid FourierAFMT framework based on the Fourier-Mellin transform modulus. This hybrid similarity transforms is the combination of a Fourier translation invariant and an AFMT invariant. To improve the face recognition performance, firstly, we extract gradient direction features and intensity values from image and secondly fuse two kinds of features and classify with correlation coefficient method (CCM). We use higher-order statistics such as gradient in the feature vector is expected to give better recognition performance. For the classification of high-dimensional data, the complexity of similarity invariants is managed by dimensionality reduction by applying the 2D wavelet transform. Using image intensity as the raw feature, the hybrid Fourier-AFMT framework has yielded promising recognition performance. To further improve the recognition performance, we exploit the directionality of image gradient, which is a quite intuitive and stable shape descriptor. We measure the gradient directions numerically and store in a feature vector for classification. We have also tried three options of gradient features: gradient vector, directional decomposition and the fusion of directional decomposition and image intensity. The directional decomposition of gradient vector significantly improves the recognition performance, while the direct use of
gradient vector does not show advantage over the image intensity. The best result is produced by fusing the directional decomposition and the intensity into a feature vector. The rest of this paper is organized as follows. In Section 3, we introduce the pattern recognition system which includes the gradient feature extraction, hybrid similarity invariants which are compared and choose the adequate one under the Fourier-Mellin scheme and a correlation coefficient method for the classifier to recognition. In Section 4, we report our experiments. In Section 5, we compare the performance of recognition rate with other algorithms. Finally in Section 6, we draw the conclusion and point out some directions for future research.
3. The hybrid Fourier-AFMT framework The proposed hybrid Fourier-AFMT framework includes three stages, i.e. the preprocessing stage, the extracted feature stage and the classification stage. The framework of features extraction is shown in Fig. 1. Given an input face image, the parallel feature extractions are implemented with Fourier-AFMT. Then fuse the intensity and gradient features with the sum rule. Finally classify a novel face image with CCM. 3.1. The preprocessing stage The first stage is important to recognition system but often catches less attention. At present, there are a few standard images preprocessing techniques such as histogram equalization and histogram matching, etc. Here, in this paper, we use two techniques for this stage. 3.1.1. Normalization The major role of the preprocessing stage is the intensity normalization. The image intensities are locally normalized by smoothing the image with a Gaussian filter (3 by 3 with sigma 1) for noise reduction and then globally normalized to have zero mean and unit standard deviation for intensity difference reduction. For 2D image f(x, y), the global normalization is given by fs ðx,yÞ ¼
f ðx,yÞm
s
ss þms
ð1Þ
where m and s are the intensity mean and deviation of image, ms and ss are expected mean and deviation, here are zero and unit, respectively.
Fig. 1. Block diagram of the proposed hybrid Fourier-AFMT framework.
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
3.1.2. Gradient field Most classification-based methods have used the intensity values of images as the input features of classifier. Edge detection has been frequently applied in feature-based face detection for facial feature extraction. So, we try to measure the directionality of edges in feature space for recognition. The directionality of edges is a visually prominent feature and is highly stable for characterizing shapes. Specifically, in frontal or nearly frontal face images, despite the variation of face identity and expressions, the facial contour is approximately an oval shape, and the eyes and mouth are approximately horizontal lines. Therefore, input features composed of both directionality of edges and intensity values should be more informative and robust to illumination and facial expression changes. In our approach, we extract direction features via the directional decomposition of gradient field. The direct use of gradient field as the input of a classifier does not yield promising recognition performance because the directionality is not measured explicitly. The use of directional decomposition of gradient field in face recognition is inspired by a previous work of on-line character recognition which reported promising results using directional gradient features [26]. The gradient direction features are obtained in three steps: gradient computation, directional decomposition and feature reduction. The gradient vector g(x, y)¼[gx, gy]T is computed at each pixel location using the Sobel operator. The two masks of Sobel operator for computing horizontal and vertical gradient components, respectively, are shown in Fig. 2. Accordingly, the two components are computed by gx ðx,yÞ ¼ f ðx þ 1,y1Þ þ 2f ðx þ 1,yÞ þf ðx þ1,y þ1Þf ðx1,y1Þ 2f ðx1,yÞf ðx1,y þ 1Þ;
gy ðx,yÞ ¼ f ðx1,y þ 1Þ þ 2f ðx,y þ1Þ þ f ðx þ1,y þ 1Þf ðx1,y1Þ 2f ðx,y1Þf ðx þ 1,y1Þ:
ð3Þ
presents the gradient vector.
Feature vector Decom Gx: x-axis directional decomposition of gradient field.
Feature vector Decom Gy: y-axis directional decomposition of gradient field.
Feature vector Decom Gx–Gy: fusion feature vector of Decom
0
1
1
2
1
-2
0
2
0
0
0
-1
0
1
-1 -2
-1
Fig. 2. Sobel masks for gradient computation.
Feature vector Fusion: fusion feature vector of intensity and Decom Gx Gy. 3.2. The extracted feature stage
Feature vector intensity: intensity values of image. Feature vector gradient vector: masked Sobel gradient field
-1
Fig. 3. The intensity image (upper-left), gradient vector field (upper-right) and two gradient components (lower).
ð2Þ
The gradient vector g(x, y) is stored in a gradient map containing two components gx(x, y) and gy(x, y), which are corresponding to two sub-images. The gradient vector can also be equivalently represented by the magnitude (vector length) and the direction angle in 2-D space. Fig. 3 shows an example of gradient feature extraction. The intensity image and the gradient vector field are shown in the upper. The two component sub-images of gradient (gx and gy) are shown in the lower. In addition to the directional decomposition of gradient, we test the face detection performance of variable feature vectors in our experiments. The feature vectors are listed in the following.
gx and Decom gy.
3091
In some cases, a scene description needs to be invariant under certain geometrical transformations of the Euclidean space. The invariant descriptor is usually achieved with the Fourier descriptors and Fourier-Mellin descriptors which have been the subject of numerous researches. A useful general purpose pattern invariant description method should make accurate and reliable recognition of an object possible. Therefore, such a description should necessarily satisfy completeness and stability property. This property implies the existence of a natural Euclidean distance between shapes. This is then followed by our proposed hybrid complete invariant under the AFMT transform frameworks, respectively. For clarity, the basic notation used can be described as follows. We denote the Cartesian spatial domain by (x, y) and the Cartesian frequency domain by (u, v); we denote the log-polar coordinates by (r, j) and the polar coordinates by (r, y). When an image (or a spectra) is converted to the log-polar or polar coordinates, the intensity (or spectral) function name is preserved while the variable is replaced with (r, j) or (r, y), respectively. 3.2.1. Fourier transform Let the image be a real valued continuous function f(x, y) defined on an integer-valued Cartesian grid 0ox oM; 0 oyoN. The discrete Fourier transform (DFT) is defined as follows: Fðu,vÞ ¼
1 N 1 X X 1 M f ðx,yÞei2pðux=M þ vy=NÞ MN x ¼ 0 y ¼ 0
ð4Þ
The inverse transform is f ðx,yÞ ¼
M 1 N 1 X X
Fðu,vÞei2pðux=M þ vy=NÞ
ð5Þ
u¼0v¼0
The DFT of a real image is generally complex valued. This leads to a magnitude and phase representation for the image.
3092
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
3.2.2. Taylor invariant With this invariant descriptor, the basic idea is to eliminate the linear part of the phase spectrum by subtracting the linear phase from the phase spectrum. Let F(u, v) be the Fourier transform of an image f(x, y), and f(u, v) be its phase spectrum, i.e. F(u, v)¼9F(u, v)9exp(if(u, v)). The following complex function is called the Taylor invariant: FT ðu,vÞ ¼ eiðau þ bvÞ Fðu,vÞ
ð6Þ
where, a and b are, respectively, the derivatives with respect to u and v of f(u, v) at the origin (0, 0), i.e. a ¼ fu ð0,0Þ, b ¼ fv ð0,0Þ. 3.2.3. Analytical Fourier-Mellin transform (AFMT) It is well known that the direct similarity group on the plane is equivalent to the space of polar coordinates:
P ¼ fðr, yÞ9r 4 0 and 0 o y o 2pg The Fourier-Mellin transform on P can be defined as f^ ðu,vÞ ¼ Mf ðu,vÞ Z þ 1 Z 2p dr f ðr, yÞeiuy r iv dy, ¼ r 0 0
for
uAZ
and
vAR ð7Þ
It is the Fourier-Mellin transform of the irradiance distribution f(r, y) in a two-dimensional image expressed in polar coordinates can be taken in the image center of gravity in order to obtain invariance under translations. The integral (7) diverges in general, since the convergence is indeed under the assumption that f(r, y) is equivalent to Kr a (a 40 and K a constant) in a neighborhood of the origin (the center of gravity of the observed image). For this reason, the analytical Fourier-Mellin transform (AFMT) was defined by Z Z 2p dr Mf ðu,s ¼ s þ ivÞ ¼ f ðr, yÞeiuy r s þ iv dy ð8Þ r Rþ 0 For u A Z, v A R and s 4 0. f is assumed to be square summable under the measure dy dr/r. The AFMT of an object f can be seen as the usual FMT of the distorted object fs(r,y)¼rsf(r,y). The AFMT gives a complete description of gray-level objects since f can be retrieved by its inverse transform given by Z X f ðr, yÞ ¼ Mf ðu, s þ ivÞeiuy r s þ iv dv R vAZ
With a variable change on the integral (q ¼ln(r) instead of r), the Eq. (8) can be rewritten into Fourier transforms as follows: Z þ 1 Z 2p 1 Mfs ðu,vÞ ¼ eqs f ðeq , yÞeiðuy þ qvÞ dy dq ð9Þ 2p 1 0 A fast algorithm is obtained by computing a two dimensional fast Fourier transform on the log-polar distorted object eqs f ðeq , yÞ. The log-polar sampling is built from the points obtained by the intersection between N beans originating from the image centroid and M concentric circles with exponentially increasing radii. In this paper, we have chosen N ¼128 and M ¼128 for YALE Database, N ¼92 and M¼92 for ORL Database, and s ¼0.5 (Fig. 4). Let us recall the transformation law of the analytical FourierMellin transform for planar similarities. Let g be the orientation and size change of an object f by the angle b A ½0; 2p and the scale factor Rþ , i.e. gðr, yÞ ¼ f ðar, y þ bÞ. These two objects have the same shape and denoted similar objects. One can easily shows that the AFMT of g and f are related by Mgs ðk,vÞ ¼ as þ iv eikb Mfs ðk,vÞ For all k in Z, v in R and s 40.
ð10Þ
Fig. 4. The original image (left), the log-polar image of origin (middle) and the magnitude image of AFMT (right).
Eq. (10) is called the shift theorem and suggests that the AFMT is well suited for the computation of global shape features which are invariant to the object position, orientation and size. Since the usual Fourier-Mellin transforms of two similar objects only differ by a phase factor (Eq. (10) without the as term), a set of global invariant descriptors regardless of the object position, orientation and size, is generally extracted by computing the modulus of some Fourier-Mellin coefficients. A set like this is not complete since the phase information is lost and it only represents a signature of the shape. Due to the lack of completeness, one can find distinct objects with identical descriptor values and a classification process may mix up objects, which is critical for content-based retrieval from image database (both false positive and true negative matches). A complete family of similarity invariant descriptors based on the AFMT has been suggested [27]. This family can be easily written and applied to any strictly positive s value as follows: Ifs ðk,vÞ ¼ Mfs ð0,0Þs þ iv eikArgðMfs ð1,0ÞÞ Mfs ðk,vÞ
ð11Þ
For all k in Z , v in R. The set in Eq. (11) is complete since it is possible (i) to recover the FMT of an object from all of their invariant descriptors and the two normalization parameters by inverting Eq. (11); (ii) to reconstruct the original image by the inverse AFMT. 3.2.4. Proposed hybrid similarity transform When considering the translation, rotation and scaling together, we combine the translational invariant with the rotation and scaling invariant to construct a hybrid complete invariant under the Fourier-Mellin transform scheme. Indeed the translation property of the Fourier transform is the basis of these above invariants, which is also satisfied in the complex domain. Through the combination of Fourier invariant and AFMT invariant, we can construct a hybrid complete similarity invariant as followings: SðUÞ ¼ AFMTðFðUÞÞ
ð12Þ
Note that because AFMT( ) is directly applied to a magnitude spectrum but not applied to a complex spectrum or a phase spectrum separately to generate two invariant descriptors. But then, due to the reciprocal scaling property of the Fourier transform, when the property of Eq. (10) is applied in the polar domain of Fourier spectra, it needs to be modified as follows: Mgs ðk,vÞ ¼ as2ik eivb Mfs ðk,vÞ
ð13Þ
The AFMT invariant of Eq. (11) is also modified as follows: ðs þ 2 þ ikÞ=ðs2Þ expðivargðMfs ð0,1ÞÞÞMfs ðk,vÞ Ifs ðk,vÞ ¼ Mfs ð0,0Þ ð14Þ In the same manner as Eq. (12), we can construct a hybrid complete invariant under the AFMT scheme as follows: SðUÞ ¼ AFMTM ðFðUÞÞ
ð15Þ
3.2.5. Extract facial expression insensitive using wavelet transform Wavelet transform is able to decompose image into a lower dimensional multi-resolution representation, which grants a
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
compact hierarchical framework for interpreting the image information. The wavelet decomposition of a signal f(x) can be obtained by convolution of signal with a family of real orthonormal basis, ca,b ðxÞ; Z xb ð1=2Þ dx f ðxÞ A L2 ðRÞ f ðxÞc ð16Þ ðWc f ðxÞÞða,bÞ ¼ 9a9 a R where a,b A R, a a 0 are the dilation parameter and the translation parameter, respectively. The basis function ca,b ðxÞ is obtained through translation and dilation of a kernel function c(x) known as mother wavelet as defined below
ca,b ðxÞ ¼ 2a=2 cð2a xbÞ
ð17Þ
The mother wavelet c(x) can be constructed from a scaling function, c(x). The scaling function c(x) satisfied the following two-scale difference equation: pffiffiffiX fðxÞ ¼ 2 hðnÞfð2xnÞ ð18Þ n
where h(n) is the impulse response of a discrete filter which has to meet several conditions for the set of basis wavelet functions to be orthonormal and unique. The scaling function c(x) is related to the mother wavelet c(x) via pffiffiffiX cðxÞ ¼ 2 gðnÞfð2xnÞ ð19Þ n
The coefficient of the filter g(n) are conveniently extracted from filter h(n) from the following relation: gðnÞ ¼ ð1Þn hð1nÞ
ð20Þ
For 2D signal such as image, there exists an algorithm similar to the one-dimensional case for two dimensional wavelets and scaling functions obtained from one-dimensional ones by tensorial product. This kind of two-dimensional wavelet transform leads to a decomposition of approximation coefficients at level j 1 in four components: the approximations at level j, Lj and the details in three orientations (horizontal, vertical and diagonal), Djvertical , Djhorizontal and Djdiagonal : Lj ðm,nÞ ¼ ½Hx ½Hy Lj1 k2,1 k2,1 ðm,nÞ
ð21Þ
Dj
ð22Þ
vertical ðm,nÞ ¼ ½Hx ½Gy Lj1 k2,1 k1,2 ðm,nÞ
The original image is decomposed into four subband images. Similarly, we can obtain two levels of the wavelet decomposition by applying wavelet transform on the low-frequency band sequentially. According to wavelet theory, the low-frequency band is the smoothed version of original image with lowerdimensional space. It also contains the highest-energy content within the four subbands. The low-frequency band features are insensitive to the facial expressions and small occlusion. Hence, If applying n level wavelet decomposition to the face image, the recognition performance and space dimension were affected. In this paper, we let n ¼1 applying in YALE and in ORL. 3.3. Classification stage The feature vector computed from an image (measurement) has to be compared and matched with a class of feature vectors stored a priori (reference), to establish the correspondence between the given image (of an unknown pattern) and a standard image (of a known pattern). We denote a feature vector corresponding to an image k by n o ðkÞ ðkÞ ðkÞ V ðkÞ ¼ vðkÞ 1 ,v2 ,v3 , ,vn is typically an invariant function of the where each component vðkÞ i image. The set of all V ðkÞ, s constitute the reference library of feature vectors. The images for which the reference vectors are
3093
computed and stored as above are either a set of patterns (used for pattern recognition). The problem considered here is to match a feature vector Vu ¼ fvu1 ,vu2 ,vu3 , ,vun g of the image of an unknown pattern, with the vectors in the reference library to identify the pattern. The most common classifiers used popularly are nearest neighbor classifier, weighted Euclidean distance measure, correlation coefficient method, logarithmic magnitude distance measure and statistical measure using minimum and maximum. In this paper, we use the correlation coefficient method (CCM) that has been compared to others in our experiment getting the best performance. 3.3.1. Correlation coefficient method The cross-correlation between the Vu,V ðkÞ is defined as Pn ðkÞ i ¼ 1 vi vui rðVu,V ðkÞ Þ ¼ 1=2 1=2 Pn Pn ðkÞ i ¼ 1 ðvi Þ2 i ¼ 1 ðvui Þ2
ð23Þ
The value of k which makes r closet to 1 is chosen as the matched image index. 3.3.2. Combining classifiers The proposed Fourier-AFMT algorithm and CCM classified the test images independently. They encode different type of information (intensity vs. gradient field). Note that a min–max score normalization step is generally necessary before combining score originating from different matchers. To evaluate the correlation between the different types of information, the softoutput of the classifiers were fused by a sum rule before labeling the images, i.e. the final assignment was determined by the classifier that gave the highest normalized output [28].
4. Experiments and results Two standard databases from Yale University and Olivetti research laboratory are selected to evaluate the recognition accuracy of the proposed system. These databases include face images with different expressions, small occlusion, different illumination condition and different poses. In the Yale database, there are 15 persons and each person consists of 11 different facial views that represent various expressions, illumination conditions and small occlusion (by glasses). Hence, there are 165 face images in the database. The resolution of all images is 128 128. Fig. 5 shows an example of an image set of one person. The ORL face database consists of 400 images collected from 40 people. Most of the samples had 20–35 years. The face images were 92 112 pixels with 8-bit gray levels. They include variations in facial expression; luminance, scale and viewing angle were shot at different time. Limited side movement and tilt of the head were tolerated. Some samples are captured with and without glasses. These characteristics introduce difficulties to correct recognition and make the database particularly interesting. The final resolution of all images which were manually cropped and rescaled is 92 92. Fig. 6 shows an example of an image set of one sample. 4.1. Comparison of the AFMT, Fourier-AFMT and Taylor-AFMT This section illustrates our proposed hybrid complete invariant descriptors (Fig. 7) with employing wavelet approach to these invariant descriptions, and tests the resulting compact representation on face recognition (Table 1).
3094
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
Fig. 5. Various view images of one person in Yale database.
Fig. 6. Images from one sample of the ORL database.
Fig. 7. The computational procedures of the hybrid invariants of proposed approach.
Table 1 Classification accuracy for different approaches and databases. Classifier\methods
CCM
AFMT
Table 2 Deterministic training and test set on ORL face database.
Fourier- AFMT
Taylor-AFMT
Manner
Training set
Test set
ORL_A ORL_B ORL_C ORL_D ORL_E ORL_F
1# 2# 3# 4# 5# 6#
2#, 1#, 1#, 1#, 1#, 1#,
Yale
Olivetti
Yale
Olivetti
Yale
Olivetti
0.70
0.51
0.79
0.62
0.79
0.61
The numerical procedures of the hybrid invariant computation are described in following figure. Our intention is to address the Fourier-AFMT surpassing the other methods.
4.2. Hybrid Fourier-AFMT framework and experimental results We implemented experiments on ORL and YALE with two manners. For the deterministic manner, the training set and the testing set are constructed as shown in Tables 2 and 3. Our goal is to have a good look at the performance of specific partition of the
3#, 3#, 2#, 2#, 2#, 2#,
4#, 4#, 4#, 3#, 3#, 3#,
5#, 5#, 5#, 5#, 4#, 4#,
6#, 6#, 6#, 6#, 6#, 5#,
7#, 7#, 7#, 7#, 7#, 7#,
8#, 8#, 8#, 8#, 8#, 8#,
9#, 9#, 9#, 9#, 9#, 9#,
10# 10# 10# 10# 10# 10#
Notes: 1# denotes the first image of each person, and other images are marked with the same ways.
database, thus we can see how the influence of recognition rate under the different pose, illumination and expression (PIE). For the random manner, from ORL face database, we randomly select one image from each subject, and the rest images are used to test the performance. Only one image of each person randomly selected from YALE database are used to construct the training
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
set, and the rest images of each person are use to test the performance of the algorithms. It is worthy to emphasize to run experiments for 10 times, and the average recognition accuracy rate is used to evaluate the classification performance. The experiment present a schematic description of the feature vectors presented in Section 3.1.2, i.e. (a) intensity-only, (b) gradient vector, (c) Decom Gx-only, (d) Decom Gy-only, (e) fuse Decom Gx and Decom Gy through sum rule (denoted by Decom Gx–Gy), (f) fuse intensity and gradient vector through sum rule (denoted by intensity+ gradient vector) and (g) fuse intensity and Decom Gx Gy through sum rule(denoted by intensity+ Decom Gx Gy). Comparing these results with those from Tables 4–6, we found that: (1) the proposed fusion methods(i.e. intensity + gradient vector or intensity+ Decom Gx Gy) give a highest recognition accuracy rate compared with other methods with intensity or gradient field, though ORL database have PIE problem of face recognition. The proposed method improve the whole recognition accuracy rate which show the complementariness between intensity and gradient field. In other words, the more the features are adopted, the better the performance attained is. (2) The recognition rate of proposed fusion methods (both of intensity+ gradient vector and intensity+ Decom Gx Gy) are very close to each other, which indicate that the fusion method contribute large to the final classification results. The recognition performances are poor on YALE_d database due to illumination effect and ORL_D database due to PIE, but the performance outperforms in the same instances appears in the proposed fusion methods from the last two rows on Tables 5 and 6.
Table 5 Recognition performance on YALE database in deterministic manner. YALE_a YALE_b YALE_c YALE_d YALE_e YALE_f Intensity Gradient vector Decom Gx Decom Gy Decom Gx Gy Intensity + gradient vector Intensity + Decom Gx Gy
We also compared other popular methods [29] such as PCA, 2DPCA, (PC)2A, E(PC)2A, 2D(PC)2A and SVD perturbation for face recognition with single example image per person. As shown in Tables 7–9, the proposed hybrid Fourier-AFMT with intensity similarity invariant gives a highest recognition accuracy rate compared with other popular methods. So we say that the hybrid Fourier-AFMT transform with intensity combined gradient is an efficient and practical approach for face recognition.
6. Conclusions
0.52 0.57 0.62 0.55 0.67 0.64
0.70 0.76 0.75 0.76 0.82 0.80
0.74 0.81 0.81 0.76 0.79 0.85
0.19 0.44 0.31 0.42 0.43 0.42
0.69 0.78 0.80 0.69 0.77 0.79
0.75 0.76 0.81 0.77 0.83 0.79
0.69
0.83
0.80
0.51
0.79
0.78
Table 6 Recognition performance on ORL database in deterministic manner.
Intensity Gradient vector Decom Gx Decom Gy Decom Gx Gy Intensity + gradient vector Intensity + Decom Gx Gy
ORL_A
ORL_B
ORL_C
ORL_D
ORL_E
ORL_F
0.76 0.63 0.54 0.61 0.66 0.78 0.80
0.73 0.60 0.50 0.58 0.60 0.77 0.72
0.75 0.60 0.50 0.60 0.63 0.77 0.78
0.74 0.57 0.47 0.57 0.58 0.71 0.71
0.69 0.62 0.50 0.59 0.62 0.75 0.74
0.69 0.59 0.46 0.56 0.59 0.72 0.67
Table 7 Recognition performance of different algorithms on database in random manner. Database PCA 2DPCA (PC)2A E(PC)2A SVD 2D(PC)2A Hybrid FourierAFMT ORL YALE
5. Performance comparison
3095
0.54 0.54 0.54 0.56
0.56 0.55
0.57 0.56
0.55 0.60 0.54 0.61
0.72 0.65
Table 8 Recognition performance of different algorithms on ORL database in deterministic manner. Algorithm
ORL_A
ORL_B
ORL_C
ORL_D
ORL_E
PCA 2DPCA (PC)2A E(PC)2A SVD 2D(PC)2A Hybrid Fourier-AFMT
0.55 0.54 0.57 0.58 0.56 0.61 0.76
0.55 0.55 0.58 0.59 0.59 0.62 0.73
0.55 0.56 0.57 0.58 0.59 0.61 0.75
0.57 0.56 0.59 0.60 0.59 0.63 0.74
0.57 0.57 0.61 0.62 0.60 0.64 0.69
This paper proposed a hybrid Fourier-AFMT framework based on fuse multiple feature domains. Table 9 Recognition performance of different algorithms on YALE database in deterministic manner.
Table 3 Deterministic training and test set on YALE face database. Manner
Training set
Test set
YALE_a YALE_b YALE_c YALE_d YALE_e YALE_f
1# 2# 3# 4# 5# 6#
2#, 1#, 1#, 1#, 1#, 1#,
3#, 3#, 2#, 2#, 2#, 2#,
4#, 4#, 4#, 3#, 3#, 3#,
5#, 5#, 5#, 5#, 4#, 4#,
6#, 6#, 6#, 6#, 6#, 5#,
7#, 7#, 7#, 7#, 7#, 7#,
8#, 8#, 8#, 8#, 8#, 8#,
9#, 9#, 9#, 9#, 9#, 9#,
10#, 10#, 10#, 10#, 10#, 10#,
11# 11# 11# 11# 11# 11#
Algorithm
YALE_a
YALE_b
YALE_c
YALE_d
YALE_e
PCA 2DPCA (PC)2A E(PC)2A SVD 2D(PC)2A Hybrid Fourier-AFMT
0.55 0.56 0.55 0.57 0.55 0.62 0.52
0.56 0.57 0.57 0.58 0.56 0.63 0.70
0.54 0.55 0.54 0.56 0.55 0.61 0.74
0.57 0.58 0.57 0.59 0.58 0.64 0.19
0.56 0.59 0.58 0.58 0.58 0.63 0.69
Table 4 Recognition performance on database in random manner. Database
Intensity
Gradient vector
Decom Gx
Decom Gy
Decom Gx–Gy
Intensity + gradient vector
Intensity + Decom Gx–Gy
YALE ORL
0.65 0.72
0.72 0.59
0.72 0.48
0.68 0.56
0.75 0.60
0.75 0.75
0.76 0.72
3096
Y.M. Chen, J.-H. Chiang / Neurocomputing 73 (2010) 3089–3096
The experimental results show that the average recognition accuracy rate of our proposed fuse multiple feature domains much higher than that of single feature domain. Experiments on two face databases are also implemented to test the performance of the proposed hybrid Fourier-AFMT compared with other popular methods. The proposed method is promising method for solving the so-called one image per person problem. We believe that it works due to the input features composed of both directionality of edges and intensity should be more informative and robust to illumination and facial expression changes. Performance improvement can be achieved by add pose invariant, this is possible if the pose transform is obtained. Then, an important issue is how to integrate the invariant by the fuse methods. Obviously, our proposed framework indeed improves the classification accuracy. The future improvement of this research will continue on developing other possibility to handle (1) large rotation in depth and (2) non-uniform illumination conditions.
Acknowledgements This research work was sponsored by the National Science Council, R.O.C., under project number NSC99-2622-E-155-055-CC3. References [1] R. Brunelli, T. Poggio, Face recognition: features versus templates, IEEE Transaction PAMI 15 (10) (1993) 1042–1062. [2] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 1991, pp. 586–591. [3] L. Wiskott, J.-M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Transaction on Pattern Analysis and Machine Intelligence 19 (7) (1997) 775–779. [4] B. Raytchev, H. Murase, Unsupervised face recognition by associative chaining, Pattern Recognition 36 (2003) 245–257. [5] J. Zhang, Y. Yan, M. Lades, Face recognition: eigenfaces, elastic matching, and neural nets, Proceedings of IEEE 85 (1997) 1422–1435. [6] D. Valentin, H. Abdi, A.J. O’toole, G.W. Cottrell, Connectionist models of face processing: a survey, Pattern Recognition 27 (9) (1994) 1209–1230. [7] S. Pang, D. Kim, S.Y. Bang, Membership authentication in the dynamic group by face classification using SVM ensemble, Pattern Recognition Letters 24 (2003) 215–225. [8] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face Recognition using kernel direct discriminant analysis algorithms, IEEE Transaction on Neural Networks 14 (1) (2003) 117–126. [9] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Transaction on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [10] K. Etemad, R. Chellappa, Discriminant analysis for recognition of human face images, Journal of the Optical Society of America A: Optics Image Science and Vision 14 (8) (1997) 1724–1733. [11] D.L. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Transaction on Pattern Analysis and Machine Intelligence 18 (8) (1996) 831–836. [12] J. Wu, Z.-H. Zhou., Face Recognition with one training image per person, Pattern Recognition Letters 23 (14) (2002) 1711–1719. [13] A.M. Martinez, Recognition imprecisely localized, partially occluded, and expression variant faces from a single sample per class, IEEE Transaction on Pattern Analysis and Machine Intelligence 25 (6) (2002) 748–763. [14] W. Zhao, R. Chellappa, A. Rosenfeld, P.J. Phillips, Face recognition: a literature survey. /http://citeseer.nj.nec.com/374297.htmlS, 2000.
[15] G. Sukthankar, Face recognition: a critical look at biologically-inspired approaches. Technical Report: CMU-RI-TR-00-04, Carnegie Mellon University, Pittsburgh, PA, 2000. [16] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces: a survey, Proceedings of the IEEE 83 (5) (1995) 705–740. [17] Daoqiang Zhang, Songcan Chen, Zhi-Hua Zhou, A new face recognition method based on SVD perturbation for single example image per person. [18] D. Beymer, T. Poggio, ‘‘Face recognition from one example’’ View, Science 272 (5250) (1996). [19] Robert D. Brandt, Feng Lin, Representations that uniquely characterize images modulo translation, rotation and scaling, Pattern Recognition Letters 17 (1996) 1001–1015. [20] J.H. Lai, P.C. Yuen, G.C. Feng, Spectroface: a Fourier-based approach for human face recognition, in: Proceeding of the Second International Conference on Multimodal Interface, Hong Kong, 1999, pp. VI 115–120. [21] Jian Huang Lai, Pong C. Yuen, Guo Can Feng., Face recognition using holistic Fourier invariant features, Pattern Recognition 34 (2001) 95–109. [22] H.E. Jiazhong, D.U. Minghui, Face recognition based on image enhancement and Fourier spectrum for one training per person, Science Technology and Engineering 6 (8) (2006) 1671–1815 08-0984-03 (in Chinese). [23] F. Lin, R.D. Brandt, Towards absolute invariants of images with respect to translation, rotation and scaling, Pattern Recognition Letters 14 (5) (1993) 369–379. [24] R. Milanese, M. Cherbuliez, A rotation, translation, and scale invariant approach to content-based image retrieval, Journal of Visual Communication Image Representation 10 (1999) 186–196. [25] F. Ghorbel, A complete invariant description for gray-level images by the harmonic analysis approach, Pattern Recognition Letters 15 (1994) 1043–1051. [26] A. Kawamura et al., On-line recognition of freely handwritten Japanese characters using directional feature densities, in: Proceedings of the 11th International Conference on Pattern Recognition, Hague, Netherlands, 1992, vol. II, pp. 183–186. [27] S. Derrode, F. Ghorbel, Robust and efficient Fourier-Mellin transform applications for gray-level image reconstruction and complete invariant description, Computer Vision Image Understanding 83 (2001) 57–78. [28] J. Kittler, M. Hatef, P.W. Duin, J. Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 226–239. [29] Junbao Li, Jeng-Shyang Pan, A novel pose and illumination robust face recognition with a single training image per person algorithm, Chinese Optics Letters 6 (4) (April 10, 2008) 255–257.
Yee Ming Chen is a professor in the Department of Industrial Engineering and Management at Yuan Ze University, where he carries out basic and applied research in agent-based computing. His current research interests include soft computing, data fusion, machine learning and pattern recognition.
Jen-Hong Chiang is a currently working toward the Ph.D. degree in the Department of Industrial Engineering and Management at Yuan Ze University. His research interests include pattern recognition theory and machine learning algorithms.