A cascade face recognition system using hybrid feature extraction

Digital Signal Processing 22 (2012) 987–993 Contents lists available at SciVerse ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp...

Download PDF

449KB Sizes 0 Downloads 131 Views

Report

PDF Reader
Full Text

Digital Signal Processing 22 (2012) 987–993

Contents lists available at SciVerse ScienceDirect

Digital Signal Processing www.elsevier.com/locate/dsp

A cascade face recognition system using hybrid feature extraction Ping Zhang ∗ , Xi Guo Department of Mathematics and Computer Science, Alcorn State University, MS 39096-7500, USA

a r t i c l e

i n f o

Article history: Available online 11 July 2012 Keywords: 2D Complex Wavelet feature extraction Ensemble classiﬁer system Face detection and recognition Video image processing Pattern recognition

a b s t r a c t A novel cascade face recognition system using hybrid feature extraction is proposed. Three sets of face features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) are analyzed. For face recognition feature extraction, it has proved that 2D-CWT compares favorably with the traditionally used 2D Gabor transform in terms of the computational complexity and features’ stability. The proposed recognition system congregates three Artiﬁcial Neural Network classiﬁers (ANNs) and a gating network trained by the three feature sets. A computationally eﬃcient ﬁtness function of the genetic algorithms is proposed to evolve the best weights of the ensemble classiﬁer. Experiments demonstrated that the overall recognition rate and reliability have been signiﬁcantly improved in both still face recognition and video-based face recognition. © 2012 Elsevier Inc. All rights reserved.

1. Introduction Reliable and fast face recognition in the both still image-based and video-based applications has been extensively researched for years due to its enormously commercial and law enforcement applications. Some well-known methods have been reported in the literature. They include eigenfaces and Fisherfaces methods [1,2], Elastic Graph Matching (EGM) [3], robust Handsdorff distance measure for face localization and recognition [4,22], a combination strategy of neural networks for face recognition [23]. Many video-based face detection and recognition systems are conducted on the still-to-video scenario, which means that a subject’s face images can be taken from still images in order to train classiﬁers beforehand (off-line). During the face recognition stage, the subject’s face is detected and recognized online. How to quickly and accurately detect a non-titled frontal face and then recognize it from a video clip is a challenging research topic. An important prerequisite for video-based face recognition is face detection. In order to capture frontal facial images timely and accurately, many face detection methods have been proposed. For example, Shih and Liu [5] proposed a novel face detection method by applying discriminating feature analysis (DFA) combining with support vector machine (SVM). Rowley, Baluja and Kanade [6] presented a neural network-based upright frontal face detection system, in which a retinally connected neural network examined small windows of an image and decided whether each window contained a face. A fast face detection method was proposed by Viola and Jones [7]. Face color information is an important feature in the face detection. Wu, Chen and Yachida [8] described a new

*

Corresponding author. E-mail addresses: [email protected] (P. Zhang), [email protected] (X. Guo).

1051-2004/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.dsp.2012.07.003

method to detect faces in color images based on the fuzzy theory. In Ref. [9], authors used the quantized skin color regions for face detection. A survey of skin-color modeling and detection methods can be found in Ref. [10]. In addition, eye is another important feature for face detection and recognition. For instance, a robust method for eye features extraction on color image was reported in Ref. [11]. Using optimal Wavelet packets and radial basis functions for eye detection was introduced in Ref. [12]. Some face detection applications in the video clips can be found in Refs. [13,14]. A new and comprehensive survey on eye detection in the facial images was address in Ref. [24]. In the recent years, research on video-based face recognition remains a great interest for scientists and researchers worldwide due to the production of cheap and high resolution cameras and the simpliﬁcation of the interface between computer and cameras. For instance, the face detection and tracking in a video using propagating detection probability method was proposed in Ref. [15]. Lee and Tsao [16] proposed the face recognizability measurement for visual surveillance. The video-based face recognition using adaptive hidden Markov modules was presented in Ref. [17], and an adaptive fusion of multiple matchers for face recognition was introduced in Ref. [18]. A framework for evaluating object detection and tracking in video: speciﬁcally for face, text, and vehicle objects was elaborated in Ref. [25]. Face recognition using dual-tree complex wavelet features was introduced in [26]. Ensemble classiﬁers have been used in many pattern recognition systems in order to increase recognition accuracy and reliability and at the same time to suppress error rate and rejection rate in the recognition system. In Ref. [27], authors proposed a cascade of boosted ensembles for face detection. An ensemble-based discriminant learning scheme with boosting for face recognition was presented in [28].

988

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

3. Feature extraction for face recognition Feature extraction is one of the most important steps in designing a pattern recognition system. It requires that the extracted features have a small variation within a class and a strong discriminating ability among classes. In this section, three sets of features for face recognition are extracted as follows: 3.1. Complex Wavelet Transform for feature extraction Complex Wavelet Transform (CWT) has been developed in order to keep Discrete Wavelet Transform (DWT)’s attractive attributions, such as approximate half-sample delay property, PR (orthogonal or biorthogonal), ﬁnite support (FIR ﬁlters), vanishing moments/good stopband, linear-phase ﬁlters, etc. [30,34]. Furthermore, CWT adds some new merits [31,33]: approximate shift invariance, good directional selectivity for 2D image, eﬃcient order-N computation and limited redundancy. The computational complexity of CWT requires only twice that of DWT for 1D (2m times for mD signal). These good properties have made CWT successfully applicable to image processing recently. The dual-tree CWT has shown a suitable solution to numerous applications, including pattern feature extraction and recognition [32]. 2D Complex Wavelet Transform (2D-CWT) provides true directional selectivity and pixel-shifting insensitivity. The six subband images of 2D-CWT can be represented by the following wavelet core functions:

ψ1 (x, y ) = ψ(x)ψ( y ) ψ2 (x, y ) = ψ(x)ψ( y ) Fig. 1. Face detection and recognition schematic ﬂowchart.

In this paper, a new cascade face recognition scheme is proposed. The paper is divided into four parts. In the ﬁrst part, the ﬂowchart of video-based face detection and recognition is drawn. In the second part, three sets of facial features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) is analyzed and the detail scheme of extracting facial feature using 2D-CWT is proposed. In the third part of this paper, an ensemble classiﬁer scheme is used to congregate three individual Artiﬁcial Neural Network (ANN) classiﬁers trained by the three feature sets. A computationally eﬃcient ﬁtness function of genetic algorithms is presented and successfully used to evolve the best weights for the proposed ensemble classiﬁer. In the last part, experiments will demonstrate that the proposed face recognition system has a signiﬁcant improvement in terms of recognition rate and reliability. 2. Face detection and veriﬁcation modules In order to extract non-titled frontal facial images from a video clip, three face veriﬁcation modules are serially processed. The ﬂowchart of the proposed video-based cascade face detection and ensemble classiﬁer system for face recognition is shown in Fig. 1. Firstly, the spectra of face and non-face areas are analyzed and used for face skin veriﬁcation. Secondly, a fast face symmetry veriﬁcation algorithm is developed. Finally, three eye templates are chosen to further verify the non-titled frontal face. By doing so, there are two advantages: (1) computer recognizes only nontitled facial images in the video; (2) face recognition performance is increased accordingly. The detailed algorithms can be found in Ref. [35].

ψ3 (x, y ) = φ(x)ψ( y ) ψ4 (x, y ) = ψ(x)φ( y ) ψ5 (x, y ) = φ(x)ψ( y ) ψ6 (x, y ) = ψ(x)φ( y )

(1)

where φ(x) = φh (x) + j φ g (x) and ψ(x) = ψh + j ψ g (x). Both are complex functions. ψi (x, y ) (i = 1, . . . , 6) are six subbands of complex coeﬃcients at ith level, which are oriented at angles of ±75◦ , ±45◦ , and ±15◦ . 2D-CWT can be implemented using a dual-tree structure. For each tree, its structure is similar to 2D-DWT, which has two decomposition operations on each level, namely row decomposition and column decomposition, except that the different ﬁlters are applied for perfect reconstruction and the output of subbands images is congregated into complex wavelet coeﬃcients. Fig. 2 shows a 2D-CWT feature extraction scheme. The dual-tree complex wavelet decomposition consists of two trees: Tree A and Tree B. The two trees have the same structure. In order to realize perfect reconstruction from decomposed subimages, a lowpass ﬁlter and a highpass ﬁlter at the ﬁrst level need to be specially designed and denoted as h00, g00 for Tree A; h10, g10 for Tree B, which are called pre-ﬁlters. The other complex ﬁlters in the higher levels are set to h01 and g01 for Tree A, h11 and g11 for Tree B. For example, a facial image with size N × N is decomposed into four subband images: LL, LH, HL, HH at the ﬁrst level of each tree and each of the subband images has a size of N /2 × N /2. At the higher levels, the decompositions are based on LL subband image at the previous level. The complex wavelet coeﬃcients will be used for feature extraction.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

989

Table 1 Template of Sobel operator S x .

−1 −2 −1

0 0 0

1 2 1

Table 2 Template of Sobel operator S y . 1 0 −1

2 0 −2

A0

A1

1 0 −1

A2

A7 (i , j )

A3

A6

A4

A5

Fig. 3. Deﬁnition of eight neighbors of pixel (i , j ).

The X-gradients of the frontal face image can be calculated by:

Fig. 2. The schematic diagram of 2D-CWT for feature extraction.

Ix = Iz × Sx 3.2. Comparison between 2D-WCT and 2D Gabor transform

and the Y-gradients of the frontal face image is calculated by:

Gabor transform has been used in image processing and pattern analysis. A 2D Gabor function is a 2D Gaussian window multiplied by a complex sinusoid:

f (x, y ) = e

−(( δxx )2 +( δyy )2 ) − j (ωx x+ω y y )

e

(2)

In Two-Dimensional Complex Wavelet Transform (2D-CWT), we can set the basic functions to closely approximate complex Gaborlike functions, which exhibit strong characteristics of spatial locality and orientation selection, and which are optimally localized in the space and frequency domains. Therefore, the 2D-CWT functions have the following form:

h(x, y ) = a(x, y )e − j (ωx x+ω y y )

(3)

with a(x, y ) is a slowly varying Gaussian-like real window function centered at (0, 0), and (ωx , ω y ), the center frequency of the corresponding subband. So the complex coeﬃcients of the ith subband of the lth level can be written as:

cli = uli + j v li

(uli )2 + ( v li )2

I y = Iz × S y

(7)

The gradient magnitude and phase are then obtained by:

r (i , j ) =

I x2 (i , j ) + I 2j (i , j )

θ(i , j ) = a tan 2

Iy

(8)

Ix

Then, we can count the gradient direction of each pixel of the convoluted image with nonzero gradient magnitude values as a directional feature. In order to generate a ﬁxed number of features, each gradient direction is quantized into one of eight directions at π /4 intervals. Each normalized gradient image is then divided into 16 subimages. The number in each direction of each sub-image is counted as a feature. In total, the number of features is 4 × 4 × 8 = 128. 3.4. Gradient-based wavelet feature

(4)

The magnitude of each component of each subband is calculated as:

C il =

(6)

(5)

Since a(x, y ) is slowly varying, the magnitude is insensitive to the small image shift, which can be considered as a good merit of feature extraction. The directional properties of the 2D-CWT arise from the fact that h(x, y ) has a constant phase along the lines such that ωx x + ω y y is constant. For the applications of 2D image processing and feature extraction, the computation of a typical Gabor transform is expensive; whereas the implementation of 2D-CWT has a dual-tree structure; with each tree has a fast algorithm. Furthermore, from biological visual point of view, the oriented wavelet and wavelet-like transforms are nature for image processing and pattern recognition applications [29]. 3.3. Gradient-based directional feature

We use Kirsch nonlinear edge enhancement algorithm to extract statistical features from the frontal face image and apply wavelet transform on these statistical features to form feature vector. The directional-based feature extraction is implemented as follows: ﬁrstly, the Kirsch nonlinear edge enhancement algorithm is applied to an N × N facial image to extract horizontal, vertical, right-diagonal and left-diagonal directional feature images and the global feature image as follows: Suppose that we deﬁne the eight neighbors of pixel (i , j ) as follows (see Fig. 3). Kirsch deﬁned a nonlinear edge enhancement algorithm as follows:

7

G (i , j ) = max 1, max |5S k − 3T k | k =0

(9)

where

S k = A k + A k +1 + A k +2 and

In order to keep line-alike features, the gradient-based directional feature is used for face recognition. The most advantage of this kind of feature is that the algorithm is simple and easily implemented [20]. We assume an input image as I z , the templates of the Sobel operators S x and S y [19] are listed in Tables 1 and 2.

T k = A k +3 + A k +4 + A k +5 + A k +6 + A k +7

(10)

In order to extract four-directional features from horizontal (H ), vertical (V ), right-diagonal (R) and left-diagonal (L) directions, we can use the following templates:

990

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

The weighted outputs of the ANNs’ conﬁdence values can be calculated as follows:

X i = W iT .C i

(12)

where W i = [ w i ,0 , w i ,1 , . . . , w i , R −1 ], C i = [c i ,0 , c i ,1 , . . . , c i , R −1 ], i = 0, 1, 2, for three ANNs. Add three weighted conﬁdence values into a Z vector:

Z=

2

Xi

i =0

Z = [ z 0 , z 1 , . . . , z R −1 ]

(13)

In order to generalize the output, the j-th output z j of the gating network is the “softmax” function of z j as follows:

ez j gj = z k ke G = [ g 0 , g 1 , . . . , g R −1 ] T

(14)

where G is the output of the gating network. Our goal is to pursue a lowest misrecognition rate and at the same time to seek the highest recognition performance. We can create a vector O target with R elements, which represent the number of testers in the video. In the vector, the value of the corresponding label is set equal to 1.0, while others are set equal to 0.0. A ﬁtness function f is chosen to minimize the difference between the output G and the corresponding training sample vector O target , as follows:

f = |G − O target |2 Fig. 4. An ensemble classiﬁer consisting of three ANNs and one gating network.

G (i , j ) H = max |5S 0 − 3T 0 |, |5S 4 − 3T 4 | G (i , j ) V = max |5S 2 − 3T 2 |, |5S 6 − 3T 6 | G (i , j ) R = max |5S 1 − 3T 1 |, |5S 5 − 3T 5 | G (i , j ) L = max |5S 3 − 3T 3 |, |5S 7 − 3T 7 |

(15)

By minimizing Eq. (15) through a genetic evolution, the weights tend to be optimal. The genetic algorithms are used to train gating network, namely to seek optimal weights w i in Eq. (12) for best recognition rate. In Eq. (13), Z is the summary of three weighted ANN conﬁdence values, which means three classiﬁers’ contributions may be different based on their weights. 4.2. Genetic algorithms for training gating network

(11)

2D Daubechies wavelet transform [21] is used to ﬁlter out the high frequency components of each directional feature image and the global feature image, and to convert the feature matrix into a 4 × 4 matrix, respectively. In total, 16 × 5 = 80 features can be extracted from each image. 4. Ensemble classiﬁer 4.1. Ensemble classiﬁcation scheme A novel classiﬁer combination scheme is proposed in order to achieve the lowest error rate while pursuing the highest recognition rate for face recognition. The schematic diagram is shown in Fig. 4. The output conﬁdence values of three ANNs are weighted by w 0,0 –w 0, R −1 for ANN1, w 1,0 –w 1, R −1 for ANN2, and w 2,0 –w 2, R −1 for ANN3 (note: R is the number of testers to be recognized; w 0,0 –w 0, R −1 refers to the weights of the conﬁdence values c 0,0 –c 0, R −1 of ANN1, and so on). A gating network is used to congregate the weighted conﬁdence values. A genetic algorithm is used to evolve the optimal weights for the gating network from the conﬁdence values of three ANNs. Suppose the outputs of three ANNs are represented as: {c 0,0 , c 0,1 , . . . , c 0, R −1 }, {c 1,0 , c 1,1 , . . . , c 1, R −1 }, {c 2,0 , c 2,1 , . . . , c 2, R −1 }, respectively.

GA applies evolution-based optimization techniques of selection, mutation, and crossover to a population for computing an optimal solution. The problem of the weight selection in the gating network is well suited to the evolution by GAs. In the ANN training procedure, the most diﬃcult problem is to ﬁnd a reasonable ﬁtness function for a large set of training samples. Ideally, the recognition rate can be used as a ﬁtness criterion for training a classiﬁer. However, using the recognition rate this way is unfeasible for some pattern recognition problems because it requires huge computations for each generation of learning. In this paper, we use GAs to train the gating network. When Eq. (15) is used as the ﬁtness function, our GA pursues the smallest difference between the gating network’s outputs and the target label vector O target . As the proposed method includes three feature sets, three ANN classiﬁers and one gating network, the computational complexity is approximately O ( N × N ). 5. Face recognition results In order to test our proposed face feature algorithms and recognition system, two types of experiments have been conducted. One experiment is tested on a standard face database (still face images); another experiment is conducted on video-based face recognition.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

991

5.1. Recognition tested on face database In the ﬁrst experiment, the face recognition is conducted on benchmark face database: Face Database of University of Essex [36]. The database includes 395 individuals (male and female), which represent different genders, races, and ages. Some individuals wear eyeglasses; some have beards. 50 subjects are chosen for the experiment. For each individual, the ﬁrst 10 face images in the database are used to create training samples; remaining 10 face images in the database are kept for testing. In order to get more training samples, a simple x, y coordinates shift and small rotations are applied to the 10 training face images to create 100 simulated training samples for each subject. Three-layer Artiﬁcial Neural Networks (ANN) with Back-Propagation (BP) is employed as three classiﬁers in the proposed ensemble system shown in Fig. 4. For each ANN classiﬁer, the number of nodes in the input layer is set to the number of features. The number of nodes in the output layer is equal to the number of subjects to be recognized. The number of nodes in the hidden layer is set to 5–10 depending on the number of training samples, the number of features used and the number of ANN outputs. A recognition result is accepted if: (1) three ANN classiﬁers vote for the same subject at the same time, where the sum of the conﬁdence values is equal to or larger than 2.25; or (2) the gating network votes for a subject, where the conﬁdence value of the gating network is larger than 0.65; or (3) the sum of the conﬁdence values of any two ANNs is larger than 1.50 and they both vote for the same subject and the gating network votes for the same subject; otherwise, the subject is rejected. A robust method for eye features extraction method [11] combining with face spectra and face symmetric analysis [25] is applied to extract two eyes and mouth area of the face image. Then the head area is extracted and scaled into size of 64 × 64. In the 2D-CWT face feature extraction, the 64 × 64 face image is decomposed into 4th levels; then each sub-image has a size of 4 × 4. We keep only amplitude coeﬃcients for three high frequency components and both amplitude and phase information for low frequency component for each tree. The number of features = 4 × 4 (for each subband image) *3 (high frequency subband images for each tree) *2 (trees) + 4 × 4 (for each subband image) *2 (trees) *2 (parts: real and imaginary) = 160. As the real and imaginary coeﬃcients of each LL subband image are extracted as features, the phase information is preserved a good directional selectivity. As analyzed before, we can extract 128 features for GradientBased Directional Feature set and 80 features for the GradientBased Wavelet Feature set. Here are some criteria for measuring the recognition performance. The recognition rate (RR) is deﬁned as:

Number of correctly recognized subjects

RR =

Total number of testing subjects

The reject rate (Re j R) is deﬁned as:

Re j R =

Number of rejected subjects Total number of testing subjects

The mis-recognition rate (MR) is deﬁned as:

MR =

Number of mis-recognized subjects Total number of testing subjects

The reliability (RE) can be denoted as: RE =

Total number of testing subjects − Number of misrecognized subjects Total number of testing subjects

Fig. 5. Convergence of gating network. Table 3 Recognition performance comparison on different classiﬁers trained by different features.

Classiﬁer I: Gradient-Based Directional Feature set Classiﬁer II: Gradient-Based Wavelet Feature set Classiﬁer III: 2D-CWT Ensemble Classiﬁer

RR (%)

Re j R (%)

MR (%)

RE (%)

81.00

5.00

14.00

86.00

82.40

5.60

12.00

88.00

84.00 85.40

5.0 8.60

11.00 5.00

89.00 95.00

Fig. 5 shows the convergence diagram of the gating network training by genetic algorithm provided that the three ANNs have been well trained by three feature sets. Table 3 lists the face recognition rate, rejection rate, misrecognition rate, and reliability for three individual classiﬁers trained by three feature sets and its ensemble classiﬁer system. From Table 3, it can be concluded that three individual classiﬁers have the similar recognition rate. However, the proposed ensemble classiﬁer system shows the best recognition performance. The introduction of ensemble classiﬁer system has signiﬁcantly increased system’s reliability. 5.2. Face recognition conducted on video clips In the preparation of training face images, the frontal face of each subject is shot 30 times in front of camera with different illuminations, different distances and different face complexions in order to get enough face images to train the ANN classiﬁers offline. In the testing procedure, the subject is moving in front of camera. The frontal face of the subject is detected using our proposed method [35]. If the frontal face image of same subject is shot multiple times in one video, and if and only if the overall veriﬁcation conﬁdence value of the detected frontal face image is higher than that of the previous shot for the same subject, then the detected frontal face image will be recognized again. Furthermore, if the recognition conﬁdence value is higher than that of the previous recognition for the same subject, then the recognition result will be updated. In the video-based face detection, 30 surveillance videos taken indoors and outdoors are used to test the face detection. Table 4 lists the number of subjects, average number of face images detected for each subject in the ﬁrst ﬁve videos.

992

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

Table 4 Number of faces detected in the ﬁve videos. Video Video Video Video Video Video

I II III IV V

No. of testers

Average no. of face images detected for each tester

10 15 18 25 45

5 4 4 5 3

Fig. 7. Tradeoff among the recognition rate (RR), misrecognition rate (MR), and rejection rate (Re j R) with different ANN conﬁdence value thresholds.

dence value tested on video V , the ANN classiﬁer is trained by the hybrid 2D-CWT feature + (r , g , H , S , V ). All the face detection and recognition experiments were conducted on PC with processor speed 2.50 GHz, 4 GB memory. 6. Conclusions Fig. 6. Example images that fail to pass the veriﬁcation modules. Table 5 Overall recognition reliability (%) on 30 video clips. Feature set

Recognition reliability based on 30 videos

Hybrid feature set A Hybrid feature B Hybrid feature C Ensemble classiﬁer

86.50 85.10 88.88 93.30

In Fig. 6, some images are veriﬁed by skin module, however, the images cannot pass the symmetric module or (and) the eye template module. Experiments demonstrated that the frontal face detection rate can be achieved as high as 95% in the low quality video images. Compared to the benchmark face (object) detection method proposed by Viola and Jones [7], our proposed method can eliminate titled face images and non-face objects in the video clips. In such a way, only non-titled frontal face images are sent for face recognition. The comparative face recognition experiments have been conducted on three feature sets and the proposed ensemble classiﬁer using the 30 video clips. The overall recognition reliability of the 30 video clips is shown in Table 5. Three hybrid feature sets are created using the three extracted feature sets combining with ﬁve color components (r , g , H , S , V ) in the face areas. It can be concluded that three ANNs trained by three hybrid feature sets (hybrid feature set A: gradient-based directional feature set + (r , g , H , S , V ); hybrid feature set B: gradient-based wavelet feature set + (r , g , H , S , V ) and hybrid feature set C: 2DCWT feature set + (r , g , H , S , V )) have similar recognition performance. However, ensemble classiﬁer has demonstrated an excellent recognition rate. Fig. 7 shows the recognition rate (RR), misrecognition rate (MR), and rejection rate (Re j R) with different thresholds of ANN’s conﬁ-

In this paper, a novel face detection and recognition system is presented. Three fast and eﬃcient face detection veriﬁcation modules have been used to detect facial area in the video clips or still images. A novel 2D-CWT face feature extraction scheme is proposed for face recognition. The proposed new ensemble classiﬁer system congregates the outputs of three ANN classiﬁers, which leads to the higher overall face recognition rate and reliability at the same time. In the future, hybrid feature extraction and newly ensemble classiﬁer system design will be a new research direction. For example, other biometric features can be combined with face features to boost system’s recognition rate and to decrease its error rate. In the ensemble classiﬁer design, the complementary information will be used to train the ensemble classiﬁer system. Acknowledgments Part of the gating network research was conducted at Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Canada. Author wishes to thank professors and colleagues at CENPARMI for their help. References [1] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: A literature survey, ACM Computing Surveys 35 (4) (2003) 399–458. [2] P.N. Bellhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using class speciﬁc linear projections, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (7) (1997) 711–720. [3] L. Wiskott, J.M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 775–779. [4] E.P. Vivek, N. Sudha, Robust Handsdorff distance measure for face recognition, Pattern Recognition 40 (2) (2007) 431–442. [5] P. Shih, C. Liu, Face detection using discriminating feature analysis and support vector machine, Pattern Recognition 39 (2) (2006) 260–276. [6] H.A. Rowley, S. Baluja, T. Kanade, Neural network-based face detection, IEEE Transactions Pattern Analysis and Machine Intelligence 20 (1) (1998) 23–28. [7] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Proc. Computer Vision and Pattern Recognition, 2001, pp. 511–518.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

[8] H. Wu, Q. Chen, M. Yachida, Face detection from color images using a fuzzy pattern matching method, IEEE Transactions Pattern Analysis and Machine Intelligence 21 (6) (1999) 557–563. [9] C. Garcia, G. Tziritas, Face detection using quantized skin color regions, merging and wavelet packet analysis, IEEE Transactions on Multimedia 1 (3) (1999) 264–277. [10] P. Kakumanu, S. Makrogiannis, N. Bourbakis, A survey of skin-color modeling and detection methods, Pattern Recognition 40 (3) (1997) 1106–1122. [11] Z. Zheng, J. Yang, L. Yang, A robust method for eye features extraction on color image, Pattern Recognition Letters 26 (14) (2005) 2252–2261. [12] J. Huang, H. Wechsler, Eye detection using optimal wavelet packets and radial basis functions, International Journal of Pattern Recognition and Artiﬁcial Intelligence 13 (7) (1999) 1009–1025. [13] M. Lievin, F. Luthon, Nonliner color space and spatiotemporal MRF for hierarchical segmentation of face features in video, IEEE Transactions on Image Processing 13 (1) (2004) 63–71. [14] H. Wang, S.F. Chang, A highly eﬃcient system for automatic face region detection in MPEG video, IEEE Transactions on Circuits and System for Video Technology 7 (4) (1997) 615–628. [15] R.C. Verma, C. Schmid, K. Mikolajczyk, Face detection and tracking in a video by propagating detection probabilities, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10) (2003) 1215–1228. [16] H.J. Lee, Y.C. Tsao, Measurement of face recognizability for visual surveillance, in: Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2005, pp. 278–294. [17] X. Liu, T. Cheng, Video-based face recognition using adaptive hidden Markov models, in: Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2003, pp. 340–345. [18] P. Unsang, A.K. Jain, A. Ross, Face recognition in video: Adaptive fusion of multiple matchers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8. [19] W.K. Pratt, Digital Image Processing, Wiley, New York, 1991. [20] P. Zhang, T.D. Bui, C.Y. Suen, A cascade ensemble classiﬁer system for reliable recognition of handwritten digits, Pattern Recognition 40 (2) (2007) 3415– 3429. [21] S. Mallat, A Wavelet Tour of Signal Processing, second ed., Academic Press, 1999. [22] H. Tan, Y.J. Zhang, A novel weighted Hausdorff distance for face localization, Image and Vision Computing 24 (7) (2006) 656–662. [23] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: A convolutional neural-network approach, IEEE Transactions on Neural Networks 8 (1) (1997) 98–113. [24] D.W. Hansen, Q. Ji, A survey of models for eyes and gaze, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (3) (2010) 478–500. [25] R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, J. Zhang, Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,

[26] [27]

[28]

[29]

[30] [31] [32] [33] [34]

[35]

[36]

993

IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 319–336. C. Liu, D. Dai, Face recognition using dual-tree complex wavelet features, IEEE Transactions on Image Processing 18 (11) (2009) 2593–2599. S.C. Brubaker, J.X. Wu, J. Sun, M.D. Mullin, J.M. Rehg, On the design of cascades of boosted ensembles for face detection, International Journal of Computer Vision 77 (1–3) (2008) 65–86. J.W. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, S.Z. Li, Ensemble-based discriminant learning with boosting for face recognition, IEEE Transactions on Neural Networks 17 (1) (2006) 166–178. E.L. Chen, P.C. Chung, C.L. Chen, H.M. Tsai, C.I. Chang, An automatic diagnostic system for CT liver image classiﬁcation, IEEE Transactions on Biomedical Engineering 45 (6) (1998) 783–794. C.K. Chui, Wavelets: A Mathematical Tool for Signal Analysis, SIAM, Philadelphia, 1997. N.G. Kingsbury, Image processing with complex wavelets, Philosophical Transactions of the Royal Society of London A 357 (1999) 2543–2560. C.C. Liu, D.Q. Dai, Face recognition using dual-tree complex wavelet features, IEEE Transactions on Image Processing 18 (2009) 2593–2599. I.W. Selesnick, R.G. Baraniuk, N.G. Kingsbury, The dual-tree complex wavelet transform, IEEE Signal Processing Magazine (2005) 123–151. P. Zhang, T.D. Bui, C.Y. Suen, Recognition of similar objects using 2-D waveletmultifractal feature extraction, in: Proceedings of 16th International Conference on Pattern Recognition, Quebec, Canada, 2002. P. Zhang, A video-based face detection and recognition system using cascade face veriﬁcation modules, in: Proceedings of 37th IEEE Applied Imagery Pattern Recognition Workshop, 2008, pp. 1–8. Face Recognition Database, University of Essex, UK, http://cswww.essex.ac.uk/ mv/allfaces/index.html.

Ping Zhang (Ph.D.) is an Assistant Professor with the Dept. of Mathematical and Computer Science, Alcorn State University, USA. As a senior IEEE member, he has been working in academic ﬁeld and industry for more than ten years. He has published more than 40 journal and international conference papers and book chapters. He has reviewed more than 60 research journal papers for Pattern Recognition, Pattern Recognition Letters, and IEEE Transactions on SMC, NN, etc. His research interests include Pattern Recognition, OCR, Image Processing and Computer Vision. Xi Guo, a Computer Science graduate student. She has published a few research papers. Her research interests include Pattern Recognition, Database Management Design and Intelligent Web Design.

A cascade face recognition system using hybrid feature extraction

A cascade face recognition system using hybrid feature extraction

Recommend Documents