A cascade face recognition system using hybrid feature extraction

A cascade face recognition system using hybrid feature extraction

Digital Signal Processing 22 (2012) 987–993 Contents lists available at SciVerse ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp...

449KB Sizes 0 Downloads 131 Views

Digital Signal Processing 22 (2012) 987–993

Contents lists available at SciVerse ScienceDirect

Digital Signal Processing www.elsevier.com/locate/dsp

A cascade face recognition system using hybrid feature extraction Ping Zhang ∗ , Xi Guo Department of Mathematics and Computer Science, Alcorn State University, MS 39096-7500, USA

a r t i c l e

i n f o

Article history: Available online 11 July 2012 Keywords: 2D Complex Wavelet feature extraction Ensemble classifier system Face detection and recognition Video image processing Pattern recognition

a b s t r a c t A novel cascade face recognition system using hybrid feature extraction is proposed. Three sets of face features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) are analyzed. For face recognition feature extraction, it has proved that 2D-CWT compares favorably with the traditionally used 2D Gabor transform in terms of the computational complexity and features’ stability. The proposed recognition system congregates three Artificial Neural Network classifiers (ANNs) and a gating network trained by the three feature sets. A computationally efficient fitness function of the genetic algorithms is proposed to evolve the best weights of the ensemble classifier. Experiments demonstrated that the overall recognition rate and reliability have been significantly improved in both still face recognition and video-based face recognition. © 2012 Elsevier Inc. All rights reserved.

1. Introduction Reliable and fast face recognition in the both still image-based and video-based applications has been extensively researched for years due to its enormously commercial and law enforcement applications. Some well-known methods have been reported in the literature. They include eigenfaces and Fisherfaces methods [1,2], Elastic Graph Matching (EGM) [3], robust Handsdorff distance measure for face localization and recognition [4,22], a combination strategy of neural networks for face recognition [23]. Many video-based face detection and recognition systems are conducted on the still-to-video scenario, which means that a subject’s face images can be taken from still images in order to train classifiers beforehand (off-line). During the face recognition stage, the subject’s face is detected and recognized online. How to quickly and accurately detect a non-titled frontal face and then recognize it from a video clip is a challenging research topic. An important prerequisite for video-based face recognition is face detection. In order to capture frontal facial images timely and accurately, many face detection methods have been proposed. For example, Shih and Liu [5] proposed a novel face detection method by applying discriminating feature analysis (DFA) combining with support vector machine (SVM). Rowley, Baluja and Kanade [6] presented a neural network-based upright frontal face detection system, in which a retinally connected neural network examined small windows of an image and decided whether each window contained a face. A fast face detection method was proposed by Viola and Jones [7]. Face color information is an important feature in the face detection. Wu, Chen and Yachida [8] described a new

*

Corresponding author. E-mail addresses: [email protected] (P. Zhang), [email protected] (X. Guo).

1051-2004/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.dsp.2012.07.003

method to detect faces in color images based on the fuzzy theory. In Ref. [9], authors used the quantized skin color regions for face detection. A survey of skin-color modeling and detection methods can be found in Ref. [10]. In addition, eye is another important feature for face detection and recognition. For instance, a robust method for eye features extraction on color image was reported in Ref. [11]. Using optimal Wavelet packets and radial basis functions for eye detection was introduced in Ref. [12]. Some face detection applications in the video clips can be found in Refs. [13,14]. A new and comprehensive survey on eye detection in the facial images was address in Ref. [24]. In the recent years, research on video-based face recognition remains a great interest for scientists and researchers worldwide due to the production of cheap and high resolution cameras and the simplification of the interface between computer and cameras. For instance, the face detection and tracking in a video using propagating detection probability method was proposed in Ref. [15]. Lee and Tsao [16] proposed the face recognizability measurement for visual surveillance. The video-based face recognition using adaptive hidden Markov modules was presented in Ref. [17], and an adaptive fusion of multiple matchers for face recognition was introduced in Ref. [18]. A framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects was elaborated in Ref. [25]. Face recognition using dual-tree complex wavelet features was introduced in [26]. Ensemble classifiers have been used in many pattern recognition systems in order to increase recognition accuracy and reliability and at the same time to suppress error rate and rejection rate in the recognition system. In Ref. [27], authors proposed a cascade of boosted ensembles for face detection. An ensemble-based discriminant learning scheme with boosting for face recognition was presented in [28].

988

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

3. Feature extraction for face recognition Feature extraction is one of the most important steps in designing a pattern recognition system. It requires that the extracted features have a small variation within a class and a strong discriminating ability among classes. In this section, three sets of features for face recognition are extracted as follows: 3.1. Complex Wavelet Transform for feature extraction Complex Wavelet Transform (CWT) has been developed in order to keep Discrete Wavelet Transform (DWT)’s attractive attributions, such as approximate half-sample delay property, PR (orthogonal or biorthogonal), finite support (FIR filters), vanishing moments/good stopband, linear-phase filters, etc. [30,34]. Furthermore, CWT adds some new merits [31,33]: approximate shift invariance, good directional selectivity for 2D image, efficient order-N computation and limited redundancy. The computational complexity of CWT requires only twice that of DWT for 1D (2m times for mD signal). These good properties have made CWT successfully applicable to image processing recently. The dual-tree CWT has shown a suitable solution to numerous applications, including pattern feature extraction and recognition [32]. 2D Complex Wavelet Transform (2D-CWT) provides true directional selectivity and pixel-shifting insensitivity. The six subband images of 2D-CWT can be represented by the following wavelet core functions:

ψ1 (x, y ) = ψ(x)ψ( y ) ψ2 (x, y ) = ψ(x)ψ( y ) Fig. 1. Face detection and recognition schematic flowchart.

In this paper, a new cascade face recognition scheme is proposed. The paper is divided into four parts. In the first part, the flowchart of video-based face detection and recognition is drawn. In the second part, three sets of facial features are extracted. The merits of Two-Dimensional Complex Wavelet Transform (2D-CWT) is analyzed and the detail scheme of extracting facial feature using 2D-CWT is proposed. In the third part of this paper, an ensemble classifier scheme is used to congregate three individual Artificial Neural Network (ANN) classifiers trained by the three feature sets. A computationally efficient fitness function of genetic algorithms is presented and successfully used to evolve the best weights for the proposed ensemble classifier. In the last part, experiments will demonstrate that the proposed face recognition system has a significant improvement in terms of recognition rate and reliability. 2. Face detection and verification modules In order to extract non-titled frontal facial images from a video clip, three face verification modules are serially processed. The flowchart of the proposed video-based cascade face detection and ensemble classifier system for face recognition is shown in Fig. 1. Firstly, the spectra of face and non-face areas are analyzed and used for face skin verification. Secondly, a fast face symmetry verification algorithm is developed. Finally, three eye templates are chosen to further verify the non-titled frontal face. By doing so, there are two advantages: (1) computer recognizes only nontitled facial images in the video; (2) face recognition performance is increased accordingly. The detailed algorithms can be found in Ref. [35].

ψ3 (x, y ) = φ(x)ψ( y ) ψ4 (x, y ) = ψ(x)φ( y ) ψ5 (x, y ) = φ(x)ψ( y ) ψ6 (x, y ) = ψ(x)φ( y )

(1)

where φ(x) = φh (x) + j φ g (x) and ψ(x) = ψh + j ψ g (x). Both are complex functions. ψi (x, y ) (i = 1, . . . , 6) are six subbands of complex coefficients at ith level, which are oriented at angles of ±75◦ , ±45◦ , and ±15◦ . 2D-CWT can be implemented using a dual-tree structure. For each tree, its structure is similar to 2D-DWT, which has two decomposition operations on each level, namely row decomposition and column decomposition, except that the different filters are applied for perfect reconstruction and the output of subbands images is congregated into complex wavelet coefficients. Fig. 2 shows a 2D-CWT feature extraction scheme. The dual-tree complex wavelet decomposition consists of two trees: Tree A and Tree B. The two trees have the same structure. In order to realize perfect reconstruction from decomposed subimages, a lowpass filter and a highpass filter at the first level need to be specially designed and denoted as h00, g00 for Tree A; h10, g10 for Tree B, which are called pre-filters. The other complex filters in the higher levels are set to h01 and g01 for Tree A, h11 and g11 for Tree B. For example, a facial image with size N × N is decomposed into four subband images: LL, LH, HL, HH at the first level of each tree and each of the subband images has a size of N /2 × N /2. At the higher levels, the decompositions are based on LL subband image at the previous level. The complex wavelet coefficients will be used for feature extraction.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

989

Table 1 Template of Sobel operator S x .

−1 −2 −1

0 0 0

1 2 1

Table 2 Template of Sobel operator S y . 1 0 −1

2 0 −2

A0

A1

1 0 −1

A2

A7 (i , j )

A3

A6

A4

A5

Fig. 3. Definition of eight neighbors of pixel (i , j ).

The X-gradients of the frontal face image can be calculated by:

Fig. 2. The schematic diagram of 2D-CWT for feature extraction.

Ix = Iz × Sx 3.2. Comparison between 2D-WCT and 2D Gabor transform

and the Y-gradients of the frontal face image is calculated by:

Gabor transform has been used in image processing and pattern analysis. A 2D Gabor function is a 2D Gaussian window multiplied by a complex sinusoid:

f (x, y ) = e

−(( δxx )2 +( δyy )2 ) − j (ωx x+ω y y )

e

(2)

In Two-Dimensional Complex Wavelet Transform (2D-CWT), we can set the basic functions to closely approximate complex Gaborlike functions, which exhibit strong characteristics of spatial locality and orientation selection, and which are optimally localized in the space and frequency domains. Therefore, the 2D-CWT functions have the following form:

h(x, y ) = a(x, y )e − j (ωx x+ω y y )

(3)

with a(x, y ) is a slowly varying Gaussian-like real window function centered at (0, 0), and (ωx , ω y ), the center frequency of the corresponding subband. So the complex coefficients of the ith subband of the lth level can be written as:

cli = uli + j v li

 (uli )2 + ( v li )2

I y = Iz × S y

(7)

The gradient magnitude and phase are then obtained by:

r (i , j ) =



I x2 (i , j ) + I 2j (i , j )



θ(i , j ) = a tan 2

Iy



(8)

Ix

Then, we can count the gradient direction of each pixel of the convoluted image with nonzero gradient magnitude values as a directional feature. In order to generate a fixed number of features, each gradient direction is quantized into one of eight directions at π /4 intervals. Each normalized gradient image is then divided into 16 subimages. The number in each direction of each sub-image is counted as a feature. In total, the number of features is 4 × 4 × 8 = 128. 3.4. Gradient-based wavelet feature

(4)

The magnitude of each component of each subband is calculated as:

C il =

(6)

(5)

Since a(x, y ) is slowly varying, the magnitude is insensitive to the small image shift, which can be considered as a good merit of feature extraction. The directional properties of the 2D-CWT arise from the fact that h(x, y ) has a constant phase along the lines such that ωx x + ω y y is constant. For the applications of 2D image processing and feature extraction, the computation of a typical Gabor transform is expensive; whereas the implementation of 2D-CWT has a dual-tree structure; with each tree has a fast algorithm. Furthermore, from biological visual point of view, the oriented wavelet and wavelet-like transforms are nature for image processing and pattern recognition applications [29]. 3.3. Gradient-based directional feature

We use Kirsch nonlinear edge enhancement algorithm to extract statistical features from the frontal face image and apply wavelet transform on these statistical features to form feature vector. The directional-based feature extraction is implemented as follows: firstly, the Kirsch nonlinear edge enhancement algorithm is applied to an N × N facial image to extract horizontal, vertical, right-diagonal and left-diagonal directional feature images and the global feature image as follows: Suppose that we define the eight neighbors of pixel (i , j ) as follows (see Fig. 3). Kirsch defined a nonlinear edge enhancement algorithm as follows:



7





G (i , j ) = max 1, max |5S k − 3T k | k =0

(9)

where

S k = A k + A k +1 + A k +2 and

In order to keep line-alike features, the gradient-based directional feature is used for face recognition. The most advantage of this kind of feature is that the algorithm is simple and easily implemented [20]. We assume an input image as I z , the templates of the Sobel operators S x and S y [19] are listed in Tables 1 and 2.

T k = A k +3 + A k +4 + A k +5 + A k +6 + A k +7

(10)

In order to extract four-directional features from horizontal (H ), vertical (V ), right-diagonal (R) and left-diagonal (L) directions, we can use the following templates:

990

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

The weighted outputs of the ANNs’ confidence values can be calculated as follows:

X i = W iT .C i

(12)

where W i = [ w i ,0 , w i ,1 , . . . , w i , R −1 ], C i = [c i ,0 , c i ,1 , . . . , c i , R −1 ], i = 0, 1, 2, for three ANNs. Add three weighted confidence values into a Z vector:

Z=

2

Xi

i =0

Z = [ z 0 , z 1 , . . . , z R −1 ]

(13)

In order to generalize the output, the j-th output z j of the gating network is the “softmax” function of z j as follows:

ez j gj = z k ke G = [ g 0 , g 1 , . . . , g R −1 ] T

(14)

where G is the output of the gating network. Our goal is to pursue a lowest misrecognition rate and at the same time to seek the highest recognition performance. We can create a vector O target with R elements, which represent the number of testers in the video. In the vector, the value of the corresponding label is set equal to 1.0, while others are set equal to 0.0. A fitness function f is chosen to minimize the difference between the output G and the corresponding training sample vector O target , as follows:

f = |G − O target |2 Fig. 4. An ensemble classifier consisting of three ANNs and one gating network.

















G (i , j ) H = max |5S 0 − 3T 0 |, |5S 4 − 3T 4 | G (i , j ) V = max |5S 2 − 3T 2 |, |5S 6 − 3T 6 | G (i , j ) R = max |5S 1 − 3T 1 |, |5S 5 − 3T 5 | G (i , j ) L = max |5S 3 − 3T 3 |, |5S 7 − 3T 7 |

(15)

By minimizing Eq. (15) through a genetic evolution, the weights tend to be optimal. The genetic algorithms are used to train gating network, namely to seek optimal weights w i in Eq. (12) for best recognition rate. In Eq. (13), Z is the summary of three weighted ANN confidence values, which means three classifiers’ contributions may be different based on their weights. 4.2. Genetic algorithms for training gating network

(11)

2D Daubechies wavelet transform [21] is used to filter out the high frequency components of each directional feature image and the global feature image, and to convert the feature matrix into a 4 × 4 matrix, respectively. In total, 16 × 5 = 80 features can be extracted from each image. 4. Ensemble classifier 4.1. Ensemble classification scheme A novel classifier combination scheme is proposed in order to achieve the lowest error rate while pursuing the highest recognition rate for face recognition. The schematic diagram is shown in Fig. 4. The output confidence values of three ANNs are weighted by w 0,0 –w 0, R −1 for ANN1, w 1,0 –w 1, R −1 for ANN2, and w 2,0 –w 2, R −1 for ANN3 (note: R is the number of testers to be recognized; w 0,0 –w 0, R −1 refers to the weights of the confidence values c 0,0 –c 0, R −1 of ANN1, and so on). A gating network is used to congregate the weighted confidence values. A genetic algorithm is used to evolve the optimal weights for the gating network from the confidence values of three ANNs. Suppose the outputs of three ANNs are represented as: {c 0,0 , c 0,1 , . . . , c 0, R −1 }, {c 1,0 , c 1,1 , . . . , c 1, R −1 }, {c 2,0 , c 2,1 , . . . , c 2, R −1 }, respectively.

GA applies evolution-based optimization techniques of selection, mutation, and crossover to a population for computing an optimal solution. The problem of the weight selection in the gating network is well suited to the evolution by GAs. In the ANN training procedure, the most difficult problem is to find a reasonable fitness function for a large set of training samples. Ideally, the recognition rate can be used as a fitness criterion for training a classifier. However, using the recognition rate this way is unfeasible for some pattern recognition problems because it requires huge computations for each generation of learning. In this paper, we use GAs to train the gating network. When Eq. (15) is used as the fitness function, our GA pursues the smallest difference between the gating network’s outputs and the target label vector O target . As the proposed method includes three feature sets, three ANN classifiers and one gating network, the computational complexity is approximately O ( N × N ). 5. Face recognition results In order to test our proposed face feature algorithms and recognition system, two types of experiments have been conducted. One experiment is tested on a standard face database (still face images); another experiment is conducted on video-based face recognition.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

991

5.1. Recognition tested on face database In the first experiment, the face recognition is conducted on benchmark face database: Face Database of University of Essex [36]. The database includes 395 individuals (male and female), which represent different genders, races, and ages. Some individuals wear eyeglasses; some have beards. 50 subjects are chosen for the experiment. For each individual, the first 10 face images in the database are used to create training samples; remaining 10 face images in the database are kept for testing. In order to get more training samples, a simple x, y coordinates shift and small rotations are applied to the 10 training face images to create 100 simulated training samples for each subject. Three-layer Artificial Neural Networks (ANN) with Back-Propagation (BP) is employed as three classifiers in the proposed ensemble system shown in Fig. 4. For each ANN classifier, the number of nodes in the input layer is set to the number of features. The number of nodes in the output layer is equal to the number of subjects to be recognized. The number of nodes in the hidden layer is set to 5–10 depending on the number of training samples, the number of features used and the number of ANN outputs. A recognition result is accepted if: (1) three ANN classifiers vote for the same subject at the same time, where the sum of the confidence values is equal to or larger than 2.25; or (2) the gating network votes for a subject, where the confidence value of the gating network is larger than 0.65; or (3) the sum of the confidence values of any two ANNs is larger than 1.50 and they both vote for the same subject and the gating network votes for the same subject; otherwise, the subject is rejected. A robust method for eye features extraction method [11] combining with face spectra and face symmetric analysis [25] is applied to extract two eyes and mouth area of the face image. Then the head area is extracted and scaled into size of 64 × 64. In the 2D-CWT face feature extraction, the 64 × 64 face image is decomposed into 4th levels; then each sub-image has a size of 4 × 4. We keep only amplitude coefficients for three high frequency components and both amplitude and phase information for low frequency component for each tree. The number of features = 4 × 4 (for each subband image) *3 (high frequency subband images for each tree) *2 (trees) + 4 × 4 (for each subband image) *2 (trees) *2 (parts: real and imaginary) = 160. As the real and imaginary coefficients of each LL subband image are extracted as features, the phase information is preserved a good directional selectivity. As analyzed before, we can extract 128 features for GradientBased Directional Feature set and 80 features for the GradientBased Wavelet Feature set. Here are some criteria for measuring the recognition performance. The recognition rate (RR) is defined as:

Number of correctly recognized subjects

RR =

Total number of testing subjects

The reject rate (Re j R) is defined as:

Re j R =

Number of rejected subjects Total number of testing subjects

The mis-recognition rate (MR) is defined as:

MR =

Number of mis-recognized subjects Total number of testing subjects

The reliability (RE) can be denoted as: RE =

Total number of testing subjects − Number of misrecognized subjects Total number of testing subjects

Fig. 5. Convergence of gating network. Table 3 Recognition performance comparison on different classifiers trained by different features.

Classifier I: Gradient-Based Directional Feature set Classifier II: Gradient-Based Wavelet Feature set Classifier III: 2D-CWT Ensemble Classifier

RR (%)

Re j R (%)

MR (%)

RE (%)

81.00

5.00

14.00

86.00

82.40

5.60

12.00

88.00

84.00 85.40

5.0 8.60

11.00 5.00

89.00 95.00

Fig. 5 shows the convergence diagram of the gating network training by genetic algorithm provided that the three ANNs have been well trained by three feature sets. Table 3 lists the face recognition rate, rejection rate, misrecognition rate, and reliability for three individual classifiers trained by three feature sets and its ensemble classifier system. From Table 3, it can be concluded that three individual classifiers have the similar recognition rate. However, the proposed ensemble classifier system shows the best recognition performance. The introduction of ensemble classifier system has significantly increased system’s reliability. 5.2. Face recognition conducted on video clips In the preparation of training face images, the frontal face of each subject is shot 30 times in front of camera with different illuminations, different distances and different face complexions in order to get enough face images to train the ANN classifiers offline. In the testing procedure, the subject is moving in front of camera. The frontal face of the subject is detected using our proposed method [35]. If the frontal face image of same subject is shot multiple times in one video, and if and only if the overall verification confidence value of the detected frontal face image is higher than that of the previous shot for the same subject, then the detected frontal face image will be recognized again. Furthermore, if the recognition confidence value is higher than that of the previous recognition for the same subject, then the recognition result will be updated. In the video-based face detection, 30 surveillance videos taken indoors and outdoors are used to test the face detection. Table 4 lists the number of subjects, average number of face images detected for each subject in the first five videos.

992

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

Table 4 Number of faces detected in the five videos. Video Video Video Video Video Video

I II III IV V

No. of testers

Average no. of face images detected for each tester

10 15 18 25 45

5 4 4 5 3

Fig. 7. Tradeoff among the recognition rate (RR), misrecognition rate (MR), and rejection rate (Re j R) with different ANN confidence value thresholds.

dence value tested on video V , the ANN classifier is trained by the hybrid 2D-CWT feature + (r , g , H , S , V ). All the face detection and recognition experiments were conducted on PC with processor speed 2.50 GHz, 4 GB memory. 6. Conclusions Fig. 6. Example images that fail to pass the verification modules. Table 5 Overall recognition reliability (%) on 30 video clips. Feature set

Recognition reliability based on 30 videos

Hybrid feature set A Hybrid feature B Hybrid feature C Ensemble classifier

86.50 85.10 88.88 93.30

In Fig. 6, some images are verified by skin module, however, the images cannot pass the symmetric module or (and) the eye template module. Experiments demonstrated that the frontal face detection rate can be achieved as high as 95% in the low quality video images. Compared to the benchmark face (object) detection method proposed by Viola and Jones [7], our proposed method can eliminate titled face images and non-face objects in the video clips. In such a way, only non-titled frontal face images are sent for face recognition. The comparative face recognition experiments have been conducted on three feature sets and the proposed ensemble classifier using the 30 video clips. The overall recognition reliability of the 30 video clips is shown in Table 5. Three hybrid feature sets are created using the three extracted feature sets combining with five color components (r , g , H , S , V ) in the face areas. It can be concluded that three ANNs trained by three hybrid feature sets (hybrid feature set A: gradient-based directional feature set + (r , g , H , S , V ); hybrid feature set B: gradient-based wavelet feature set + (r , g , H , S , V ) and hybrid feature set C: 2DCWT feature set + (r , g , H , S , V )) have similar recognition performance. However, ensemble classifier has demonstrated an excellent recognition rate. Fig. 7 shows the recognition rate (RR), misrecognition rate (MR), and rejection rate (Re j R) with different thresholds of ANN’s confi-

In this paper, a novel face detection and recognition system is presented. Three fast and efficient face detection verification modules have been used to detect facial area in the video clips or still images. A novel 2D-CWT face feature extraction scheme is proposed for face recognition. The proposed new ensemble classifier system congregates the outputs of three ANN classifiers, which leads to the higher overall face recognition rate and reliability at the same time. In the future, hybrid feature extraction and newly ensemble classifier system design will be a new research direction. For example, other biometric features can be combined with face features to boost system’s recognition rate and to decrease its error rate. In the ensemble classifier design, the complementary information will be used to train the ensemble classifier system. Acknowledgments Part of the gating network research was conducted at Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Canada. Author wishes to thank professors and colleagues at CENPARMI for their help. References [1] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: A literature survey, ACM Computing Surveys 35 (4) (2003) 399–458. [2] P.N. Bellhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using class specific linear projections, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (7) (1997) 711–720. [3] L. Wiskott, J.M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 775–779. [4] E.P. Vivek, N. Sudha, Robust Handsdorff distance measure for face recognition, Pattern Recognition 40 (2) (2007) 431–442. [5] P. Shih, C. Liu, Face detection using discriminating feature analysis and support vector machine, Pattern Recognition 39 (2) (2006) 260–276. [6] H.A. Rowley, S. Baluja, T. Kanade, Neural network-based face detection, IEEE Transactions Pattern Analysis and Machine Intelligence 20 (1) (1998) 23–28. [7] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Proc. Computer Vision and Pattern Recognition, 2001, pp. 511–518.

P. Zhang, X. Guo / Digital Signal Processing 22 (2012) 987–993

[8] H. Wu, Q. Chen, M. Yachida, Face detection from color images using a fuzzy pattern matching method, IEEE Transactions Pattern Analysis and Machine Intelligence 21 (6) (1999) 557–563. [9] C. Garcia, G. Tziritas, Face detection using quantized skin color regions, merging and wavelet packet analysis, IEEE Transactions on Multimedia 1 (3) (1999) 264–277. [10] P. Kakumanu, S. Makrogiannis, N. Bourbakis, A survey of skin-color modeling and detection methods, Pattern Recognition 40 (3) (1997) 1106–1122. [11] Z. Zheng, J. Yang, L. Yang, A robust method for eye features extraction on color image, Pattern Recognition Letters 26 (14) (2005) 2252–2261. [12] J. Huang, H. Wechsler, Eye detection using optimal wavelet packets and radial basis functions, International Journal of Pattern Recognition and Artificial Intelligence 13 (7) (1999) 1009–1025. [13] M. Lievin, F. Luthon, Nonliner color space and spatiotemporal MRF for hierarchical segmentation of face features in video, IEEE Transactions on Image Processing 13 (1) (2004) 63–71. [14] H. Wang, S.F. Chang, A highly efficient system for automatic face region detection in MPEG video, IEEE Transactions on Circuits and System for Video Technology 7 (4) (1997) 615–628. [15] R.C. Verma, C. Schmid, K. Mikolajczyk, Face detection and tracking in a video by propagating detection probabilities, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (10) (2003) 1215–1228. [16] H.J. Lee, Y.C. Tsao, Measurement of face recognizability for visual surveillance, in: Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2005, pp. 278–294. [17] X. Liu, T. Cheng, Video-based face recognition using adaptive hidden Markov models, in: Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2003, pp. 340–345. [18] P. Unsang, A.K. Jain, A. Ross, Face recognition in video: Adaptive fusion of multiple matchers, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007, pp. 1–8. [19] W.K. Pratt, Digital Image Processing, Wiley, New York, 1991. [20] P. Zhang, T.D. Bui, C.Y. Suen, A cascade ensemble classifier system for reliable recognition of handwritten digits, Pattern Recognition 40 (2) (2007) 3415– 3429. [21] S. Mallat, A Wavelet Tour of Signal Processing, second ed., Academic Press, 1999. [22] H. Tan, Y.J. Zhang, A novel weighted Hausdorff distance for face localization, Image and Vision Computing 24 (7) (2006) 656–662. [23] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: A convolutional neural-network approach, IEEE Transactions on Neural Networks 8 (1) (1997) 98–113. [24] D.W. Hansen, Q. Ji, A survey of models for eyes and gaze, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (3) (2010) 478–500. [25] R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, J. Zhang, Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,

[26] [27]

[28]

[29]

[30] [31] [32] [33] [34]

[35]

[36]

993

IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 319–336. C. Liu, D. Dai, Face recognition using dual-tree complex wavelet features, IEEE Transactions on Image Processing 18 (11) (2009) 2593–2599. S.C. Brubaker, J.X. Wu, J. Sun, M.D. Mullin, J.M. Rehg, On the design of cascades of boosted ensembles for face detection, International Journal of Computer Vision 77 (1–3) (2008) 65–86. J.W. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, S.Z. Li, Ensemble-based discriminant learning with boosting for face recognition, IEEE Transactions on Neural Networks 17 (1) (2006) 166–178. E.L. Chen, P.C. Chung, C.L. Chen, H.M. Tsai, C.I. Chang, An automatic diagnostic system for CT liver image classification, IEEE Transactions on Biomedical Engineering 45 (6) (1998) 783–794. C.K. Chui, Wavelets: A Mathematical Tool for Signal Analysis, SIAM, Philadelphia, 1997. N.G. Kingsbury, Image processing with complex wavelets, Philosophical Transactions of the Royal Society of London A 357 (1999) 2543–2560. C.C. Liu, D.Q. Dai, Face recognition using dual-tree complex wavelet features, IEEE Transactions on Image Processing 18 (2009) 2593–2599. I.W. Selesnick, R.G. Baraniuk, N.G. Kingsbury, The dual-tree complex wavelet transform, IEEE Signal Processing Magazine (2005) 123–151. P. Zhang, T.D. Bui, C.Y. Suen, Recognition of similar objects using 2-D waveletmultifractal feature extraction, in: Proceedings of 16th International Conference on Pattern Recognition, Quebec, Canada, 2002. P. Zhang, A video-based face detection and recognition system using cascade face verification modules, in: Proceedings of 37th IEEE Applied Imagery Pattern Recognition Workshop, 2008, pp. 1–8. Face Recognition Database, University of Essex, UK, http://cswww.essex.ac.uk/ mv/allfaces/index.html.

Ping Zhang (Ph.D.) is an Assistant Professor with the Dept. of Mathematical and Computer Science, Alcorn State University, USA. As a senior IEEE member, he has been working in academic field and industry for more than ten years. He has published more than 40 journal and international conference papers and book chapters. He has reviewed more than 60 research journal papers for Pattern Recognition, Pattern Recognition Letters, and IEEE Transactions on SMC, NN, etc. His research interests include Pattern Recognition, OCR, Image Processing and Computer Vision. Xi Guo, a Computer Science graduate student. She has published a few research papers. Her research interests include Pattern Recognition, Database Management Design and Intelligent Web Design.