SWF-SIFT Approach for Infrared Face Recognition

SWF-SIFT Approach for Infrared Face Recognition

TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll17/17llpp357-362 Volume 15, Number 3, June 2010 SWF-SIFT Approach for Infrared Face Recognition TAN ...

428KB Sizes 1 Downloads 143 Views

TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll17/17llpp357-362 Volume 15, Number 3, June 2010

SWF-SIFT Approach for Infrared Face Recognition TAN Chunlin (൪ҝॿ)1,**, WANG Hongqiao (ฅ‫܀‬஢)2,3, PEI Deli (ଂԄॆ)3 1. School of Aerospace, Harbin Institute of Technology, Harbin 150001, China; 2. Xi’an Research Institute of Hi-Tech, Xi’an 710025, China; 3. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China Abstract: The scale invariant feature transform (SIFT) feature descriptor is invariant to image scale and location, and is robust to affine transformations and changes in illumination, so it is a powerful descriptor used in many applications, such as object recognition, video tracking, and gesture recognition. However, in noisy and non-rigid object recognition applications, especially for infrared human face recognition, SIFT-based algorithms may mismatch many feature points. This paper presents a star-styled window filter-SIFT (SWF-SIFT) scheme to improve the infrared human face recognition performance by filtering out incorrect matches. Performance comparisons between the SIFT and SWF-SIFT algorithms show the advantages of the SWF-SIFT algorithm through tests using a typical infrared human face database. Key words: infrared image; human face recognition; scale invariant feature transform (SIFT); star-styled window filter (SWF)

Introduction There are various biometrics characteristics, which have already been shown to provide a secure means of authentication for security and access control applications, such as fingerprint[1], iris[2], ear[3], and palmprint[4] biometrics. Given its ability to extract distinctive key points that are invariant to location, scale, rotation, and its robustness to affine transformations and changes in illumination, scale invariant feature transform (SIFT)[5] is widely used for object detection, recognition, and tracking application. Tests by Mikolajczyk and Schmid[6] showed that the SIFT descriptor gave the best image matching rate and stability compared to other recognition algorithms. However, in noisy environments with non-rigid targets, the SIFT matching rate and recognition performance can be dramatically decreased[7]. As a Received: 2010-04-13; revised: 2010-04-26

** To whom correspondence should be addressed. E-mail: [email protected]; Tel: 86-10- 68744864

result, the SIFT descriptor has been most widely applied in computer vision and object recognition. Some interesting object recognition algorithms have been proposed based on SIFT. In general object recognition SIFT has proven effective and robust with SIFT descriptor recently introduced in various fields, such as offline handwritten Chinese character recognition[8] and different biometric traits including face recognition[9], ear recognition[10], fingerprint recognition[11], and multimodal biometrics[12]. Researchers have developed various improved SIFT methods in recent years, especially for face recognition. Although SIFT features have emerged as very powerful image descriptors, their use in the face analysis context has never been systematically investigated. Bicego et al.[13] studied the application of the SIFT approach in the context of face authentication to determine the real potential and applicability of the method. Geng and Jiang[14] analyzed the SIFT performance and deficiencies when applied to face recognition and proposed keypoint-preserving-SIFT (KPSIFT) which keeps all the initial keypoints as features and

358

partial-descriptor-SIFT (PDSIFT) where keypoints detected at large scale and near face boundaries are described by a partial descriptor. Tests show that their approaches achieved better performance than the original SIFT. Luo et al.[15] used the person-specific SIFT features and a simple non-statistical matching strategy combined with local and global similarity of key-points clusters to solve face recognition problems. Majumdar and Ward[16] proposed a discriminative ranking of SIFT features that can be used to prune the number of SIFT features for face recognition, with tests showing that the number of computations is reduced more than 4 fold with increases in the recognition accuracy. These algorithms or methods are used for common human face images, but effective methods based on the common SIFT approach are not available for infrared face images. As the infrared images are strong-noises and the human face is a non-rigid object, the SIFT algorithm cannot be directly applied to infrared human face recognition without changes. Glasses, head rotation, and facial expressions cause most current infrared human face recognition methods, such as the template matching, basic element matching, and wavelet analysis to have low matching rate[17]. The SIFT algorithm, independent component analysis (ICA), and fractal and the genetic algorithms[5,18,19] are all effective methods for infrared face recognition owing to their characters, such as robustness to partial occlusion and affine transformations, and local invariants. An improved SIFT descriptor is also expected to give improved infrared image recognition.

1 SIFT Algorithm The SIFT algorithm extracts constant local feature points from an image, which are invariant to many basic image transformations. The SIFT algorithm is given by four steps. Step 1 Finding the extreme points in the scale space. The algorithm gets a local feature using Gaussian linear transformation of the original image to get a set of images with different scales and then searches for the extreme points by comparing every point to points in the same space and neighbor space. The extreme points are extracted within the scale space {L( x, y, V )} , which is a point set obtained by applying different Gaussian linear transforms to the original

Tsinghua Science and Technology, June 2010, 15(3): 357-362

image I ( x, y ) , with a scale factor V . The Gaussian scale transform operator used by Lowe[20] was the difference of Gaussian (DoG) and the scale space is Gaussian difference scale space {D( x, y,V )}. Lowe[20] proved that the points extracted by this operator are invariant to scale transformations. The space can be formulated as follows. Gaussian scale space: L ( x , y , V ) G ( x, y , V ) I ( x, y ) (1) Gaussian difference scale space: D( x, y, V ) (G ( x, y, kV )  G ( x, y, V )) I ( x, y ) L ( x , y , kV )  L ( x , y , V )

(2)

where 2 2 2 1 (3) e ( x  y )/ 2V 2 2ʌV Each point in the scale space is compared to its 26 adjacent points (8 points in the same space and the other 9×2=18 points in its neighbor scale space) and points that have the maximum or minimum value are treated as extreme points, which are invariant to scale changes. Step 2 Extracting key points. Extreme points that are unstable and sensitive to noise are filtered out to leave extreme points that are treated as key points. The unstable extreme points are either (1) sensitive to noise or (2) at the edge of the local texture. Unstable points that are sensitive to noise usually have small values and are easily dominated by noise, so may not be detected. Points on the edge of the texture are sensitive to image transformations. Different extreme values may be extracted from the same location. This step gives only key points that are insensitive to noise and invariant to affine transformations. Step 3 Assigning direction parameters to the key points to quantize the description. Lowe[20] formulated this assignment using the norm and angle in Euclidian space, with the direction of key points used as normalized the gradient direction of the key point operator in the following step. Identical directions can be extracted after an image rotation. Step 4 Computing of key point descriptors. A 16×16 window is used for each key point with the gradient of every point in the window computed. Each 4×4 sub-window is treated as a unit and the weighted means of points in each sub-window are computed. The SIFT algorithm can extract stable feature points from an image which are invariant to scale changes,

G ( x, y , V )

359

TAN Chunlin (൪ҝॿ) et al.ġSWF-SIFT Approach for Infrared Face Recognition

rotation, affine transformation, noise, and changes in light conditions. However, the extraction of key points is confined to scale spaces and only makes the comparisons with neighboring points, so it does not take the original image into account. The SIFT algorithm is improved by extracting information from the original image and searching for extreme points in a larger space, without being limited by their neighbor points. The improved algorithm also filters out strong noise and deals with non-rigid transformations, which can be used for face recognition in infrared images.

2 SWF-SIFT Algorithm SIFT has been very successful for optical images, so it is used here for face recognition in infrared images by improving the original algorithm and adding some new information. First, the match information for the key points descriptor (star-styled window filter (SWF) information) is added to the original SIFT algorithm. Then, an object recognition algorithm (SWF discrimination) is introduced to the SIFT algorithm to improve

face recognition in infrared images. 2.1

SWF

The SIFT algorithm searches for key points and lists the candidates of matched pairs of points and then computes the key point descriptors and quantizes the candidate points. SWF is proposed to filter out unstable key points in infrared images and to add information for face recognition. Although the key points extracted by SIFT make full use of the information of their neighboring points in the scale space, the information in the original image and the other points are neglected. Figure 1 shows the matching result of a pair of infrared human faces, in which the wrong matches are labeled by red lines. The result shows that the wrongly matched pairs have different textures. Thus, information of neighboring points in the scale space can be combined with the texture information of their neighboring points in the original image to detect the wrong matches in the infrared image.

Fig. 1 SIFT match results of a pair of candidate infrared human faces. The left face is the front side while the right face is turned right about 15 degrees.

The shape and size of the filter window are crucial to the algorithm performance. SWF-SIFT adopts a starstyled window, shown in Fig. 2, to keep the SIFT invariant to rotation. The directions of the points in the windows are the same as the SIFT descriptor histogram: 0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315° and the

window size is (2N+1)×(2N+1), N=2, 3, 4, Ă, depending on the image resolution. In this paper N was set to 2, 3, and 4. The mean of the points in the star-styled window is called the SWF information, which can be formulated by 2 N 1

SWF[ I ( x, y )] mean[ ¦ I ( x  i, y )  i 1

I ( x  i , y )  I ( x, y  i )  I ( x, y  i )  I ( x  i , y  i )  I ( x  i, y  i )  I ( x  i, y  i)  I ( x  i, y  i )] (4) Fig. 2 General shape of star-styled window

where I ( x, y ) are the key points. SIFT computes a 128-dimensional vector for each key point, named the

Tsinghua Science and Technology, June 2010, 15(3): 357-362

360

JJJJJJJJJJJJJJJJJJG SIFT information ( SIFT[ D ( x, y , V )] ), which can be used for face recognition in infrared images.

2.2

SWF discrimination

In the SIFT algorithm matching stage, the Euclidian distance is computed between key point descriptors, and the points with minimum distance are selected as matching point candidates. Then some matching point candidates are filtered out based on the SWF information ( SWF[ I ( x, y )] ). If the difference in SWF[ I ( x, y )] between a pair of matching point candidates is larger than the threshold, they are discarded because the difference of texture between these two points is too large for them to belong to the same place. The threshold is related to the image itself. Higher definition images can use a smaller threshold to reflect the differences in texture. This analysis used a normalized threshold of 0.17 with the following discrimination rules: ­- 0.17, accept; SWF[ I ( x1 , y1 )]  SWF[ I ( x2 , y2 )] ® (5) ¯! 0.17, reject

Fig. 3

3

Tests and Results

This comparison between the SIFT[20] and the SWFSIFT algorithm is based on the facial infrared database provided by Terravic Research Corporation[21], which contains 20 people’s infrared data, including images with small-angle rotations, images with and without glasses, and images with and without hats. The images also have obvious illumination differences. Therefore, the database can be used to thoroughly evaluate the performance of infrared human face recognition algorithms. 102 face images were selected as the test images from 20 people’s face images. According to the Lowe’s suggestion, the rotation angles to the left and right were all 15嘙, which can adapt to the 30嘙 viewfinder of 3-D photographing. The SWF-SIFT result corresponding to Fig. 1 shown in Fig. 3 shows that the SWF-SIFT algorithm filtered out most of the mismatching points.

SWF-SIFT matching result for the same pair of candidate infrared human faces shown in Fig. 1

Current performance evaluation methods for machine recognition algorithms include receiver operating characteristics (ROC ) and precision recall (PR). Ke [7] and Sukthankar suggested that ROC is more suitable for classification performance evaluation, and the PR is more suitable for recognition. Since the main goal of the SWF-SIFT algorithm is filtering out mismatched points in infrared face recognition, and the objective of this study is to improve the algorithm’s accuracy, PR was used to evaluate the two algorithms. The definition of PR can be formulated as tp recall (6) tp+fn fp 1  precision= (7) tp+fp where, tp is the number of correct matches, fp is

the number of mismatches, and fn is the number of correctly matched points which the algorithm could not recognize. tp+fn is the number of key points in the two matching images, which is called the base numbered key points, and tp+fp represents the total numrecall describes the ber of matches. The ratio 1  precision algorithm’s performance, with a larger value indicating recall is calculated a better matching result. 1  precision by PR

recall tp = Ž(tp+fp) 1  precision fpŽ(tp+fn)

The simulation results are shown in Table 1.

(8)

361

TAN Chunlin (൪ҝॿ) et al.ġSWF-SIFT Approach for Infrared Face Recognition Table 1 Algorithm

Recall-precision of the SIFT and SWF-SIFT algorithms

Window size

Total matched points

Mismatched points

PR

59.337 84

6.945 946

2.504 19

5×5

38.783 78

2.013 514

3.962 75

7×7

35.797 30

2.027 027

3.336 78

9×9

33.797 30

1.635 135

3.719 42

SIFT SWF-SIFT

The data in Table 1 shows that the SWF-SIFT algorithm has less total matches and less mismatches than the SIFT algorithm, but PR is significantly better. Thus, the SWF-SIFT algorithm improves performance by reducing the total number of matches. The reduced total number of matched points includes erroneously matched points, as well as the correctly matched points. For example, using the 5×5 window, the average filtered total number of matched points is 20.5 and the average number of filtered mismatched points is 4.4, so the average number of filtered correctly matched points is 16.1. Although there are more correctly filtered points than the erroneously matched points, the total number of matched points is large enough so that the PR for the SWF-SIFT is 3.962 75, which is much larger than the PR for the SIFT algorithm. Therefore, the actual expense of SWF-SIFT is worthwhile. The other two window size results also show this performance improvement.

4

Conclusions

A SWF-SIFT algorithm was developed to solve mismatching problems encountered in infrared human face recognition by filtering out mismatched points generated by the SIFT algorithm with a star-window filter. Comparison of the SWT-SIFT and SIFT algorithm shows that the SWF-SIFT algorithm can effectively filter SIFT’s mismatched points to improve the recognition performance. References

actions on Pattern Analysis and Machine Intelligence, 2009, 31(6): 1032-1047. [5] Lowe D G. Object recognition from local scale-invariant features. In: Proc. of the International Conference on Computer Vision. Corfu, Greece, 1999: 1150-1157. [6] Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. In: Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, USA, 2003. [7] Ke Y, Sukthankar R. PCA-SIFT: A more distinctive representation for local image descriptors. In: Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington DC, USA, 2004. [8] Zhang Z Y, Jin L W, Ding K, et al. Character-SIFT: A novel feature for offline handwritten Chinese character recognition. In: Proc. of International Conference on Document Analysis and Recognition. Barcelona, Spain, 2009: 763-767. [9] Kisku D R, Tistarelli M, Sing J K, et al. Face recognition by fusion of local and global matching scores using DS theory: An evaluation with uni-classifier and multi-classifier paradigm. In: Proc. of IEEE Computer Vision and Pattern Recognition (CVPR) Workshop on Biometrics. Miami, USA, 2009. [10] Kisku D R, Mehrotra H, Gupta P, et al. SIFT-based ear recognition by fusion of detected keypoints from color similarity slice regions. In: Proc. of International Conference on Advances in Computational Tools for Engineering Applications. Notre Dame, Lebanon, 2009: 380-385. [11] Park U, Pankanti S, Jain A K. Fingerprint verification using SIFT features. In: Proc. of SPIE Defense and Security Symposium. Kissimmee, 2008.

[1] Jain A K, Hong L, Bolle R. On-line fingerprint verification.

[12] Rattani A, Kisku D R, Bicego M, et al. Feature level fusion

IEEE Transactions on Pattern Analysis and Machine Intel-

of face and fingerprint biometrics. In: Proc. of the 1st IEEE

ligence, 1997, 19(4): 302-314.

International Conference Biometrics: Theory, Applications

[2] Daugman J. How iris recognition works. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(1): 21-30. [3] Nanni L, Lumini A. A multi-matcher for ear authentication. Pattern Recognition Letters, 2007, 28(16): 2219-2226. [4] Jain A K, Feng J. Latent palmprint matching. IEEE Trans-

and Systems. Washington DC, USA, 2007: 1-6. [13] Bicego M, Lagorio A, Grosso E, et al. On the use of SIFT features for face authentication. In: Proc. of Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06). New York, USA, 2006. [14] Geng C, Jiang X D. SIFT features for face recognition. In:

362

Tsinghua Science and Technology, June 2010, 15(3): 357-362

Proc. of the 2nd IEEE International Conference on Com-

Vision Beyond the Visible Spectrum: Methods and Appli-

puter Science and Information Technology. Beijing, China,

cations. Hilton Head Island, 2000: 5-14.

2009: 598-602. [15] Luo J, Ma Y, Takikawa E, et al. Person-specific SIFT features for face recognition. In: Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP). Honolulu, USA, 2007. [16] Majumdar A, Ward R K. Discriminative SIFT features for

[18] Ding P L, Mei J F, Zhang L M. Research of automatic face recognition based on ICA. Journal of Infrared and Millimeter Waves, 2001, 20(5): 361-364. (in Chinese) [19] Chen G, Qi F H. Face recognition based on fractal and genetic algorithms. Journal of Infrared and Millimeter Waves, 2000, 19(5): 371-376. (in Chinese)

face recognition. In: Proc. of Canadian Conference on

[20] Lowe D G. Distinctive image features from scale-invariant

Electrical and Computer Engineering (CCECE’09). New-

keypoints. International Journal of Computer Vision, 2004,

foundland, Canada, 2009. [17] Prokoski F. History, current status, and future of infrared identification. In: Proc. of IEEE Workshop on Computer

60(2): 91-110. [21] Terravic Research Facial IR Database. Available: http: //www.terravic.com/research/facial.htm. 2010.