Fused intra-bimodal face verification approach based on Scale-Invariant Feature Transform and a vocabulary tree

Pattern Recognition Letters 36 (2014) 254–260 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.c...

Download PDF

861KB Sizes 0 Downloads 40 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 36 (2014) 254–260

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Fused intra-bimodal face veriﬁcation approach based on Scale-Invariant Feature Transform and a vocabulary tree Carlos M. Travieso ⇑, Marcos del Pozo-Baños, Jesús B. Alonso Signals and Communications Department, Institute for Technological Development and Innovation in Communications (IDeTIC), University of Las Palmas de Gran Canaria, Las Palmas de Gran, Canaria, Spain

a r t i c l e

i n f o

Article history: Available online 30 August 2013 Communicated by Luis Gomez Deniz Keywords: Bimodal interaction Information fusion Visible and thermal face veriﬁcation Face detection SIFT parameters Vocabulary tree

a b s t r a c t This work studies the intra-bimodal face-based biometric fusion approach composed of the thermal and spatial domains. The distinctive feature of this work is the use of a single camera with two sensors which returns a unique image with thermal and visual images at a time, as opposed to the-state-of-the-art, for example the multibiometric modalities and hyperspectral images. The proposed system represents a practical bimodal approach for real applications. It is composed by a veriﬁcation architecture based on the Scale-Invariant Feature Transform algorithm (SIFT) with a vocabulary tree, providing a scheme that scales efﬁciently to a large number of features. The image database consists of front-view thermal and visual image as a single image, which contain facial temperature distributions of 41 different individuals in 2-dimensional format and 18 images per subject, acquired on three different-day sessions. Results showed that visible images gives better accuracy than thermal information, and with independency of range, head images give the most discriminative information. Besides, fusion approaches reached better accuracy, up to 99.45% for score fusion and 100% for decision fusion. This shows the independency of information between visual and thermal images and the robustness of bimodal interaction. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction The usage of different biometric systems on security applications has become more and more common nowadays. The reason is a series of advantages versus other methods such as carrying magnetic cards or reminding passports or PIN numbers, which can be forgotten or used by non authorized persons. Identiﬁcation systems based on human body measures are well accepted and perceived naturally by both men and women. Therefore, biometric identiﬁcation methods are achieving outstanding results and truthfulness in the security market. Human recognition through distinctive facial features supported by an image database is still been studied. Note that this problem presents various difﬁculties. What will occur if the individual’s haircut is changed? Is make-up a determining factor in the process of veriﬁcation? Would it distort the facial features signiﬁcantly? The usage of thermal cameras, originally conceived for military purposes, has expanded to other ﬁelds of application such as control process in production lines, detection/monitoring of ﬁre and even security and anti-terrorism applications. Therefore, its use in human identiﬁcation tasks, in scenarios where the lack of light ⇑ Corresponding author. E-mail addresses: [email protected] (C.M. Travieso), [email protected] (M.d. Pozo-Baños), [email protected] (J.B. Alonso). 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.08.016

restricts the operation of conventional cameras, can also be considered. Thermal cameras can also be a great tool against look variations, which in some cases could be quite extreme. Different looks of the main role from the ﬁlm The Saint are shown in Fig. 1. Val Kilmer modiﬁes his look in this ﬁlm spectacularly in order to not to be recognized by the enemy. A correct matching between the test face and that stored in the image database is expected, and this is a hard task to solve even if natural distortion effects such as illumination changes or interference are not considered. The recognition problem should be split in three stages, that is, acquisition of facial images for testing, features extraction from speciﬁc facial regions and ﬁnally, veriﬁcation of the individual’s identity (Soon-Won et al., 2007). Currently, computational face analysis is a very lively research ﬁeld, in which new interesting possibilities are being studied. For example, there are approaches aiming to improve a system’s performance when working with low resolution images (LR) and decreasing computational load. In Huang and He (2011), a facial recognition system was presented, which works with LR images using nonlinear mappings to infer coherent features that favor higher accuracy of the nearest neighbor (NN) classiﬁers for the recognition of a single LR face image. It is also interesting to cite the approach of Imtiaz and Fattah (2011), in which a multi-resolution feature extraction algorithm for face recognition, based on two-dimensional discrete

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

Fig. 1. Facial changes of the character played by Val Kilmer in the ﬁlm The Saint.

wavelet transform (2D-DWT), was proposed. Such method exploits local spatial variations in a face image effectively, obtaining outstanding results with 2 different databases. Images from subjects are often taken in different poses or with different modalities, such as thermographic images, presenting different stages of difﬁculty in their identiﬁcation. In Socolinsky and Selinger (2004), results on the use of thermal infrared and visible imagery for face recognition in operational scenarios were presented. These results showed that thermal face recognition performance is stable over multiple sessions in outdoor scenarios, and that fusion of modalities increases performance. In the same year 2004, L. Jiang proposed in Jiang et al. (2004) an automated thermal imaging system that is able to discriminate frontal from non-frontal face views with the assumption that at any time, there is only 1 person in the ﬁeld of view of the camera and no other heat-emitting objects are present. In this approach, the distance from centroid (DFC) shows its suitability for comparing the degree of symmetry of the lower face outline. The use of correlation ﬁlters in Heo et al. (2005) showed its adequacy for face recognition tasks using thermal infrared (IR) face images due to the invariance of this type of images to visible illumination variations. The results with Minimum Average Correlation Energy (MACE) ﬁlters and Optimum Trade-off Synthetic Discriminant Function (OTSDF) in LR images (20 20 pixels) prove their efﬁciency in Human Identiﬁcation at a Distance (HID). Scale Invariant Feature Transform (SIFT) algorithm Lowe, 1999 is widely used in object recognition. In Soyel and Demirel (2011), SIFT appeared as a suitable method to enhance the recognition of facial expressions under varying poses over 2D images. The usage of afﬁne transformation consistency between two faces to discard SIFT mismatches has been demonstrated. Gender recognition is another lively research ﬁeld working with SIFT algorithm. In Jian-Gang et al. (2010), faces were represented in terms of dense-Scale Invariant Feature Transform (d-SIFT) and shape. Instead of extracting descriptors around interest points only, local feature descriptors were extracted at regular image grid points, allowing dense descriptions of face images. However, SIFT usually generates a large number of features from an image. This huge computational effort associated with feature matching limits its application to face recognition. To overcome this problem, Majumdar and Ward (2009) proposed the usage of a discriminating method. Computational complexity was reduced more than 4 times and accuracy increased in 1.00% on average by checking irrelevant features. Another interesting idea is the building method, which is well scaled with the size of a database and allow ﬁnding one element of a large number of objects in acceptable time. This work is inspired by Nister and Stewenius (2006), where object recognition by a k-means vocabulary tree was presented. Efﬁciency was proved by a live demonstration that recognized CD-covers from a database of 40,000 images. The vocabulary tree showed good results when a large number of distinctive descriptors form a large vocabulary. Many different approaches to this solution have been developed in the last few years (Ober et al., 2007) and (Slobodan, 2008), showing its competency organizing several objects. Based on this

255

good results, this solution will be tested in this paper with SIFT descriptors in a vocabulary tree. In addition, references using two different images in different ranges; visible and infrared-thermal, can be found in the stateof-the-art. In Buyssens and Revenu (2010), the authors used a PCA and Sparce analysis before applying a fusion module. This reaches mean recognition rates between 95% and 99% after the fusion process for 63 users. In Bhowmik et al. (2012), the system fuses the thermal and visible images; captured by two individual sensors, in a unique image. Using 70% of the visual image and 30% of the thermal image and classifying with a SVM, the system reaches up to 97.28% on an identiﬁcation approach. Other lines of research, multimodal biometric systems have been focused. For example, Almohammad et al. (2012) was based on the fusion of face and gait biometrics, Tong et al. (2010) was based on face and ﬁngerprint biometric fusion, Javadtalab et al. (2011) on the fusion of face and ear recognition, and Raghavendra (2012) on the feature level fusion of face and palmprint biometrics. This kind of multimodal approaches usually need longer times or uncomfortable devices from the user and application points of view. Finally, different references based on Multispectral Face Recognition and Multimodal Score Fusion can also be found. Zheng and Elmaghraby (2011) and Bourlai et al. (2012) are two examples, where authors used different sensors and cameras; or one camera for a certain range by bandpass ﬁlters or for broadband. In this context, the aim of the present work is to propose, innovate and evaluate on the ﬁeld of bimodal face biometrics, for visible and thermal ranges. The proposed method could be used in a real application with the convenience of a unique device for fast tracking and the advantages of the fusion of bimodal information. All this gives it an added value versus the state-of-the-art. In addition, a study to search the main source of information is also included here. In particular, the system applies the SIFT algorithm and obtains local distinctive descriptors from each image based on Crespo et al. (2012). The construction of the vocabulary tree enables to have these descriptors hierarchically organized and ready to carry out a search to ﬁnd a speciﬁc object. For each test image, only its new descriptors are calculated and used to search through the hierarchical tree in order to build a vote matrix, in which the most similar image of the database can be easily identiﬁed. This approach mixes the singularity of the SIFT descriptors to perform reliable matching between different views of a visual and thermal face, and the efﬁciency of the vocabulary tree for building a high discriminative vocabulary. A more detailed description of the system is provided in the next subsections. This paper is organized as follows. The proposed approach is presented in Section 2. In Section 3, the experimental settings, results and discussions are shown. Finally, conclusions are given in Section 4. 2. Approach proposed The innovation upon this work focuses in the implementation of fused bimodal face veriﬁcation approach implemented by a unique device, which gives an image from two sensors for visible and thermal ranges respectively. Finally, a bimodal veriﬁcation approach is implemented using SIFT descriptors as feature extraction and a vocabulary tree through the use of the k-means function as a classiﬁcation system. The score and decision fusions for both ranges have been applied. Besides, localized discriminative information has been searched between ranges and its fusions, and between regions of interest (head versus face). This work is a novel study and opens a door for an application on real conditions. In this section, the whole approach and each of its parts will be explained.

256

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

2.1. General description The proposed approach is composed by ﬁve stages: preprocessing module, SIFT descriptors calculator, vocabulary tree construction, matching module and fusion module. While face segmentation is executed manually, the matching module searches in the vocabulary tree the best correspondence between the test descriptors and those of the database. Therefore, the forthcoming explanation is ﬁrst focused on the SIFT parameters and tree classiﬁcation, and a brief description of the matching module is given afterwards. A block diagram of the system is shown in Fig. 2. 2.2. Pre-processing In this step, natural (visible range) and thermal images are extracted and isolated from the unique image supplied by the device used. Then, the system detects human faces from the natural image. Though many face detectors are available in the literature, a frontal face detector, similar to the Viola-Jones cascade detector (Viola and Jones, 2004), has been used here due to its simplicity, speed and effectiveness. 2.3. Feature extraction: scale-invariant feature transform The use of SIFT descriptors is applied in the majority of the results achieved by D. Lowe in Lowe (2004) as a guideline, and only determinant parameters are modiﬁed in order to adapt the algorithm to the system. Keypoints are detected using a cascade ﬁltering, searching for stable features across all possible scales. The scale space of an image, L(x,y,r) is produced from the convolution of a variable-scale Gaussian, G(x,y,r) with an input image, I(x,y);

Lðx; y; rÞ ¼ Gðx; y; rÞ Iðx; yÞ where

⁄

ð1Þ

is the convolution operation in x and y, and

Gðx; y; rÞ ¼

ðx2 þy2 Þ 1 e 2r2 2pr2

ð2Þ

Following (Lowe, 2004), scale-space in the Difference-of-Gaussian function (DoG) convolved with the image, D(x,y,r) can be computed as a difference of two nearby scales separated by a constant factor k:

Dðx; y; rÞ ¼ ðGðx; y; krÞ Gðx; y; rÞÞ Iðx; yÞ ¼ Lðx; y; krÞ Lðx; y; rÞ

ð3Þ

From MMikolajczyk (2002), it is stated that the maxima and minima of the scale-normalized Laplacian of Gaussian (LGN), r2r2G produce the most stable image features in comparison with other functions, such as the gradient or Hessian. The relationship between D and r2r2G is:

Gðx; y; krÞ Gðx; y; rÞ ðk 1Þr2 r2 G

Visible and Thermal image database (head and face)

ð4Þ

SIFT descriptors calculator

The factor (k 1) is a constant over all scales and does not inﬂuence strong location. A signiﬁcant difference in scales has been chosen, k = 21/2, which has almost no impact on the stability and the initial value of r = 1.6 provides close to optimal repeatability according to Lowe (2004). After having located accurate keypoints and removed strong edge responses of the DoG function, orientation is assigned. There are two important parameters for varying the complexity of the descriptor: the number of orientations and the number of the array of orientation histograms. Throughout the present work, a 4 4 array of histograms with 8 orientations is used, resulting in characteristic vectors with 128 dimensions. The results in Lowe (2004) support the use of these parameters for object recognition purposes since larger descriptors have been found more sensitive to distortion. 2.4. Classiﬁcation system: vocabulary tree The veriﬁcation scheme used in this paper is based on Nister and Stewenius (2006). Once the SIFT descriptors are extracted from the image database, it is time to organize them in a vocabulary tree. A hierarchically veriﬁcation scheme allows to search selectively for a speciﬁc node in the vocabulary tree, decreasing search time and computational effort. The k-means algorithm is used in the initial point cloud of descriptors to ﬁnd centroids through the minimum distance estimation, so that a centroid represents a cluster of points. The kmeans algorithm is applied iteratively, since the calculation of the centroid location can vary the associated points. The algorithm converges if centroids location does not vary. Each tree level represents a node division of the nearby superior stage. After some experimentation, the initial number of clusters was deﬁned as 10, with 5 tree levels. These values have shown good results, working with the actual database. A model of a vocabulary tree with 2 levels and 3 initial clusters is shown in Fig. 3. 2.5. Fusion module This block uses the correlation between errors on the different approaches (face and head information and visible and thermal ranges) in order to improve the global accuracy. The system applied two different strategies for fusion based on score and decision (Yang et al., 2003). For score-based fusion, this work has implemented the sum and product rules of thermal and visual normalized scores, as in Fuertes et al. (2010). For the decision-based fusion, the rules OR (Fuertes et al., 2010) and weights (Yang et al., 2011) where implemented. OR function applies an OR logical function on the decision of each classiﬁer. The result of applying this function corresponds to the ﬁnal decision. If the method is based on weights, the criterion makes use of a priori information; the efﬁciency of each classiﬁer. A higher accuracy and security sys-

Vocabulary tree construction

Matching Module

Test Image

Image segmentation

Fusion Module

SIFT descriptors calculator

Fig. 2. Diagram of the proposed thermal face recognition system.

Decision

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

257

gions of interests. In particular, heads and faces are the groups created. Summing up, images were divided into categories depending on the type of information they provide, resulting in a total of 2952 images:

Heads: Thermal images of full heads of subjects (738 images). Heads: Visible images of full heads of subjects (738 images). Faces: Thermal images of facial details. (738 images). Faces: Visible images of facial details. (738 images).

The following ﬁgures present some examples of thermal and visible images of heads (in Fig. 5) and faces (in Fig. 6) with the speciﬁed format. Images were taken indoors with different facial expressions such as happiness, sadness or anger, various facial orientations and distinctive changes in the haircut or facial hair. The set of head images collects interesting details for recognition tasks, such as ear shape, haircut and chin. On the other hand, the set of facial images provides the minimum information, i.e. nose, mouth and eyes areas. 3.2. Experimental methodology Fig. 3. Two levels of a vocabulary tree with branch factor 3.

tem is conformed when these methods of fusion are implemented (Fuertes et al., 2010). 3. Experimental settings 3.1. Databases used Authors have built an image database in order to develop this work. This database contains 738 images of 704 756 pixels and 24 bit per pixel. Images were taken using a SAT-S280 SATIR camera which contains two sensors; one of them is a thermal sensor and another is a visible camera. An example of that image is observed on Fig. 4. Such database is composed by 41 subjects, with 18 images per subject. Images were acquired in 3 different sessions extended along 6 months, 6 images recorded per session. The captured images were divided in two parts (visible and thermal). Thus, the ﬁnal database contains a total of 1476 images, 738 visible face images and 738 thermal facial images. Note that false thermal color is given by the sensor according to characteristics of each person. All images have been stored in PNG format. A further segmentation process is applied as a result of the selection of re-

Fig. 4. Example of a unique image with visible and thermal ranges.

The aim of the experiments was to ﬁnd how important is the extra information provided by head shape for human veriﬁcation versus face information for thermal and visible ranges, in addition to the effects of fusion. The proposed methodology compares the performance of thermal and/or visible ranges using faces and heads. Therefore, four experiments were done, one for each isolated modality, varying the range (visible and thermal) and the type of information (head and face). Besides, eight experiments more were performed, varying the type of fusion (sum rule for score fusion, product rule for score fusion, function OR for decision fusion, and weights for decision fusion). Summing up, the results of these twelve experiments were used to get the conclusions of this work. In order to assure the independence of results, both sets of images were equally divided into 2 subsets: test and training, under a 50% hold-out cross validation methodology (Arlot, 2010). For each modality, 369 test images and 369 training images were available for the experiments. For each subject, an equally random division of the image database is made so that 9 images per individual are used for testing and the remaining 9 for training purposes (50% hold-out validation method). As previously commented, 369 test images and 369 training images randomly chosen are available for the experiments in each modality. This division is carried out 41 times, i.e. subject by subject in 41 iterations. The process of face/head veriﬁcation for a subject was the following. Firstly, the previously stated division of the database was made. Secondly, each of the 9 images of the test subject was compared with the 369 training images, obtaining the corresponding results. Once these 9 images were processed, the database was joined together again and the process restarted with the next subject until the 41 subjects of the database were processed. The parameters that took part in the experiments were the False Rejection Rate (FRR), False Acceptance Rate (FAR) and Equal Error Rate (EER), commonly used in biometric studies. Mean processing times were also recorded. Such parameters were collected in form of vectors depending on a variable, the histogram threshold. Once the veriﬁcation process ﬁnished, a histogram with the contributions of each image from the database was obtained. The image that best ﬁt the test image shows the biggest value in the histogram. In a second stage, histogram values were normalized

258

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

Fig. 5. 6 Thermal and visible head images of the database. The examples show additional facial features such as head shape, hair and chin.

Fig. 6. 6 Thermal and visible face images of the database from the same subjects in Fig. 4. The examples only show basic facial features such as eyes, lips and nose, representing the minimum information needed to verify a subject in the system.

Table 1 All accuracies from the experiments applied for thermal, visual ranges and their fusions. Type of experiment

Without fusion: visual range Without fusion: thermal range Score fusion: sum Score fusion: product Decision fusion: OR Decision fusion: weights

Type of information

Table 2 Average computational times of head and face images veriﬁcation during the experiment for thermal and visible ranges, in seconds. Mean times in seconds

Head (%)

Face (%)

Type of experiment

Model building

Test veriﬁcation

99.05 97.60 99.15 99.45 100 100

97.65 88.20 98.15 97.65 100 100

Visible head Visible face Thermal head Thermal face Score fusion for head (add rule) Score fusion for head (product rule) Score fusion for face (add rule) Score fusion for face(product rule) Decision fusion for head (add rule) Decision fusion for head (product rule) Decision fusion for face (OR rule) Decision fusion for face(weight rule)

283.47 135.08 121.56 102.55 408.17 419.47 244.68 231.99 401.01 414.04 235.25 237.06

0.49 0.28 0.26 0.26 0.36 0.37 0.26 0.25 0.37 0.37 0.26 0.26

with regard to the biggest value, from 1 to 1. A threshold was then applied in order to consider only the contributions of those images that are above it; i.e. those below were discarded in that moment. The histogram threshold descends from value 1 to 1 in order to consider different samples each time.

3.3. Results According to the experimental methodology, twelve experiments were done. Table 1 shows accuracies calculated under hold-out validation for a veriﬁcation approach. Table 2 shows the average computational time. Although the veriﬁcation time remains the same, the database updating time (model building time) with head images is substantially higher as these images possess more information than facial images and therefore require a greater computational effort. All experiments were done using MATLAB on a computer with a 2.66-GHz CPU, and 2 GB RAM. In Fig. 7, FRR and FAR are shown in relation to the histogram threshold. X-axis represents the threshold variation and Y-axis shows FRR and FAR values, for the best approaches. These are ROC curves for visual head images versus score and decision fusions, respectively. In Fig. 7 we can see that the response of FRR

curve is typical and it obtains its answer with thresholds between 0 and 1. But the response of FAR curve is ﬂat, as it needs very high values to reach its typical shape. Of course, this is a good characteristic of the proposal, as it allows ﬁnding a better EER point. In practical terms, the threshold fall represents how the system becomes less demanding, taking more samples in account, increasing the FRR and FAR, since the additional samples do not belong to the test subject. For isolated modalities, the best result obtained in experiments with thermal head images is 97.60% in relation to 88.20% in thermal face veriﬁcation. And the best result for visual head images is 99.05% in relation to 97.65% in visible face veriﬁcation. Therefore, the accuracy rate with head images is higher in comparison with facial images in both cases. Besides, it can be concluded that the visible range has more discriminate information than the thermal range. Regarding the fusion experiments, head images provided again better results than face images. For both types of fusions, the accuracy is improved, reaching up to 100% for decision fusion. For score-based fusion, 99.45% was reached with the product rule.

259

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

(a)

FRR & FAR: visual head image vs score fusion for head with product rule

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.00 0.85 0.70 0.55 0.40 0.25 0.10 -0.05 -0.20 -0.35 -0.50 -0.65 -0.80 -0.95

FRR FAR FRRfusion FARfusion

Verification Threshold

(b)

FRR & FAR: visual head image vs decision fusion for head

1.0 FRR

0.8 FAR

0.6

FRRfusion

0.4

FARfusion

0.2 0.0 1.00 0.85 0.70 0.55 0.40 0.25 0.10 -0.05 -0.20 -0.35 -0.50 -0.65 -0.80 -0.95

Verification Threshold Fig. 7. FRR (blue and green lines) and FAR (red and purple lines) in terms of the histogram threshold in (a), visible head veriﬁcation, and in (b), thermal head veriﬁcation. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.).

On the isolated approaches shown the EER presents a larger value compared to the fused systems. On fusion the FRR is ﬂat for negative thresholds and the EER is subsequently lower. So the previous accuracy can be reached.

3.4. Discussions Four isolated modalities (thermal and visible ranges of face and head images) have been compared in this work under a veriﬁcation problem, with the aim of studying the amount of information provided by each format. The results showed that, when compared to face images, head images preserve important discriminative characteristics in both visible and thermal images to identify a subject. On the other hand, it becomes clear that in case of head images, more SIFT descriptors are produced, and therefore more essential data for the veriﬁcation process is extracted. Additionally, faces of different subjects have often common features that provide no discriminative information. This effect became particularly notorious for thermal images. On the visible range, a similar accuracy was found; only 2% of decay between head and facial images. In addition, visible images reached better accuracies, with a difference of 2% for head images and 9% for facial images. Therefore, for the camera used and relying on SIFT keypoints; the visible range may be a better approach. SIFT parameterization is adapted better for visual than thermal range, because visual range presents more clear details. The veriﬁcation quality was evaluated through a series of independent experiments with various results, showing the power of the system, which satisfactorily veriﬁed the identity of the database subjects, overcoming limitations such as dependency on illumination conditions and facial expressions. A comparison between head and face veriﬁcation was made for both ranges. Such approach has reached accuracy rates of 97.60% in thermal head images and 88.20% in thermal face veriﬁcation. On the visible

range, 99.05% was achieved with head images and 97.65% in face veriﬁcation. Therefore, visible range gave better accuracies than thermal range, and with independency of range, head images provided the most discriminate information. Regarding the fusion approach, the potential of both ranges has been integrated, increasing the system’s performance in all the cases when comparing with isolated biometrics. After experiments, it can be observed that the fusion methods applied improve the accuracy rate. This means that the source of information (head, face and ranges) is not correlated from the error point of view and those errors can be corrected by using score or decision information. Besides, the parameterization method was the same for all these modalities, which enforce the independency of information theory. The usage of fusion methods builds a whole approach, which increases the accuracy rate, showing its robustness. Ratios between 99.45% and 100% were reached for score and decision fusions, respectively. The product rule reported the best success on the score-based fusion, while both cases reached 100% of accuracy on decision based fusion. Thus, the use of bimodal information seems to be the best option under the conditions studied in this work.

4. Conclusions The main contribution of this work resides in the usage of a unique device, providing a single image with both visual and thermal information. A full study of each of these modalities has been provided. In addition, a comparison between head and facial veriﬁcation systems was also included. All systems were based on SIFT descriptors with a vocabulary tree and a fusion module applied on thermal and visible range. The following conclusions have been found. First, head images for thermal and visible ranges are more discriminative than only facial images. In addition, the visible range is more discriminative

260

C.M. Travieso et al. / Pattern Recognition Letters 36 (2014) 254–260

than the thermal range. Finally, the fusion of both ranges improves the isolated biometric. As a future work, it is desired to increase considerably the size of the database, including outdoor images, so that the proposed approach could be validated in such extended database. Acknowledgments This work was supported by research Project TEC2012-38630C04-02 from Spanish Government. Special thanks to Jaime Roberto Ticay-Rivas for their valuable help during the building of this database. References Almohammad, Manhal Saleh, Salama, Gouda Ismail, Mahmoud, Tarek Ahmed, 2012. Human identiﬁcation system based on feature level fusion using face and gait biometrics. In: 2012 International Conference on Engineering and Technology (ICET), pp. 1–5. Arlot, S., 2010. A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79. Bhowmik, M.K., De, B.K., Bhattacharjee, D., Basu, D.K., Nasipuri, M., 2012. Multisensor fusion of visual and thermal images for human face identiﬁcation using different SVM kernels. In: 2012 IEEE Long Island Systems, Applications and Technology Conference (LISAT), pp. 1–7. Bourlai, T., Cukic, B., 2012. Multi-spectral face recognition: Identiﬁcation of people in difﬁcult environments. In: 2012 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 196–201. Buyssens, P., Revenu, M., 2010. Fusion levels of visible and infrared modalities for face recognition. In: 2010 Fourth IEEE International Conference on Biometrics: Theory Applications and Systems (BTAS), pp. 1–6. Crespo, D., Travieso, C.M., Alonso, J.B., 2012. Thermal face veriﬁcation based on scale-invariant feature transform and vocabulary tree – application to biometric veriﬁcation systems. Int. Conf. Bio-inspired Syst. Signal Process. 2012, 475–481. Fuertes, J.J., Travieso, C.M., Alonso, J.B., Ferrer, M.A., 2010. Intra-modal biometric system using hand-geometry and palmprint texture. In: 44th IEEE International Carnahan Conference on Security Technology, pp. 318–322. Heo, J., Savvides, M., Vijayakumar, B.V.K., 2005. Performance evaluation of face recognition using visual and thermal imagery with advanced correlation ﬁlters. In: CVPR’05, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 9–15. ISSN: 1063–6919/05. Huang, H., He, H., 2011. Super-resolution method for face recognition using nonlinear mappings on coherent features. IEEE Trans. Neural Networks 22–1, 121–130, ISSN: 1045–9227. Imtiaz, H., Fattah, S.A., 2011. A wavelet-domain local feature selection scheme for face recognition. In: ICCSP’11, 2011 International Conference on Communications and Signal Processing, Kerala, India, p. 448. Javadtalab, A., Abbadi, L., Omidyeganeh, M., Shirmohammadi, S., Adams, C.M., El Saddik, A., 2011. Transparent non-intrusive multimodal biometric system for video conference using the fusion of face and ear recognition. In: 2011 Ninth Annual International Conference on Privacy, Security and Trust, pp. 87–92.

Jiang, L., Yeo, A., Nursalim, J., Wu, S., Jiang, X., Lu, Z., 2004. Frontal infrared human face detection by distance from centroide method. In: Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, pp. 41–44. Jian-Gang, W., Jun, L., Wei-Yun, Y., Sung, E., 2010. Boosting dense SIFT descriptors and shape contexts of face images for gender recognition. In: CVPRW’10, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 96–102. Lowe, D.G., 1999. Object recognition from local scale-invariant features. In: ICIP’99, Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60–2, 91–110. Majumdar, A., Ward, R.K., 2009. Discriminative SIFT features for face recognition. In: CCECE ‘09, 2009 Canadian Conference on Electrical and, Computer Engineering, pp. 27–30. MMikolajczyk, K., 2002. Detection of local features invariant to afﬁne transformations. Ph.D. thesis. Institut National Polytechnique de Grenoble, France. Nister, D., Stewenius, H., 2006. Scalable recognition with a vocabulary tree. In: CVPR’06, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168. Ober, S., Winter, M., Arth, C., Bischof, H., 2007. Dual-layer visual vocabulary tree hypotheses for object recognition. In: ICIP’07, 2007 IEEE International Conference on Image Processing, vol. 6, pp. VI-345–VI-348. Raghavendra, R., 2012. PSO based framework for weighted feature level fusion of face and palmprint. In: 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia, Signal Processing, pp. 506–509. Slobodan, I., 2008. Object labeling for recognition using vocabulary trees. In: ICPR’08, 19th International Conference on, Pattern Recognition, pp. 1–4. Socolinsky, D.A., Selinger, A., 2004. Thermal face recognition in an operational scenario. In: CVPR’04, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1012– 1019. ISSN: 1063–6919/04. Soon-Won, J., Youngsung, K., Teoh, A.B.J., Kar-Ann, T., 2007. Robust identity veriﬁcation based on infrared face images. In: ICCIT’07, 2007 International Conference on Convergence Information, Technology, pp. 2066–2071. Soyel, H., Demirel, H., 2011. Improved SIFT matching for pose robust facial expression recognition. In: FG’11, 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, pp. 585–590. Tong, Yan, Wheeler, F.W., Xiaoming, Liu, 2010. Improving biometric identiﬁcation through quality-based face and ﬁngerprint biometric fusion. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 53–60. Viola, P., Jones, M., 2004. Robust real-time object detection. Int. J. Comput. Vision 57–2, 137–154. Yang, J., Yang, J., Zhang, D., Lu, J., 2003. Feature fusion: parallel strategy vs. serial strategy. Pattern Recognit. 36 (6), 1369–1381. Yang, S., Zuo, W., Liu, L., Li, Y., Zhang, D., 2011. Adaptive weighted fusion of local kernel classiﬁers for effective pattern classiﬁcation. In: ICIC’11 Proceedings of the 7th international conference on Advanced Intelligent, Computing, pp. 63– 70. Zheng, Yufeng, Elmaghraby, A., 2011. A brief survey on multispectral face recognition and multimodal score fusion. In: 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 543–550.

Fused intra-bimodal face verification approach based on Scale-Invariant Feature Transform and a vocabulary tree

Fused intra-bimodal face verification approach based on Scale-Invariant Feature Transform and a vocabulary tree

Recommend Documents