Random subspace for an improved BioHashing for face authentication

Random subspace for an improved BioHashing for face authentication

Available online at www.sciencedirect.com Pattern Recognition Letters 29 (2008) 295–300 www.elsevier.com/locate/patrec Random subspace for an improv...

224KB Sizes 1 Downloads 125 Views

Available online at www.sciencedirect.com

Pattern Recognition Letters 29 (2008) 295–300 www.elsevier.com/locate/patrec

Random subspace for an improved BioHashing for face authentication Loris Nanni *, Alessandra Lumini DEIS, IEIIT – CNR, Universita` di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Received 19 May 2006; received in revised form 1 March 2007 Available online 13 October 2007 Communicated by T. Tan

Abstract Verification based on tokenised pseudo-random numbers and user specific biometric feature has received much attention. In this paper, we propose a BioHashing system for automatic face recognition based on Fisher-based Feature Transform, a supervised transform for dimensionality reduction that has been proved to be very effective for the face recognition task. Since the dimension of the Fisher-based transformed space is bounded by the number of classes – 1, we use random subspace to create K feature spaces to be concatenated in a new higher dimensional space, in order to obtain a big and reliable ‘‘BioHash code’’.  2007 Elsevier B.V. All rights reserved. Keywords: Face verification; BioHashing; Random subspace

1. Introduction Establishing the identity of a person is a task of increasing importance in various areas in the modern society, as entrance control in buildings and restricted areas, authentication in day-to-day affairs like dealing with the post office, and detection of a suspect in a particular crime in the field of criminal investigation. Biometrics, which measures a physiological or behavioural characteristic of a person, such as voice, face, fingerprints, and iris, provides an effective way to solve the problems that the traditional methods such as password and IC cards have faced. One of the drawback of biometric is that when a biometric is compromised a new template cannot be assigned. There is a substantial research going on to find solutions/ alternatives to the contemporary non-reissuable biometric. Some authors like Bolle et al. (2002) and Davida et al. (1998) have introduced the terms cancellable biometrics and private biometrics. In (Duda et al., 2001) in order to solve the problem of high false rejection and to obtain a cancellable biometrics, *

Corresponding author. Tel.: +39 3493511673. E-mail address: [email protected] (L. Nanni).

0167-8655/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.10.005

a novel two-factor authenticator based on iterated inner products between tokenised pseudo-random number (generated by an Hash key) and the user specific fingerprint features is presented; in this way a set of user specific compact codes can be produced which is named ‘‘BioHash code’’. In (Teoh et al., 2004) it is proposed to integrate a dual factor authenticator based on iterated inner products between tokenised pseudo-random number and the user specific facial feature, which is generated from a Fisher Discriminant Analysis, and hence to produce a set of user specific compact code. The BioHashing is theoretically motivated in (Teoh et al., 2006). Unfortunately, when an ‘‘impostor’’ B steals the hash key or the pseudo-random numbers of A and tries to authenticate as A the performance of BioHashing-based methods (e.g. Teoh et al., 2004; Jin et al., 2004) may be lower than that obtained using only the biometric data (Duda et al., 2001; Lumini and Nanni, 2007). In a recent paper (Kong et al., 2006) the authors put in evidence the anomalies of the Base BioHashing approach and concluded that the claim of having achieved a zero equal error rate (EER) is based upon the impractical hidden assumption of no stealing of the hash key. Moreover, they proved that in a more realistic scenario where an impostor steals

296

L. Nanni, A. Lumini / Pattern Recognition Letters 29 (2008) 295–300

the hash key the results are worse than when using the biometric alone. In (Lumini and Nanni, 2007) we have proposed an improved BioHashing where more projection spaces were exploited to generate more BioHash codes per user. The verification task was performed by training a classifier for each BioHash code and finally by combining these classifiers by SUM rule. Moreover, we have shown that combining the matcher trained using the biometric features and the matcher based on the BioHash code we could obtain a more robust system. In this paper we propose a BioHashing system for automatic face recognition based on Fisher-based Feature Transform. Fisher Feature Transform is a supervised transform for dimensionality reduction that has been proved to be very effective for the face recognition task (Bellhumer et al., 1997). Unfortunately, it is well known that a Fisher-based Feature Transform permits to obtain a feature space whose dimension is bounded by the number of classes of the training set – 1, which is a strong limit in a BioHashing approach. Our proposal in order to solve this constraint when we use a Fisher-based Feature Transform is to use random subspace to create K feature spaces, to project each of these spaces onto a lower d-dimensionality space by a Fisher-based Feature Transform, and finally to concatenate the K reduced spaces in a new space of K · d dimensions, in order to obtain a big and reliable ‘‘BioHash code’’. In this paper, we test Fisherface (FIS) and Non-Linear Fisher (NLF) (Loog et al., 2001) (both implemented as in PRTools 3.1.7 toolbox http://130.161.42.18/prtools/). While the random subspace method has been applied in various machine learning tasks (e.g. in on-line signature verification Nanni, 2006), it has not received significant attention in face recognition like boosting (Gao and Wang, 2006). In Wang and Tang (2004, 2006) applied a variant of the random subspace method using Linear Discriminant Analysis (Bellhumer et al., 1997) as the base classifier. In Gao and Wang (2006) instead of boosting in original feature space, whose dimensionality is usually very high, multiple feature subspaces with lower dimensionality were randomly generated, and boosting was carried out in each random subspace. Then the trained classifiers were further combined with simple fusion method. The work is organized as follows: in Section 2 the Base BioHashing and the Improved BioHashing are briefly reviewed, in Section 3 the system proposed for face recognition is detailed, in Section 4 the experimental results are discussed, finally, in Section 5 some concluding remarks are given.

2. BioHashing BioHashing (Duda et al., 2001) generates a vector of bits starting from the biometric feature set and a seed which represents the Hash key. The biometric feature vector x 2 RN is reduced down to a bit vector b 2 {0,1}m (m 6 N), via a uniform distributed pseudo-random num-

bers generated by giving a secret seed K (Hash key), as follows: (1) Given K generate a sequence of real numbers to produce a set of vectors ri 2 RN , i = 1, . . ., m. Control that they are linearly independent, eventually discarding wrong ones. (2) Apply the Gram-Schmidt ortho-normalization procedure to transform the basis ri into an ortho-normal basis ori, i = 1, . . ., m. (3) Compute the inner products hxjorii, i = 1, . . ., m and compute bi (i = 1, . . ., m) as bi ¼  0 if hxjori i 6 s , where s is a preset threshold. 1 if hxjori i > s The resulting bit vector b, that we name ‘‘BioHash code’’, is compared by the Hamming Distance for the similarity matching. In Lumini and Nanni (2007) we suggest to use more projection spaces to generate more BioHash codes per user, we called this method: improved BioHashing. Let k be the selected number of projection spaces to be used, the BioHashing method is iterated k time on the same biometric vector in order to obtain k bit vectors bi, i = 1, . . ., k (SPACES AUGMENTATION). The verification task is performed by training a classifier for each BioHash code and finally by combining these classifiers by a fusion rule (we suggest the SUM rule). Moreover, we proposed to normalize each biometric vector by its module (NORMALIZATION) and to use several values for s instead of a fixed one (s VARIATION). In Fig. 1 a schema of the Improved BioHashing is reported, for a detailed explanation, please see Lumini and Nanni (2007). All the experiments reported in this paper have been performed using the Improved BioHashing approach with the following parameter configuration: smax = 0.1, smin = 0.1, p = 5, k = 5. 3. System proposed We propose to use random subspace to create K feature spaces. The random subspace method (RS) is the combining technique proposed by Ho (1998). This method modifies the training data set (generating K new training sets), builds classifiers on these modified training sets, and then combines them into a final decision rule. The new training sets contain only a subset (50% in this paper) of the all features. We project each of these spaces onto a lower ddimensionality space by a Fisher-based Feature Transform, we concatenate these reduced spaces to obtain a big and reliable ‘‘BioHash code’’. In this way, the dimension of the pseudo-random numbers is N = K · d. In this paper, we test as Fisher-based Feature Transform the Fisherface (FIS) and the Non-Linear Fisher (NLF) (Loog et al., 2001) (both implemented as in PRTools 3.1.7 toolbox .http://130.161.42.18/prtools/). Fisher mapping is a dimensionality reduction technique based on the optimization of the between class scatter

L. Nanni, A. Lumini / Pattern Recognition Letters 29 (2008) 295–300

297

Fig. 1. The improved BioHashing method (from Lumini and Nanni, 2007).

T R A I N I N G

RANDOM SUBSPACE GENERATION

SUBSPACE 1

FISHER-BASED FEATURE TRANSFORM

……………

……………….

SUBSPACE K

FISHER-BASED FEATURE TRANSFORM

CONCATENATION

BIOHASHING

Fig. 2. System proposed in this paper.

matrix with respect to the within scatter matrix, to better discriminate patterns of different classes. It has been proven (Bellhumer et al., 1997) that the Fisher transform is particularly suited to be applied to classification tasks such as face recognition. The Non-Linear Fisher mapping has been proposed (Loog et al., 2001) to deal with the drawbacks of the Fisher transform. The Fisher transform usually gives a bad approximation of the patterns in presence of multi-class problems due to the fact that it tends to emphasize large class distances to the detriment of very neighboring classes. Therefore, in Non-Linear Fisher mapping a weight factor based on the error function has been introduced in the definition of the between class scatter matrix to lower the influence of the large class distances. The feature vector obtained by concatenating the K Fisher-based transformed features has a dimension (N = K · d) large enough to allow a big and reliable ‘‘BioHash code’’ to be obtained, that is not possible using

a single Fisher-based Feature vector, whose dimension is bounded by the number of classes – 1. In Fig. 2 the global approach is shown, where the biometric feature vector obtained by concatenating the K Fisher-based reduced spaces is used to create a ‘‘BioHash code’’ by the Improved BioHashing approach. 4. Experiments All the tests have been conducted on the ORL,1 YALEB and FERET (Wang and Tang, 2006) datasets, which are three of the most used benchmarks in this field. 2

ORL: It consists of 400 different images related to 40 individuals.

1 2

http://www.uk.research.att.com/facedatabase.html. http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html.

298

L. Nanni, A. Lumini / Pattern Recognition Letters 29 (2008) 295–300

YALE-B: It consists of images related to 10 individuals: we use only the frontal poses (108 images per individual). FERET: It was introduced in the third evaluation (September 1996) and contains images of 1196 individuals (taken over a long period varying acquisition conditions) and consists of one training set (1196 frontal images, one for each individual) and four test sets: • Dup I: 722 images of an individual taken on different days – duplicate images. • Dup II: 234 images of an individual taken over a year apart. • FB: 1195 images of an individual taken on the same day with the same lighting. • FC: 194 images of an individual taken on the same day with different lighting. For the verification task of the methods based on solely biometric data we adopt a 1-Nearest Neighbor Classifier (Duda et al., 2001). Moreover, before the random subspace step the data are pre-processed by Karhunen–Loeve Transform3 (Duda et al., 2001) (the gray level values are projected onto a 100-dimensional space). For ORL and YALE-B the datasets, to minimize the possible misleading results caused by the training data, the results have been averaged over five experiments, all conducted using the same parameters. For each experiment we randomly resampled the learning and the test sets (containing respectively half of the patterns), maintaining the distribution of the patterns in the classes (individuals). For the FERET dataset we used the given training set and evaluated the approaches on the four test sets. For the performance evaluation we adopt the equal error rate (EER) (as in Lumini and Nanni, 2007; Duda et al., 2001). EER is the error rate where frequency of fraudulent accesses and frequency of rejections of people who should be correctly verified assume the same value (Franco et al., 2006). Table 2 reports the results of the comparison among the different methods tested in this paper. Each method is defined by three parameters: • BioHash, indicates if the BioHashing is performed; it can assume the values ‘‘YES’’ or ‘‘NO’’. ‘‘YES’’ means that the classification is performed by Improved BioHashing, ‘‘NO’’ means that the classification is performed by 1Nearest Neighbor Classifier. • RS, indicates if the random subspace creation is performed; it can assume the values ‘‘NO’’, ‘‘RS1’’ or ‘‘RS2’’. ‘‘NO’’ means that we do not use random subspace, ‘‘RS1’’ and ‘‘RS2’’ means that we use random subspace. In order to obtain a fair comparison, with a similar

3 Implemented as in PRTools 3.1.7 MATLAB toolbox http:// 130.161.42.18/prtools/.

Table 1 Parameter settings for the different approaches tested in this work Dataset

Method (RS)

NO RS1 RS2

ORL

YALE-B

FERET

d = 39 K = 5, d = 20 K = 10, d = 20

d=9 K = 11, d = 9 K = 25, d = 9

d = 50 K = 5, d = 20 K = 10, d = 20

length of the feature vector on the three datasets, we fix the parameter configurations reported in Table 1. Please note that the configuration ‘‘RS2’’ is aimed at evaluating the performance that may be obtained using a very long BioHash Code. • Features, indicates which feature extraction has been adopted; ‘‘FIS’’ denotes the Fisher mapping, ‘‘NLF’’ denotes the Non-Linear Fisher mapping. Please note that all the experiments are carried out in the worst and very unlikely hypothesis that always (in each match) an impostor steals the hash key. Our best results has been obtained by: BioHash = ‘‘YES’’; RS = ‘‘RS2’’; Features = ‘‘NLF’’; we call this combination BIOHRS. BIOHRS dramatically improves the performance of BioHashing (i.e. RS = ‘‘NO’’) and of a matcher trained using the biometric features (i.e. BioHash = ‘‘NO’’) alsoin the worst and very unlikely hypothesis that always (in each match) an impostor steals the hash key. We want to stress that whenever an impostor steals the hash key, the EER of BIOHRS in these datasets was always 0! Other interesting results are: • The simple BioHashing approach (configuration ‘‘BioHash = ‘‘YES’’RS = ‘‘NO’’; Features = ‘‘FIS’’), in the worst hypothesis that always (in each match) an impostor steals the hash key, perform worse than the pure biometric (configuration ‘‘BioHash = ‘‘NO’’RS = ‘‘NO’’; Features = ‘‘FIS’’); anyway this is no more true in the best hypothesis than nobody steals the hash key, where the performance of the BioHashing approach reaches 0 EER. • The performance of pure biometric methods (‘‘BioHash = ‘‘NO’’) increases with the number of random subspaces (RS = ‘‘RS2’’ vs. RS = ‘‘RS1’’): this is probably due to the fact that the number of subspace in ‘‘RS1’’ is not sufficient to create a robust ensemble of matchers. The problem of the number of subspace in random subspace is not novel in the literature, this problem is also discussed in (Wang and Tang, 2004, 2006). The training time in the ORL dataset for the Fisher mapping feature extraction is 2 s, and it grows to 10 s using a Random subspace where K = 10 and d = 20. The verification time using a BioHash key of length 20 is 0.1 · 104 s, the verification time using the method BIOHRS is 0.1 · 103 s. All the experiments were carried out in a

L. Nanni, A. Lumini / Pattern Recognition Letters 29 (2008) 295–300

299

Table 2 EER obtained using the following parameters: smax = 0.1, smin = 0.1, p = 5, k = 5 Face verification (EER)

Dataset ORL

Method

BioHash

RS

Features

NO

NO

FIS

NO

NO

NLF

NO

RS1

FIS

NO

RS1

NLF

NO

RS2

FIS

NO

RS2

NLF

YES

NO

FIS

YES

NO

NLF

YES

RS1

FIS

YES

RS1

NLF

YES

RS2

FIS

YES

RS2

NLF

YALE

2.07 2.65 2.9 2.78 2 1.48 3.48 2.15 3.6 1.94 3 0.82

FERET

0.8 0.8 0.23 0.24 0.35 0.23 0.55 0.53 0.26 0.26 0.3 0.23

DUP1

DUP2

FB

FC

17 15 17.2 16.3 16 12.9 17.5 14.8 16 13 16 11.9

21 18 20 20 18 13.5 20.5 17.1 18 15 19 11

6.5 6 6.2 7 6 5 7.9 6.5 6 5.8 4 4.7

14 13 14.6 13.6 13.2 10.5 15 12.4 12 11.5 9.4 10

Table 3 Average accuracy and average Q-statistic Dataset

Q-statistic

BIOH

NLF FIS

EER

NLF FIS

Q-statistic

BIOHRS

NLF FIS

EER

NLF FIS

ORL

YALE

DUP1

DUP2

FB

FC

0.95 0.8 4.7 14.1

0.96 0.96 3.4 3.1

0.92 0.9 19 25

0.91 0.85 20 25

0.85 0.8 14 22

0.83 0.8 15 21

0.8 0.65 4 9

0.95 0.95 0.6 1.5

0.95 0.92 18 22

0.94 0.92 19 22

0.88 0.85 13.5 20

0.86 0.85 14.5 19.5

Pentium 1.6 GHz 512 MbRAM with Matlab 6.5.1 without code optimization. We try to motivate the goodness of our method using the average Q-statistic (Kuncheva and Whitaker, 2003; Lumini and Nanni, 2007). Yule’s Q-statistic is a measure for evaluating the independence of classifiers: for two classifiers Di and Dk the Q-statistic is defined as: Qi;k ¼

ad  bc ad þ bc

where a is the probability of both classifiers being correct, d is the probability of both classifiers being incorrect, b is the probability first classifier is correct and second is incorrect, c is the probability second classifier is correct and first is incorrect. Qi,k varies between 1 and 1 and it is 0 for statistically independent classifiers. It is well known in the literature that a good ensemble of classifiers is obtained combing statistically independent classifiers where each classifier obtains a low error rate (Kuncheva, 2005). In Table 3 we report the average accuracy and average Q-statistic obtained by each single hash key using in the BIOH (BioHash = ‘‘YES’’RS = ‘‘NO’’) and in BIOHRS approaches.

FERET

The classifiers in BIOHRS have an average Q-statistic slightly higher than the classifiers in BIOH, but the classifiers in BIOHRS have a lower average accuracy. Moreover, the BioHash keys obtained by Non-Linear Fisher mapping have a very lower average accuracy with respect to those obtained by Fisher mapping. 5. Conclusions We have proposed a new BioHashing approach (BIObased on the several spaces obtained by random subspace for augmenting the length of the hash code, which gains some performance improvement with respect to the method based on solely biometric also in the worst case when always an impostor steals the hash key. Whenever an impostor steals the hash key, the EER of BIOHRS was 0! As future research, we intend to investigate the use of a different set of features (less sensitive to illumination conditions, as Gabor features (Zhang et al., 2005) or Locally binary Patterns (Ahonen et al., 2004) and other methods for dimensionality reduction. Moreover, we want to study the fusion between the matcher trained using the biometric features and the

HRS),

300

L. Nanni, A. Lumini / Pattern Recognition Letters 29 (2008) 295–300

matchers trained using the BioHash Code. Preliminary results reported in Lumini and Nanni (2007) show that these two matchers can be combined for obtaining a more robust system. Acknowledgements This work has been supported by European Commission IST-2002-507634 Biosecure NoE Projects. Portions of the research in this paper use the FERET database of facial images collected under the FERET program, sponsored by the DOD Counterdrug Technology Development Program Office. References Ahonen, T., Hadid, A., Pietikainen, M., 2004. Face recognition with local binary patterns. In: Proc. European Conference on Computer Vision, pp. 469–481. Bellhumer, P.N., Hespanha, J., Kriegman, D., 1997. Eigenface vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Machine Intell. 17 (7), 711–720. Bolle, R.M., Connel, J.H., Ratha, N.K., 2002. Biometric Perils and Patches. Pattern Recognition 35, 2727–2738. Davida, G., Frankel, Y., Matt, B.J., 1998. On enabling secure applications through off-line biometric identification. In: Proceeding Symposium on Privacy and Security, pp. 148–157. Duda, R., Hart, P., Stork, D., 2001. Pattern Classification. Wiley, New York. Franco, A., Lumini, A., Maio, D., Nanni, L., 2006. An enhanced subspace method for face recognition. Pattern Recognition Lett. 27, 76–84. Gao, Y., Wang, Y., 2006. Boosting in Random Subspaces for Face Recognition, ICPR 2006.

Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell. 20 (8), 832–844. Jin, A.T.B., Ling, D.N.C., Goh, A., 2004. BioHashing: Two factor authentication featuring fingerprint data and tokenised random number. Pattern Recognition 37 (11), 2245–2255. Kong, B., Cheung, K., Zhang, D., Kamel, M., You, J., 2006. An analysis of BioHashing and its variants. Pattern Recognition 39 (7), 1359–1368. Kuncheva, L.I., 2005. Diversity in multiple classifier systems. Inform. Fusion 6 (1), 3–4. Kuncheva, L.I., Whitaker, C.J., 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learn. 51, 181–207. Loog, M., Duin, R.P.W., Haeb-Umbach, R., 2001. Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE PAMI 23, 762–766. Lumini, A., Nanni, L., 2007. An improved BioHashing for human authentication. Pattern Recognition 40 (3), 1057–1065. Nanni, L., 2006. Experimental comparison of one-class classifiers for online signature verification. Neurocomputing 69 (7–9), 869–873. Teoh, A.B.J., Ngo, D.C.L., Goh, A., 2004. An Integrated Dual Factor Authenticator Based on the Face Data and Tokenised Random Number, ICBA 2004, LNCS 3072, pp. 117–123. Teoh, A.B.J., Ngo, D.C.L., Goh, A., 2006. Random multispace quantization as an analytic mechanism for bioHashing of biometric and random identity inputs. IEEE Trans. Pattern Anal. Machine Intell. 28 (12), 1892–1901. Wang, X., Tang, X.O., 2004. Random sampling lda for face recognition, In: Proc. of the 2004 IEEE Computer Society Conf. on CVPR. Wang, X., Tang, X., 2006. Random sampling for subspace face recognition. Internat. J. Comput. Vision 70 (1), 91–104. Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H., 2005. Local gabor binary pattern histogram sequence (LGBPHS): A novel non-statistical model for face representation and recognition. In: Proceedings of the 10th Internat. Conf. on Computer Vision.