Learning discriminability-preserving histogram representation from unordered features for multibiometric feature-fused-template protection

Learning discriminability-preserving histogram representation from unordered features for multibiometric feature-fused-template protection

Pattern Recognition 60 (2016) 706–719 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Le...

2MB Sizes 0 Downloads 21 Views

Pattern Recognition 60 (2016) 706–719

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

Learning discriminability-preserving histogram representation from unordered features for multibiometric feature-fused-template protection Meng-Hui Lim, Sunny Verma, Guangcan Mai, Pong C. Yuen n Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong

art ic l e i nf o

a b s t r a c t

Article history: Received 28 December 2015 Received in revised form 23 April 2016 Accepted 21 June 2016 Available online 29 June 2016

Multi-biometric feature-level fusion exploits feature information from more than one biometric source to improve recognition performance and template security. When ordered and unordered feature sets representing different biometric sources are involved, feature fusion becomes problematic. One way to mitigate this incompatibility problem is to transform the unordered extracted feature sets to ordered feature representation without sacrificing the discrimination power of the original features so that a feature fusion on ordered features can subsequently be applied. Existing unordered-to-ordered feature transformation methods are designed for three-dimensional minutiae point sets and are mostly not adaptable to high-dimensional feature input. This paper proposes a feature transformation scheme to learn a histogram representation from an unordered feature set. Our algorithm estimates the component-wise correspondences among the sample feature sets of each user and then learns a set of bins per user based on the distribution of the mutually-corresponding feature instances. Given the learnt bins, the histogram representation of a sample can be generated by concatenating the normalized frequency of unordered features falling into histogram bins. Experimental results on seven unimodal and three bimodal biometric databases show that our feature transformation scheme is able to preserve the discrimination power of the original features more promisingly than state-of-the-art transformation schemes. & 2016 Elsevier Ltd. All rights reserved.

Keywords: Histograms Feature extraction Biometrics Learning Face recognition Fingerprint recognition

1. Introduction Multi-biometric recognition uses more than one biometric modality, algorithm or/and sensor to achieve recognition [31]. Not only multi-biometric recognition could offer higher recognition accuracy than the conventional single-biometric recognition, it also provides larger population coverage and higher system security against spoof attacks [23,32]. In multi-biometric recognition, information of multiple biometric sources can be fused at the feature level, forming a composite feature set for each individual. As this fusion approach exploits information at the level where most discriminability is preserved, it often produces better accuracy performance than fusion at other levels [37,40]. Biometric templates are typically stored in a protected form to n

Corresponding author. E-mail addresses: [email protected] (M.-H. Lim), [email protected] (S. Verma), [email protected] (G. Mai), [email protected] (P.C. Yuen). http://dx.doi.org/10.1016/j.patcog.2016.06.018 0031-3203/& 2016 Elsevier Ltd. All rights reserved.

prevent irrevocable compromise of biometrics [29]. When fusion is applied at the feature level, features are fused prior to protection so that a more informative multi-biometric secret can be derived for template protection. This allows one to achieve stronger security when a template protection scheme such as fuzzy commitment, fuzzy vault or fuzzy extractor is applied. Compared to other fusion approaches that protect unimodal templates individually, the protected feature-fused template is more difficult to break than the multiple individually-protected templates [19,23]. A feature set extracted from each biometric source for feature fusion can be ordered or unordered based on the characteristic of the set elements [16]. An ordered feature set is a feature vector or matrix containing ordered elements. Examples include a subspace-projected feature vector, a histogram feature vector and a transformed feature matrix. An unordered feature set contains non-ordered elements such as a fingerprint minutiae point set and facial point descriptor set. As feature extractors are usually developed specifically for a biometric source, the features extracted from different biometric sources could be incompatible. This poses a great challenge when

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

incompatible features need to be fused. Most biometric feature fusion schemes avoid fusing incompatible features (ordered with unordered features) together. For instance, Chin et al. [6] extract ordered Gabor features from both fingerprint and palmprint modalities for fusion; Rattani et al. [30] fuse unordered Scale Invariant Feature Transform (SIFT)-based face features with unordered fingerprint minutiae points through concatenation; Huang et al. [11] extract ordered EigenFace and EigenEar features using Principal Component Analysis (PCA) for fusion; Xing et al. [36] extract ordered features from face and gait modalities via coupled projections for fusion. Although one could select feature extractors to obtain only either ordered or unordered feature sets from distinct biometric sources for feature fusion, the optimal fusion performance can rarely be achieved. This is due to that state-of-the-art feature sets of these biometric sources may not necessary be of the same type. For instance, a state-of-the-art fingerprint representation to date is the unordered Minutia Cylinder Code (MCC) [5] and a state-of-theart face representation is the ordered Monogenic Binary Code [38]. These two representations are incompatible in their present form and therefore cannot be fused at the feature level. To obtain accuracy optimality in feature fusion, flexibility to adopt any type of features for fusion is important. To realize this, one can either (1) propose a fusion method that works effectively on heterogeneous feature sets; or (2) propose a transformation that converts heterogeneous feature sets into homogeneous features for fusion, such that there should not be any significant degradation in discrimination power in the converted feature sets. This paper focuses on the latter, so that a feature fusion technique that operates on multiple ordered feature sets, such as serial concatenation [11] or feature selection [23], can be applied, as shown in Fig. 1. To propose a feasible unordered-to-ordered transformation approach, a few challenges need to be tackled: 1.1. Feature alignment problem The correspondence of unordered feature elements between the query and enroled samples, which is important for similarity assessment, is unknown. These element-wise correspondences are typically sought during matching via a brute force comparison over the alignment possibilities followed by a best match selection [5,8]. However, when ordered feature-based fusion is concerned, the alignment of transformed unordered feature elements is needed during fusion. This implies that the above-mentioned strategy cannot be applied because the correspondence of unordered features between query and enroled samples has to be determined before matching. Hence, to transform an unordered feature set into an ordered set appropriately, a criterion is to ensure a clear correspondence of each transformed feature among different acquisitions so that fusion and similarity assessment can be carried out semantically. 1.2. Variable set cardinality problem Unordered feature set such as the fingerprint minutiae set can

Biometric Source A

Biometric Source B

Feature Extraction

Feature Extraction

Unordered Features

Unordered to ordered feature transformation Ordered Features

707

have an uncertain number of feature elements in every acquisition. The variability in the number of minutia extracted in different acquisitions is mainly caused by missing or spurious feature points extracted due to noise introduced during acquisition. To enable fusion of the transformed feature set with other ordered feature sets, a criterion is to ensure that the transformed feature set always has a fixed size disregarding the variable number of feature elements in different acquisition. Although there have been several methods [1,10,22,24,33– 35,37] developed to address these challenges, a major limitation is that these unordered-to-ordered feature transformation methods are designed specifically for three-dimensional minutiae points and are mostly not applicable to high-dimensional feature points. Most of these methods rely on operations/processes that are easy to apply in two or three-dimensional space but are difficult to apply in high-dimensional space. Instances of these operations/ processes include identification of clockwise orientation [34] and Fourier transform [24,37]. The extraction of a histogram feature representation from an unordered set is a popular approach [1,10,33] because histogram features are inherently aligned and are of a fixed length. A histogram feature can be extracted by counting feature points that fall within a specific interval known as bin. To extract a discriminative histogram feature representation, the main objective is to minimize intra-class variation and maximize inter-class variation. To achieve this objective, instances of the same feature element need to be enclosed within the same bin; while the average difference in the number of features between a genuine user and an imposter user in each bin has to be maximized. In this paper, we explore a new perspective on deriving a discriminative histogram feature representation. Different from existing histogram-based methods that adopt a heuristic approach in constructing the histogram bins, we design a feature transformation scheme to learn a discriminative histogram representation from an unordered feature set without significantly deteriorating the discrimination power of the original unordered features. Our transformation scheme consists of two stages, as shown in Fig. 2. The first stage addresses the feature alignment problem by estimating the group correspondence for every unique feature that appears in the training set. The second stage addresses the variable set cardinality problem by learning a set of bins for each genuine user according to how the genuine and imposter link structures are distributed in the high-dimensional feature space. By ordering the bins, an ordered histogram representation can be extracted for each sample by counting the unordered feature elements that fall into these bins. In summary, our contribution is two-fold: 1) We propose a learning-based feature transformation method to transform an unordered feature set into a histogram feature representation for the application of ordered feature fusion. 2) We propose a discriminative measure for histogram representation, which is useful in guiding the learning of bins for deriving a discriminative representation for the target genuine user.

Transformed Ordered Features Fused

Feature- Features Template level Protection Fusion

Protected Template

Fig. 1. A typical approach of transforming unordered feature set into ordered set for multi-biometric feature fusion.

708

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

Fig. 2. Learning histogram bins for unordered-to-ordered feature transformation in our transformation scheme.

The structure of this paper is organized as follows. In the next section, we review some related work. In Section 3, we describe our unordered-to-ordered feature transformation scheme. In Section 4, the experimental results are reported and discussed. Finally, we draw a few concluding remarks in Section 5.

2. Literature review Fuzzy commitment [12] and fuzzy extractor [9] are among the two most popular template protection schemes. Fuzzy commitment binds a binary key to a binary biometric representation and the key can only be recovered if a similar binary biometric representation is presented. Fuzzy extractor directly transforms a binary biometric input into a stable binary string that can be used as encryption keys in cryptographic applications. Both these schemes take an ordered multi-biometric representation as input in our context. When modalities that are represented by unordered features are fused with modalities that are represented by ordered features, an unordered-to-ordered feature transformation is required. The existing unordered-to-ordered feature transformation methods are developed mainly for fingerprint minutia. Most of these methods are not applicable when the three-dimensional minutiae input is substituted with the high-dimensional feature points such as MCC and SIFT point descriptors. In the following, we will revisit several recent methods. Vij and Namboodiri [34] proposed a method that generates an ordered fixed-length representation from the minutiae points through extracting arrangement structures. Each arrangement structure is formed by the neighbouring points around a minutia, where geometric features such as relative distances, relative orientation and ratio of angles in the geometry formed by the minutia and two neighbouring minutiae points are used. This arrangement structure is derived based on a concatenation of features from multiple geometrics around a minutia in a clockwise direction. However, it is not clear how such a clockwise orientation can be applied in a high-dimensional space in our context. Sutcu et al. [33] and Nagar et al. [22] make use of random cuboids to derive a fixed-length binary representation from the minutiae points. To generate bits of the binary representation, Sutcu et al. use the number of minutia components points that fall into the random cuboids; while Nagar et al. use secondary features such as average minutia coordinate, standard deviation of minutiae coordinates, and aggregate wall distance, which are extracted from the minutia points in the random cuboids. Although these methods are relatively easy to extend when the input points are high-dimensional, it is difficult to generate useful high-dimensional cuboids that contain a significant number of points, from which ordered features are extracted. Bringer and Despiegel [4] identify a set of representative minutia vicinities (minutia neighbourhoods) and use the comparison score with the query vicinities as a feature in generating the binary feature representation. Wong et al. [35] use kernel PCA based on a Gaussian kernel to perform the unordered-to-ordered feature transformation. Both these methods, however, require training data to be stored in order to support feature transformation during

query, which would pose huge privacy risks. Farooq et al. [10] proposed to generate a binary histogram representation from the minutiae points based on the triangular features derived from minutiae triplets. These features include side length, relative difference in angle and height of the triangle formed by the minutiae triplet. This method requires very high computational cost considering the exhaustive feature computation over a huge number of possible triplets. Spectral transform is another unordered-to-ordered feature transformation approach. Xu et al. [37] perform Fourier transform on the minutiae set and re-maps the Fourier spectral magnitude [37] spectrum onto polar-logarithmic coordinates. Nandakumar [24] suggested to quantizing the phase spectrum of the minutiae set to obtain the binary representation. However, it is not clear how the Fourier transform can be applied on high-dimensional feature set. Ahmad et al. [1] project the minutiae points on a line along the horizontal, vertical, and an optional orientation direction in a twodimensional feature space. Histogram bins are then created on the line through segmenting the projection line into multiple partitions. Finally, a histogram can be produced by aggregating the number of projected points over the bins. It is not easy to extend this method to the case of high-dimensional input features because it is difficult, if not infeasible, to project high-dimensional features onto a one-dimensional line along a large number of directions. Considering the restricted adaptability of existing methods to high-dimensional feature input, we propose a feature transformation approach to derive a discriminative ordered feature representation from unordered high-dimensional features effectively.

3. Our feature transformation approach Our transformation approach converts an unordered feature set into a histogram feature representation. Different from the conventional histogram approach that pre-defines or randomly defines histogram bins, we learn the histogram bins based on the feature correspondences estimated from the training set. More specifically, our objective is to learn a set of discriminative bins for each user, such that these bins could tolerate variation of the genuine user and that the histograms generated from these bins would discriminate the genuine user against the imposter users. By maximizing the discriminability of the histogram representation, the possibility of preserving the discriminability of the unordered features can thus be maximized. The difference between two histograms lies in the difference in occurrence counts of unordered features over the histogram bins. To maximize the discrimination power of the histogram representation, the intra-class variation has to be minimized and the inter-class variation has to be maximized. To minimize the intrauser variation of a histogram representation, the same feature that appears over the genuine samples should be grouped into the same bin so that such a feature could contribute consistently to the same bin whenever it appears. To maximize the inter-user variation, the average difference in the number of different feature elements in each bin should be maximized between the imposter users and the genuine user. However, information of componentwise feature correspondences across samples of each user is not available. This makes the search for a set of discriminative bins challenging. To address this problem, we propose a feature transformation scheme consisting of two stages: group-correspondence estimation and discriminative bin generation, as shown in Fig. 2. In the first stage, we estimate feature correspondences over the training samples and connect features of the same type with a graph called

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

709

label of the features is not known. The objective is to identify which of the features are mutually corresponding considering the entire set of features pooled over the samples. 3.1.1. Estimating pairwise correpondences We begin by estimating the pairwise correspondence for every combination pair of samples of a user, from which the group correspondence is formed. We regard the estimation of pairwise correspondence for each sample pair as a matching problem with at most one potential correspondence per feature instance. For each sample pair ( i′, i′′) with 1 ≤ i′ < i′′≤t , we compute a (norFig. 3. An illustration of 3 link structures generated from 4 unordered sample feature sets (S1, S2, S3 and S4) of a user, where objects that have the same shape represent instances of a particular feature.

link structure. The objective is to establish a group correspondence to join instances of the same feature in a link structure. The vertices in a link structure represent instances of a feature and the links represent correspondences between the connected feature instances, as shown in Fig. 3. Ideally, the quantity of vertices in a link structure is determined by the frequency of the associated feature that appears over the training samples; while the number of link structures estimated in this stage is determined by the number of different features that appear over the training samples. In the second stage, we learn a set of discriminative histogram bins for a target genuine user using the estimated link structures. We adopt an iterative approach to partition the high-dimensional feature space, such that every cut that is applied to the space would warrant (1) minimum separation of genuine vertices that originate from the same link structure and (2) maximum average difference in the quantity of feature elements between a genuine user and an imposter user over the partitions (bins). Once the bins are obtained, a discriminative histogram representation can be generated given a genuine sample that is represented by an unordered feature set based on the frequency of feature components falling within these bins.

malized) similarity matrix metric, where

S ∈ ⎡⎣ 0,1⎤⎦

S= S ⎡⎣ f ′ ⎤⎦ ⎡⎣ f ′′ ⎤⎦ f ′∈⎡⎣ 1,ni′ ⎤⎦, f ′′∈⎡⎣ 1,ni′′ ⎤⎦

{

}

ni′× ni′

based on a distance

(1)

Based on S, a set of potential pairwise feature correspondences

C=

ni ′ i ′′

{ f ′ , f ′′ } q

q

q=1

⊆ P i′×P i′′ can be determined using a greedy ap-

proach as shown in Fig. 4, where ni′i′′=min{ni′, ni′′ }. As a feature in any sample can only be involved at most once in a correspondence, index of the features selected in any iteration will be excluded from the forthcoming evaluation by being added to the respective index set F′ and F′′. Note that we slightly abuse the notation by using S ⎡⎣ F′⎤⎦ ⎡⎣ F′′⎤⎦ to denote a submatrix of S containing rows and columns specified by F′ and F′′, respectively. Once ni′i′′ pairwise correspondences in C are determined, a ni′× ni′

binary indicator matrix Xi′i′′∈{ 0,1} is constructed to specify the matching configuration between the two feature sets P i′ and P i′′, where Xi′i′′ ⎡⎣ f ′ ⎤⎦ ⎡⎣ f ′′ ⎤⎦=1 indicates a potential correspondence that has been sought between feature f ′ of sample i′ and feature f ′′ of sample i′′. Finally, the similarity matrix for the potential pairwise ni′× ni′ between sample i′ and i′′ can be correspondences S^i′i′′∈⎡⎣ 0,1⎤⎦ computed by

^ S i′i′′=X i′i′′ ∘S i′i′′

(2)

3.1. Group-correspondence estimation

where ‘ ∘’ denotes the Hadamard (element-wise) product and only the similarity of ni′i′′ pairwise correspondences will be preserved in ^ S i′i′′.

Let P1,… ,Pt be the unordered feature sets extracted from t training samples of a user containing n1,… ,nt features, where the

3.1.2. Merging corresponding feature instance pairs Given (corresponding) instance pairs estimated from different

Fig. 4. Pairwise-correspondence estimation algorithm.

710

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

sample pairs of a user, instance pairs that involve a common feature of a sample can be merged to establish a link structure. As ni′i′′instance pairs are taken from each sample pair in the pairwise correspondence estimation, false positives may arise in the instance pairs when there are less than ni′i′′corresponding instance pairs in any sample pair. These false positives can be reduced by discarding instance pairs whose similarity is too low during merging of instance pairs, where a similarity threshold τ can be used. It is noted that τ should not be fixed too low to avoid miss detection of genuine corresponding instance pairs with lower similarity values. As similarity is the only information used in estimating the pairwise correspondences, non-corresponding features could be erroneously joined as an instance pair during pairwise correspondence estimation. Most of these false positives can be detected during merging of instance pairs. As an example, an ideal group correspondence (e.g., A1–B1–C1) consists of mutually corresponding instance pairs (e.g., A1–B1, A1–C1, B1-C1), where ‘A1’, ‘B1’ and ‘C1’ in the example denote the first feature of sample A, B, and C, respectively. However, as estimation of pairwise correspondences is performed locally, a “linking collision” may occur when the instance pairs to be merged consist of two different features from a single sample (e.g., A1–B1, A1–C1, B1–C2). Since there can be only one feature from each sample to be included in a link structure, a false positive has been detected and the instance pair that has a lower similarity will be discarded from merging whenever such a collision occurs. The merging algorithm begins with N initial link structures, ^ each containing a feature instance from the training samples. Let S ^ denotes the entire set of Si′i′′ matrices for all combinations of i′ and i′′ of a user. The algorithm initially selects the instance pair with ^ the highest similarity within S and then identifies which two link

structures the elements in the selected instance pair are associated. The algorithm then checks if a linking collision occurs between the two link structures. If there is no linking collision, it merges the two link structures together. The same process is repeated until the similarity threshold τ is reached and the final list of M link structures L that represents the resulted group correspondences can be obtained. Note that the number of vertices in a link structure implies the frequency of instances of a feature appearing over the training samples. In practice, the link structures can be composed of different number of vertices due to missing or/ and spurious unordered feature elements in the training samples. The complete description of our pairwise-correspondence linking algorithm is given in Fig. 5. A toy example illustrating how instance groups are extracted from three training samples of a user is given in Fig. 6, where the similarity threshold is preset to τ = 0. 7. It is noted that objects of the same shape represent instances of a particular feature and the label (shape) of the features are unknown in practice. Initially, four pairwise correspondences are formed per sample pair using the pairwise correspondence estimation algorithm depicted in Fig. 4. In each sample pair, there is at least one instance pair that is incorrectly paired, as there are only 1, 3, and 2 features that are truly corresponding in the first, second, and third sample pair, correspondingly. These instance pairs are then linked using the algorithm depicted in Fig. 5. As a result, eight link structures are obtained. Among them, L[6] to L[8] are feature groups, where points within each group are deemed mutually corresponding; while L[1] to L[5] are individual features that are deemed spurious. In fact, there is a false positive and a miss detection of star feature in the final outcome, as the star feature in the first sample is more similar to the triangle feature ( 0.71>τ ) in the second sample than to its counterpart (0.68<τ ) in the third sample. Hence, the star feature is

^ Fig. 5. Pairwise-correspondence linking algorithm, where ψmax (S ) returns a pair of (sample, feature) indices associated with the pairwise correspondence that has the ^ highest similarity Smax.

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

711

Fig. 6. Estimation of group correspondences from 3 unordered sample feature sets (S1, S2 and S3) of a user, where the similarity threshold is preset to τ = 0.7.

incorrectly tied to the triangle feature group and the genuine star instance pair is erroneously discarded during the instance-pair linking. 3.2. Discriminative bin generation With the link structures estimated from the first stage for all users, we can now proceed to derive a discriminative bin configuration. As the overall goal is to preserve the discriminability of the unordered feature sets upon transformation, we seek a bin configuration that could minimize the intra-user variation and maximize the inter-user variation of the histogram representation for each genuine user. 3.2.1. The objective function A reliable objective function is essential for deriving a discriminative bin configuration from a pool of configuration candidates. It should be a combined measure of intra-user variation and inter-user variation, indicating how well the vertices of each genuine link structure are enclosed within the same bin (intrauser variation) and how large the quantity difference is between the genuine and imposter link structures in each bin (inter-user variation). In the ideal case, all vertices of each genuine link structure should be enclosed within a common bin so that these feature instances would contribute to the count of the same bin

whenever they appear, thus cancelling each other out in the computation of frequency difference (minimizing dissimilarity) of any two genuine histograms. On the other hand, there should be a significant difference in the number of link structures (features) between the genuine user and every imposter user in every bin, thus enlarging the average frequency difference between the genuine histogram and any imposter histogram. 3.2.1.1. Histogram variation. We adopt mean difference to measure the variation of two histograms drawn from a probability distribution of histograms. We express mean difference between two histograms as a sum of mean differences over the individual bins, where the mean difference associated with each bin is defined as the mean difference in frequency of unordered features falling within the corresponding bin between two unordered sets (samNbin Nbin ples). Given two histograms X1={ X1 [h] } and X2 ={ X2 [h] } reh=1 h=1 presented by two vectors of bin frequencies, the mean difference can be described as

V = E ( X1 − X2 ) Nbin N1[h] N2 [h]

=

Nbin

∑ ∑ ∑ P ( X1[h] = α) P (X2 [h] = β)|α − β| = ∑ x 2,T h Qx1, h

h=1 α=0

β= 0

where x1, h ¼

h=1 N1[h]

{ P ( X1 [h]=α) }α=0 ,

x2, h ¼

N2 [h]

{ P ( X2 [h]=β) }β=0 ,

(3)

Q is a

712

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

N2 [h] × N1 [h] Toeplitz matrix, Nbin denotes the quantity of histogram bins, and N1 [h] and N2 [h] denote the maximum number of features that could appear in bin h for histograms X1 and X2, respectively. To compute the probability of obtaining α number of features in an acquisition for the first user: P ( X1 [h] = α) and the probability of obtaining β number of features for the second user P ( X2 [h] = β) in (3), the occurrence probability of the different features in bin h is required. As instances of each unique feature are represented by a link structure, the occurrence probability of the unordered features can be estimated based on the number of vertices that the corresponding link structures have in bin h. N [h ] Let S = { pc ] } be a set of non-zero occurrence probabilities c=1 for N [h] link structures (features) of a user in bin h, where the link structures corresponding to these features have at least one vertex enclosed in bin h. Each probability element of Sh can be estimated by

L [ c] h

pc =

(4)

t

where L ⎡⎣ c ⎤⎦

h

represents the number of feature instances of L ⎡⎣ c ⎤⎦

that are enclosed in bin h. Since the occurrence probability for α particular features in bin h can be computed with a product of presence probability of the α link structures and the absence probability of the remaining N ⎡⎣ h⎤⎦ −α link structures in bin h, such occurrence probability can be computed for all different combinations of α link structures and summed to yield the occurrence probability of any α number of features in bin h during an acquisition, such that

P X ⎡⎣ h⎤⎦=α =p1p2 …pα −1pα ( 1 − pα +1)( 1 − pα +2 )

(

)

1

2

pN ⎡⎣ h⎤⎦ −1pN ⎡⎣ h⎤⎦= ∑ r = 0

r ( − 1) wr p¯α + r ,

(5)

where

wr =

( (N [h]−α )/r )( N [h]/α ) , ( N [h]/α+r )

⎧1 ⎪ p¯ x = ⎨ ∑ ⎪ ⎩ Sx ⊆ S

(6)

s ∈A { αi }n i =1

(∏

pc ∈ Sx

pc

) for x > 0

, (7)

large, (e.g., N ⎡⎣ h⎤⎦=30) , the computation of P ( X [h] = α ) in (5) would require the computation of p¯x in (7) for 0 ≤ x≤N [h], incurring more than a billion of multiplications that is nearly practically infeasible to compute. To overcome this problem, we decompose S into ns subgroups n { Si } s : i=1

(8)

such that the subgroup sizes N ⎡⎣ h⎤⎦, i.e., Ni sum to n N ⎡⎣ h⎤⎦= ∑i =s 1 Ni. With smaller subgroups , P ( Xi [h] = αi ) in each subgroup can be computed according to (5) for 0 ≤ αi≤Ni . Due to smaller Ni , the entire subgroup probabilities would be far more efficient to compute than P ( X [h] = α) in (5). Assuming the

n

∏i =s 1 P ( Xi [h] = αi ),

(9)

where A represents the set that contains all possible combinations n of { αi }ni =s 1 satisfying α = ∑i =s 1 αi . Based on (9), we have developed

{

an algorithm to compute xh= P ( X [h]=α)

N [h ]

}α=0

efficiently in (3) as

shown in Fig. 7. 3.2.1.2. Objective function. By making P ( X1 [h] = α) and P ( X2 [h] = β) in (3) computational efficient, the objective function to be maximized is given by

J =Vinter −Vintra,

(10)

where

Vinter =

1 I

∑j ∈ U / i

^ Vij,

^ Vintra=Vii

(11)

(12)

represent the interand intra-class components for 0 ≤ Vintra, Vinter ≤1, respectively. For the inter-class component in (11),

Vij−Vij (min) Vij (max)−Vij (min)

(13)

denotes the variation V in (3) between a genuine user i and an imposter user j from the population user set U , which is normalized using Vij (max ) and Vij (min). The maximum variation Vij (max ) (resp. the minimum variation Vij (min)) corresponds to the setting where all link structures of genuine user i are enclosed within a different bin (resp. the same bin) with the link structures of imposter user j , respectively. For the intra-class component defined in (12),

^ Vii= for x = 0

and Sx is a subset of S containing x probability components. For instance, p¯1= ∑i pi , p¯ 2= ∑i ∑j > i pi pj , and p¯ 3= ∑i ∑j > i ∑k > j pi pj pk . However, the computation of p¯x in (7) requires exhaustive multiplications of all ( N [h] /x ) combinations of x probability elements from S. When N ⎣⎡ h⎦⎤ (the number of possible features in bin h) is

S=⋃in=s 1 Si.

P ( X [h] = α )= ∑

α

pN ⎡⎣ h⎤⎦ − α +1pN ⎡⎣ h⎤⎦ − α +2 … N [h]− α

n

of { α1,… ,αns } that fulfil α = ∑i =s 1 αi , such that

^ Vij=

( 1 − p )+… + ( 1 − p )( 1 − p )…( 1 − p ) N ⎡⎣ h⎤⎦

occurrence of the unordered features to be mutually independent, P ( X [h] = α) in (5) can be regarded as a concurrent occurrence of the ns individual events (an event per subgroup). With this, P ( X [h] = α ) can be expressed as a product of ns probabilities P ( Xi [h] = αi ) for 1 ≤ i ≤ ns , which is summed over all combinations

Vii−Vii (min) Vii (max)−Vii (min)

(14)

denotes the variation V in (3) between two samples of a genuine user i, which is normalized using Vii (max ) and Vii (min). The minimum variation Vii (min) corresponds to the setting where all link structures of the genuine user i are enclosed within a single bin; while the maximum variation Vii (max ) corresponds to the setting where each link structure is enclosed within a different bin. 3.2.2. The discriminative bin-configuration search With the objective function in (10), we can now search for a set of high-dimensional histogram bins that results in maximum J for a target user, where a bin configuration with maximum J groups the genuine feature instances into a common bin per feature with largest probability and maintains maximum mean quantity difference of features with other imposter users in every bin. A direct approach to obtain such a bin configuration is to adopt a partitional clustering technique such as k-medoid [26] with the objective function in (10). However, due to potential change in clusters boundary during the updating step, a re-computation of J is required in each iteration. This requirement induces a much higher computational complexity than the cluster representative update of the original k-medoid clustering that relies only on

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

Fig. 7. Efficient computation of probability vector xh for the computation of histogram variation V in (3).

Fig. 8. The discriminative bin-configuration search algorithm.

713

714

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

distance computations. Hence, the application of partitional clustering in this case may not be feasible when a huge number of iterations is required for clustering convergence. To overcome this issue, we adopt a dynamic programming approach. Instead of searching for the bins globally, we segment the feature space into partitions using local separators (one-dimensional cutpoints) sequentially, where each partition represents a high-dimensional histogram bin. To reduce the search space, our algorithm begins by binarizing the training data. We represent real-valued data with nst discrete states using equalprobable quantization [16]. These discrete states are then binarized using Linearly Separable SubCode (LSSC) encoding [15]. As the binary training data is high-dimensional, we define a separator candidate per dimension, where each candidate is an intermediate value between the binary values, e.g., 0.5. Initially, the separator candidates are evaluated one at a time on the Hamming feature space (without partitions). The separator candidate that creates the best partitions (achieves the highest objective J ) on the feature space is selected and these partitions form a bin configuration. In the next iteration, the remaining separator candidates are evaluated on the space that is partitioned with the previously-selected separator(s). The bin configuration corresponding to the best separator candidate is taken. The same procedure is repeated until Niter iterations are reached and the bin configuration that achieves the highest objective J among these iterations is chosen. As the unordered features are not uniformly

distributed, not all bins would contain at least an unordered feature point. To generate a compact histogram representation, nonempty bins are identified and the corresponding index and mean are saved. These bins on the feature space will serve as the final bin configuration for the target user. The pseudo-algorithm of the discriminative bin-configuration search is shown in Fig. 8.

3.3. Generating a histogram representation Given a biometric sample that is represented by an unordered set of feature components, the histogram representation of this sample can be extracted by counting the features that fall within the set of learnt bins, thus forming a fixed-length integer representation. If a component of a query feature set falls within an empty bin of the final bin configuration, this feature will be associated to the nearest non-empty bin based on the mean of the bins. To alleviate the effect of missing and spurious points in query sample, the transformed representation is finally normalized over the total frequency of the histogram in both enrolment and query phases. A minimum difference between two genuine histogram representations can be obtained when the distributions of query and enroled unordered features over the histogram bins are the same.

Fig. 9. (a)–(c) The ROC performance of the transformation schemes with reference to that of the original schemes, where “[O]” and “[T]” denote original scheme and transformation scheme, respectively; and every real-valued number in the legend denotes the normalized AUC value of the corresponding scheme. (d)–(g) The ROC performance of the transformation schemes with reference to that of the original schemes, where “[O]” and “[T]” denote original scheme and transformation scheme, respectively; and every real-valued number in the legend denotes the normalized AUC value of the corresponding scheme.

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

715

Fig. 9. (continued) Table 1 EER and AUC performance of the feature transformation schemes. EER

DB1 DB2 DB3 DB4 WVU FERET FRGC

AUC

BoVW

CBH

MA

Proposed

BoVW

CBH

MA

Proposed

0.325 0.333 0.346 0.350 0.388 0.144 0.441

0.319 0.348 0.330 0.329 0.384 0.268 0.461

0.253 0.202 0.292 0.245 0.405 0.117 0.377

0.207 0.210 0.250 0.228 0.352 0.107 0.376

0.730 0.721 0.709 0.710 0.654 0.918 0.593

0.709 0.698 0.700 0.715 0.659 0.787 0.559

0.838 0.879 0.794 0.833 0.660 0.949 0.686

0.871 0.884 0.819 0.855 0.724 0.958 0.681

4. Experimental evaluation The experiments are divided into two parts. In the first part, we conduct a comparative study to evaluate the proposed transformation using seven unimodal biometric datasets; while the second part examines the performance of ordered feature fusion that involves an ordered feature set and an unordered feature set using three bimodal biometric datasets. In this part of the experiment, the proposed feature transformation is applied on the unordered feature set prior to fusion. 4.1. Data sets In the first part of the experiment, we adopt seven benchmark unimodal datasets: five fingerprint datasets (FVC2002 DB1-DB4 and WVU), and two face datasets (FRGC and FERET).

– FVC2002 DB1-DB4: The four datasets [20] each comprises 800 fingerprints of 100 subjects (8 impressions per subject). Fingerprints from DB1 to DB3 databases are captured using optical sensor "TouchView II" by Identix, optical sensor "FX2000" by Biometrik, and capacitive sensor "100 SC" by Precise Biometric, respectively; while fingerprints from DB4 are generated synthetically using Gabor-like space-variant filters. – WVU: The adopted dataset is a subset of the fingerprint dataset of the WVU multimodal biometric dataset collection (release 1) [7], which contains a total of 576 fingerprint impressions with 8 impressions for each of the 72 subjects. These fingerprints are collected using the SecuGen optical fingerprint biometric scanner. – FRGC: The adopted dataset is a subset of FRGC face dataset (version 2) [27], in which the images were collected under controlled illumination condition. This dataset contains a total of 2124 images with 12 images for each of the 177 subjects. – FERET: The adopted dataset is a subset of FERET face dataset [28], in which the images were collected under varying illumination conditions and face expressions. It contains a total of 800 images with 8 images for each of the 100 subjects. In the second part of the experiment, we use three bimodal datasets. These datasets include a real bimodal biometric dataset (SDUMLA-HMT) and two virtual bimodal datasets (FRGC þDB3 and FERET þ DB4). – SDUMLA-HMT: The adopted dataset is a bimodal dataset [39] containing face and fingerprint images of 106 subjects. The face images are captured under different pose and expression variations. The fingerprints are captured using FPR620 optical fingerprint scanner. In this dataset, 8 near-frontal face images and

716

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

4.2. Experimental settings

Fig. 10. The ROC performance of the bimodal feature fusion schemes using a transformation method with reference to that of the unimodal schemes, where “[Uni]” and “[Bi]” denote unimodal and bimodal biometric recognition, respectively; and every real-valued number in the legend denotes the normalized AUC value of the corresponding scheme.

fingerprint impressions per subject are used. – FRGCþDB3: The adopted dataset is a virtual bimodal database that is obtained by randomly linking 8 faces of 100 subjects from the FRGC face dataset [27] with the 8 fingerprints of 100 subjects from the FVC2002DB3 dataset [20]. – FERET þDB4: The adopted dataset is another virtual bimodal database that is obtained by randomly linking 8 faces of 100 subjects from the FERET face dataset [28] with the 8 fingerprints of another 100 subjects from the FVC2002DB4 dataset [20].

In the experiments, four samples are used for training and the remaining four (remaining eight for FRGC dataset) is used for testing. We pre-process the face datasets by applying proper alignment to the images based on standard face landmarks, extracting the face region by cropping and resizing the images to the size of 61 × 73 for FRGC and FERET datasets, and 32 × 32 for SDUMLA-HMT dataset, and finally applying histogram equalization on the images. To evaluate the proposed transformation, we extract unordered features from the datasets, where these features serve as the input for the transformation. In the first part of the experiment, we extract from each fingerprint of FVC2002 DB1-DB4 and WVU fingerprint datasets a 1280-dimensional unordered binary MCC [5] feature set using the Verifinger SDK [25] by Neurotechnology and the MCC SDK version 2.0 [3] by Biometric System Laboratory in University of Bologna using the optimal MCC parameters from “MCCSdk1.4OptimalEnrol Parameters.xml” [3]. As all the transformation schemes in our evaluation do not support the application of masks for the selection of valid MCC bits, we exclude the use of mask in our experiments. For FRGC and FERET face datasets, we extract a 128-dimensional SIFT [17] unordered feature set from each face image using the code [18] provided by the original author. In the second part of the experiments where the face and fingerprint features are fused at the feature level, we apply a transformation on the unordered fingerprint features so that an ordered feature fusion can be applied on the transformed fingerprint features with the ordered face features. Of all three bimodal datasets, we extract MCC features from the fingerprints and Linear Discriminant Analysis (LDA) features [2] from the faces. For LDA, the number of dimensions upon dimensionality reduction is fixed as number of users minus one, i.e., 105 for SDUMLA-HMT and 99 for FRGCþ DB3 and FERET þDB4. As typical template protection scheme such as fuzzy extractor or fuzzy commitment accepts only binary input, we convert both LDA features and the non-binary transformed MCC features into their equivalent binary counterpart using equal-width quantization [16] and 20-bit LSSC encoding [15]. Subsequently, we adopt a bit-selection-based feature fusion technique [23] to fuse the unimodal binary features together, producing a 512-bit fused binary representation for each sample. For the comparative measure, we use Equal Error Rate (EER) and Area Under Curve (AUC), where EER denotes the rate at which both false acceptance rate (FAR) and false rejection rate (FRR) are equal and AUC denotes the normalized area under the receiver operating characteristic (ROC) curve. The lower the EER, the better the performance is considered to be. In contrast, the higher the AUC, the better the performance is considered to be. To measure FAR, every test sample of each subject is matched against each test sample of every other subject. To measure FRR, every test sample is matched against every other test sample of the same subject for all the subjects in the dataset. For the dissimilarity measure, Manhattan distance is used to evaluate the dissimilarity of histograms generated using the proposed method in the first part of the experiments; while Hamming distance is adopted to evaluate the dissimilarity of binary fused feature vectors for the second part of the experiment. 4.3. Evaluated methods 4.3.1. Comparison with other transformation schemes The proposed method is compared with three implementations of transformation algorithms that are extendable to high-dimensional feature input on all datasets. The algorithms below have been implemented as described in the corresponding papers, except for a few minor changes:

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

717

Table 2 EER and AUC performance of feature fusion using different feature transformation schemes. EER

SDUMLA-HMT FRGC þDB3 FERET þ DB4

AUC

BoVW

CBH

MA

Proposed

BoVW

CBH

MA

Proposed

0.139 0.210 0.061

0.190 0.212 0.146

0.118 0.185 0.0237

0.086 0.174 0.0236

0.935 0.871 0.982

0.897 0.867 0.935

0.953 0.895 0.997

0.974 0.914 0.995

 Minutia Aggregate (MA) [22]: In this method, random cuboids





are used to generate the binary bits based on minutiae secondary features such as average minutiae coordinate, standard deviation of minutiae coordinates, and aggregate wall distance of minutiae. Here, minutia orientation-related feature has been neglected because it is not applicable on high-dimensional spatial features. For the face datasets, 50 random cuboids are generated, in which 40 highly-overlapping cuboids are discarded. Given an 128-dimensional SIFT input feature set, 257 (128 þ128þ 1) bits are extracted from each of the remaining cuboids using the average minutia coordinate, standard deviation of minutiae coordinates, and aggregate wall distance in the cuboid, respectively. From a long feature vector of 2570 bits ( 10cuboids×257bits), 1200 highly correlated bits and 370 least discriminative bits are discarded consecutively, producing the final 1000-bit binary representation for each face image. For the fingerprint datasets, 80 random cuboids are generated, in which 75 highly-overlapping cuboids are discarded. Given 1280-dimensional MCC input feature set, 2561 (1280 þ1280 þ1) bits are extracted from each of the remaining cuboids using the aforementioned secondary features. From a long feature vector of 12805 bits ( 5cuboids×2561bits), 8000 highly correlated bits and 3805 least discriminative bits are discarded consecutively, producing the final 1000-bit binary representation for each fingerprint impression. Cuboid-based Binary Histogram (CBH) [33]: This is a similar method as MA, which uses the number of minutia components points in the random cuboids as features for bit generation. For this method, 500 random cuboids are used for all datasets. By discretizing the quantity of unordered feature points in each cuboid into 1 bit based on the population mean, a final 500-bit binary representation can be obtained for each sample via applying a specific order to the bits that are generated from the cuboids. Bag of Visual Words (BoVW) [21]: This is a transformation method where an unsupervised clustering method is adopted to learn the dictionary, so that an ordered vector of occurrence counts of unordered features can be constructed based on this dictionary. Instead of k-means or fuzzy c-means, k-medoid clustering [14] is adopted for dictionary learning because it can be applied more appropriately on binary data such as binary MCC features. In the experiments, k is fixed as 100 to construct the dictionary with 100 words and to generate a 100-dimensional discrete histogram representation per sample.

4.3.2. Comparison with baseline schemes To further justify the ability of the transformed features in preserving discrimination power of the original unordered features, we compare the above ordered-matching-based transformation schemes with the baseline scheme (without transformation) that is based on unordered matching. As the local similarity scores from matching two unordered sets can be consolidated in a few different ways, two variants of the baseline scheme are implemented based on different score consolidation technique:

 Local Similarity Sort (LSS) [5]: This technique sorts the local



similarities of all possible instance pairs between two feature sets and averages the top np local similarities in producing the final global similarity score. Local Similarity Assignment (LSA) [5]: This technique uses the Hungarian algorithm [13] to find a set of np local similarities that maximizes the global score upon local-similarity consolidation. Note that each feature can only appear at most once in the instance pairs that are selected by the Hungarian algorithm, which is not guaranteed by LSS.

In the experiments, the reported performance of LSS and LSA for each dataset corresponds to the best value of np that is selected over the range 1 ≤ np≤20. Finally, the proposed method has been implemented with the following parameters throughout the experiments: normalized similarity threshold τ=0. 1for accepting 90% of the corresponding instance pairs; maximum number of iterations Niter =20 and the number of subgroups of feature occurrence probabilities ns=10 for an acceptable computational complexity; the number of discrete states nst =6 for representing the training data sufficiently precisely in binary. It is noted that Niter can be further increased to explore more options for obtaining more discriminative bin configuration but lower efficiency; ns can be further increased for higher efficiency; nst can be further increased for more precise but less compact representation; and τ can be further decreased for reduced miss detection but higher false positives. 4.4. Performance assessment 4.4.1. Part I: Feature transformation performance The ROC performance of the transformation schemes along with the original unordered-matching-based schemes are shown in Fig. 9. The AUC and EER results of the transformation schemes are demonstrated in Table 1 for the seven datasets, with the best results of each dataset highlighted in bold. Among the transformation schemes, the proposed scheme achieves the best AUC and EER performances in five out of seven datasets. In terms of EER, the proposed scheme outperforms the second-ranked minutia aggregate scheme non-trivially by 4.6% for DB1, 4.2% for DB3, 1.7% for DB4 and 5.3% for WVU. However, it is noticed that the GAR of the proposed scheme is lower than that of the minutia aggregate scheme at very low FAR in DB1-4 and FRGC datasets. This could be due to the presence of a larger number of inaccurately-estimated link structures in the proposed scheme, which largely affects the quality of the transformed features in those datasets. Compared to the original schemes, i.e., LSS and LSA, it is worth noting that the proposed scheme is able to preserve the discrimination power very well without introducing any significant degradation. The proposed scheme achieves a better performance than LSS in all datasets except FERET and FRGC and it outperforms LSA in DB1, DB2 and DB4 datasets. These observations have justified the stability of the proposed scheme in preserving the discrimination power of the unordered features compared to other evaluated transformation schemes.

718

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

4.4.2. Part II: Feature fusion performance involving transformed features The ROC performance of feature-fusion-based bimodal recognition using the evaluated transformation schemes are shown in Fig. 10, with reference to the performance of unimodal biometric recognition. The AUC and EER results corresponding to the transformation schemes are demonstrated in Table 2 for the three fusion datasets, with the best results of each dataset highlighted in bold. It is observed that feature fusion using the proposed transformation scheme achieves the best performance in SDUMLA-HMT and FRGCþDB3 datasets. The proposed scheme outperforms the secondranked minutia-aggregate transformation scheme by 3.2% EER and 2.1% AUC in SDUMLA-HMT dataset; and 1.1% EER and 1.9% AUC in FRGCþDB3 dataset. In FERETþ DB4 dataset, the proposed scheme performs comparably well as minutia-aggregate-transformationbased feature fusion, achieving 2.36% EER and 99.7% AUC. Compared to unimodal biometric recognition, the proposed scheme outperforms Face-LDA by 8.87% AUC and Fingerprint-MCC by 20.63% AUC on average. These outperformances have justified the feasibility of the proposed transformation scheme in feature fusion.

5. Conclusion In this paper, we have proposed a feature transformation scheme to convert an unordered feature set to an ordered vector for enabling ordered feature fusion involving heterogeneous feature sets. This transformation scheme estimates group correspondences from both genuine user and imposter users’ unordered feature sets and learns discriminative histogram bins for the target user using these group correspondences. Given a biometric sample represented by an unordered feature set, a histogram vector can be derived based on the quantity of unordered features falling into the histogram bins. We have conducted experiments on seven unimodal biometric datasets and three bimodal biometric datasets. Experimental results have shown that the proposed scheme is able to achieve consistent performance improvement over well-known transformation methods. Additionally, the experiment results have also demonstrated the ability of the proposed scheme in preserving the discrimination power of the original unordered features reliably. A future work in this line of research is to improve the robustness of group-correspondence estimation of the proposed scheme in order to boost the feature transformation performance. As the existing unordered-toordered feature transformation schemes (including the proposed scheme) do not support the use of feature mask in ordered similarity assessment, another important future direction would be developing transformation methods that can overcome this issue.

Acknowledgement This project is partially supported by Hong Kong RGC General Research Fund HKBU211612.

References [1] T. Ahmad, J. Hu, Generating cancelable biometric templates using a projection line, in: Proceedings of International Conference on Control, Automation, Robotics and Vision, 2010, pp. 7–12. [2] P.N. Belhumeur, J.P. Kriegman, D.J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711–720. [3] Biometric System Laboratory in University of Bologna, Minutia Cylinder Code SDK, [Online] Available: 〈http://biolab.csr.unibo.it/mccsdk.html〉. [4] J. Bringer, V. Despiegel, Binary feature vector fingerprint representation from minutiae vicinities, in: Proceedings of IEEE International Conference on Biometrics: Theory Applications and Systems, 2010, pp. 1–6.

[5] R. Cappelli, M. Ferrara, D. Maltoni, Minutia Cylinder-code: a new representation and matching technique for fingerprint recognition, IEEE Trans. Pattern Anal. Mach. Intell. 32 (12) (2010) 2128–2141. [6] Y.J. Chin, T.S. Ong, A.B.J. Teoh, K.O.M. Goh, Integrated biometrics template protection technique based on fingerprint and palmprint feature-level fusion, Inf. Fusion,” 18 (2014) 161–174. [7] S. Crihalmeanu, A. Ross, S. Schuckers, L. Hornak, A protocol for multibiometric data acquisition, storage and dissemination, Technical Report, WVU, Lane Department of Computer Science and Electrical Engineering, 2007. [8] J. Daugman, How iris recognition works, IEEE Trans. Circuits Syst. Video Technol. 14 (1) (2004) 21–30. [9] Y. Dodis, L. Reyzin, A. Smith, Fuzzy extractors: how to generate strong keys from biometrics and other noisy data, in: Proceedings of International Conference on the Theory and Applications of Cryptographic Techniques: Advances in Cryptology (EUROCRYPT), LNCS, vol. 3027, 2004, pp. 523–540. [10] F. Farooq, R. Bolle, T. Jea, Ratha, Anonymous and revocable fingerprint recognition, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–7. [11] Z. Huang, Y. Liu, C. Li, L. Chen, A robust face and ear based multimodal biometric system using sparse representation, Pattern Recognit. 46 (8) (2013) 2156–2168. [12] A. Juels and M. Wattenberg, A fuzzy commitment scheme, in: Proceedings of ACM Conference on Computer and Communications Security, 1999, pp. 28–36. [13] H.W. Kuhn, The Hungarian method for the assignment problem, Nav. Res. Logist. Q. 2 (1955) 83–97. [14] M.-H. Lim, A.B.J. Teoh, Non-user-specific multivariate biometric discretization with medoid-based segmentation, in: Proceedings of 6th Chinese Conference on Biometric Recognition, LNCS 7098, 2011, pp. 279–287. [15] M.-H. Lim, A.B.J. Teoh, A novel encoding scheme for effective biometric discretization: Linearly Separable SubCode, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2) (2013) 300–313. [16] M.-H. Lim, A.B.J. Teoh, J. Kim, Biometric feature-type transformation: making templates compatible for template protection, IEEE Signal Process. Mag. 32 (5) (2015) 77–87. [17] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision. 60 (2) (2004) 91–110. [18] D.G. Lowe, Demo Software: SIFT Keypoint Detector, [Online] Available: 〈http:// www.cs.ubc.ca/  lowe/keypoints/〉. [19] G. Mai, M.-H. Lim, P.C. Yuen, Fusing binary templates for multi-biometric cryptosystems, in: Proceedings of the 7th International Conference on Biometrics: Theory, Applications and Systems, 2015. [20] D. Maio, D. Maltoni, R. Cappelli, J.L. Wayman, A.K. Jain, FVC2002: Second fingerprint verification competition, in: Proceedings of International Conference on Pattern Recognition, vol. 16, 2002, pp. 811–814. [21] G.A. Montazer, M.A. Soltanshahi, D. Giveki, Extended bag of visual words for face detection, in: Proceedings of International Work Conference on Artificial Neural Networks, LNCS 9094, 2015, pp. 503–510. [22] A. Nagar, S. Rane, A. Vetro, Privacy and security of features extracted from minutiae aggregates, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 524–531. [23] A. Nagar, K. Nandakumar, A.K. Jain, Multibiometric cryptosystems based on feature-level fusion, IEEE Trans. Inf. Forensics Secur. 7 (1) (2012) 255–268. [24] K. Nandakumar, A fingerprint cryptosystem based on minutiae phase spectrum, in: Proceedings of IEEE Workshop on Information Forensics and Security, 2010, pp. 1–6. [25] Neurotechnology, Verifinger SDK, [Online] Available: 〈http://www.neuro technology.com/verifinger.html〉. [26] H.-S. Park, C.-H. Jun, A simple and fast algorithm for K-medoids clustering, Expert Syst. Appl. 36 (2) (2009) 3336–3341. [27] P.J. Philips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, W. Worek, Overview of the face recognition grand challenge, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 947–954. [28] P.J. Philips, H. Moon, P.J. Rauss, S. Rizvi, The FERET evaluation methodology for face recognition algorithms, IEEE Trans. Pattern Anal. Mach. Intell. 22 (10) (2000). [29] C. Rathgeb, C. Busch, Multi-biometric template protection: issues and challenges ISBN 978- 953-51-0859-7, New Trends Dev. Biom. (2012). [30] A. Rattani, D.R. Kisku, M. Bicego, M. Tistarelli, Feature level fusion of face and fingerprint biometrics, in: Proceedings of First IEEE International Conference on Biometrics: Theory, Applications, and Systems, 2007, pp.1–6. [31] A. Ross, A. Jain, Information fusion in biometrics, Pattern Recognit. Lett. 24 (2003) 2115–2125. [32] A.A. Ross, K. Nandakumar, A.K. Jain, Handbook of Multibiometrics, Springer Science and Business Media, Springer, New York, 2006. [33] Y. Sutcu, S. Rane, J.S. Yedidia, S.C. Draper, A. Vetro, Feature extraction for a Slepian-Wolf biometric system using LDPC codes, in: Proceedings of IEEE International Symposium on Information Theory, 2008, pp. 2297–2301. [34] A. Vij, A. Namboodiri, Learning minutiae neighborhoods: a new binary representation for fingerprints, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshop, 2014, pp. 64–69. [35] W.J. Wong, A.B.J. Teoh, Y.H. Kho, M.L.D. Wong, Kernel PCA enabled bit-string representation for minutiae-based cancellable fingerprint template, Pattern Recognit. vol.51 (2015) 197–208. [36] X. Xing, K. Wang, Z. Lv, Fusion of gait and facial features using coupled projections for people identification at a distance, IEEE Signal Process. Lett. 22

M.-H. Lim et al. / Pattern Recognition 60 (2016) 706–719

(12) (2012) 2349–2353. [37] H. Xu, R.N.J. Veldhuis, A.M. Bazen, T.A.M. Kevenaar, T.A.H.M. Akkermans, B. Gokberk, Fingerprint verification using spectral minutiae representations, IEEE Trans. Inf. Forensics Secur. 4 (3) (2009) 397–409. [38] J. Yang, Z. Xu, Feature-level fusion of fingerprint and finger-vein for personal identification, Pattern Recognit. Lett. 33 (5) (2012) 623–628.

719

[39] Y. Yin, L. Liu, X. Sun, SDUMLA-HMT: a multimodal biometric database, in: Proceedings of Chinese Conference on Biometric Recognition, LNCS 7098, 2011, pp. 260–268. [40] Q. Zhang, Y. Yin, D.-C. Zhan, J. Peng, A novel serial multimodal biometrics framework based on semisupervised learning techniques, IEEE Trans. Inf. Forensics Secur. 9 (10) (2014) 1681–1694.

Meng-Hui Lim received his Ph.D. degree from Yonsei University, South Korea in 2012. He joined the Department of Computer Science of Hong Kong Baptist University as a postdoctoral research fellow for a year and he became a research assistant professor in the same university from 2013 to 2016. His research interests include pattern recognition, cryptography and biometric security. He is a member of the IEEE.

Sunny Verma obtained his Bachelor’s Degree in 2010 and Master’s Degree in 2012 from University of Delhi, India. He is currently a Ph.D. student at Faculty of Engineering and Information Technology, University of Technology, Sydney. From 2012 to 2014 he was a research assistant with department of Electrical Engineering, Indian Institute of Technology Delhi, India. Later he joined Department of Computer Science, Hong Kong Baptist University, Hong Kong as Senior Research Assistant from 2014 to 2015. His current research interests includes Biometric Recognition, Machine Learning, and Big Data.

Guangcan Mai received the B.Eng. degree in computer science and technology from South China University of Technology, Guangzhou, China, in 2013. He is currently pursuing the Ph.D. degree from the Department of Computer Science, Hong Kong Baptist University, Hong Kong. He is currently a Visiting Scholar in the Pattern Recognition and Image Processing (PRIP) Lab, Department of Computer Science and Engineering, Michigan State University, USA. His research interests include biometric security, multibiometric template protection and feature fusion.

Pong C. Yuen received his B.Sc. degree in Electronic Engineering with First Class Honours in 1989 from City Polytechnic of Hong Kong, and his Ph.D. degree in Electrical and Electronic Engineering in 1993 from The University of Hong Kong. He joined the Hong Kong Baptist University in 1993 and, currently is a Professor and Head of the Department of Computer Science. Dr. Yuen was a recipient of the University Fellowship to visit The University of Sydney in 1996. In 1998, Dr. Yuen spent a 6-month sabbatical leave in The University of Maryland Institute for Advanced Computer Studies (UMIACS), University of Maryland at College Park. From June 2005 to January 2006, he was a visiting professor in GRAVIR laboratory (GRAphics, VIsion and Robotics) of INRIA Rhone Alpes, France. Dr. Yuen was the director of Croucher Advanced Study Institute (ASI) on biometric authentication in 2004 and the director of Croucher ASI on Biometric Security and Privacy in 2007. Dr. Yuen has been actively involved in many international conferences as an organizing committee and/or technical program committee member. He was the track co-chair of International Conference on Pattern Recognition (ICPR) 2006 and the program co-chair of IEEE Fifth International Conference on Biometrics: Theory, Applications and Systems (BTAS) 2012. Currently, Dr. Yuen is an Editorial Board Member of Pattern Recognition and Associate Editor of IEEE Transactions on Information Forensics and Security, and Senior Editor of SPIE Journal of Electronic Imaging. He is also serving as a Hong Kong Research Grant Council Engineering Panel Member. Dr. Yuen's current research interests include video surveillance, human face recognition, biometric security and privacy.