Symmetric keyring encryption scheme for biometric cryptosystem

Information Sciences 502 (2019) 492–509 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

Download PDF

2MB Sizes 1 Downloads 78 Views

Report

PDF Reader
Full Text

Information Sciences 502 (2019) 492–509

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Symmetric keyring encryption scheme for biometric cryptosystem Yen-Lung Lai a, Jung Yeon Hwang b, Zhe Jin a, Soohyong Kim b, Sangrae Cho b, Andrew Beng Jin Teoh c,∗ a

School of Information Technology, Monash University Malaysia, Malaysia Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea c School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea b

a r t i c l e

i n f o

Article history: Received 4 January 2019 Revised 22 May 2019 Accepted 26 May 2019 Available online 28 May 2019 Keywords: Biometrics Fingerprint Face Symmetric encryption Locality-sensitive hashing Keyring model

a b s t r a c t In this paper, we propose a novel biometric cryptosystem for vectorial biometrics called symmetric keyring encryption (SKE), inspired by Rivest’s keyring model (2016). Unlike conventional biometric secret-binding primitives, such as fuzzy commitment and fuzzy vault approaches, the proposed scheme reframes the biometric secret-binding problem as a fuzzy symmetric encryption problem using a concept called a resilient vector pair. In this study, this pair resembles the encryption–decryption key pair in symmetric key cryptosystems. This scheme is realized using an index of maximum hashed vectors, a special instance of the ranking-based locality-sensitive hashing function. With a simple ﬁltering mechanism and an [m, k] Shamir’s secret-sharing scheme, we show that SKE, both in theory and in an empirical evaluation, can retrieve the exact secret with overwhelming probability for a genuine input yet negligible probability for an imposter input. Although SKE can be applied to any vectorial biometrics, we adopt ﬁngerprint and face vectors in this work. Experiments were performed using the Fingerprint Veriﬁcation Competition (FVC) and Labeled Face in the Wild (LFW) datasets. We formalize and analyze the threat model for SKE, which involves several major security attacks. © 2019 Published by Elsevier Inc.

1. Introduction Conventional security systems rely on a deterministic model to ensure information security [32]. This model works eﬃciently on the assumption that a legitimate user is always consistent in terms of presenting a constant, unique and speciﬁable cryptographic key. However, in practice, situations arise in which human and other factors undermine the possibility of exactness and uniqueness in a security cryptosystem. A deterministic security system may not work when fuzziness is involved. This has prompted the growth of error-tolerant cryptographic systems, which are considered to be more feasible under circumstances in which deterministic models do not apply. As a motivating example, in order to back up emails, a user may utilize the keywords in the subject line of the email. These keywords are then used as an encryption key to encrypt the content of the email, and the encrypted entity is stored safely in the cloud. When email recovery is required, the user should be able to perform decryption using the keywords of the email title as the decryption key. However, the ∗

Corresponding author. E-mail addresses: [email protected] (Y.-L. Lai), [email protected] (J.Y. Hwang), [email protected] (Z. Jin), [email protected] (S. Kim), [email protected] (S. Cho), [email protected], [email protected] (A.B.J. Teoh). https://doi.org/10.1016/j.ins.2019.05.064 0020-0255/© 2019 Published by Elsevier Inc.

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

493

keywords are often not identical to those used in the encryption phase, due to typos or forgotten phrases. Error tolerance with respect to the differences between the encryption and decryption keys then becomes an important property to ensure the success of secure email recovery, and error-tolerant cryptographic systems come to the rescue in this case. A well-known realization instance of error-tolerant cryptographic systems is fuzzy identity-based encryption (FIBE), as outlined by Sahai [41]. FIBE utilizes a descriptive attribute set X as an identity to decrypt a ciphertext that was encrypted by X’, provided that X and X are close enough. As an example, the chairperson of an EE engineering department might want to encrypt a document to all members of a hiring committee. The document would encrypt to the identity {“hiringcommittee”, “member”, “EE Engineering”}, and any party with an identity that contains all of these attributes would be able to decrypt the document. The advantage of FIBE is that the document can be stored on a simple, untrusted server, rather than requiring a trusted server to perform authentication checks before delivering a document. The integration of cryptographic schemes and biometrics is a special instance of error-tolerant cryptographic systems. This is because biometrics are prone to diverse variations and distortions, due to the presence of inherent and environmental noise. Two instances of biometric data from the same subject are rarely identical; it is thus impossible to exploit biometrics directly as encryption/decryption keys in conventional cryptographic schemes, where both keys need to be exactly the same. However, this problem does not exist in error-tolerant cryptographic systems, as these schemes tolerate differences in the biometric keys for encryption and decryption. This ﬁeld of study is known as biometric cryptosystems or biometric encryption [44]. Biometric cryptosystems are devised to protect secrets such as crypto keys by either binding/retrieving secrets with biometrics or by generating secrets from biometrics directly, rather than using passwords or tokens as secrets as is common in conventional cryptographic systems. Fuzzy commitment [21], fuzzy vaults [20] and fuzzy extractors [13,31] are three primitive instances of biometric cryptosystems. Biometric cryptosystems have been deployed in practice by GenKey, a Netherlands-based company, for elections and digital healthcare sectors, mainly within emerging economies [2]. GenKey products have been independently evaluated and have been shown to have a level of accuracy that is comparable to conventional biometric systems. The most notable deployment of biometric cryptosystems for face biometrics so far is an application in a watch-list scenario for a self-exclusion program for Ontario gaming sites [7]. In this paper, we introduce a novel biometric cryptosystem for vectorial biometrics (a biometric representation that appears in a ﬁxed-size vector form) and a secret binding called symmetric keyring encryption (SKE). SKE is an error-tolerant symmetric encryption construct that is motivated by the recently developed Rivest’s keyring model [40]. The key ingredient of Rivest’s keyring model is a concept called the resilient vector (RV) pair, which is analogous to the random key pair in conventional symmetric cryptosystems. We illustrate the idea as follows. Let Alice encrypt a secret using an RV (x), which is derived from a sample set x, known as a keyring [40]. One instance of a keyring is biometrics. The model allows either Alice herself or another user Bob to decrypt the secret via (x ), which is similar to (x). Decryption succeeds if the probability collision of (x ) and (x) exceeds a preset threshold value τ . In Rivest’s keyring model, the probability collision is quantiﬁed by S(x, x ) where S( · , · ) ∈ [0, 1] is a similarity measure between x and x . However, anyone who attempts to decrypt the secret with a keyring that is distinct from x will certainly fail if S(x, x ) < τ . This paper is organized as follows. In Section 2, we present two preliminaries that are relevant to SKE: index of maximum (IoM) hashing and Rivest’s keyring model. In Section 3, SKE is described in terms of the binding and retrieving phases, and Section 4 elaborates the resilience properties of SKE. The threat model and the associated security analyses are presented in Section 5. In Section 6, we provide a thorough empirical evaluation and analysis on SKE based on the ﬁngerprint and face vectors generated from the Fingerprint Veriﬁcation Competition (FVC) benchmark and the Labeled Face in the Wild (LFW) datasets, respectively. We also provide a comparison with other competing biometric cryptosystems. In Section 7, we show that SKE satisﬁes the four template protection criteria. Finally, concluding remarks are given in Section 8.

1.1. Related works In this section, we review several biometric cryptosystem primitives that are relevant to our scheme. One of the more notable works is by Juels and Wattenberg, which puts forward the concept of fuzzy commitment [21]. Consider a person with biometric inputs of different but close readings x ∈ F nq and x ∈ Fqn in a ﬁnite ﬁeld Fq of prime power q. For fuzzy commitment, binary biometric features i.e. q = 2 are used to generate an offset δ ∈ F2n , where δ = x x. In this sense, the Hamming weight of the offset ||δ|| indicates the Hamming distance of x and x . During enrollment, given a random codeword c ∈ F2n , which is an encoded secret of size F2k , we can conceal c by generating a secure sketch ss = c x with x. The codeword c is then hashed with a one-way hash function H and stored along with ss as a commitment (H(c), ss). In the veriﬁcation stage, given (H(c), ss), we can compute the corrupted codeword as c = ss x = δ c with x . Using a decoding algorithm Decodet with error tolerance capacity t, if ||δ|| ≤ t, then c can be recovered from c , i.e. Decodet (c ) = c. Finally, the decommitment is deemed successful if H (c ) = H (Decodet (c ) ). The fuzzy commitment scheme is conceptually simple and its error tolerance mechanism relies solely on error correction codes (ECCs) such as Bose–Chaudhuri–Hocquenghem (BCH), Reed–Solomon codes etc. The security of the fuzzy commitment scheme is attributed to the reconstruction complexity of the codeword c, provided an adversary has no knowledge of c or x from H(c) based on a random oracle model (ROM). One practical shortcoming of fuzzy commitment is the diﬃculty of providing a rigorous proof of security over the nonuniformity of x.

494

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

Another major biometric cryptosystem primitive that also involves an error tolerance property is the fuzzy vault [20]. Let k ≤ L. Given an unordered set x = {x1 , . . . , xL } where xi ∈ Fq , such as ﬁngerprint minutiae, if secret, s ∈ Fqk is encoded as coeﬃcients of a (k − 1 )-th order polynomial f(•), and a genuine set G = {(x1 , f (x1 ) ), . . . , (xL , f (xL ) )} can then be generated. The fuzzy vault protects the polynomial f by concealing G in a vault V. This can be performed by mixing in a number of chaff entries r using a chaff set C = {(xc(1 ) , yc(1 ) ), . . . , (xc(r ) , yc(r ) )}, i.e. V = G ∪ C . These chaff entries should be randomly generated and should not lie on f. The chaff set C is meant to prevent an adversary from distinguishing G from C. For secret retrieval, given a query set x = {x1 , . . . , xL } where xi ∈ Fq , the vault set is unlocked, leading to the revelation of f if x is suﬃciently close to x based on an error tolerance over the set difference. Here, we explain the details of standard polynomial reconstruction via polynomial interpolation over an unlocking set U = {(x j , f (x j )}tj=1 ⊆ V of size t. U can be generated by identifying the overlapped entries of x and x across the vault V of size |V| = L + r. In practice, U may also consist of chaff entries due to the errors in the noisy query set x . Therefore, the actual number of overlapped entries of x and x is t ≤ L. If t ≥ k, there must be a number of redundancies of at least t − k ≥ 0 in U to allow error correction. Thus, t is referred to as the error tolerance capacity. The secret can then be retrieved by decoding the unlocking set; otherwise, the decoding will always fail. Ideally, the mixing of genuine and chaff sets should be uniformly random to ensure maximum security. However, this assumption does not hold in practice, and over the years, several attempts have been made to rectify this issue. For instance, Clancy et al. [10] devised a criterion for the minimum distance for random placement of the chaff set from the genuine set in order to eliminate overlapping. This approach was later criticized, since in this case the chaff set could be easily recognized and would be prone to a statistical attack, which has a lower complexity than a brute-force attack [8]. Merkle et al. [34] observed that even when it is generated following the minutia distribution, the chaff set is unlikely to reside in those regions that are frequently occupied by the genuine set. In another work, Li et al. [29] suggested generating a ‘chaff set descriptor’ based on the distribution of the minutia descriptor, to give a closer chaff–genuine distribution around the associated minutia. However, this in fact contradicts the uniformity principle of a genuine chaff set mixture. The use of additional information to generate the chaff set, such as the orientation of the minutia point, was proposed by Nandakumar et al. [38], but another study [33] revealed that minutia points in close proximity tend to have a similar orientation, allowing an adversary to ﬁlter out the chaff set from the genuine set. The brute-force security of the fuzzy vault approach is another key concern, and is attributed to the hardness of the polynomial reconstruction. This is related to the list-decoding problem of enumerating valid solutions over a speciﬁc overlapping requirement (t) between x and x (see [16] for a detailed survey). Mihailescu [35] pointed out that in practice, fuzzy vaults are prone to low brute-force security. An attack can be used to systematically search through all subsets of Fqk , com|V| t pute its polynomial-by-polynomial reconstruction and verify its correctness, which requires a number ( )/( ) of trials on k k average. The low brute-force security may be due to the limited number of chaff entries that can be added (r), the limited number of minutia points (L), and the constraint on the polynomial order (k − 1). For instance, Nandakumar et al. [38] and Li et al. [29] reported their realizations as having brute-force complexity of around 232 and 235 , respectively, i.e. relatively low values in practice.

1.2. Motivation and contributions In this work, we treat the biometric secret-binding problem as a symmetric encryption–decryption problem. Based on this, we devise an SKE that comprises Shamir’s secret-sharing scheme, RV pairs and a simple ﬁltering mechanism. In our approach, an RV pair is used that resembles a symmetric key pair in symmetric cryptosystems, and this is realized by the hashed enrolled and query biometric vectors. IoM hashing is used with a niche property that preserves the similarity between different inputs [19]. Though both fuzzy vault and SKE use a polynomial to bind a secret, the genuine set of an SKE (which is analogous to the plain text in symmetric key systems) is encrypted by the RV, unlike the chaff set that is required for concealment of the genuine set in the fuzzy vault approach. Hence, secret retrieval is merely the reverse operation, i.e. decryption by means of query RV followed by polynomial interpolation. As discussed in the previous section, resilience is another major concern of biometric cryptosystems due to the noisy nature of this technique. In our scheme, we show that strong resilience can be achieved in SKE with an overwhelming probability for a genuine biometric input, based only on IoM hashing and a simple ﬁltering mechanism during decryption (Section 4). Although SKE is a biometric cryptosystem, it inherits the legacy of symmetric key cryptosystems. Hence, its threat model includes attacks on classical biometric cryptosystems such as brute-force and false-accept attacks. In addition, we consider the notion of tag indistinguishability, which is adopted from the concept of key privacy or anonymity (i.e. being unable to differentiate which key is being used for encryption) that is often used in public key cryptosystems [1] and fuzzy identity-based cryptosystems [28]. In our context, since a biometric input is unique and is associated with an individual over a lifetime, tag indistinguishability offers strong privacy protection to a user, allowing them to remain anonymous while encryption is achieved via SKE. We will analyze these aspects formally and comprehensively in Section 5. To sum up, the contributions of this paper are fourfold:

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

495

1. We put forward a novel simplistic biometric secret-binding scheme for vectorial biometrics called SKE, which is based on the notion of symmetric key cryptosystems. 2. We demonstrate an innovative use of IoM hashing that allows us to generate abundant IoM hashed entries as genuine entries in SKE, without being restricted by the original biometric vector size. Hence, the probability of failure of secret retrieval via Shamir’s secret-sharing mechanism due to corrupted genuine entries can be signiﬁcantly reduced. This unique trait also allows us to choose a higher order polynomial, which is beneﬁcial in strengthening the brute-force and false-accept securities. 3. We formalize and analyze the threat model of SKE, which consists of three major security threats, i.e. the brute-force and false-accept attacks and tag indistinguishability, and show how SKE can resist these attacks with different parameterization. 4. Although SKE was motivated as an error-tolerant symmetric encryption construct, it also serves as a biometric template protection (BTP) scheme [3,30]. We therefore show that SKE satisﬁes the four design criteria of BTP, namely noninvertibility, unlinkability, revocability and performance (Section 7). Although SKE is generic for any vectorial biometrics, we demonstrate the realization of SKE with ﬁngerprint and face vectors. We present a comprehensive set of empirical results with ﬁve subsets of ﬁngerprint benchmarks, using FVC 2002, FVC 2004 and FVC 2006 and an LFW dataset. 2. Preliminaries In this section, we give a brief introduction to Rivest’s keyring model and IoM hashing. 2.1. Rivest’s keyring model The keyring model is an error-tolerant symmetric cryptosystem that was proposed by Rivest [40]. The keyring refers to a “bag of keywords” that is distributed in a pair to both sender and receiver. This model relies on the keyword-matching game, which is favorable for two similar keyrings. Each keyring can be represented as a binary vector in which each single bit "1 ’ refers to a particular keyword, corresponding to its hashed value over the universe set U. For Alice and Bob with similar keyrings x ∈ U and x ∈ U, respectively, random vectors with length m such as φx = (ϕx(1 ) , . . . , ϕx(m ) ) ∈ U m and φx = (ϕx (1) , . . . , ϕx (m) ) ∈ U m can be generated through their respective resilient set vectorizer (RSV), (x, m, N) → φ x and (x , m, N) → φx with a shared nonce N (e.g. public random numbers). The random vector pair {φ x , φx } can then be employed to encrypt or decrypt a secret with an ECC. Formally, an RSV is a set vectorizer with length m, and has the property that for any two keyrings x and x P = J(x, x ) = |x ∩ x |/|x ∪ x | where J(., .) is the Jaccard similarity coeﬃcient. It follows that t ∼ Bin(m, P) where t is the number of positions at which φ x and φx agree. More explicitly, if a fraction P of x∪x is shared, then the fraction of the positions where φ x and φx agree follows a binomial distribution parameterized by m and P. In contrast, the number of disagreeing positions can be characterized by the Hamming weight of the offset ||δ|| = ||φx φx ||. Its expected value is E(||δ|| ) = m(1 − J(x, x ) ). In his proposal, Rivest put forward a way of playing the keyword-matching game using min-hashing [4] as the RSV. Rivest’s keyring model resembles fuzzy commitment in the sense that Alice conceals a codeword c = (c1 , . . . , cm ) ∈ U m with φ x via a secure sketch ss = c φx = (c1 ϕx(1 ) , . . . , cm ϕx(m ) ). Bob would then be able to identify the nearest codeword c in a reverse manner such as c = ss φx = c φx φx = δ c. With a suitable ECC over a tolerance capacity t, Bob is expected to retrieve the secret with a high probability if t > E(||δ|| ) = m(1 − J(x, x ) ). 2.2. Index-of-Max hashing IoM hashing [19] is a special instance of locality-sensitive hashing (LSH) [9] that uses a feature vector x ∈ Rd over a continuous domain Rd . Let Y ∈ Rq×d be a random projection matrix composed of q Gaussian row vectors, where each follows a standard normal N (0, 1 ) distribution. Furthermore, let N = (Y1 , . . . , Ym ) be an independent random projection matrix set. Given x, m, and N, IoM hashing is deﬁned as a mapping function IoM (x, m, N) : Rq×d × Rdxm → qm . The operation of IoM (x, m, N) is as follows: 1. Record the indices of the maximum value computed from ϕ x(i) = arg max Yi , x ∈ {0, 1, . . . , q}, where , is the inner i

product. 2. Repeat Step 1 for i = 1, . . . , m, yielding φx = (ϕx(1 ) , . . . , ϕx(m ) ). In other words, for prime q, IoM hashing embeds x onto a q-dimensional Gaussian random subspace to output a random integer over a prime ﬁeld Fq corresponding to the index of the maximum value. This process is repeated with m sets of independent Gaussian random matrices and random integer vectors to yield a collection of m independent IoM entries, ϕx(i) ∈ Fq , for i = 1, . . . , m. Both min hashing (as suggested in Rivest’s keyring model) and IoM hashing essentially follow the spirit of LSH, and both strive to ensure that the two vectors x and x with high similarity render a higher probability of collision (number of agreed

496

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

entries) between their output vectors φ x , and φx respectively. Conversely, if x and x are far apart, this will result in a low probability of hash collision between φ x , and φx ..Speciﬁcally, let φx = (ϕx(1 ) , . . . , ϕx(m ) ) and φx = (ϕx (1) , . . . , ϕx (m ) ) be the enrolled and query IoM hashed vectors, respectively. We can calculate the similarity of φ x and φx by meam suring their number of collisions t = Xi , where Xi ∈ {0, 1} is an independent and identically distributed (i.i.d) varii=1

able such that Xi = 1 if ϕx(i ) = ϕx (i ) and 0 otherwise. Formally, we have P(Xi = 1 ) = P(ϕx(i ) = ϕx (i ) ) = S(x, x ), where ∞ ·x S(x, x ) = 1q + a j (q )(cos θ )i is deﬁned as the similarity function of IoM hashing with cos θ = xxx and aj (q) is the coefj=1

ﬁcient that satisﬁes

1 q

+

∞

a j (q ) = 1 (see [14], Lemma 3).

j=1

In this paper, IoM hashing IoM (x, m, N) is portrayed as an instance of RSV where N corresponds to the public nonce in Rivest’s keyring model and t ∼ Bin(m, S(x, x )) with E(t ) = mS(x, x ).

3. Symmetric keyring encryption SKE inherits the notion of RSV from the keyring model as a means of symmetric key pair generation. Speciﬁcally, IoM hashing is naturally adopted by SKE in the same way as in RSV, as described in Section 2.2. In what follows, the public random Gaussian matrices set in IoM hashing and biometrics in the vector form correspond to the nonce and keyring in ˆ to refer to different random Gaussian matrix sets. These the keyring model, respectively. We use different nonces N and N ˆ are used to generate different random vectors, φx ∈ Fqm and φˆ x ∈ Fqm , respectively, called RVs. The main nonces N and N idea of SKE is set out below. Main idea of SKE Given that SKE is primarily used for secret binding and retrieval, we adopt Shamir’s threshold secret-sharing scheme for this purpose. Here, a ﬁnite ﬁeld polynomial f (. ) ∈ Fqk of order k − 1 is used for secret embedding. During enrollment, a user with input biometric vector x ﬁrst generates an RV φ x through RSV with nonce N, together with its polynomial projected correspondences, f(φ x ). A genuine set G = {(ϕx(1 ) , f (ϕx(1 ) ) ), . . . , (ϕx(m ) , f (ϕx(m ) ) )} is formed based on φ x and f(φ x ). At this point, SKE resembles the fuzzy vault approach, where G is used to bind a secret over the ˆ . This coeﬃcients of f. In our case, we aim to conceal f(φ x ). This is done via a second RV φˆ x = (ϕˆ x(1 ) , . . . , ϕˆ x(m ) ) with nonce N second RV acts as a random "keyset," which is independent of φ x , to encrypt f(ϕ x(i) ), i = 1,…,m and yields a public secure sketch ss = (ss1 , . . . , ssm ), where ssi = ϕˆ x(i ) f (ϕx(i ) ). As such, the encryption function simply follows the fuzzy commitment approach with an XOR operation, but includes an authentication tag TAG = (tag1 , . . . , tagm ), where tagi = H (ϕx(i ) ||ssi ||ϕˆ x(i ) ), and H is a cryptographic secure one-way hash function, where|| means concatenation. Note that the XOR operation is a simple addition modulo two operation, which offers a simpler method of analysis when it comes to the study of security and attacks. ˆ ) from the query biometric vector During secret retrieval, a query RV pair (φx , φˆ x ) can be generated with nonce (N, N x . In this way, we can verify the integrity of the secure sketch with the authentication tagi empowered ﬁltering mechanism. More speciﬁcally, since the encryption is done by releasing ssi = f (ϕx(i ) ) ϕˆ x(i ) if all three conditions are satisﬁed (i.e. ϕx(i ) = ϕx (i ) , ϕˆ x(i ) = ϕˆ x (i ) , and ssi remains unaltered), we can then generate tag i = H (ϕx (i ) ||ssi ||ϕˆ x (i ) ), which appears to be identical to tagi . Hence, the decryption succeeds and yields f (ϕx(i ) ) = ssi ϕˆ x (i ) = f (ϕx(i ) ). On the other hand, if all of these conditions do not hold, then f (ϕ x(i) ) = f(ϕ x(i) ) and decryption fails. In other words, the security of SKE relies upon the probability that both ϕx(i ) = ϕx (i ) and ϕˆ x(i ) = ϕˆ x (i ) . This probability characterizes the security of SKE, and this can be formalized for several security attacks such as the brute-force and false-accept attacks and tag indistinguishability (Section 5). When tag i = tagi , a genuine pair (ϕ x(i) , f(ϕ x(i) )) ∈ G can be revealed subject to the condition that H() must satisfy the property of collision resistance. For a suﬃcient number of revealed genuine pairs, we can construct an unlocking set U = {(ϕx( j ) , f (ϕx( j ) ))}tj=1 ⊆ G. If t ≥ k, the secret can be retrieved via polynomial interpolation using U. A high-level overview of SKE is shown in Fig. 1. The detailed steps of enrollment and secret retrieval are given in the following subsections. Enrollment ˆ , parameter m, a ﬁnite ﬁeld polynomial with degree k − 1 f (. ), and a Given input x ∈ Rd , two different nonces N, N one-way hash function H: {0, 1}∗ → {0, 1}l , the enrollment procedure of SKE is as follows: ˆ ), where φx = (ϕx(1 ) , . . . , ϕx(m ) ) and φˆ x = (ϕˆ x(1 ) , . . . , ϕˆ x(m ) ). Generate φ x ← IoM (x, m, N) and φˆ x ← IoM (x, m, N Perform polynomial projection to generate f (φx ) = ( f (ϕx(1 ) ), . . . , f (ϕx(m ) )). Encrypt f(φ x )to yield secure sketch ss = (ϕˆ x(1 ) f (ϕx(1 ) ), . . . , ϕˆ x(m ) f (ϕx(m ) )). Generate authentication tag TAG = (tag1 , . . . , tagm ), where tagi = H (ϕx(i ) ||ssi ||ϕˆ x(i ) ) is the one-way hashed output. ˆ , TAG, ss ) as the public helper data. 5. Store (N, N 1. 2. 3. 4.

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

497

Fig. 1. High-level overview of the process of symmetric keyring encryption.

Algorithm 1 Filtering mechanism for error checking. Input: TAG, TAG , φ x , φ RSV , ss U←∅ For i = 1, . . . , m If (tagi = tag i ) f (ϕx(i ) ) = ssi ϕˆ x (i ) U ← U ∪ {ϕx ( i ) , f ( ϕx ( i ) ) } End if End for U ← Unique(U) Output: U

Secret retrieval ˆ , TAG, ss} with input parameter m and the Given x ∈ Rd , the query biometric vector, and the public helper data {N, N same one-way hashing H, the detailed steps for key retrieval are as follows: 1. 2. 3. 4.

ˆ ), where φx = (ϕx (1 ) , . . . , ϕx (m ) ) and φˆ x = (ϕˆ x (1 ) , . . . , ϕˆ x (m ) ). Generate φx ← IoM (x , m, N ) and φˆ x ← IoM (x , m, N Compute TAG = (tag 1 , . . . , tag m ), where tag i = H (ϕx (i ) ||ssi ||ϕˆ x (i ) ). Run Algorithm 1 with input (TAG, TAG , φx , φx , ss) to yield an unlocking set U = {(ϕx( j ) , f (ϕx( j ) ) )}tj=1 ⊆ G. Perform polynomial reconstruction with U.

In Step 4, the secret can be retrieved via Lagrange interpolation if t ≥ k. Given (ϕx(i ) , f (ϕx(i ) ) ) ∈ Fq × Fq , there will be at most Fq × Fq unique genuine pairs in U. In addition, q and m are interrelated, where q < m, and this is used to improve accuracy [19]. This implies that the key retrieval rate would be low if we opt to increase q to a value close to or equal to m. As a solution, a large value of m is needed to compensate for the increment in q. This may lead to more non-unique genuine pairs in U and may increase the computational complexity of polynomial interpolation. Therefore, the ﬁltering out of non-unique genuine pairs as detailed in Algorithm 1 is crucial. Note that Unique(U) is a function that guarantees only unique genuine pairs present in U. The use of TAG in SKE is indispensable, as it serves three purposes: (i) it performs a conditional check to ensure only relevant ssi (i.e. when tagi = tag i ) can be decrypted. This can be seen as an error-checking mechanism for both ϕ x(i) and ϕx (i) ; (ii) it ensures the data integrity of ss, since if any alteration occurs in ssi , the decryption would fail; (iii) it facilitates polynomial interpolation by ensuring that only genuine pairs are within U, and hence the secret can be retrieved simply via a single step of polynomial reconstruction rather than requiring iterative decoding. It is useful to illustrate SKE using an example with numerical explication. Let x and x’ be the enrolled and query biometric vectors of Alice, respectively. x and x’ are not identical due to noise, but are close. Example 1. Public Algorithm: H, Algorithm 1 Public Parameters: m = 5, k = 3, q = 5 Enrollment: 1. Given a secret s = 2, Alice encodes this secret to a ﬁnite ﬁeld polynomial order k − 1, i.e. f = x2 + 2x + s mod (q). Supˆ, pose she generates φx = (3, 1, 2, 3, 2) ∈ Fqm and φˆ x = (2, 3, 0, 1, 1) ∈ Fqm with her biometric vector x via N and N respectively.

498

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

Fig. 2. Illustration of secret binding and retrieval in SKE.

2. Alice performs polynomial projection (φx ) = ( f (3 ), f (1 ), f (2 ), f (3 ), f (2 )) = (2, 2, 1, 2, 1). 3. Alice encrypts f(φ x ) via φˆ x to generate the secure sketch ss, that is ss = (10, 10, 01, 10, 01) (10, 11, 00, 01, 01) = (00, 01, 10, 11, 00) =(0, 1, 2, 3, 0). She generates TAG = (H (3||0||2),H(1||1||3), H(2||2||0), H(3||3||1), H(2||0||1)). ˆ , TAG, ss) . 4. Alice stores (N, N Secret retrieval: ˆ , respectively. 1. Alice generates φx = (3, 1, 1, 2, 2) ∈ Fqm and φˆ x = (2, 3, 1, 0, 1) ∈ Fqm with x via N and N 2. Using the identical one-way hash function, Alice generates TAG = (H (3||0||2), H(1||1||3), H(1||2||1), H(2||3||0), H(2||0||1)). 3. She runs Algorithm 1 to yield U. Given that three entries of TAG and TAG are matched, i.e. (H(3||0||2), H(1||1||3), H(2||0||1)), this implies that Alice can regenerate f with Lagrange interpolation with the unique genuine pair, i.e. U = ( (3, P (3 )), (1, P (1 )), (2, P (2 ))), since |U| = t ≥ k. The secret can be reconstructed, and the correctness can be veriﬁed using f (0 ) = f (0 ) = s = 2. ˆ are random and independent of biometric input, even though ss and TAG are dependent Note that the choice of N and N ˆ does not offer an additional advantage in generon RV or x and x . Hence, without knowing x or x , knowledge of N and N ating the RV, which can reveal the clue for ss decryption. Fig. 2 illustrates the enrollment and secret retrieval operation in SKE. 4. Resilience analysis In this section, we elaborate the resilience property of SKE. Suppose a pair of RV entries (enrolled and query) ϕ x(i) and

ϕx (i) are generated by N. Another pair ϕˆ x(i) and ϕˆ x (i) are generated by Nˆ .

Based on the i.i.d. attribute of IoM hashed entries, the collision probability of the enrolled and query entries is P(ϕx(i ) = ϕx (i ) ) = P = S(x, x ), where S(x, x ) is the compliance similarity function of IoM hashing given in Section 2.2. The same goes for P(ϕˆ x(i ) = ϕˆ x (i ) ) = P . Hence, the joint collision probability of these two probabilities is merely their product, ˆ . In what follows, the number of collisions t follows Bin(m, i.e. P 2 = (1 − dis(x, x ) )2 due to the independence of N and N P2 ), where P 2 = (S(x, x ) )2 , expected value E(t ) = mP 2 , and variance Var(t ) = mP 2 (1 − P 2 ). For further resilience evaluation, we let δ ∈ (0, 1) be a fraction of m and k = δ m. This allows us to deﬁne a security k threshold in terms of the fraction δ = m rather than based solely on k. Intuitively, for high resilience, given x ∈ Rd to k bind a secret s ∈ Fq with parameter (m, δ ), we would expect that any similar x ∈ Rd will retrieve the secret with an overwhelming probability (a probability close to one) if E(t ) = mP 2 > k or E(t )/m = P 2 > δ . We give an example below to illustrate the resilience of SKE. Example 2. Consider a polynomial of order k − 1 with k = δ m and m = 256. Hence, exact secret retrieval is only possible if t > k = δ m = 192 for δ = 34 . In other words, more than 75% of TAG(TAG ) entries tagi (tagi ) need to be successfully veriﬁed

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

499

(e.g. tagi = tag i ). It is reasonable to assume that for two genuine biometric vector pairs, S(x, x ) = 0.9 (i.e. 90% similar); hence, the probability of success for tagi = tag i is P 2 = S (x, x )2 = 0.81. Since t ∼ Bin(256, 0.81) with E(t ) = 207 > k, the secret can be retrieved with P(t > 192 ) = 0.989, i.e. an overwhelming probability for exact secret retrieval. The above example suggests that to achieve high resilience, we can reduce δ so that

E(t ) m

= P 2 > δ is easily satisﬁed.

5. Security In this section, we present the threat model of SKE. We divide the threats to SKE into two categories. Firstly, since SKE is essentially a biometric cryptosystem, conventional security attacks such as the brute-force and false-accept attacks will be discussed. Secondly, since SKE is a special type of error-tolerant symmetric key cryptosystem that incorporates the tag, we will analyze SKE security in terms of tag indistinguishability. 5.1. Security model Before we elaborate each potential security threat, we formalize the security model of SKE. Typically, for a large enough value of q, i.e. q2 > 280 , we can reduce our security to the computational hardness of ﬁnding k out of m collisions on TAG, subject to tagi = tag i , i = 1, . . . , m, by using an ROM over a uniformly random query vector. This security can be achieved by asserting higher values for q and m simultaneously in order to balance out the security performance over IoM hashing. However, this statement is not suﬃcient for our security goal, which involves showing that for a given query biometric vector x , it is infeasible to retrieve the bound secret unless x is close to x without assuming that x is uniformly random. For the sake of simplicity, one would choose to measure the biometric input entropy with Shannon entropy over a particular distribution (i.e. uniform), as demonstrated by Li et al. [26]. However, without a precise knowledge of the input distributions, it is diﬃcult to ensure that the lower bound of the measured Shannon entropy is truly characterized by the input minentropy over the worst-case distribution. To resolve this issue, the security goal is instead characterized by the similarity of x and x , which is more naturally distributed over some random distribution that does not need to be uniform. In the light of this, we exploit the input structure of the biometric vectors by means of their similarity using RSV. Formally, we adopt a random error model (see Section 8 of [13]) in which different entries of the RV pair have a probability of colliding (i.e. tagi = tag i characterized as P2 ) and non-colliding (errors) (given as 1 − P 2 , where P 2 = S (x, x )2 ) (Section 2.2). A similar implementation of this model can be found in [24]. This permits us to show that we can achieve security that is governed by the non-uniform biometric vectors and has high resilience, based on this similarity. However, it is unrealistic to assume that the error process of the biometric vector can be modeled accurately, and the same is true for estimation of the collision probability P. Meaningful security therefore needs to involve a precise knowledge of the input distribution, and this demands considerable human effort and the collection of a massive number of biometric samples, which is an expensive process. A notable example is the collection of IrisCode by Daugman [11],and the security is based on the empirical degree of freedom that estimared from over 11.5 million iris samples. We attempt to relax the security requirements due to a limited number of samples, and therefore use the maximum collision probability, denoted max . Formally, P max = max{P , . . . , P } is in principle the maximum imposter-matching score of the RV pairs over M as PM 1 MI I MI I scores of different query biometric vectors x . This enables us to ensure security for a smaller sample size by considering the inherited non-uniform nature of the biometric vector by means of min-entropy over a limited sample size. 5.2. Brute-force attack We ﬁrst examine the brute-force attack. For SKE, this refers to the situation where an adversary attempts to guess the secret directly or to guess the TAG. Guessing the secret s: Let the secret in the ﬁnite ﬁeld be s ∈ Fqk , where k − 1 is the polynomial order. The complexity of this attack relies on the size of the secret space, i.e. q and k. If a secret is encoded in 8 bits, q = 28 , and k = 20 is chosen, then the brute-force complexity of SKE is 2160 for guessing all the polynomial coeﬃcients. Guessing the TAG: In addition to this, we can reason about the complexity of the brute-force attack on the tag. Given a tagi = H (ϕx(i ) ||ssi ||ϕˆ x(i ) ), where ϕx(i ) ∈ Fq and ϕˆ x(i ) ∈ Fq , the adversary is expected to ﬁnd a collision on H subject to tagi = tagi after q2 trials. We therefore need q to be high enough to resist this guessing over TAG. 5.3. False-accept attack A false-accept attack (also known as a dictionary attack) refers to a scenario where an adversary manages to compromise a large dataset. S/he can then repeatedly attempt to verify the TAG with the compromised biometric instances. In this circumstance, the adversary is expected to gain access to the system with probability equal to the non-zero false-accept rate (FAR) of the system. As this attack exploits the FAR of the biometric system, it is possible to use an artiﬁcial template generator to launch this attack without the need to compromise any dataset. It is also feasible for any adversary with higher computational power to carry out better modeling of the input biometric distribution, allowing this adversary to launch a false-accept attack by randomly sampling a ﬁngerprint biometric vector according to the speciﬁc distribution. From this

500

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

point of view, the complexity of the false-accept attack can be deﬁned in terms of a false-acceptance probability, which can max )2 . We can therefore describethe maximum false-accept be calculated in the same way as shown in Example 2 with P 2 = (PM I attack complexity over MI secret retrieval attempts as

fa m, δ, PMmax = − log P t I

≥ max

PM

δm

,

(1)

I

max )2 ). where t(Pmax ) ∼ Bin(m, (PM I M Clearly, unlike the brute-force attack, the false-accept attack is independent of q. This means that increasing q would not contribute to higher false-accept security, and we observe that the false-accept attack is a much stronger attack that could serve as the lower bound to a brute-force attack through a very large q could be supplied in practice.

5.4. TAG indistinguishability Bellare et al. [1] ﬁrst introduced the notion of key indistinguishability, in which an adversary needs to determine whether ciphertexts were encrypted by the same key. This notion arose due to usability concerns when the same keys are used across different applications, as this triggers information leakage. The notion of key privacy was later used by Simeons et al. [42] in secure sketching, whereby each biometric sketching function is treated as a probabilistic encryption to study the information leakage across multiple sketches [5]. In the biometric cryptosystems literature, this notion leads to the so-called attack via record multiplicity (ARM) [34] or correlation attack [23]. In SKE, the stored tag is public. We need to evaluate the advantages gained by an adversary when the same biometric input is used to generate RVs and hence the corresponding tag. Formally, this scenario can be characterized as an indistinguishability game. In this game, an adversary holds two distinctive tags and attempts to learn from them in order to identify whether the tags are from the same user. A tag that corresponds to a particular biometric should have the property of indistinguishability, to ensure that no or negligible information can be learned by the adversary. ˆ with a polynomial f, the γ -tag indistinguishability between a challenger and an Given x ∈ Rd and nonces N and N adversary with parameter m can be described as follows: 1. The challenger generates a biometric vector x ∈ Rd and an RV φ x = IoM (x, m, N). Meanwhile, s/he generates φˆ x = IoM (x, m, Nˆ ). S/he then computes the corresponding TAG∗ and sends it to the adversary. 2. The challenger randomly chooses an integer K ∈ {1, . . . , γ } and generates a sequence of tags (TAG1 , . . . , TAGγ ). The K-th ˆ ) from another biometric vector x ∈ Rd , where S(x, TAG is generated by φ = IoM (x , m, N ) and φˆ = IoM (x , m, N x

x

x ) > δ and δ ∈ (0, 1) is a fraction of m as deﬁned in Section 4. The rest of the γ − 1 tags are generated by φxR( j ) = ˆ ) from random biometric vectors xR( j ) ∈ Rd , which hold for S(x, xR(j) ) < δ IoM (xR( j ) , m, N) and φˆ xR( j ) = IoM (xR( j ) , m, N where j = 1, . . . , γ − 1. The challenger then sends (TAG1 , . . . , TAGγ ) to the adversary. 3. The adversary outputs an integer K ∈ {1, . . . , γ } and wins if K = K. The adversary’s advantage in the q-tag indistinguishability game can be deduced from the n-indistinguishability game discussed in [37]. In our context, we measure S(x, x ) and S(x, xR(j) ), corresponding to the IoM hashing similarity function over δ ∈ (0, 1). We refer to the adversary as an SKE-IND adversary, with an advantage as follows:

1 AdvSKE−Ind = P K = K − . γ − 1 γ γ

(2)

Deﬁnition 1. Under the same IoM hashing setup (an identical random projection matrix set), an SKE is (γ , )indistinguishable in δ ∈ (0, 1) if for any SKE-IND adversary AdvSKE−Ind ≤ and is perfectly indistinguishable if AdvSKE−Ind = 0. It is crucial to note that the indistinguishability game considers the same secret polynomial f(.) that is protected under SKE through encryption with φˆ x . In practice, different secrets may be protected using the same input φˆ x for different applications. Nonetheless, the decoding can succeed only when the number of unique, genuine pairs in U is at least the size of the secret, i.e. t ≥ k. This implies that as long as f is ﬁxed at order k − 1, binding different secrets (coeﬃcient of f) has no signiﬁcant advantage for the adversary. Hence, our security analysis should hold for both cases. Furthermore, with an XOR operation as encryption/decryption, the distribution of the tag or RV highly depends on the biometric vector, and it is more appropriate to analyze the information leakage directly from S(x, x ) with respect to its TAG or RV. Hence, the similarity measurement S(x, x ) becomes an important factor in deriving AdvSKE−Ind . While the indistinguishability game consists of samples x, x and xR(j) , it is natural to have different unlocking set sizes t(x,x ) , and t(x, xR( j ) ) corresponding to the different cases when S(x, x ) > δ , and S(x, xR(j) ) < δ for j = 1, . . . , γ − 1, respectively. We deﬁne the false-rejection probability FRP = 1 − P(t(x,x ) ≥ k ) and the false-acceptance probability FAP = P(t(x,

xR ( j ) )

≥ k)

as measures of information leakage due to S(x, x ) and S(x, xR(j) ), respectively, over the sketches during the q-tag indistinguishability game. This information leakage will contribute to the advantage of the adversary, AdvSKE−Ind . We now provide a detailed argument for the security of tag-indistinguishability with respect to AdvSKE−Ind . Suppose we have a non-zero FAP for each secret retrieval attempt. This implies that some minimum amount of information must leak at

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

501

each attempt. Conversely, the FRP needs to be taken into consideration to avoid overestimation of such information leakage. Assume that an adversary possesses TAG∗ and (TAG1 , . . . , TAGγ ); then, to distinguish TAG∗ from (TAG1 , . . . , TAGγ ), s/he needs a biometric vector x∗ that satisﬁes both S(x, x∗ ) > δ and S(x , x∗ ) > δ . This eventually allows the adversary to distinguish the K-th TAG from others by simply running a secret retrieval algorithm across all the available tags (TAG1 , . . . , TAGγ ) and TAG∗ . Clearly, the adversary can achieve this with minimum effort if x∗ = x by merely focusing on a single case S(x, x∗ ) > δ . If the adversary manages to distinguish the K-th sketch with non-negligible probability, i.e. advantage AdvSKE−Ind = , this implies that s/he is able to ﬁnd x∗ = x subject to S(x, x∗ ) > δ with non-negligible probability as well. Thus, the notion of tag indistinguishability captures the possibility of a better solution for carrying out secret retrieval than brute-force and false-accept attacks, stimulated via the γ -tag indistinguishability game. In such an event, we ﬁnd that the tag indistinguishability is at most as hard as the false-accept attack, which can be reduced to the hardness of ﬁnding a collision on H. Lemma 1. In a γ -tag indistinguishability game, AdvSKE−Ind = γ 1−1 |FAP − FRP|. Proof. : We argue that the value of follows Deﬁnition 1. Given that δ is neither 0 or 1, this means there will be a non-zero FAP for every secret retrieval attempt with x∗ over the TAG j=1,...,γ −1 generated from xR(j) . The total false-acceptance probability can therefore be deﬁned as FAP =

γ −1 j=1

FAP j =

γ −1 j=1

P(t(xR( j ) ,x) ≥ k ). On the other hand, the secret retrieval would not

succeed with a probability of one but would have a signiﬁcant false-rejection probability FRP = 1 − P(t(x,x ) ≥ k ). Combining these two results leads to P[K = K ] = γ1 (P(t(x,x ) ≥ k ) + AdvSKE−Ind = γ 1−1 |1 − FRP +

γ −1 j=1

γ −1 j=1

P(t(xR, j ,x) ≥ k )) = γ1 (1 − FRP +

n −1

FAP j − 1 ). It follows that

j=1

FAP j − 1| = γ 1−1 |FAP − FRP| and the lemma is proved.

A zero FAR and false-reject rate (FRR) in a biometric cryptosystem with limited query instances does not necessarily imply perfect indistinguishability, because a non-zero FAP or FRP may still exist for every match. Thus, an information leakage γ −1 measure is required unless we are able to achieve ideal circumstances such as P(t(xR( j ) ,x) ≥ k ) = 0 and P(t(x,x ) ≥ k ) = 1; j=1

hence, AdvSKE−Ind = 0. This suggests S(x, x ) = 1 (a perfect match) and S(x, xR( j ) ) = 0 (complete unlinkability), which would not arise in practice. We give an illustration to highlight the computation of AdvIoM−Ind for particular parameters. Example 3. Suppose m = 256, k = δ m with δ = 12 and x is 80% similar to x such that S(x, x ) = 0.8. Suppose xR(j) is only 50% similar to x, so S(x, xR( j ) ) = 0.5. Thus P 2 = (S(x, x ) )2 for the former is 0.82 and for the latter is 0.52 . It follows that t(x, xR( j ) ) ∼ Bin(m, 0.52 ) and t(x,x ) ∼ Bin(m, 0.82 ). For TAG j=1,...,γ −1 generated from xR(j) , S(x, xR( j ) ) = 0.5, and the total falseaccept probability is FAP = (7.55 × 10−18 )(γ − 1 ). In addition, TAG∗ generated from x has S(x, x ) = 0.8, so that the FRP = 6.30 × 10−5 . Based on Lemma 1, AdvSKE−Ind = γ 1−1 |(7.55 × 10−18 )(γ − 1 ) − 1.75 × 10−6 |. We suppose there are γ = 210 ≈ 1024 TAG available, AdvSKE−Ind = 1.71 × 10−9 . These results can further be converted into entropy as − log(AdvSKE−Ind ) ∼ 30 bits. 6. Experiments and discussion In this section, we present a comprehensive performance evaluation of secret retrieval using ﬁngerprint and face biometrics. We adopt ﬁngerprint vectors x with length 299, generated from the ﬁngerprint minutiae proposed in [18]. The conversion process consists of two sequential stages: generation of the minutia cylinder-code [6] with the publicly available MCC SDK, and a kernel principal component analysis (KPCA) transformation. For the face vector, we adopt a pre-trained convolution neural network dedicated to face recognition, namely InsightFace [12]. InsightFace employs a loss function called the additive angular margin loss for learning. Using InsightFace pre-trained with MS-Celeb-1M, a face vector of length 512 can be obtained. Our experiments were carried out using FVC 2002 (subsets DB 1 and 2), FVC 2004 (subsets DB 1 and 2), and FVC 2006 (subset DB 1) [30]. Each of these datasets has a Set A, which contains 100 × 8 ﬁngerprints for FVC20 02 and FVC20 04 and 140 × 12 ﬁngerprints for FVC2006, and Set B, which contains 10 × 8 ﬁngerprints in FVC2002 and FVC2004 and 10 × 12 ﬁngerprints in FVC2006. We used Set B to generate the KPCA projection matrix and Set A for the secret retrieval experiments. We adopted the LFW dataset [17] for SKE evaluation using face biometrics. LFW consists of 7701 images of 4281 subjects. Using the protocol outlined in [17], 60 0 0 face pairs were divided into 10 disjoint subsets for cross validation, with each subset containing 300 genuine pairs and 300 impostor pairs. 6.1. Experimental performance evaluation For the experiments on the ﬁngerprint datasets, we used the full FVC protocol for the genuine and imposter attempt tests, which are described in [30] as follows:

502

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509 Table 1 Experiment results for SKE with various k. Fingerprint

Face

FVC 2002

DB1 FRR(%) FAR(%) EER(%)

k =8 0.29 0.14 0.75

k =12 0.38 0.02 0.75

k =14 0.38 0.00 0.75

k=16 0.52 0.00 0.75

k=18 0.61 0.00 0.75

k=20 0.67 0.00 0.75

k=24 0.76 0.00 0.75

FVC 2004

DB1 FRR(%) FAR(%) EER(%)

k =8 5.90 3.95 4.46

k =12 9.10 0.95 4.46

k =14 10.66 0.52 4.46

k=16 11.89 0.38 4.46

k=18 12.71 0.23 4.46

k=20 13.81 0.14 4.46

k=24 15.43 0.09 4.46

FVC 2002

DB2 FRR(%) FAR(%) EER(%)

k =8 0.33 0.14 0.30

k =12 0.33 0.00 0.30

k =14 0.38 0.00 0.30

k=16 0.48 0.00 0.30

k=18 0.48 0.00 0.30

k=20 0.48 0.00 0.30

k=24 0.62 0.00 0.30

FVC 2004

DB2 FRR(%) FAR(%) EER(%)

k =8 8.33 2.72 6.64

k =12 10.24 0.89 6.64

k =14 11.19 0.50 6.64

k=16 11.91 0.42 6.64

k=18 12.57 0.28 6.64

k=20 13.76 0.22 6.64

k=24 15.05 0.12 6.64

FVC 2006

DB2 FRR(%) FAR(%) EER(%)

k =8 0.73 1.24 0.86

k =12 1.22 0.39 0.86

k =14 1.52 0.26 0.86

k=16 1.75 0.16 0.86

k=18 2.06 0.14 0.86

k=20 2.38 0.11 0.86

k=24 3.06 0.06 0.86

LFW

FRR(%) FAR(%) EER(%)

5.83 0.00 0.97

10.03 0.00 0.97

11.83 0.00 0.97

13.93 0.00 0.97

16.20 0.00 0.97

18.43 0.00 0.97

23.33 0.00 0.97

6.1.1. Genuine attempt For each subject, each ﬁngerprint vector is used for enrollment, and other ﬁngerprint vectors from the same subject are used to try to retrieve the secret with the procedure described in Section 3. For 100 subjects, the procedure leads to a total number of (100 × 8 × 7)/2 = 2800 genuine matches, denoted as MG = 280 0. For FVC 20 06 DB2, we have (140 × 12 × 11)/2 = 9240 genuine matches.

6.1.2. Imposter attempt For each subject, the ﬁrst ﬁngerprint vector is used for enrollment, and the second ﬁngerprint vectors of other subjects are used to decrypt C for secret retrieval. For 100 subjects, this leads to a total number of (100 × 99)/2 = 4950 imposter matches, as denoted by MI = 4950 earlier. For FVC 2006 DB2, we have (140 × 139)/2 = 9730 imposter matches. For the experiments on the face dataset, we followed the standard protocol described in [17], which yielded 30 0 0 pairs of genuine comparisons and 30 0 0 pairs of impostor comparisons. Generally, once q is ﬁxed, only two major parameters need to be tuned, i.e. (δ , m). In this case, we simply ﬁxed q = 251 for all the experiments. Since q and m are the parameters associated with IoM hashing, we ﬁxed m = 1024 for all datasets (FVC 20 02, 20 04 and LFW) as this is the best parameter for q = 251 for the FVC datasets, as reported in [19]. We repeated k the experiments with different values of δ = m with k = 8, 12, …, 24. The FRR and FAR of the experimental results are presented in Table 1. From Table 1, we can observe similar performance trajectories for FAR and FRR for all datasets when k (or δ ) grows large. A larger value of k increases the FRR, while smaller k improves FRR but unfortunately impairs FAR. Nevertheless, the FAR remains extremely small for the FVC 20 02, 20 04 and LFW datasets regardless of the value of k, i.e. most values are zero due to the strong resilience property of SKE, as justiﬁed in Section 4. Only for FVC 2006 DB2 do we expect k > 24 to give a lower FAR.

6.2. Performance preservation of SKE In this section, we demonstrate that SKE with its best parameters manages to preserve the accuracy performance at the same level as in the original ﬁngerprint and face vector. Note that this is not a trivial task; the performance of the original biometric system is often not preserved, since the errors accumulate when propagated to the encryption domain of biometric cryptosystems. Table 2 illustrates the relative performance in terms of the equal error rate (EER) of three systems in different domains: biometric vectors (feature domain), IoM hashed vectors (max ranked domain), and SKE (encryption domain). Using the same evaluation protocol for genuine and imposter attempts, we noticed to our surprise that SKE achieved better performance for the FVC2002 DB1, FVC2004 DB1, FVC2004 DB2, FVC2002 DB2 and LFW datasets, while performance for FVC2006 DB2 and LFW was slightly reduced. This performance gain could be attributed to the use of the long IoM hashed vector (m = 1024), error checking with TAG, and Shamir’s secret-sharing scheme.

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

503

Table 2 Various benchmarks for SKE. Equal error rate (EER,%) (error rate where FAR = FRR) Dataset

Fingerprint

Face

FVC2002 FVC2002 FVC2004 FVC2004 FVC2006 LFW

DB1 DB2 DB1 DB2 DB2

Original biometric feature vectors (cosine similarity measure)

IoM hashed vector (m = 1024, q = 251) (normalized Hamming distance over ﬁnite ﬁeld)

δ=

0.56 0.28 4.29 6.98 0.70 0.72

0.75 0.28 4.73 6.82 0.72 0.83

0.75 0.30 4.46 6.64 0.86 0.97

SKE (m = 1024, 8 , q = 251) 1024

Table 3 Average encoding and decoding rates (s) for SKE parameter setup, where m = 1024, q = 251 for various k. Average encoding time (sec) Dataset Fingerprint

Face

FVC2002 FVC2002 FVC2004 FVC2004 FVC2006 LFW

DB1 DB2 DB1 DB2 DB2

Average genuine/imposter decoding time (sec)

k=8

k = 18

k = 36

k = 54

k=8

k = 18

k = 36

k = 54

0.425 0.471 0.434 0.422 0.444 0.421

0.954 0.815 0.858 0.916 0.903 0.893

1.612 1.710 1.560 1.584 1.677 1.467

2.645 2.412 2.256 2.255 2.554 2.534

0.361/0.185 0.593/0.178 0.354/0.177 0.401/0.187 0.533/0.213 0.433/0.225

0.453/0.160 0.6115/0.173 0.389/0.165 0.387/0.169 0.641/0.310 0.634/0.289

0.678/0.178 0.737/0.170 0.469/0.163 0.454/0.167 0.804/0.238 0.781/0.222

0.985/0.183 1.004/0.177 0.652/0.208 0.595/0.165 0.939/0.171 0.889/0.155

6.3. Computation time of SKE Table 3 illustrates the average encoding time, deﬁned as total time taken , no. retrieval trials

total time taken no. enrollment

during enrollment, and the average decoding time,

deﬁned as for genuine and imposter attempts with different values of k. It is noted that the average encoding and decoding times for genuine attempts increase when k becomes larger. However, the decoding time for imposter attempts is close to a mean of 0.187 ± 0.12 s, regardless of the value of k. This observation supports the claim that we would expect only a genuine user to be able to retrieve a secret with a large amount of unique genuine pairs. 6.4. Computational security over a speciﬁc secret parameter In ideal circumstances, the SKE can achieve a theoretically favorable level of formal security, as illustrated in Section 5. However, due to the uncertainty in the nature of biometrics, it would be imprudent to claim a guarantee of security for a limited sample size. In view of this, it would be useful to benchmark the computational security of this scheme based on the FVC and LFW datasets with respect to a speciﬁc secret parameter that is always deﬁned computationally, i.e. an adversary with limited computational power. In particular, this requires the individual IoM hashed vector to have high entropy. This requirement can be met by performing element-wise projection onto each output of the IoM hashed vector. More speciﬁcally, for a secret revocation set r = {r1 , . . . , rm } with parameter q , where ri ∈ {0, 1, . . . , q − 1}, the IoM hashed vectors φ x and φˆ x are transformed to r φx mod (q ) = (r1 ϕx(1 ) , . . . , rm ϕx(m ) ) mod (q ) and r φˆ x mod (q ) = (r1 ϕˆ x(1 ) , . . . , rm ϕˆ x(m ) ) mod (q ), respectively. From this perspective, the original IoM hashed vectors are updated element-wise to enclose elements of larger range, i.e. from log q to q . Since this update is carried out element by element, the impact of the matching performance of different IoM hashed vectors is insigniﬁcant, meaning that the system performance can be preserved. In practice, the secret revocation set r resembles a system-wise secret key or a user’s token/smart card, and can allow the key or template revocation to be more eﬃcient [22,39] without regeneration of the IoM hashed vector. The secret revocation set also offers high entropy (with high value of q ) for system security without requiring precise knowledge of the biometric input distribution. Recall that the complexity estimation of the brute-force and false-accept attacks requires a knowledge of the maximum max over M imposter matches of different RVs. We therefore record the values of P max for FVC and collision probability PM I MI I max =0.1006, 0.1279, 0.2047, 0.1846, 0.1113 for FVC2002 DB1, DB2, and FVC2004 DB1, DB2 LFW with MI = 4950, which are PM I and LFW, respectively. We also record the average genuine matching scores of the RV pairs over MG = 2100 of different query biometric vectors x , which are P¯MG = 0.5944, 0.7099, 0.5479, 0.5561, 0.2391 for FVC2002 DB1, DB2, and FVC2004 DB1, max and P¯ DB2, LFW (MG = 30 0 0), respectively. Both PM MG are needed for the calculation of computational security. We now I investigate the computational security of SKE. In particular, we show how to reduce the security of SKE to the computational hardness of searching for a collision over a typical cryptographic one-way hash function H. This can be achieved using an additional secret revocation set r. We choose m = 1024 and q = 251, where the latter is the original parameter used in Tables 1–3, and set k = 12, 18, 54 and 60 for our study of computational security. In addition, a common secret revocation

504

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509 Table 4 Computational security of SKE with parameter set m = 1024, q = 251, q = 280 and k = 12, 18, 54 and 60. Complexity (bits)

Brute force (Guessing the secret s) Brute force (Guessing TAG): False-accept attack TAG indistinguishability

k = 12

Brute force (Guessing the secret s) Brute force (Guessing TAG): False-accept attack TAG indistinguishability

k = 18

Brute force (Guessing the secret s) Brute force (Guessing TAG): False-accept attack TAG indistinguishability

k = 54

Brute force (Guessing the secret s) Brute force (Guessing TAG): False-accept attack TAG indistinguishability

k = 60

FVC2002 (ﬁngerprint)

FVC2004 (ﬁngerprint)

LFW (face)

DB1

DB2

DB1

DB2

–

>2256 280 2 2

1 1

1 1

1 1

1 1

>2256 280 6 6

2 2

1 1

1 1

4 4

>2256 280 71 64

43 43

5 5

10 10

59 12

>2256 280 87 63

54 54

8 8

15 15

72 11

set r with q > q = 280 is chosen and applied to all IoM hashed vectors; we therefore refer to q instead of q in the following evaluation. Brute-force attack (Section 5.1): (Guessing the secret s): The brute-force complexity of guessing the secret s can be estimated at log (q )k > 256 bits. (Guessing the TAG): The brute-force complexity of guessing the TAG can be estimated at log(q ) = log(280 ) = 80 bits. False-accept attack (Section 5.2): The maximum complexity (or maximum achievable system entropy) of the false-accept attack can be estimated from max ) = − log (P (t max > δ m ) ) as 71, 43, 5, 10 and 69 bits for FVC2002 DB1, DB2, and FVC2004 DB1, DB2, and LFW, fa(m, δ, PM (P ) I

MI

respectively. Note that the adversary is expected to ﬁnd a collision on H subject to tagi = tagi after log (q ) trials. Therefore if max ) < log (q ), this implies that the adversary can ﬁnd a collision on H with a lower number of trials compared to a fa(m, δ, PM I BF approach. In this case, the complexity of the brute-force attack is reduced to that of the false-accept attack by minimum attack complexity. Note that this can be further reduced to the computational hardness of searching for a collision over the cryptographic one-way hash function with − log(P(t(Pmax ) > δ m ) ) bits security based on the probability of the collision argument between different IoM hashed elements.

MI

TAG indistinguishability (Section 5.3): Following Lemma 1, the advantage for the SKE-IND adversary can be estimated from AdvSKE−Ind = γ 1−1 |FAP − FRP|, γ −1 γ −1 2 where FAP = FAP j = P(t(xR( j ) ,x) ≥ k ) and FRP = 1 − P(t(x,x ) ≥ k ). Since t(x, xR( j ) ) ∼ Bin(m, S(x, xR( j ) ) )and t(x,x ) ∼ j=1 2

j=1

Bin(m, S(x, x ) ), computing t(x,

and t(x,x ) requires a precise knowledge of S(x, xR(j) ) and S(x, x ). It is therefore natural to adopt = S(x, xR( j ) ) for j = 1, . . . , γ − 1. Conversely, P¯MG = S(x, x ) is chosen for the estimation of P(t(x,x ) ≥ k ), and so FRP = 1 − P¯MG . max =0.1006 and P¯ 10 We take FVC 2002 DB1 as an instance of estimation, where PM MG = 0.5944. Given γ = 2 , FAP = max PM I

xR ( j ) )

I

(210 − 1 )(4.53 × 10−22 ) and FRP = 1.11 × 10−16 can be computed, and hence AdvSKE−Ind = 1.081 × 10−19 or its entropy form − log(AdvSKE−Ind ) = 64 bits, which is lower than the false-accept complexity of 71 bits. In addition, it is numerically veriﬁable that an increase in γ would eventually lead to lower AdvSKE−Ind or higher − log(AdvSKE−Ind ). For instance, for γ = 26 , 210 , 220 , 230 , 240 and 250 , the values of − log(AdvSKE−Ind ) = 59, 64, 72, 71, 71 and 71 bits, respectively, and sooner will be bounded by the false-accept attack complexity. This suggests the existence of a more eﬃcient algorithm with the computational complexity of the lower bound of the false-accept attack complexity. Remarkably, this statement holds for any two distinct TAGs sourced from similar biometric vectors over γ TAGs. A lower value of γ indicates that the adversary has a higher chance of retrieving the secret. Likewise, an analysis of BF and FA attacks shows that the TAG indistinguishability can be reduced to the computational hardness of searching for a collision over H with a random secret revocation set r. Table 4 presents the computational security of SKE with different parameter settings of k after set r inclusion.

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

505

Table 5 Comparison based on the full FVC protocol with parameter set m = 1024, q = 251, q = 280 and k = 12, 31, 36 and 45. FRR, given as FAR=0 ((%)) / EER (%) FVC2002

Yang et al. [47] Yang et al. [46] Li et al. [29] Li et al. [29] Lai et al. [25] SKE (k = 12) SKE (k = 31) SKE (k = 36) SKE (k = 45)

FVC2004

FVC2006

DB1

DB2

DB1

DB2

DB2

≥35.8/11.8 −/4.5 16.6/− /31.2 2.9/− 0.38/0.75 0.92/0.75 0.92/0.75 0.95/0.75

>26.8/10.4 −/5.99 11.5/− −/27.7 5.3/− 0.38/0.30 0.84/0.30 0.99/0.30 1.23/0.30

−/− −/− −/− −/− 13.4/− −/4.56 18.48/4.56 22.7/4.56 24.3/4.56

≥57.6/20.6 ≥26.1/− ≥24.8/− −/− 24.3/− −/6.64 −/6.64 18.04/6.64 20.14/6.64

−/− −/3.07 – −/− −/− −/0.86 −/0.86 −/0.86 5.46/0.86

Table 6 Comparison of the 1vs1 protocol with parameter set m = 1024, q = 251, q = 280 and k = 12, 18 and 54. FRR, given as FAR=0 ((%)) / EER (%) FVC2002

Nandakumar et al. [38] Li et al. [29] Yang et al. [47] Yang et al. [46] Yang et al. [45] Tams et al. [43] Li et al. [29] Nagar et al. [37] Nagar et al. [36] Lai et al. [25] SKE (k = 12) SKE (k = 18) SKE (k = 54)

FVC2004

FVC2006

DB1

DB2

DB1

DB2

DB2

−/− 14.0/− ≥8.0/3.38 4.0/− 15.0/− −/− 2.0/− – – 5.0/− 3.0/0.81 4.0/0.81 4.0/0.81

14.0 8.0/− >6.0/0.59 2.0/1.02 7.0/− 21.0/− 0.0/0.0 7.00/0.0 -/1.00 5.0/− −/0.00 0.0 0/0.0 0 2.0 0/0.0 0

−/− −/− −/− −/− −/− −/− −/− −/− −/− 10/0/− −/3.01 −/3.01 20.0/3.01

−/− −/− ≥41/14.88 ≥24.72/− −/− −/− 23.4/− −/− −/− 11.0/− −/3.94 −/3.94 17.0/3.94

−/− −/− −/− −/4.83 −/− −/− −/− −/− −/− −/1.06 −/1.06 8.57/1.06

6.5. Comparison with the state of the art Given that SKE shares certain similarities with the fuzzy vault approach, since both use a polynomial reconstruction mechanism, we provide a numerical comparison with state-of-the-art fuzzy vault implementations. This comparison is made based on existing implementations using ﬁngerprint datasets (FVC 20 02, 20 04). Table 5 presents the results of this comparison based on the full FVC testing protocol for genuine and imposter attempts. The comparison results in Table 6 are based on the 1vs1 protocol for genuine attempts (e.g. the ﬁrst sample is used for enrollment, while the second sample is used for secret retrieval) and the full FVC protocol for imposter attempts. Table 6 shows the results for SKE with k = 12, 18 and 54 in comparison with alternative techniques. The SKE with k = 12 and k = 18 outperforms the other options, with higher system security (based on residue entropy in this case), while the SKE with k = 54 shows a higher value of FRR compared to the schemes of Li et al. [27] and Yang et al. [45] for FVC2002 DB1 and DB2. When the full FVC protocol is applied, which is considered more diﬃcult than the 1vs1 protocol, the proposed SKE approach with k = 12, 31, 36 and 45 outperforms the other methods with lower values of FRR at FAR = 0% (Table 5). 7. Template protection properties of SKE As an instance of a biometric cryptosystem, SKE can also be treated as a BTP scheme [3,28]. This approach must satisfy four criteria: noninvertibility, unlinkability, revocability and performance. In this section, we evaluate SKE based on each of these properties, and primarily focus on the RV of RSV (e.g. IoM hashing). 7.1. Noninvertibility In a BTP scheme, it should be computationally infeasible to reverse engineer the original biometric data from the protected template or/and the helper data, i.e. the sketch and TAG in SKE. This prevents privacy invasion and security attacks

506

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

Fig. 3. Pseudo imposter score and imposter score for FVC2002 DB1.

on biometric systems. This property is addressed by TAG indistinguishability, as described in Section 5.3. More speciﬁcally, we measure the advantage for an adversary who is trying to compromise the SKE through potential strategies that are not limited to the brute-force and false-accept approaches. Given an adversary with pre-knowledge that two different TAGs of a number of tags γ are sourced from the same user, we ﬁnd that s/he eventually gains an advantage and is able to reconstruct the secret with complexity that has a lower bound of the false-accept attack. As demonstrated below, our results show that for γ = 26 , 210 , 220 , 230 , 240 and 250 , the computed advantage of the adversary is 59 bits, 64 bits, 72 bits, 71 bits, 71 bits and 71 bits, respectively, for FVC 2002 DB1. This complexity directly implies a compromise of the input biometric data, without the need to reverse engineer the input. 7.2. Revocability This property enables a protected template to be revoked if compromised and to be replaced with a new one. FVC 2002 DB1 was used for an empirical evaluation of the revocability of SKE. Note that TAG in SKE is constructed using the RV (or the IoM hashed vector), and IoM hashing is therefore used in this experiment. The ﬁrst ﬁngerprint sample of each subject was used to generate a total number of eight distinctive RVs with different random Gaussian matrices (e.g. eight numbers with different nonce, N1 , . . . , N8 ). Each RV was used for SKE enrollment. During retrieval, we followed the protocol discussed in Section 6.1 for imposter and genuine attempts. A total number of 4950 trials for imposter attempts and 2800 trials for genuine attempts were recorded. However, since different nonces were used, we called the genuine attempt (for the same subject) the pseudo imposter attempt, following [19]. Fig. 3 shows the pseudo imposter score and imposter score (corresponding to the normalized count of genuine pairs). As expected, the distributions of the pseudo imposter and imposter attempts are very similar, and these overlap. This suggests that RVs derived from the newly generated RVs with different nonce for the same subject are indistinguishable from RVs generated for others. Therefore, a new RV can be generated to replace the old one, and the property of revocability is satisﬁed. 7.3. Unlinkability The unlinkability criterion demands that the protected biometric data should not be differentiated based on whether they are generated from the same user’s biometrics. This is to prevent matching across different applications (cross-matching). To evaluate the unlinkability of SKE, we used the unlinkability framework recently proposed by Gomez-Barrero et al. [15]. This evaluation was carried out with FVC2002 DB1, FVC2002 DB2, FVC2004 DB1 and FVC2004 DB2. For every sample in the dataset, a different nonce was applied to generate the RV pair (φx , φˆ x ). Each RV pair was used for SKE enrollment. Let P(s|Hm ) and P(s|Hnm ) be probability densities with a given similarity score s ∈ [0, 1], which are computed from mated samples Hm (i.e. the same subjects) and non-mated samples Hnm (i.e. different users), respectively. In the following, P(s|Hm ) and P(s|Hnm ) refer to the probability density of the normalized number of revealed genuine pairs (equivalent to the similarity score s) for the genuine and imposter attempts, respectively. A total number of 2800 genuine attempts and 4950 imposter attempts were recorded for each dataset. Given a likelihood ratio LR(s ) = P(s|Hm )/P(s|Hnm ), the property of unlinkability can be characterized by the local linkability measure D↔ (s), which is deﬁned as:

D↔ ( s ) = 2

LR(s ) · ω −1 1 + LR(s ) · ω

(3)

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

507

Fig. 4. Mated and non-mated score distributions and D↔ (s) plots for FVC2002 DB1, DB2, FVC2004 DB1 and B2 (left to right).

Fig. 5. Normalized count of revealed genuine pairs for genuine and imposter attempts for FVC2002 DB2: (a) m = 256; (b) m = 512; (c) m = 1024.

where we set ω ∈ [0, 1] = 1 and the global linkability measure D↔ : sys

Dsys ↔ =

P(s|Hm ) · D↔ (s )ds .

(4) sys

In brief, an increase in D↔ (s) ∈ [0, 1] causes an increasing degree of linkability. Thus, high global linkability D↔ suggests the revelation of the genuine pair by authenticating TAG(TAG’) for the mated samples. Fig. 4 shows P(s|Hm ), P(s|Hnm ) and D↔ (s) for genuine (mated) and imposter attempts (non-mated) at secret retrieval for the four datasets. It can clearly be seen that the score distributions for mated and non-mated instances are highly overlapped. This implies that an adversary will be unable to determine whether or not the RV pairs (φx , φˆ x ) and (φx , φˆ x ) (sourced from different biometric vectors x and x , sys respectively) are generated from the same ﬁngerprint samples. Additionally, the near-zero values of D↔ for the four datasets (i.e. 0.0145, 0.0099, 0.0140 and 0.0093 for FVC2002 DB1, FVC2002 DB2, FVC2004 DB1 and FVC2004 DB2, respectively) suggest that linkability of the RV pairs is highly unlikely. 7.4. Performance The performance criterion of a BTP scheme demands that the accuracy of the protected system should not be lower than its original counterpart. The preservation of performance from ﬁngerprint vector to IoM hashed vector and SKE is discussed in detail in Section 6.2 above, and the results are presented in Table 2. We also provide a discussion of the parameter m, which is another factor that is critical in determining the performance of the proposed SKE scheme. Note that the number of unique genuine pairs in U = {(ϕx( j ) , f (ϕx( j ) )}tj=1 follows t ∼ Bin(m,

P2 ) and E(t ) = mP 2 (Section 4). In this way, the effect of m can be empirically examined by observing the mean values of the number of revealed unique genuine pairs for genuine and imposter attempts. Fig. 5 shows that the means of both the genuine (red) and imposter (blue) curves remain unchanged with respect to m, whereas the variances are reduced when m is increased from 128 to 512 and 1024. Hence, the net effect is the gradual separation of genuine and imposter curves when m becomes large, resulting in a better secret retrieval rate in terms of FAR and FRR. 8. Conclusion In this paper, we propose SKE for vectorial biometric secret binding. The SKE scheme involves an RV pair that resembles the encryption-decryption key pair in symmetric cryptosystems, used in conjunction with a simple ﬁltering mechanism and Shamir’s secret-sharing scheme for error checking and correction. The SKE possesses strong resilience, and was empirically validated using ﬁve subsets of the FVC ﬁngerprint and LFW face benchmark datasets. This scheme is not limited to ﬁngerprint and face data, but can be applied to any biometric features that are represented in vector form. The security threats to our SKE scheme are also identiﬁed and analyzed using the random error model, which can be characterized by a similarity measure over the input biometric vector using an RSV (IoM hashing). In brief, we offer an alternative that exploits

508

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

the biometric input structure via RSV, which implicitly allows us to measure the similarity of the input biometric vectors. This exploitation is important, especially for a more complete evaluation of security over the nonuniformity of the input biometric. We believe that our security model will beneﬁt future cryptographic research into biometrics where error tolerance and nonuniformity need to be taken into consideration. In addition, we relax the security requirements based on a limited number of samples, where a precise knowledge of the input distribution of the biometrics does not exist, and show that even under this relaxed (weaker) scenario, our model can offer better security guarantee with the secret revocation set, which enables security reduction for the typical cryptographic computational hard problem (collision ﬁnding) compared to brute forcing via the random oracle model. Declaration of Competing Interest The authors declare that they have no known competing ﬁnancial interests or personal relationships that could have appeared to inﬂuence the work reported in this paper. Acknowledgments Funding: This work was supported by the Institute for Information &communications Technology Promotion (IITP) grant funded by the Korean Government (MSIT) (No. 2016-0-0 0 097, Development of a Biometrics-based Key Infrastructure Technology for Online Identiﬁcation and No. 2018-0-01369, Developing Blockchain Identity Management System with Implicit Augmented Authentication and Privacy Protection for O2O Services). References [1] M. Bellare, A. Boldyreva, A. Desai, D. Pointcheval, Key-privacy in public-key encryption, ASIACRYPT, 2001. [2] D. Bissessar, C. Adams, A. Stoianov, Privacy, security and convenience: biometric encryption for smartphone-based electronic travel documents, in: Recent Advances in Computational Intelligence in Defense and Security, 2016, pp. 339–366. [3] J. Breebaart, B. Yang, I. Buhan-Dulman, C. Busch, Biometric template protection – the need for open standards, Datenschutz und Datensicherheit 33 (4) (2009) 299–304. [4] A. Broder, On the resemblance and containment of documents, in: Proceedings of the Compression and Complexity of Sequences, 1997. [5] I. Buhan, J. Guajardo, E. Kelkboom, Eﬃcient strategies to play the indistinguishability game for fuzzy sketches, in: Information Forensics and Security (WIFS), 2010 IEEE International Workshop on, IEEE, 2010, pp. 1–6. [6] R. Cappelli, M. Ferrara, Maltoni D, Minutia cylinder-code: a new representation and matching technique for ﬁngerprint recognition, IEEE Trans. Pattern Anal. Mach. Intell. 32 (12) (2010) 2128–2141. [7] A. Cavoukian, M. Chibba, A. Stoianov, Advances in biometric encryption: taking privacy by design from academic research to deployment, Rev. Policy Res 29 (1) (2012) 37–61. [8] E.C. Chang, R. Shen, F.W. Teo, Finding the original point set hidden among chaff, in: Proc. of the 2006 ACM Symposium on Information, Computer and Communications Security, Taipei, Taiwan, 2006. [9] M.S. Charikar, Similarity estimation techniques from rounding algorithms, in: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, 2002. [10] T.C. Clancy, N. Kiyavash, D.J. Lin, Secure smartcardbased ﬁngerprint authentication, in: Proceedings of the 2003 ACM SIGMM Workshop on Biometrics Methods and Applications, ACM, 2003, pp. 45–52. [11] J. Daugman, Information theory and the IrisCode, in: IEEE Trans. Information Forensics and Security, 11, 2016, pp. 400–409. [12] J. Deng, J. Guo, and S. Zafeiriou, “ArcFace: aAdditive angular margin loss for deep face recognition,” ArXiv180107698 Cs, Jan. 2018. [13] Y. Dodis, L. Reyzin, A. Smith, Fuzzy extractors: how to generate strong keys from biometrics and other noisy data, International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2004. [14] A. Frieze, M. Jerrum, Improved approximation algorithms for maxk-cut and max bisection, Algorithmica 18 (1) (1997) 67–81. [15] M. Gomez-Barrero, C. Rathgeb, J. Galbally, C. Busch, J. Fierrez, Unlinkable and irreversible biometric template protection based on bloom ﬁlters, Inf. Sci. 370 (2016) 18–32. [16] V. Guruswami, List Decoding of Error-Correcting Codes Winning thesis of the 2002 ACM Doctoral Dissertation Competition, List Decoding of Error– Correcting Codes, 3282, Springer Science & Business Media., 2004. [17] G.B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments, Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, 2008. [18] Z. Jin, M.H. Lim, A.B.J. Teoh, B.M. Goi, Y.H. Tay, Generating ﬁxed-length representation from minutiae using kernel methods for ﬁngerprint authentication, IEEE Trans. Syst. Man Cybern. Syst. 46 (10) (2016) 1415–1428. [19] Z. Jin, Y.L. Lai, J.Y. Hwang, S. Kim, A.B.J. Teoh, Ranking based locality sensitive hashing enabled cancelable biometrics: index-of-max hashing, IEEE Trans. Inf. Forensics Secur. 13 (6) (2017) 393–407. [20] A. Juels, M. Sudan, A fuzzy vault scheme, in: Proceedings of the IEEE International Symposium, 2002. [21] A. Juels, M. Wattenberg, A fuzzy commitment scheme, in: Proceedings of the 6th ACM conference on Computer and Communications Security, 1999. [22] J. Kang, D. Nyang, K. Lee, Two-factor face authentication using matrix permutation transformation and a user password, Inf. Sci. 269 (2014) 1–20. [23] A. Kholmatov, B. Yanikoglu, Realization of correlation attack against the fuzzy vault scheme, Electronic Imaging, International Society for Optics and Photonics, 2008. [24] Y.L. Lai, How to Correct More Errors in a Secure Sketch, 2018 Cryptology ePrint Archive, Report 2018/701 https://eprint.iacr.org/2018/701 . [25] Y.L. Lai, J.Y. Hwang, Z. Jin, S.H. Kim, S. Cho, A.B.J. Teoh, Secure secret sharing enabled b-band mini vaults bio-cryptosystem for vectorial biometrics, IEEE Trans. Dependable Secure Comput. (2018) , 1–1 . [26] C. Li, J. Hu, J. Pieprzyk, W. Susilo, A new biocryptosystem-oriented security analysis framework and implementation of multibiometric cryptosystems based on decision level fusion, IEEE Trans. Inf. Forensics Secur. 10 (6) (2015) 1193–1206. [27] C. Li, J. Hu, A security-enhanced alignment-free fuzzy vault-based ﬁngerprint cryptosystem using pair-polar minutiae structures, IEEE Trans. Inf. Forensics Secur. 11 (3) (2016) 543–555. [28] X.-G. Li, X. Tao, F. Chen, S.-W. Guo, Eﬃcient biometric identity-based encryption, Inf. Sci. 465 (2005) 248–264. [29] P. Li, X. Yang, K. Cao, X. Tao, R. Wang, J. Tian, An alignment-free ﬁngerprint cryptosystem based on fuzzy vault scheme, J. Netw. Comput. Appl. 33 (3) (2010) 207–220. [30] D. Maltoni, D. Maio, A.K. Jain, S. Prabhakar, Handbook of Fingerprint Recognition, second ed., Springer, London, 2009. [31] R. Á. Mariño, F.H. Álvarez, L.H. Encinas, A crypto-biometric scheme based on iris-templates with fuzzy extractors, Inf. Sci. 195 (2012) 91–102.

Y.-L. Lai, J.Y. Hwang and Z. Jin et al. / Information Sciences 502 (2019) 492–509

509

[32] A. Menezes, P. van Oorschot, S. Vanstone, Handbook of Applied Cryptography, CRC Press, Boca Raton, FL, 1996. [33] J. Merkle, H. Ihmor, U. Korte, M. Niesing, and M. Schwaiger, Performance of the fuzzy vault for multiple ﬁngerprints (extended version), arXiv:1008. 0807. [34] J. Merkle, M. Niesing, M. Schwaiger, H. Ihmor, U. Korte, Security capacity of the fuzzy ﬁngerprint vault, Int. J. Adv. Secur. (2010). [35] P. Mihailescu, "The fuzzy vault for ﬁngerprints is vulnerable to brute force attack," arXiv preprint arXiv:0708.2974, 2007. [36] A. Nagar, K. Nandakumar, A.K. Jain, Securing ﬁngerprint template: fuzzy vault with minutiae descriptors, in: Proc. 19th Int. Conf. Pattern Recognition (ICPR 2008), 2008. [37] A. Nagar, S. Rane, A. Vetro, Alignment and bit extraction for secure ﬁngerprint biometrics, in: Proc. Media Forensics and Security, San Jose, CA, 2010. [38] K. Nandakumar, A.K. Jain, S. Pankanti, Fingerprint-based fuzzy vault: implementation and performance, IEEE Trans. Inf. Forensics Secur. 2 (4) (2007) 744–757. [39] N. Nedjah, R.S. Wyant, L.M. Mourelle, B.B. Gupta, Eﬃcient ﬁngerprint matching on smart cards for high security and privacy in smart systems, Inf. Sci. 479 (2017) 622–639. [40] R.L. Rivest, "Symmetric encryption via keyrings and ECC," July 2016. [Online]. Available: http://arcticcrypt.b.uib.no/ﬁles/2016/07/Slides-Rivest.pdf. [41] A. Sahai, B. Waters, Fuzzy identity-based encryption, in: Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, Berlin, Heidelberg, 2005, pp. 457–473. [42] K. Simoens, P. Tuyls, B. Preneel, Privacy weaknesses in biometric sketches, in: 30th IEEE Symposium on Security and Privacy, IEEE, 2009, pp. 188–203. [43] B. Tams, P. Miha˘ ilescu, A. Munk, Security considerations in minutiae-based fuzzy vaults, IEEE Trans. Inf. Forensics Secur. 10 (5) (2015) 985–998. [44] G.J. Tomko, C. Soutar, G.J. Schmidt, Fingerprint Controlled Public Key Cryptographic System, U.S. Patent and Trademark Oﬃce, Washington, DC, 1996 U.S. Patent No. 5,541,994. [45] W. Yang, J. Hu, S. Wang, A Delaunay quadrangle-based ﬁngerprint authentication system with template protection using topology code for local registration and security enhancement, IEEE Trans. Inf. Forensics Secur. 9 (7) (2014) 1179–1192. [46] W. Yang, J. Hu, S. Wang, A Delaunay triangle group based fuzzy vault with cancellability, in: Image and Signal Processing (CISP), 2013 6th International Congress on, vol. 3, IEEE, 2013, pp. 1676–1681. [47] W. Yang, J. Hu, S. Wang, M. Stojmenovic, An alignment-free ﬁngerprint bio-cryptosystem based on modiﬁed Voronoi neighbor structures, Pattern Recognit. 47 (3) (2014) 1309–1320.

Symmetric keyring encryption scheme for biometric cryptosystem

Symmetric keyring encryption scheme for biometric cryptosystem

Recommend Documents