Aggregating minutia-centred deep convolutional features for fingerprint indexing

Aggregating minutia-centred deep convolutional features for fingerprint indexing

Accepted Manuscript Aggregating Minutia-centred Deep Convolutional Features for Fingerprint Indexing Dehua Song, Yao Tang, Jufu Feng PII: DOI: Refere...

3MB Sizes 1 Downloads 31 Views

Accepted Manuscript

Aggregating Minutia-centred Deep Convolutional Features for Fingerprint Indexing Dehua Song, Yao Tang, Jufu Feng PII: DOI: Reference:

S0031-3203(18)30407-2 https://doi.org/10.1016/j.patcog.2018.11.018 PR 6715

To appear in:

Pattern Recognition

Received date: Revised date: Accepted date:

1 March 2018 31 July 2018 17 November 2018

Please cite this article as: Dehua Song, Yao Tang, Jufu Feng, Aggregating Minutia-centred Deep Convolutional Features for Fingerprint Indexing, Pattern Recognition (2018), doi: https://doi.org/10.1016/j.patcog.2018.11.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Aggregating Minutia-centred Deep Convolutional Features for Fingerprint Indexing

CR IP T

Dehua Song, Yao Tang, Jufu Feng∗ Key Laboratory of Machine Perception (Ministry of Education), Department of Machine Intelligence, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, P.R.China

AN US

Abstract

Most current fingerprint indexing systems are based on minutiae-only local structures and index local features directly. For minutiae local structure, missing and spurious neighboring minutiae significantly degrade the retrieval accuracy. To overcome this issue, we employs deep convolutional neural network to learn a minutia descriptor representing the local ridge structures. Instead of indexing

M

local features, we aggregate various number of learned Minutia-centred Deep Convolutional (MDC) features of one fingerprint into a fixed-length feature vec-

ED

tor to improve retrieval efficiency. In this paper, a novel aggregating method is proposed, which employs 1-D convolutional neural network to learn a discriminative and compact representation of fingerprint. In order to understand the

PT

MDC feature, a steerable fingerprint generation method is proposed to verify that it describes the attributes of minutiae and ridges. Comprehensive experimental results on five benchmark databases show that the proposed method

CE

achieves better performance on accuracy and efficiency than other prominent approaches.

AC

Keywords: fingerprint indexing, deep convolutional neural network, aggregating local features, representation learning, minutia descriptor

∗ Corresponding

author Email addresses: [email protected] (Dehua Song), [email protected] (Yao Tang), [email protected] (Jufu Feng)

Preprint submitted to Journal of Pattern Recognition

November 26, 2018

ACCEPTED MANUSCRIPT

1. Introduction On the basis of the uniqueness and immutability of fingerprint, Automatic Fingerprint Identification Systems (AFIS) are widely deployed for forensic and

5

CR IP T

government applications. However, the size of modern AFIS gallery databases has exceeded billions and it seriously depresses the accuracy and efficiency of

the system [1]. To effectively reduce the total number of fingerprints to be considered, fingerprint identification system employs pre-filtering techniques: fingerprint classification and fingerprint indexing/retrieval.

Due to the uneven natural distribution, fingerprint classification is usually

unable to sufficiently reduce the search space. Besides, there are many am-

AN US

10

biguous fingerprints whose class membership cannot be reliably classified [2]. In contrast, fingerprint indexing techniques overcome these problems by representing each fingerprint with a multidimensional vector summarizing its main features. During the search phase, the feature vector is utilized to pre-filter out a large number of fingerprints with low similarity to the query fingerprint. The

M

15

flowchart of fingerprint indexing is illustrated in Fig. 1. Then the query fingerprint is identified from the remaining candidate hypotheses by an automated

ED

one-to-one matching algorithm.

The feature representation plays an important role for designing the fingerprint indexing system, which greatly affects the accuracy and efficiency of

PT

20

retrieval. Features applied to fingerprint indexing can be roughly divided into three categories: global feature, local feature and matching score. Most fin-

CE

gerprint indexing methods based on global features [3, 4, 5] need to align the fingerprint. However, it is hard to align fingerprints with bad quality and partial

25

fingerprints. Hence, one feasible way is to focus on local features. Most index-

AC

ing methods based on local features employ minutiae-only local structures and index local features directly. Unfortunately, missing and spurious neighboring minutiae (i.e., ridge ending and bifurcations) significantly degrade the retrieval accuracy. In addition, the fingerprint similarity computation with local fea-

30

tures seriously limits the retrieval efficiency. To address these issues, we make

2

ACCEPTED MANUSCRIPT

Gallery Database

Feature Template

Candidate List Rank Candidate Similarity

 2

Feature

Similarity Measure



N

Query Fingerprint

0.92

CR IP T

1

0.89





0.5

AN US

Figure 1: Flowchart of fingerprint indexing

two primary improvements in this paper: firstly, Deep Convolutional Neural Network (DCNN) is employed to learn a minutia descriptor for representing the local ridge structures; secondly, a novel method based on 1-D convolutional neural network is proposed to aggregate various number of Minutia-centred Deep Convolutional (MDC) features from one fingerprint into a fixed-length, discrim-

M

35

inative and compact feature vector for indexing. For local features, Minutia Cylinder-Code (MCC) [6] is the most typical

ED

descriptor used in fingerprint indexing. It can be decomposed into two steps: minutiae localization and representation. MCC descriptor encodes spatial and 40

directional relationships between central minutia and its (fixed-radius) neigh-

PT

borhood. The minutia-only descriptor has two drawbacks (illustrated in Fig. 2). Firstly, missing and spurious neighboring minutiae cause the descriptor to

CE

match with false mate. Secondly, descriptor possesses limited discriminatory information when the local region only contains 2-3 neighboring minutiae. Fig. 2

45

shows that ridge structures of query local region and its false mate are different

AC

in spite of similar minutiae distribution. Hence, we employ DCNN to learn a minutia descriptor representing the local ridge structures to address these issues. In order to understand the learned descriptor, a steerable fingerprint generation method is proposed to verify whether it describes the attributes of minutiae or

50

ridges. The experimental results show that the descriptor represents the spatial

3

ACCEPTED MANUSCRIPT

True mate

Fingerprint similarity

MCC: 0.57 MDC: 0.62

MCC: 0.60 MDC: 0.34

Region similarity

MCC: 0.55 MDC: 0.89

MCC: 0.63 MDC: 0.30

AN US MCC: 0.65 MDC: 0.71

Region similarity

False mate

CR IP T

Query image

MCC: 0.72 MDC: 0.17

Figure 2: Illustration of our motivation. Top row: query image and gallery images. Middle row: a portion of neighboring minutiae is missed. Bottom row: few neighboring minutiae. MCC encodes the relationships between central minutia and neighboring minutiae. MDC represents the local ridge structures.

(The missing minutiae are showed in yellow.

The

M

fingerprint similarity is defined in equation 14)

location, angle of minutiae and the curvature of ridges.

ED

For methods based on local feature indexing, fingerprint similarity calculation with local features consumes high computational cost, which seriously limits the retrieval efficiency. An efficient strategy to solve this problem is to aggregate the local features of one fingerprint into a single feature vector.

PT

55

A preliminary version of this work [7] employed Triangulation Embedding (T-

CE

Embedding) from Content-Based Image Retrieval (CBIR) [8] to aggregate MDC features. It encodes the directional information between local feature and its clustering center. However, for minutia descriptor, T-embedding results in high dimension (16000-D) of aggregated feature vector due to hundreds of centers,

AC 60

which limits the retrieval efficiency. In fact, embedding can be viewed as a nonlinear map. Besides, an eligible aggregated feature for fingerprint indexing should be discriminative and compact. Hence, a natural idea is employing metric learning and 1-D convolutional neural networks to learn a discriminative and

4

ACCEPTED MANUSCRIPT

65

compact representation from a set of minutia descriptors without clustering center. Comprehensive experiments are conducted on five benchmark databases, and the results show that the proposed approach achieves better performance

CR IP T

on accuracy and efficiency than other prominent methods. In contrast to other prominent features, the Aggregated Minutia-centred 70

Deep Convolutional (AMDC) feature has the following advantages:

• It carries more discriminative information. The AMDC feature represents both minutiae and ridges information.

• It is rotation invariant. MDC represents the attributes of local region

AN US

aligned with minutia angle. Hence, AMDC feature is rotation invariant to whole fingerprint.

75

• It is insensitive to nonlinear distortion. High dimensional, nonlinear mapping of DCNN ensures invariance in some degree for nonlinear distortion. • It is discriminative and compact. Metric learning ensures the discrim-

M

inability and compactness of feature.

• It could be extended to partial fingerprint.

80

ED

As far as we know, there is no public rolled fingerprint database containing multiple impressions per finger, which can be utilized to train deep neural networks. To train the aggregating network, we collect and share a bench-

85

PT

mark rolled fingerprint database, named Peking University and Founder (PUF) database 1 . It contains 40, 000 images of 1, 000 fingers from 100 identities. In

CE

PUF database, each finger possesses forty impressions. A preliminary version of this work was presented earlier [7]. Three signif-

icant contents are added to the initial version in this paper. Firstly, a novel

AC

aggregation method based on 1-D convolutional neural network is proposed for

90

minutia descriptor in this version. It improves the accuracy and efficiency of fingerprint retrieval significantly. Secondly, we collect and share a benchmark rolled fingerprint database where each finger contains 40 impressions. As far as 1 https://github.com/DehuaSong/PUF

fingerprint database

5

ACCEPTED MANUSCRIPT

we know, it is the first public rolled fingerprint database that can be utilized to train DCNN. Finally, we extend original experiments from two databases 95

to five databases and add comparison experiments to evaluate the proposed

CR IP T

aggregating method. The remainder of this paper is organized as follows. Section 2 introduces the

related works on fingerprint indexing, minutia descriptors and local descriptor

aggregation. Section 3 describes the procedures of feature representation which 100

contains minutia descriptor learning and minutia descriptors aggregation. Section 4 states the fingerprint indexing scheme. Section 5 details the experiments

AN US

and provides the indexing performance on five benchmark databases. Besides,

to understand the learned descriptor, qualitative analysis and quantitative analysis are carried out in section 6. At last, conclusions are drawn in section 7.

105

2. Related work

M

2.1. Fingerprint indexing

Along with the increasing of fingerprint database, fingerprint indexing has been more and more urgent for fingerprint identification. The design of accu-

110

ED

rate, fast and interoperable algorithms is an open issue. In last two decades, researchers proposed many fingerprint indexing methods to improve the accuracy

PT

and efficiency of retrieval. Conventional features utilized in fingerprint indexing can be roughly divided into three categories: global feature, local feature and

CE

matching scores:

• Global features: These features describe the global pattern of ridges struc-

AC

115

120

ture, for example, Directional field [3, 9, 10], FingerCode [11], reference points [12] and deep learning features [4, 5].

• Local features: Most local features attempt to represent the minutiae local structure, for example, Minutiae triplets [13, 14, 15, 16], minutiae quadruplets [17, 18] and MCC [6, 19]. Besides, Scale Invariant Feature Transformation (SIFT) feature [20] and ridge invariants [21] are adopted to improve fingerprint indexing performance. 6

ACCEPTED MANUSCRIPT

• Matching scores: Matching scores between an input fingerprint and reference fingerprints are employed to derive index keys [22]. Most fingerprint indexing methods based on global features [3, 4, 5] need to align the fingerprint. However, it is tough to align fingerprint for images

CR IP T

125

with bad quality and partial fingerprint [23]. Besides, global features could not deal with partial fingerprint indexing. Matching scores cannot narrow down the search space sufficiently because of its weak discriminability. Hence, this paper focuses on local features. 2.2. Minutia descriptors in fingerprint indexing

AN US

130

Minutia descriptor Most methods based on local features attempt to represent minutiae local structure. Conventional minutia descriptors in fingerprint indexing can be roughly divided into two categories: non-central minutia-based and central minutia-based. Minutiae triplets and minutiae quadruplets are two typical non-central minutia-based methods. Boer et al. [9] first employed minu-

M

135

tiae triplets as feature for fingerprint indexing. Then, Bhanu et al. [13] extend the novel attributes of minutiae triplets to improve retrieval accuracy. Recently,

ED

Andres et al. [15] defined a triangle set based on extensions of delaunay triangulations to deal with missing and spurious minutiae. In 2017, authors of literature 140

[16] further proposed expanded delaunay triangulation algorithm based on qual-

PT

ity of images. In addition, Minutiae quadruplets [17, 18] and minutiae pairs [24] were proposed to promote fingerprint indexing.

CE

Central minutia-based descriptors mainly contain two categories: minutiae-

based descriptors and texture-based descriptors. The most conventional minutiae-

145

based descriptor is MCC [6] which encodes spatial and directional relationships

AC

between central minutia and its (fixed-radius) neighborhood. However, it cannot address the missing and spurious neighboring minutiae effectively, which significantly degrades the retrieval accuracy. In addition, descriptor possesses limited discriminatory information when the local region only contains 2-3 neighboring

150

minutiae. To overcome these problems, Benhammadi et al. [25] employed Ga-

7

ACCEPTED MANUSCRIPT

Query Fingerprint

Extract Local Features Offline Stage

Index Local Features

Extract Local Features Online Stage

Retrieve Local Features

Calculate Fingerprint Similarity

Candidate List

CR IP T

Gallery Database

(a) The flowchart of indexing local descriptors

Query Fingerprint

Extract Local Features Offline Stage

Aggregate Local Features

Extract Local Features Online Stage

Aggregate Local Features

Index Aggregated Features

AN US

Gallery Database

Calculate Fingerprint Similarity

Candidate List

(b) The flowchart of indexing aggregated feature

M

Figure 3: The flowcharts of different indexing schemes based on local features.

bor filters bank to represent the local texture. Zhou et al. [26] proposed SIFT-

ED

based minutia descriptor to encode the relationships between central minutia and sampling points. Recently, Cao et al. [27] employed ConvNet with three convolutional layers to learn the minutia descriptor. But, they did not take into account the nonlinear distortion of intra-class. Focussing on these issues, this

PT

155

paper employs DCNN with deep architecture and center loss to learn a minutia descriptor representing the local ridge structures.

CE

Indexing strategy For methods based on minutia descriptor, indexing

strategy is crucial to the accuracy and efficiency of fingerprint retrieval. In these approaches, each fingerprint is viewed as a set of minutia descriptors

AC

160

which represent the local structure. The indexing scheme should ensure that we can compute the similarity between fingerprints fast and the similarity can discriminate different fingerprints effectively. The typical indexing strategies can be roughly divided into two categories: indexing local descriptors and indexing

165

aggregated feature.

8

ACCEPTED MANUSCRIPT

(1) Indexing local descriptors: A simple scheme is indexing the minutia descriptors directly and the similarity between fingerprints is calculated from similarities of minutia descriptor pairs, which is illustrated in Fig. 3a. How-

CR IP T

ever, each fingerprint contains many minutia descriptors. If we retrieve each minutia descriptors directly, it consumes high computational cost. Hence,

170

Bhanu et al. [13] quantified the local feature and set range constraint to

narrow down the searching space. Besides, Cappelli et al. [6] binarized the MCC vector and designed a Locality-Sensitive Hashing (LSH) scheme to

improve retrieval efficiency. Then, Su et al. [19] further add pose constraint to reduce searching space. In another way, Praveer et al. [28] proposed a

AN US

175

tree-based algorithm to address efficiency issues.

(2) Indexing aggregated feature: Another scheme is aggregating local minutia descriptors of one fingerprint into a fixed-length vector which is discriminative and compact. We index the aggregated feature vector and compute the similarity between fingerprints by measuring aggregated feature vector

180

M

simply, which is illustrated in Fig. 3b. Illoanusi et al. [17, 18] utilized clustering methods and encoded fingerprint as the number of quadruplets in

ED

each cluster. Its main idea is similar to Bag of Words (BoW) [29] in image retrieval. Besides, Khodadoust et al. [24] employed k-means clustering and candidate list reduction criteria to improve retrieval efficiency.

PT

185

Although accelerating algorithm (e.g. LSH) is utilized in local descriptor indexing method, it still needs to compute the similarity of each local feature

CE

pair from two fingerprints and consumes high computational cost. However, indexing aggregated feature just needs to compute the similarity between two aggregated feature vectors. Hence, this paper focus on aggregating local minutia descriptors. The current aggregation method utilized in fingerprint indexing is

AC

190

too simple. Therefore, this paper proposes a novel aggregation method based on 1-D convolutional neural network to improve the accuracy and efficiency of fingerprint indexing.

9

ACCEPTED MANUSCRIPT

195

2.3. Deep learning and image retrieval Deep learning In recent years, deep learning methods have made tremendous progresses in various computer vision tasks, including face recognition

CR IP T

[30, 31] and generic image classification [32, 33]. The main reasons for this breakthrough are advances in two technical aspects: more powerful models 200

and effective strategies against overfitting. In order to enhance the discrimi-

native power of the deeply learned feature, researchers adopted metric learning

in DCNN. Recently, a variety of metric loss have been proposed in face recog-

nition, such as contrastive loss [34], center loss [30] and triplet loss [31]. In

205

AN US

addition, deep learning began to be utilized in multiple fingerprint tasks, for example, minutia extraction [35], pose estimation [36] and fingerprint indexing

[4, 5]. This paper employs metric learning and DCNN to learn a discriminative minutia descriptor for fingerprint indexing.

Aggregation methods in image retrieval Aggregating various number of local descriptors into a fixed-length, discriminative and compact feature vector is crucial to image retrieval. There are a variety of aggregation approaches

M

210

proposed for SIFT descriptor in image retrieval. By counting the number of

ED

occurrences of clustering centers (visual words), Bag of Words (BoW) [29] encodes the 0-order statistics of the distribution of local descriptors. Then, Fisher Vector (FV) [37, 38] extended the BoW by encoding high-order statistics to improve the discriminability of feature. Jegou et al. [39] further proposed Vector

PT

215

of Locally Aggregated Descriptors (VLAD) which encodes image by the residual

CE

between local descriptors and its visual word. Besides, he proposed Triangulation Embedding (T-Embedding) [40] encoding feature with directions instead of absolute distances. However, deep convolutional feature is rather different from SIFT descriptor, because it is more discriminative. Focussing on this is-

AC

220

sue, Sum-Pooling Convolutional (SPoC) [41] and NetVLAD [42] aggregation method were proposed for deep convolutional features. Whereas, most methods are based on clustering centroids and utilized “shallow” embedding which can be implemented by network with one or two layers. To acquire a fixed-

225

length, discriminative and compact feature, this paper proposes a novel deep 10

ACCEPTED MANUSCRIPT

Enhanced Image

Original image

MDC feature

Minutia-centred local region Minutiae

Figure 4:

Aggregation Net

Descriptor Net

Local region extraction

CR IP T

Finger Net

Local feature extraction

AMDC feature

Feature aggregation

Flowchart of feature representation for fingerprint (MDC: Minutia-centred Deep

Convolutional, AMDC: Aggregated Minutia-centred Deep Convolutional)

aggregation method based on 1-D convolutional neural networks for minutia

AN US

descriptor, and it achieves better retrieval accuracy and efficiency than other typical aggregation approaches in image retrieval.

3. Feature representation

The feature representation plays an important role for designing the fin-

230

M

gerprint indexing system, which greatly affects the accuracy and efficiency of the system. An eligible feature should be discriminative and compact. In this paper, metric learning and DCNN are employed to learn a minutia descrip-

235

ED

tor and aggregating model. Furthermore, each fingerprint is represented by a fixed-length, discriminative and compact feature vector which encoding main attributes of fingerprint.

PT

The proposed feature representation of fingerprint consists of four stages: fingerprint enhancement and minutiae extraction, local region extraction, local

CE

feature extraction and feature aggregation. The flowchart of proposed feature

240

representation is illustrated in Fig. 4. The first stage (fingerprint enhancement and minutiae extraction) is implemented by FingerNet [35] which combines

AC

domain knowledge and the representation ability of deep learning to enhance fingerprint image and extract minutiae. Then, we utilize enhanced fingerprint image and minutiae to extract minutia-centred local regions. In the third stage,

245

we extract deep feature from each minutia-centred local region by DescriptorNet whose architecture is illustrated in Fig. 6. At last, minutia local features are

11

(a)

(b)

(c)

CR IP T

ACCEPTED MANUSCRIPT

(d)

Figure 5: Aligned minutiae-centred local region ((a) original local region (b) enhanced local region (c,d) local regions with different area ratio r.)

aggregated by AggregationNet illustrated in Fig. 7 into a fixed-length, discrim-

3.1. Minutia-centred local region

AN US

inative and compact feature vector for fingerprint indexing.

In practice, there are many reasons that can make a poor quality image

250

marked by low contrast and ill-defined boundaries between the ridges, such as bruises, dry fingers, sweats and so on. The poor quality fingerprint images may prevent the DCNN from learning the representation of ridges structure. Hence,

255

M

FingerNet [35] is employed to enhance fingerprint images. It can effectively improve the ridges structure and filter out noise. Fig. 5b shows the enhanced

ED

minutia-centred local regions.

We represent the minutia by the ridge structures of local region centering on itself. The considerable rotation variation between different impressions of

260

PT

the same finger affects the accuracy of fingerprint indexing. To overcome this issue, the minutia-centred local region is aligned by the direction of minutia.

CE

Each minutia is a triplet m = {xm , ym , θm }, where xm and ym are the minutia location, θm is the minutia direction. We use {xm , ym } to crop l × t minutiacentred region from fingerprint image and rotate the region clockwise by angle

AC

θm . The aligned minutia-centred local region is illustrated in Fig. 5. When the central minutia is located nearby the border of fingerprint, the

local region may lie outside the fingerprint area and contain little ridges information (illustrated in Fig. 5d). To avoid this case, the fingerprint image is segmented into background and foreground. The area ratio r of foreground and

whole local region is defined in Eq. (1). We choose the minutiae-centred local 12

ACCEPTED MANUSCRIPT

Input 128×128

50@ 100@ 50@ 64×64 64×64 32×32

100@ 100@ 100@ 100@ 100@ 100@ 150@ Feature 32×32 16×16 16×16 8×8 8×8 8×8 150 4×4

Loss LS

3 5

3

3

3

3 3

3

3

3

3

3

3

3

3

3 3

3

3

L

C

CR IP T

5

ŸŸŸ

LC

Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Fully Pooling Pooling Pooling Pooling Pooling Connected

Figure 6:

The architecture of DescriptorNet. (Conv.: convolution, C: class centers, LC :

center loss in Eq. (5), LS : softmax loss in Eq. (4), L: joint loss in Eq. (3).)

AN US

regions whose area ratio r are larger than threshold minr . r=

265

Sf Sl

(1)

where Sf and Sl denote the area of foreground in local region and the area of whole local region, respectively.

M

3.2. Minutia-centred deep convolutional feature

We focus on improving the discriminability of minutia descriptor. It is hard

270

ED

to measure the noise and plastic distortion by distance for rule-based method. Hence, we should drive the minutia descriptor to be invariant for nonlinear distortion. For these purposes, DCNN is utilized to learn a minutia descrip-

PT

tor representing local ridge structures, named DescriptorNet. DCNN possesses strong expression ability and has proven to be much more effective than tradi-

CE

tional pattern recognition methods in a variety of computer vision tasks [31, 32]. 275

It utilizes local receptive fields, shared weights, spatial subsampling and nonlinear activation function to ensure some degree for translation and distortion

AC

invariance. Besides, with local receptive fields, neurons can extract elementary visual features such as oriented edges, end-points, corners. The architecture of our DescriptorNet is illustrated in Fig. 6, which is ad-

justed from VGG net [43]. It consists of twelve layers with weights: ten convolutional layers and two fully-connected layers. Five convolutional layers are followed by max pooling layer with 2 × 2 size. The non-saturating nonlinear 13

ACCEPTED MANUSCRIPT

function, Parametric Rectified Linear Unit (PReLU) [33] defined in Eq. (2), is employed as the activation function to alleviate gradient vanishing. Softmax loss cannot effectively constrain the variation of intra-class. A natural idea is

CR IP T

to employ metric learning method to learn a discriminative feature. We adopts center loss [30] to minimize the intra-class variations. Finally, the DescriptorNet

is trained by joint supervision of softmax loss and center loss for discriminative feature learning. The whole loss function is defined in Eq. (3). Each minutiaecentred local region of one finger is considered as a class. In addition, hard

AN US

sample mining [31] is also employed on center loss.   x if x > 0 f (x) =  ax if x ≤ 0 L = LS + λLC

i

(3)

T

eWyi xi +byi log PJ W T x +b i j j j e

M

LS = −

K X

(2)

(4)

K

ED

LC =

1X kxi − cyi k22 2 i

(5)

where x is the input of nonlinear activation f . a is a coefficient controlling the slope of the negative part and it is a learnable parameter. W and b are the

PT

280

parameters of softmax layer, and xi denotes the ith deep feature, belonging

CE

to the yi class. cyi denotes the yi th class center of deep convolutional feature. The size of mini-batch and number of classes is K and J, respectively. λ is the coefficient of center loss LC .

AC

285

The most challenge to apply DCNN to fingerprints is the scarcity of training

data which results in severe overfitting. The DescriptorNet showed in Fig. 6 contains 2.58 million parameters, but the total number of fingerprint images in all of four FVC databases is 16800. What’s worse, each fingerprint only contains 8-12 different impressions on FVC databases. To overcome this problem, we

290

utilize three different forms of data augmentation [32]: cropping, rotation and 14

ACCEPTED MANUSCRIPT

slight shear transformation. Besides, we collect and share a rolled fingerprint database, Peking University and Founder (PUF) database 2 , in which each finger possesses forty impressions. Dropout technique [44] which sets the output of

295

CR IP T

each hidden neuron to zero with probability p is employed on the first fully connected layer to alleviate the overfitting of DCNN. It forces the model to learn more robust feature detectors.

The learned minutia descriptor consists of seven convolutional layers and the first fully connected layer. We view the output of the first fully connected

layer as deep convolutional feature fc . To ensure features suitable for learning

AN US

with AggregationNet, mean subtraction and l2 -normalization are employed on

the deep convolutional feature fc . The Minutia-centred Deep Convolutional (MDC) feature fM is defined in Eq. (6). Therefore, the range of elements in MDC feature is [−1, 1].

fM =

fc − f¯c kfc − f¯c k2

(6)

M

where f¯c denotes the mean of feature fc . 3.3. Minutia descriptors aggregation

300

ED

As for indexing methods based on minutia descriptors, local features aggregating scheme is crucial for the accuracy and efficiency of fingerprint retrieval. In these approaches, each fingerprint is viewed as a set of minutia descriptors

PT

which represent the local structure. Aggregation approach needs to merge various number of local descriptors into a fixed-length, discriminative and compact

CE

feature vector. There are four typical aggregation methods for SIFT descriptor 305

in image retrieval: BoW, Fisher Vector, VLAD and Triangulation Embedding. However, there are two main differences between MDC and SIFT descriptors:

AC

(1) Different number of descriptors: Each image contains hundreds of SIFT features, however each fingerprint only contains dozens of MDC features.

(2) Different discriminability: MDC descriptor possesses much more discrim-

310

inability than SIFT descriptor. 2 https://github.com/DehuaSong/PUF

fingerprint database

15

ACCEPTED MANUSCRIPT

Feature maps Feature maps Feature maps Feature map Feature 300@s×2 300@s×1 150@s×75 150@s×22 300×1

Input s×150

LS 1

1 54

76 ...

1 21

2

ŸŸŸ

fMs

Convolution

Figure 7:

L

C

CR IP T

1

fM1 fM2 fM3 fM4

Loss

Convolution Convolution Convolution Deep Embedding

Pooling Pooling

LC

Metric

The architecture of our AggregationNet. (The input is the representation of

fingerprint, which concatenates s MDC features from one fingerprint. C: class centers, LC : center loss in Eq. (5), LS : softmax loss in Eq. (4), L: joint loss in Eq. (3))

AN US

The two attributes determine that these methods based on clustering suit

SIFT descriptor, but they result in high-dimensional aggregated feature and decay of discriminability for MDC descriptor. Hence, we have to design a novel aggregation approach for MDC feature.

Consider the problem of representing a set of minutia descriptors by a single

315

feature vector such that a simple comparison of two such vectors with cosine

M

similarity reflects the similarity of the original sets. Besides, the aggregated feature vector should be fixed-length, discriminative and compact in considera-

320

ED

tion of the accuracy and efficiency of retrieval. Hence, descriptors aggregation for fingerprint indexing can be naturally formalized as metric learning problem, where the similarity of images from the same identity should be larger

PT

than that from different identities. We can convert the procedures of typical aggregation method into a neural network without clustering. The typical meth-

CE

ods can be decomposed into two procedures: the embedding step individually 325

maps each vector of the set to a high-dimensional space; whilst the aggregating/pooling step merges a set of mapped vectors into a single vector [40]. VLAD

AC

and T-embedding employed human-crafted “shallow” embedding with clustering, which can be implemented by network with one or two layers. However, embedding descriptor into high-dimensional space is a complex nonlinear map.

330

Inspired by text classification with network [45], we propose a 1-D convolutional neural network to implement “deep” embedding without clustering.

16

ACCEPTED MANUSCRIPT

···

av

fM

Conv. layer 1

···

··· ···

... ... ... ...

en

···

···

... ... ... ...

... ... ... ...

···

...

a1 a2 a3 a4

···

···

Conv. layer 2

···

CR IP T

a

e1 e2 e3 e4 e5

Conv. layer 3

fE

Conv. layer 4

Figure 8: The illustration of single MDC feature embedding. (All MDC features from one

fingerprint share the same weights of embedding. Conv. is the abbreviation of convolutional.

AN US

In the figure, v = n − h + 1 = n − 3. fM denotes the MDC feature vector and fE denotes the embedded feature vector of fM .)

The proposed Aggregation neural Network (AggregationNet) is illustrated in Fig. 7. The AggregationNet contains three steps: firstly, deep embedding step individually maps each minutia descriptor of the set to a high-dimensional space 335

with convolutional layers; secondly, the pooling step merges a set of mapped

M

vectors into a single fixed-length vector with average- or max-pooling and nonlinear activation; finally, metric learning ensures that the aggregated feature is

ED

discriminative and compact. The AggregationNet contains four convolutional layers, one pooling layer and one fully connected layer. The activation function 340

of AgrregationNet is PReLU defined in Eq. (2).

PT

Deep embedding Unlike natural image and sentence, a minutia’s neighbors have no natural ordering. Thus, we employ 1-D convolution instead of the conventional 2-D convolution to implement feature embedding. Let fM i =

CE

[e1 , e2 , · · · , en ] ∈ Rn be a n-dimensional MDC feature vector corresponding the

i-th minutia in the fingerprint. Let ei:i+j refers to the connective elements of one DMC feature: [ei , ei+1 , · · · , ei+j ]. A basic 1-D convolution operation involves

AC

345

filter (convolutional kernel) w ∈ Rh , which is applied to a window with size of

1 × h to generate a new feature vector. The element ai of new feature vector a is calculated by Eq. (7):

ai = f (w · ei:i+h−1 + b) 17

(7)

ACCEPTED MANUSCRIPT

where b ∈ R is the bias term while f is the activation function, PReLU. This operation is applied to each possible window of MDC feature [e1:h , e2:h+1 , · · · , en−h+1:n ]

and produce a new feature vector a = [a1 , a2 , · · · , an−h+1 ] ∈ Rn−h+1 . It is il-

CR IP T

lustrated in Fig. 8. One convolutional layer contains multiple filters and each MDC feature vector shares the same weights. Going through four convolutional layers, the MDC feature fM i is embedded into a high-dimensional feature vector fE i , which is defined in the following. fE i = g(Wc ∗ fM i )

(8)

AN US

where Wc denotes all parameters of convolutional layers. ‘∗’ denotes a set of

convolution operations and g(·) denotes a set of PReLU operations. This operation is applied to each MDC feature {fM 1 , fM 2 , · · · , fM s } and produce a new feature map defined in the following.

fEM = {fE 1 , fE 2 , · · · , fE s }

where s denotes the number of MDC features in one fingerprint.

M

350

(9)

Pooling Different from sentences, the minutia’s neighbors have no natural

ED

ordering. Hence, the pooling operation has to implement on an unordered set of vectors. An ideal pooling scheme should be invariant to permutations of its input while still be trainable. There are two candidate pooling schemes: average-

PT

pooling and max-pooling. The first pooling scheme is average operation, in which we simply take the elementwise mean of the vectors {fE 1 , fE 2 , · · · , fE s }.

CE

In contrast, the max-pooling operation takes the elementwise max of the vectors {fE 1 , fE 2 , · · · , fE s }. Different conventional DCNN, we employ PReLU

activaiton function on the pooling layer to improve representational capacity.

AC

Hence, the aggregated feature vector of average-pooling and max-pooling are defined in Eq. (10) and (11), respectively. Comparison experimental result reveals that average-pooling achieves better retrieval performance. Therefore, we utilize average-pooling in the AggregationNet. s

fP = f (

1X fE i ) s i=1

18

(10)

ACCEPTED MANUSCRIPT

fP = f (fEM ) = f (max({fE i , ∀i ∈ {1, 2, · · · , s}}))

(11)

where s denotes the number of MDC features in one fingerprint. f (·) denotes

CR IP T

the activation function, PReLU. max denotes the element-wise max operator.

Optimization The aggregation model is designed to be derivable and the gradients are obtained by back-propagation [32] to perform optimization. Metric 355

learning is utilized to ensure that the similarity of images from the same identity should be larger than that from different identities. Finally, the AggregationNet

is trained by joint supervision of softmax loss and center loss for discriminative

AN US

feature learning. The whole loss function is defined in Eq. (3). Each finger is served as a class. Hard sample mining [31] is also employed on center loss. The 360

optimization of AggregationNet is still confronted with the challenge of scarcity of training data. To overcome this problem, we utilize two forms of data augmentation: minutia omitting and spurious minutia addition. We random omit

M

few MDC features or add MDC features to the template to augment training data. Besides, we collect and share a PUF fingerprint database. We also em365

ploy dropout technique [44] on the pooling layer to alleviate the overfitting of

ED

networks. It forces the model to learn more robust feature detectors. The learned aggregation model consists of four convolutional layers and one pooling layer. We view the output of the pooling layer as fP . To improve

PT

the computational efficiency of similarity, fP is normalized to generate the

CE

Aggregated MDC (AMDC) feature fA , which is defined in Eq. (12). fA =

fP kfP k2

(12)

AC

4. Fingerprint indexing In the proposed fingerprint indexing system, each fingerprint is represented

as an AMDC feature with fixed length. We need to measure the similarity between two AMDC features. The deep convolutional feature is usually measured by cosine distance. In this paper, the feature fA is normalized, hence we

19

ACCEPTED MANUSCRIPT

can utilize the scalar product of feature vectors to measure the similarity. The similarity between two AMDC features is defined in Eq. (13). (13)

CR IP T

Sim(fA 1 , fA 2 ) = hfA 1 , fA 2 i

where fA 1 and fA 2 denote two different aggregated feature vectors of fingerprints. hi denotes the scalar product.

The fingerprint indexing system illustrated in Fig. 3b contains two stages:

370

offline stage and online stage. Training neural networks and extracting the features of gallery images are proceeded during the offline stage. During online

AN US

stage, we extract the AMDC feature of query fingerprint image and compute the similarity between query fingerprint and gallery fingerprints by Eq. (13). 375

Finally, the top M ranked fingerprints are selected as the candidate list by sorting the gallery fingerprints according to the similarity.

M

5. Performance evaluation

In this section, we first evaluate the performance of MDC feature and the proposed aggregation approach. Then, indexing accuracy and efficiency of the proposed method are evaluated on the basis of five databases: FVC2000 DB2a,

ED

380

FVC2000 DB3a, NIST4, NIST4 natural and NIST14 database.

PT

5.1. Experimental details

In experiments, FVC2002, FVC2004, FVC2006, PUF database and other

CE

rolled fingerprint images that collected in our lab are used to train the De-

385

scriptorNet and AggregationNet. The minutia patches are selected carefully for training. To ensure the minutia class reliable, the selected minutia should meet

AC

the following two conditions: firstly, the minutia can be extracted in all impressions of the same fingerprint; secondly, there is no overlapping region between two different minutia classes. Finally, we generate 9272 minutia classes to train

390

the DescriptorNet, and each class contains an average of 9 samples. In total, 6855 classes of finger are employed to train the AgrregationNet. Each finger

20

ACCEPTED MANUSCRIPT

class contains 9 impressions on average. The model is implemented by Python with Pytorch package. The size of minutia-centred local region is 128 × 128 and the threshold minr of area ratio is 0.75. The dropout rate p is 0.75. The coefficient λ of center loss is 0.001 and the learning rate is 0.03. The top-10 hard

CR IP T

395

samples are mined to compute the center loss. The dimension of final AMDC feature is 300.

To evaluate the indexing performance of our approach, this paper carried out experiments on five benchmark databases:

• FVC2000 DB2a: The second set of FVC2000 database, which contains 800

400

AN US

fingerprint images taken from 100 fingers (eight impressions per finger).

• FVC2000 DB3a: The third set of FVC2000 database, which contains 800 fingerprint images taken from 100 fingers (eight impressions per finger). It is a typical fingerprint database in which images are of bad quality. • NIST4: The NIST special database 4 contains 2000 rolled fingerprint im-

405

M

age pairs. The fingerprints are evenly distributed over each of five classes (Right Loop, Left Loop, Whorl, Arch and Tented Arch). • Modified NIST4 (natural): We extract 2408 fingerprint images taken from

ED

1204 fingers (two expressions per finger) from NIST4 database following the natural proportion of Henry class distribution [1].

410

PT

• NIST14: The last 2700 fingerprint pairs of NIST special database 14 database. The five classes distribution of this database resembles the

CE

fingerprint distribution in nature.

There are two standard indicators to evaluate the performance of fingerprint

415

indexing methods in the published literatures. The first indicator is the trade-off

AC

between error rate and penetration rate, which usually depends on the maximum number of candidates list. The error rate denotes the fraction of query fingerprints whose mate is not present in the candidate list. The penetration rate denotes the portion of fingerprints in the gallery database that have to be

420

searched. The second indicator of fingerprint indexing system is the average penetration rate during incremental search, which utilizes an ideal matching al21

ACCEPTED MANUSCRIPT

MCC + pairwise MDC + pairwise MDC + AggregationNet

6 4 2 0 0

5

10 15 20 penetration rate (%)

CR IP T

error rate (%)

8

25

30

AN US

Figure 9: Comparison experimental results of minutia descriptors on NIST4 natural database

error rate (%)

2.5

One convolutional layer Two convolutional layers Three convolutional layers Four convolutional layers

2 1.5 1

0 5

10 15 20 penetration rate (%)

25

30

ED

0

M

0.5

Figure 10: Fingerprint indexing results of AggregationNets with different number of convolutional layers on NIST4 natural database

PT

gorithm to stop the search as soon as the true mate is retrieved. In experiments, the first samples are used as the gallery database templates to be indexed while the other samples serve as query fingerprints for indexing, and cross validation is performed to test.

CE 425

AC

5.2. Evaluating the design of indexing system MDC VS. MCC The typical indexing method based on minutiae local

features measures the similarity between fingerprints by comparing each local feature pairs. The similarity between fingerprints is defined in Eq. (14). Based on Eq. (14), experiments are carried out to compare the indexing performance of MDC and MCC descriptor without aggregation. The experimental results

22

ACCEPTED MANUSCRIPT

Table 1: Error rate versus penetration rate of different pooling schemes on NIST4 natural database. Best results are highlighted with bold font.

1

2

5

10

Max

0.88%

0.63%

0.42%

Average

0.51%

0.46%

0.38%

CR IP T

Penetration rate (%) Pooling scheme

31

36

0.31%

0.16%

0.0%

0.25%

0.0%

0.0%

illustrated in Fig. 9 show that the error rate of MDC feature is much lower

AN US

than that of MCC. It mainly because MDC feature represents both minutiae and ridge attributes which is verified in section 6. P fi ∈V1 maxfj ∈V2 {sim(fi , fj )} Sim(V1 , V2 ) = |V1 |

(14)

where |V1 | denotes the cardinality of set V1 . V1 and V2 denote minutiae local

M

features set of different fingerprints. fi and fj denote minutia local feature. Architecture of AggregationNet evaluation The depth is crucial for 430

DCNN in computer vison tasks. To verify whether depth is crucial for Aggrega-

ED

tionNet, we carried out experiments with AggregationNets containing different number of convolutional layers on NIST4 natural database. The dimension of

PT

the aggregated feature vector in different AggregationNets is fixed 300. The experimental results illustrated in Fig. 10 reveal that deep AggregationNet 435

achieves better indexing performance than shallow AggregationNet. It mainly

CE

because deep neural network contains more expression ability for the variance of the same finger. In addition, we carried out experiments to evaluate the

AC

schemes of pooling. The experimental results summarized in Table 1 show that average-pooling achieves lower error rate than max-pooling. The penetration

440

rate of average-pooling is also lower than that of max-pooling when the error rate is zero (31% vs. 36%). Hence, this paper adopts average-pooling. Aggregation methods evaluation Comparison experiments are carried

out to compare the performance of the proposed method and four typical ag-

23

ACCEPTED MANUSCRIPT

Table 2: Error rate versus penetration rate of different aggregation methods on NIST-4 natural database. Best results are highlighted with bold font. (Agg.Net means AggregationNet.)

2

5

10

BoW

26.38%

19.37%

9.90%

5.62%

VLAD

8.60%

6.21%

3.95%

2.60%

FV

5.23%

3.37%

2.15%

1.05%

T-Embedding

2.95%

2.10%

1.40%

0.82%

Agg.Net

0.51%

0.46%

31

43

4.75%

4.12%

0.71%

0.36%

0.45%

0.21%

0.27%

0.0%

AN US

1

CR IP T

Penetration rate (%) Method

0.38%

0.25%

0.0%

0.0%

gregation approaches. The experimental results are summarized in Table 2. 445

The dimension of aggregated feature vector in four typical methods is 16000, and the feature dimension of our approach is 300. When the error rate is 0%,

M

the penetration rates of five aggregation methods are 87%, 64%, 59%, 43%, and 31%, respectively. The results show that our method with lower feature

450

ED

dimension achieves much better performance than other prominent methods. The AggregationNet with lower feature dimension achieves much better indexing accuracy than the preliminary version. Besides, experiment result in Fig.

PT

9 shows that aggregation method even achieves lower error rate than pairwise comparison defined in Eq. (14). Experimental results reveal that the proposed

CE

method is more suitable for MDC feature than other typical approaches. The influence of minutia detection on AggregationNet The proposed

455

method consists of two parts: DescriptorNet and AggregationNet. Firstly, un-

AC

like MCC, the DescriptorNet extracts the MDC feature from the enhanced image directly. It does not utilize the neighbor minutiae. Hence, both spurious minutiae and missing minutiae have no influence on the MDC feature. Secondly,

460

the average pooling of AggregationNet enables it to aggregate various number of MDC features into a single vector. During the training phase, two forms of

24

ACCEPTED MANUSCRIPT

1

0.6 0.4 0.2

Missing minutiae Spurious minutiae

0 0

20

Gallery fingerprint

40

60

80

100

missing/spurious minutiae number

AN US

(a) Fingerprint pair

CR IP T

Query fingerprint

similarity

0.8

(b) Similarity with different number of spurious or missing minutiae

Figure 11: The influence of minutia detection on AggregationNet

data augmentation, minutia omitting and spurious minutia addition, are proposed to suppress the influence of minutia detection on AMDC feature. Hence,

465

in some degree.

M

the AMDC feature is invariant to the spurious minutiae and missing minutiae

The experiment was carried out to analyze the influence of minutia detection

ED

on the AMDC feature. We added different number of spurious minutiae to minutiae set or removed different number of true minutiae from minutiae set. Subsequently, we calculated the similarity between AMDC feature extracted with the new minutiae set and its true mate AMDC feature extracted with ideal

PT

470

minutiae set. Figure 11 illustrates the experimental results, in which spurious

CE

minutiae and missing minutiae are imposed on the query image. The location and direction of spurious minutia are generated randomly with the constraint of locating in the foreground. The query fingerprint contains 103 true minutiae. Figure 11 shows that the similarity almost remains constant when the number of

AC

475

spurious minutiae is less than 15. After that, the similarity declines quite slowly as the number of spurious minutiae increasing. It reveals that a small number of spurious minutiae have no influence on AMDC feature and a large number of spurious minutiae have slight influence on the AMDC feature. For a small

25

ACCEPTED MANUSCRIPT

10

error rate (%)

8 6 4 2 0 0

5

10

15

penetration rate (%)

20

Indexing results on FVC2000 DB2a database. (MCC: Minutia Cylinder-Code,

AN US

Figure 12:

CR IP T

Proposed method PDC [4] MCC [6] Multiple features [9] Directional field [3] FingerCode [11] Minutiae triplets [15] Expanded DT [16]

PDC: Pyramid Deep Convolutional, DT: Delaunay Triangulation) 480

amount of missing minutiae, there is also no influence on the AMDC feature. Unfortunately, the similarity drops rapidly when few true minutiae (less than 20) are left.

M

5.3. Indexing performance

Accuracy Fig. 12, 13 and 14 illustrate the retrieval performance of the proposed approach and other prominent fingerprint indexing methods on FVC2000

ED

485

DB2a, DB3a, NIST4, NIST4 natural and NIST14 database. We can see that the proposed method achieves lower error rate than other prominent approaches

PT

on five benchmark databases. Among them, the performance of our method on FVC2000 DB3a database is slightly weaker than that on other four databases. 490

It is mainly because fingerprint images in FVC2000 DB3a database are of bad

CE

quality. Table 3 reports the average penetration rate for the incremental search scenario on all the benchmark databases. Note that the average penetration rate

AC

of DC (Deep Convolutional) method reported in Table 3 is estimated from the plot in literature [5]. It shows that our approach achieves the state-of-the-art

495

performance on five benchmark databases. Note that DC method in literature [5] utilized a large-scale non-public database

(MSP longitudinal database [46]) to train the DCNN. However, we only use about one-seventh of feature dimension (300 vs. 2048) and one-seventh of train-

26

ACCEPTED MANUSCRIPT

12

error rate (%)

10 8 6 4 2 0 0

5

10

15

penetration rate (%)

20

Indexing results on FVC2000 DB3a database. (MCC: Minutia Cylinder-Code,

AN US

Figure 13:

CR IP T

Proposed method PDC [4] MCC [6] Directional field [3] FingerCode [11] Minutiae triplets [15] Expanded DT [16]

PDC: Pyramid Deep Convolutional, DT: Delaunay Triangulation)

ing data (62K vs. 440K), and achieves better indexing performance (e.g. 0.41% 500

vs. 1.07% error rate on NIST14 database when penetration rate is 1%). Efficiency To evaluate the computational efficiency of feature extraction, the proposed approach is implemented by Python language with Pytorch on

M

TITAN X. The enhancement and minutiae extraction with FingerNet spends 183.26ms. The average time of extracting all MDC features from an fingerprint image is 40.14ms. Aggregating all MDC features into a fixed-length vector costs

ED

505

3.84ms. In total, the time of extracting an AMDC feature from original fingerprint image is 0.23s. The speed of feature extraction for fingerprint indexing is

PT

feasible compared with the hand-designed feature extraction methods. Experiments are carried out to compare the retrieval efficiency of our approach and MCC-based method[6]. Both approaches are implemented by MAT-

CE

510

LAB programming language and executed with Intel Core i5 CPU at 2.4GHz. As a result, searching a fingerprint against gallery database with 2000 images,

AC

proposed method requires 0.31 milliseconds, but MCC-based method spends 56.5 milliseconds. The efficiency of our method is about two orders of magni-

515

tude faster than MCC-based method.

27

ACCEPTED MANUSCRIPT

10

error rate (%)

8 6 4 2 0 0

2

4

6

8

10

12

14

penetration rate (%)

10

18

20

Proposed method PDC [4] MCC [6] Directional field [3] FingerCode [11] Minutiae triplets [15]

8 6 4

M

error rate (%)

16

AN US

(a) Indexing results on NIST4 database

CR IP T

Proposed method DC [5] PDC [4] MCC [6] Directional field [3] Minutiae triplets [15] Expanded DT [16]

2 0 2

4

6

ED

0

8

10

12

14

16

18

20

penetration rate (%)

(b) Indexing results on NIST4 natural database

PT

10

Proposed method DC [5] MCC [6] Directional field [2] Minutiae triplets [15] Expanded DT [16] Minutiae pairs [24]

6

AC

CE

error rate (%)

8

4 2 0

0

2

4

6

8

10

12

14

16

18

20

penetration rate (%) (c) Indexing results on NIST14 database

Figure 14:

Indexing performance results on NIST databases compared with other pub-

lished results (DC: Deep Convolutional, PDC: Pyramid Deep Convolutional, MCC: Minutia Cylinder-Code, DT: Delaunay Triangulation).

28

ACCEPTED MANUSCRIPT

Table 3: Average penetration rate for incremental search scenario: best results are highlighted with bold font. (DC: Deep Convolutional, PDC: Pyramid Deep Convolutional, DF: Directional Field, Nat.: Natural)

FVC2000

DB14

Our approach

0.26%

0.28%

0.06%

DC [5]

-

0.48%

0.22%

PDC [4]

0.67%

0.79%

-

MCC [6]

1.32%

1.59%

2.19%

DF [3]

2.93%

-

DF [2]

6.90%

DF [9]

-

Triplet [9]

-

1.02%

1.21%

-

-

1.29%

2.02%

1.72%

3.63%

-

-

-

-

-

-

-

-

-

2.58%

-

-

-

7.27%

-

-

-

-

2.40%

-

-

-

-

1.34%

-

ED

Combined [9]

DB3a

AN US

DB4

FingerCode [9]

DB2a

M

DB4 Nat.

CR IP T

NIST Method

6. What is learned in DescriptorNet?

PT

We do not explicitly constrain the DescriptorNet to learn the ridge structures. A natural question is what attributes are represented in the learned minu-

CE

tia descriptor. However, understanding the learned feature is still a challenging

520

problem. Focussing on this issue, we propose a steerable fingerprint generation method to verify that the learned descriptor represents the attributes of

AC

minutiae and ridges. Qualitative analysis To explore the representation of MDC feature, we

employ Deconvnet [47] to visualize the MDC feature. In the process of exper-

525

iment, we map the whole MDC feature and single component of MDC feature back to input pixel space. The experimental result is illustrated in Fig. 15b,

29

ACCEPTED MANUSCRIPT

(b) feature

(c) 1st comp. (d) 2nd comp. (e) 3rd comp.

(f) 4th comp.

CR IP T

(a) input image

Figure 15: Visualizing the MDC feature. (b) whole feature (c-f) single component. (comp. denotes component)

1

MDC feature

ƓƵ = 10°

0.6

AN US

Original

similarity

0.8

0.4 0.2 0

ƓƵ = 30°

ƓƵ = 45°

(a) Generated images

10

20

30 40 " 3 (°)

50

60

(b) Similarity with different angles

Explore different angles of minutia. (∆θ denotes angle increment of minutia

compared with original image.)

M

Figure 16:

0

ED

and shows that MDC represents most ridges of the input image. Besides, the border of local region is ignored in MDC feature which is helpful to alleviate border errors. We visualize the maximum three components shown in Fig. 15, respectively. It reveals that the second and fourth components describe the

PT

530

minutiae information, while the others represent the ridge information.

CE

Quantitative analysis We utilize controlling variable method to verify whether MDC feature describes the angle, spatial location of minutiae or curvature of ridges. Inspired by literature [48], the fingerprint image is generated from minutiae and orientation field. The ideal fingerprint can be represented

AC

535

as a 2D FM [48] signal: I(x, y) = cos(Ψ(x, y)). The phase can be uniquely

decomposed into two parts: the continuous phase and the spiral phase, defined in equation 15. To change the curvature of ridges, the continuous phase is generated by equation 16 and the spiral phase is generated by equation 17.

30

(a) Original

(b) (20,25)

(c) (73,82)

(d) a2 = 1

1

1

y

CR IP T

ACCEPTED MANUSCRIPT

(e) a2 = 1.5.

MDC feature

20

0.8

0.95

similarity

40 0.9

60 0.85

0.6 0.4

80

0.2

AN US

0.8

100

0

0.75

120 20

40

60

80

100

120

0

[ VLPLODULW\

(f) Spatial location changed

0.5

1

1.5

2

2.5

a2

(g) Curvature changed

Figure 17: Explore location of minutiae and curvature of ridges. ((x, y) denotes the spatial location of minutia.)

The value of a2 reflect the curvature of ridges. We change the angle, spatial

M

540

location of minutiae or the curvature of ridges respectively, then compare the

ED

similarity between changed image and original image. The similarity is defined as Sim(fM 1 , fM 2 ) = hfM 1 , fM 2 i. Fig. 16 and Fig. 17 show that MDC feature represents the angle, spatial location of minutiae and curvature of ridges. Besides, they show that similarity almost stays in one when angle, location of

PT

545

minutia or curvature of ridges changes slightly, but similarity decreases rapidly

CE

when angle, location or curvature changes largely. This attribute is helpful to

AC

distinguish inter-class and ensure invariance to distortion of intra-class.

Ψ(x, y) = ΨC (x, y) + ΨS (x, y)

ΨC (x, y) =

ΨS (x, y) =

s X

r

x2 + y2 a2

pi arctan(

i=1

31

y − yi ) x − xi

(15)

(16)

(17)

ACCEPTED MANUSCRIPT

where xi and yi denote the coordinates of the ith spiral and pi ∈ {1, −1} denotes 550

its polarity. s denotes the number of spirals (or minutiae). Summarizing all the analysis experimental results, two remarks can be made.

CR IP T

Firstly, compared with MCC, the MDC feature represents more discriminative information containing attributes of minutiae and ridges. Secondly, MDC feature is invariant to distortion in some degree.

555

7. Conclusion

In this paper, a novel fingerprint indexing method based on representation

AN US

learning is proposed to improve the retrieval accuracy and efficiency. The al-

gorithm is composed of two major components: minutia descriptor and aggregation model, which are both learned by DCNN. In contrast to the traditional 560

approach, the learned minutia descriptor effectively represents the ridge structure of fingerprint, which is an abundant information source for fingerprint in-

M

dexing. What’s more, MDC features from one fingerprint are aggregated into a fixed-length, discriminative and compact vector with a novel AggregationNet. It takes full advantage of data driven. The proposed AMDC feature has four advantages over other conventional features: (1) carrying more discriminative

ED

565

information; (2) rotation invariant; (3) insensitive to nonlinear distortion; (4)

PT

compact. The validity that MDC feature represents the attributes of minutiae and ridges is verified by a steerable fingerprint generation method. Finally, experimental results show that the proposed AMDC method dramatically improves the indexing performance in accuracy and efficiency on the basis of five

CE

570

benchmark databases. We consider MDC features of one fingerprint as an unordered set of vectors.

AC

It ignores the location information of MDC features and the global pattern of fingerprints. There are two strategies to address these issues in our future work.

575

Firstly, the 2-D topology structure of MDC features can be considered in the AggregationNet to improve the retrieval accuracy, and it is constructed with the location of MDC features. Secondly, the global feature of the fingerprint

32

ACCEPTED MANUSCRIPT

can be integrated to further promote the indexing performance.

Acknowledgement

CR IP T

This work was supported by National Natural Science Foundation of China

580

(NSFC) under grant 61333015. Thanks to Founder Interntional Co., Ltd who has sponsored us to collect fingerprint data at Peking University.

References

AN US

[1] D. Maltoni, D. Maio, A. K. Jain, S. Prabhakar, Handbook of fingerprint recognition, Springer Science & Business Media, 2009.

585

[2] A. Lumini, D. Maio, D. Maltoni, Continuous versus exclusive classification for fingerprint retrieval, Pattern Recognition Letters 18 (10) (1997) 1027– 1034.

M

[3] X. Jiang, M. Liu, A. C. Kot, Fingerprint retrieval for identification, IEEE Transactions on Information Forensics and Security 1 (4) (2006) 532–542.

590

ED

[4] D. Song, J. Feng, Fingerprint indexing based on pyramid deep convolutional feature, in: IEEE International Joint Conference on Biometrics, 2017, pp.

PT

200–207.

[5] K. Cao, A. K. Jain, Fingerprint indexing and matching: An integrated approach, in: IEEE International Joint Conference on Biometrics, 2017,

595

CE

pp. 437–445.

AC

[6] R. Cappelli, M. Ferrara, D. Maltoni, Fingerprint indexing based on minutia

600

cylinder-code, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5) (2011) 1051–1057.

[7] D. Song, Y. Tang, J. Feng, Fingerprint indexing based on minutia-centred deep convolutional features, in: 4th IAPR Asian Conference on Pattern Recognition, 2017, pp. 770–775.

33

ACCEPTED MANUSCRIPT

[8] L. Zheng, Y. Yang, Q. Tian, Sift meets cnn: A decade survey of instance retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence pp (99) (2017) 1–20.

605

CR IP T

[9] J. D. Boer, A. M. Bazen, S. H. Gerez, Indexing fingerprint databases based

on multiple features, Proc.annu.workshop Circuits Systems Signal Processing.

[10] M. Liu, P. T. Yap, Invariant representation of orientation fields for fingerprint indexing, Pattern Recognition 45 (7) (2012) 2532–2542.

610

AN US

[11] K. C. Leung, C. H. Leung, Improvement of fingerprint retrieval by a statistical classifier, IEEE Transactions on Information Forensics and Security 6 (1) (2011) 59–69.

[12] T. H. Le, H. T. Van, Fingerprint reference point detection for image re-

3360–3372.

M

trieval based on symmetry and variation, Pattern Recognition 45 (9) (2012)

615

[13] B. Bhanu, X. Tan, Fingerprint indexing based on novel features of minutiae

ED

triplets, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (5) (2003) 616–622.

[14] W. Zhou, J. Hu, S. Wang, I. Petersen, M. Bennamoun, Fingerprint indexing

PT

620

based on combination of novel minutiae triplet features, Applied Mechanics

CE

and Materials 644-650 (2014) 377–388. [15] A. Gago-Alonso, J. Hern´ aNdez-Palancar, E. Rodr´ıGuez-Reina, A. Mu˜ nOzBrise˜ nO, Indexing and retrieving in fingerprint databases under structural

AC

625

distortions, Expert Systems with Applications 40 (8) (2013) 2858–2871.

[16] J. Khodadoust, A. M. Khodadoust, Fingerprint indexing based on expanded delaunay triangulation, Expert Systems with Applications 81 (2017) 251–267.

34

ACCEPTED MANUSCRIPT

[17] O. Iloanusi, A. Gyaourova, A. Ross, Indexing fingerprints using minutiae quadruplets, in: 2011 IEEE Computer Society Conference on Computer

630

Vision and Pattern Recognition Workshops, 2011, pp. 127–133.

CR IP T

[18] O. N. Iloanusi, Fusion of finger types for fingerprint indexing using minutiae quadruplets, Pattern Recognition Letters 38 (2014) 8–14.

[19] Y. Su, J. Feng, J. Zhou, Fingerprint indexing with pose constraint, Pattern Recognition 54 (2016) 1–13.

635

[20] X. Shuai, C. Zhang, P. Hao, Fingerprint indexing based on composite set of

AN US

reduced sift features, in: 19th International Conference on Pattern Recognition, 2008, pp. 1–4.

[21] J. Feng, A. Cai, Fingerprint indexing using ridge invariants, in: 18th International Conference on Pattern Recognition, Vol. 4, 2006, pp. 433–436.

640

[22] A. Gyaourova, A. Ross, A novel coding scheme for indexing fingerprint

M

patterns, in: Joint Iapr International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, 2008, pp. 755–764.

ED

[23] E. Zhu, X. Guo, J. Yin, Walking to singular points of fingerprints, Pattern Recognition 56 (2016) 116–128.

645

PT

[24] J. Khodadoust, A. M. Khodadoust, Fingerprint indexing based on minutiae pairs and convex core point, Pattern Recognition 67 (2017) 110–126.

CE

[25] F. Benhammadi, M. N. Amirouche, H. Hentous, K. Bey Beghdad, M. Aissani, Fingerprint matching from minutiae texture maps, Pattern Recognition 40 (1) (2007) 189–197.

AC

650

[26] R. Zhou, D. Zhong, J. Han, Fingerprint identification using sift-based minutia descriptors and improved all descriptor-pair matching, Sensors 13 (3) (2013) 3142–56.

[27] K. Cao, A. K. Jain, Automated latent fingerprint recognition, IEEE Trans655

actions on Pattern Analysis and Machine Intelligence (2018) 1–14. 35

ACCEPTED MANUSCRIPT

[28] P. Mansukhani, S. Tulyakov, V. Govindaraju, A framework for efficient fingerprint identification using a minutiae tree, IEEE Systems Journal 4 (2) (2010) 126–137.

CR IP T

[29] J. Sivic, A. Zisserman, Video google: A text retrieval approach to object

matching in videos, in: IEEE International Conference on Computer Vi-

660

sion, 2003, p. 1470.

[30] Y. Wen, K. Zhang, Z. Li, Y. Qiao, A discriminative feature learning approach for deep face recognition, in: European Conference on Computer

665

AN US

Vision, 2016, pp. 499–515.

[31] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815–823. [32] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with

M

deep convolutional neural networks, in: International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.

670

ED

[33] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the

PT

IEEE international conference on computer vision, 2015, pp. 1026–1034. [34] Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, in: International Conference on Neural

675

CE

Information Processing Systems, 2014, pp. 1988–1996.

AC

[35] Y. Tang, F. Gao, J. Feng, Y. Liu, Fingernet: An unified deep network for

680

fingerprint minutiae extraction, in: IEEE International Joint Conference on Biometrics, 2017, pp. 108–116.

[36] J. Ouyang, J. Feng, J. Lu, J. Zhou, Fingerprint pose estimation based on faster r-cnn, in: IEEE International Joint Conference on Biometrics, 2017, pp. 268–276.

36

ACCEPTED MANUSCRIPT

[37] F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.

685

CR IP T

[38] F. Perronnin, J. S´ anchez, T. Mensink, Improving the fisher kernel for largescale image classification, in: European conference on computer vision, 2010, pp. 143–156.

[39] H. J´egou, F. Perronnin, M. Douze, J. S´ anchez, P. Perez, C. Schmid, Ag-

gregating local image descriptors into compact codes, IEEE Transactions

690

AN US

on Pattern Analysis and Machine Intelligence 34 (9) (2012) 1704–1716.

[40] H. J´egou, A. Zisserman, Triangulation embedding and democratic aggregation for image search, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3310–3317. 695

[41] A. Babenko, V. Lempitsky, Aggregating local deep features for image re-

M

trieval, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1269–1277.

ED

[42] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, J. Sivic, Netvlad: Cnn architecture for weakly supervised place recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016,

700

PT

pp. 5297–5307.

[43] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-

CE

scale image recognition, Computer Science.

[44] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhut-

AC

705

dinov, Improving neural networks by preventing co-adaptation of feature detectors, Computer Science 3 (4) (2012) pgs. 212–223.

[45] X. Zhang, J. Zhao, Y. LeCun, Character-level convolutional networks for text classification, in: International Conference on Neural Information Processing Systems, 2015, pp. 649–657.

37

ACCEPTED MANUSCRIPT

710

[46] S. Yoon, A. K. Jain, Longitudinal study of fingerprint recognition., Proceedings of the National Academy of Sciences of the United States of America 112 (28) (2015) 8555.

CR IP T

[47] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: European Conference on Computer Vision, 2014, pp. 818–833. 715

[48] J. Feng, A. K. Jain, Fingerprint reconstruction: from minutiae to phase, IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (2)

AC

CE

PT

ED

M

AN US

(2011) 209–223.

38

ACCEPTED MANUSCRIPT

Dehua Song received the B.S. degree from Beijing University of Posts and Telecommunications, China, in 2013. He is currently working toward the Ph.D. 720

degree in the Department of Machine Intelligence at Peking University. His cur-

and machine learning.

CR IP T

rent research interests include biometrics, computer vision, pattern recognition

Yao Tang received the B.S. degree from Xidian University, China, in 2014. He is currently working toward the Ph.D. degree in the Department of Ma725

chine Intelligence at Peking university. His current research interests include biometrics, computer vision and machine learning.

AN US

Jufu Feng is a professor of the School of Electronics Engineering and Com-

puter Science at Peking University. He received the B.S. and Ph.D. degrees in mathematics from Peking University, in 1989 and 1997, respectively. His current research interests include biometrics, computer vision and pattern recognition.

AC

CE

PT

ED

M

730

39