Multi-feature fusion for thermal face recognition

Multi-feature fusion for thermal face recognition

Reference: INFPHY 2042 To appear in: Infrared Physics & Technology Received Date: Revised Date: Accepted Date: 14 October 2015 10 May 2016 10 May...

867KB Sizes 0 Downloads 131 Views

Reference:

INFPHY 2042

To appear in:

Infrared Physics & Technology

Received Date: Revised Date: Accepted Date:

14 October 2015 10 May 2016 10 May 2016

Please cite this article as: Y. Bi, M. Lv, Y. Wei, N. Guan, W. Yi, Multi-feature fusion for thermal face recognition, Infrared Physics & Technology (2016), doi: http://dx.doi.org/10.1016/j.infrared.2016.05.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Multi-feature fusion for thermal face recognition Yin Bi,Mingsong Lv,Yangjie Wei,Nan Guan,Wang Yi College of Information Science and Engineering Northeastern University,China Shenyang,Liaoning, China,110809

Abstract Human face recognition has been researched for the last three decades. Face recognition with thermal images now attracts significant attention since they can be used in low/none illuminated environment. However, thermal face recognition performance is still insufficient for practical applications. One main reason is that most existing work leverage only single feature to characterize a face in a thermal image. To solve the problem, we propose multifeature fusion, a technique that combines multiple features in thermal face characterization and recognition. In this work, we designed a systematical way to combine four features, including Local binary pattern, Gabor jet descriptor, Weber local descriptor and Down-sampling feature. Experimental results show that our approach outperforms methods that leverage only a single feature and is robust to noise, occlusion, expression, low resolution and different l1 -minimization methods. Keywords: Feature fusion, Sparse representation, Thermal face recognition 1. Introduction Face recognition has a wide range of applications in the areas of video surveillance, information security, identity authentication, etc. For example, Smile-to-Pay system conducts the payment, account transfer and transaction via face instead of bank card and password; online photo sharing platform in Google Plus automatically recognizes individuals in photographs; face recognition systems are already used in immigration control in Japanese Airport. However, visible face recognition is still a challenge, mainly because it suffers

Preprint submitted to Infrared Physics & Technology

May 10, 2016

from complex environmental variations such as dim lighting, non-uniform illumination, viewing directions [1]. Moreover, attackers may reconstruct face pattern for spoofing in visible face recognition systems [2]. Therefore, thermal images have been considered as a solution in recent years since it captures the heat radiation form facial skin temperature clearly, even under complete dark environment [3]. Since facial skin temperature is closely related to the underlying blood vessels that are unique to each individual, the possibility of forging face pattern is largely reduced [4]. However, multiple factors, such as low resolution, high level of noise in the images, sensitivity to the temperature variation and opacity to glass, hinder thermal face recognition performance [5]. To improve the performance of thermal face recognition, several types of methods have been proposed, including (1) appearance-based methods, (2) local matching methods, and (3) global matching methods. However, these methods are still insufficient for practical use since they are sensitive to occlusion, noise and photographed directions. To solve these problems, we propose multi-feature fusion technique to improve the performance of thermal face recognition. Four features, including Local binary pattern, Gabor jet descriptor, Weber local descriptor and Down-sampling feature, are considered for face recognition. In the training stage, we designed a systematical way to properly assign weights to all features and combine them for face image characterization. In the testing stage, all the four features of a testing face are extracted. With the weights computed in the training stage, a final residual is computed, based on which the final recognition decision is made. Extensive experiments are conducted to evaluate the accuracy of the proposed method in different settings. Results show that our approach outperforms methods that leverage only a single feature and is robust to noise, occlusion, expression, low resolution and different l1 -minimization methods. The paper is organized as follows. Section 2 surveys existing methods for thermal face recognition. Base techniques used in our approach, including feature extraction and sparse representation classifier, are detailed in Section 3. Section 4 explains the overall framework of our approach with a focus on multi-feature fusion. Experiments and evaluation are given in Section 5. Discussion & Future Work are presented in Section 6, and the paper is concluded in Section 7.

2

2. Related Work Thermal face recognition has been a hot research topic in the past decades. Currently, there are three main types of approaches to do recognition: appearancebased approaches, local matching approaches and global matching approaches. Appearance-based approaches: Appearance-based approaches project face images into a subspace where the recognition is carried out. Typical technologies include PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis) and ICA (Independent Component Analysis) [6][7][8]. In general, feature extraction of PCA and LDA transforms thermal face image into a high dimension vector in an image vector space. With a small number of samplings, evaluation of the covariance matrix yields inaccurate results. Later, Desa et al. proposed KPCA (Kernel Principal Component Analysis) and KLDA (Kernel Linear Discriminant Analysis) citeDesa. Compared with PCA and LDA, KPCA and KLDA take into account higher order correlations and project face to an even higher dimensional feature space, but they still need to construct covariance matrix, which has similar problems as PCA and LDA. Local matching approaches: A facial image is divided into blocks, and descriptors (Local Binary Pattern, Gabor Jet Descriptor and Weber Linear Descriptors) are extracted from each block and concatenated in a one vector. Then, a specific classifier, such as KNN (K-nearest neighbor, Euclidean distance, artificial neural network, Chi-square similarity measure), is used for recognition [9] [3] [10] [11][12]. These descriptors have capabilities in different cases. LBP performs well in thermal image recognition, even in the presence of glasses without any preprocessing step. Moreover, LBP is robust to fixed-pattern noise. WLD has the best trade-off between high recognition rate and fast processing speed. Gabor Jet descriptor is suitable for edge and peak, valley and ridge detection in thermal image. However, LBP and WLD consider local and global gradient histogram as the efficient texture information representation and concatenate the regional features to get the global description of the thermal face. But thermal images have less gradient generating from face heat energy. It is hard to achieve an excellent recognition performance. Gabor Jet descriptor has low performance in outdoor setups and high complex computation. Global matching approaches: These methods are based on detection of the critical points and then extract the descriptor around them. Techniques include SIFT (Scale Invariant Feature Transform)-based methods and SURF 3

(Speeded-Up Robust Features)-based methods. In Gabriels work, SIFT are computed in the vascular images generated by processing the thermal images [13]. While in his later work, SIFT methodology is used to obtain the descriptors directly in the thermal face images [14]. These approaches obtained a recognition rate that depended strongly on the database used. SURF is inspired by the SIFT method, which obtains a similar image feature but a higher speed. Gabriel used OpenSURF implementation do the thermal face recognition [3]. SIFT extracts local interest points independently as the local descriptors, while SURF computes local interest points and descriptors at a higher speed than SIFT with a lower number of components. However, due to the lower quality and resolution of thermal face images, the interest points of inter-class are nearly the same as intra-class, therefore global matching approaches are not suitable for thermal face image. Besides, Bai used Y-styled Window filter to average SIFT termed as YWF-SIFT and then combined multi-scale fusion with YWF-SIFT, which are able to handle the facial rotation and occlusion problems such as wearing glasses [15] [16]. However, these methods usually lead to a number of mismatched feature points and hence degrades the performance. Besides the above methods, recent work uses vascular information of face to develop thermal face recognition systems [17] [18] [19] [20]. This is accomplished by extracting the network of blood vessels and facial geometric features which are unique to each individual, and then matching them using specific classifiers. These methods achieves only about 80% recognition rate in a non-public database, since vascular information cannot be extracted soundly and preciously, therefore they still need to be improved in order to achieve the same performance as local-matching methods. While some researchers focus on artificial intelligence and optimization algorithm to realize thermal face recognition, such as particle swarm optimization [21], genetic algorithms [22] [23], least squares [24], dictionary learning [24] [25], neural networks [26] [27], et.al. Although these methods can achieve higher recognition performance, the computation complexity is so demanding that they are not suitable for the real-time scenarios. Moreover, a group of researchers combined use of visible and thermal IR imaging sensors offers a viable means for improving the performance of face recognition techniques [28] [29] [30] [31], et.al. These methods can achieve higher recognition performance, but they need thermal and visible images at the same time, making hardware implement be complex. Features are an important component for thermal face recognition. Most 4

existing work uses single feature, which limits the accuracy it can achieve. So in this paper, we explore the possibility to systematically combine multiple features for better thermal face recognition. 3. Technical Background Feature extraction and classifier are the two main components of face recognition. In this section, we introduce the techniques adopted in our work as background knowledge. Our main framework will be presented in the next section. 3.1. Feature Extracted for Thermal Face Features are fundamental information that determines the performance of face recognition. In our work, we extract four main features for thermal images, detailed as follows. In this work, four features can be regarded as local features that illustrate efficient texture information and regional features. 3.1.1. Local binary pattern (LBP) The LBP operator was originally designed for texture description and is highly discriminative. This operator is very suitable for demanding image analysis tasks mainly due to its invariance to monotonic gray level changes, which is computationally efficient. The operator assigns a label to each pixel of an image, which is essentially a list of binary numbers, each of which represents the value relation between the center pixel and the corresponding neighbor pixel. The histogram of the labels can be used as a texture descriptor. Timo et al. are the first to use histograms of LBP as features for face recognition [32]. In our work, an extension to the original LBP descriptor called uniform patterns [33] is adopted. A local binary pattern is called uniform if the binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa when the bit pattern is considered circular. In the computation of the LBP histogram, each uniform pattern is assigned an independent bin and all non-uniform patterns are assigned to a single bin. In our experiments, images are firstly divided into 6 × 8 blocks, and we get 59 features for each block.

5

3.1.2. Gabor jet descriptors (Gabor) Gabor jet descriptors can perfectly represent the texture information by using the magnitudes of Gabor wavelet transformation. Gabor wavelets, which are spatially localized and selective to spatial orientations and scales, are comparable to the receptive fields of simple cells in the visual cortex [34]. Moreover, since Gabor wavelets detect amplitude-invariant spatial frequencies of pixel gray values, they are robust in the presence of illumination changes. In Gabor feature representation, five scales and eight orientations of Gabor filters are adopted in our feature extraction procedure, and Gabor magnitudes are used as local feature because Gabor phases change linearly with small displacements, thus 320 Gabor magnitudes are leveraged as features. 3.1.3. Weber local descriptor (WLD) Weber local descriptor (WLD) is able to extract a broad range of local texture information, and is suitable for texture-based recognition. An image is first divided into blocks, and then descriptors are extracted from each block and concatenated in a one vector. WLD contains two components: differential excitation and orientation [35]. Differential excitation is the sum of differences between the center pixel and its surroundings pixels divided by the center pixel. Orientation is the ratio of the change in horizontal directions to that in the vertical direction of an image. Once differential excitation and orientation are computed, WLD descriptor is built by converting the twodimensional histogram into a one-dimensional histogram. Before extracting features, images are divided into 6 × 8 blocks, and for each block 240 features are extracted in the experiments. 3.1.4. Down-sampling features (DS) Down-sampling refers to the process of reducing the resolution of an image. For example, in our work, we down-sample the original thermal face image with size 320 × 240 to size 12 × 10. Down-sampling feature, i.e., the small size image, can describe global characteristics of a face, such as face outline and the positions of facial organs, with few data. Compared to the above features, down-sampling is easy to implement and has lower computation complexity. In our work, linear sampling method is used to down-sample an image.

6

3.2. Sparse Representation Classifier (SRC) Sparse representation classifier [36] refers to the general framework of face recognition from a single feature, which includes dictionary construction, sparse representation, L1 optimization and minimal residual computation. 3.2.1. Dictionary construction & sparse representation Sparse representation classifier is essentially a supervised learning algorithm which uses training samples from k distinct object classes to correctly determine the class to which a new test sample belongs. Suppose there are ni training samples from the i-th class, then a matrix Ai = [Vi,1 , Vi,2 , · · · , Vi,ni ]

(1)

is constructed, which represents the training face images of the i-th class (note that Vi,ni values have the same scale). Then, a dictionary matrix A defined as A = [A1 , A2 , · · · , Ak ]

(2)

is constructed to represent the entire training set by concatenating the ni training samples of all k classes. Given sufficient training faces of the i-th class, any new test face y from this class can be approximately represented by the training faces associated with class i : y = αi,1 vi,1 + αi,2 vi,2 + · · · + αi,ni vi,ni

(3)

Then, the linear representation of y can be rewritten as y = Ax0

(4)

where x0 = [α1,1 , · · ·, α1,n1 , · · ·, αk,1 , · · ·, αk,nk ]T is a coefficient vector regarding all training samples. Ideally, for a y belonging to class i, x0 = [0, · · ·, 0, αi,1 , αi,2 , · · ·, αi,ni , 0, · · ·, 0]T . 3.2.2. L1 optimisation A test face y can be sufficiently represented using only the training faces from the same class. This representation is naturally sparse if k, the number of object classes, is reasonably large. This motivates us to seek the sparsest solution to y = Ax0 by solving the following optimization problem: 7

xˆ0 = argmin kxk0 s.t.Ax0 = y

(5)

Where k · k0 denotes the l0 -norm, which counts the number of nonzero entries in a vector. However, the problem of finding the sparsest solution of an underdetermined system of linear equations is NP-hard. According to the recent theory of sparse representation and compressed sensing: if the solution x0 is sparse enough, the solution of the l0 -minimization problem is equal to the solution of the following l1 -minimization problem: xˆ1 = argmin kxk1 s.t.Ax0 = y

(6)

The new problem can be solved in polynomial time by standard linear programming methods. 3.2.3. Minimal residual for classification Ideally, all the entries in xˆ1 associated to the correct class are non-zero, and the other entries are zero. However, noise and modeling errors may lead to small non-zero values for the entries of incorrect classes. We define a function δ i which sets the entries of classes other than i in xˆ1 to zero. Then, we can approximate the given test sample y regarding class i as yˆi = Aδ i (ˆ x1 ). The final recognized class ˆi of the test face image y is computed by ˆi = argminky − Aδ i (ˆ x1 )k2

(7)

i.e., the final class has the minimal residual among all classes. 4. Our Method: Multi-Feature Fusion In this section, we show how to improve thermal face recognition quality by the proposed multi-feature fusion technique. Fig. 1 illustrates our thermal face recognition framework which comprises four important components: feature extraction, local classification, feature weight vector computation, and final residual computation. In the training stage, labeled thermal faces are recognized by SRC for each single feature. Then, a loss function is defined and minimized to determine the weight of each feature based on the recognition results and residual from the previous step. In the recognition stage, residuals of the testing face are obtained by the same technique as in the training stage, i.e., feature extraction and SRC classification. With the weights computed in the training stage, now we can compute a final residual 8

Training Stage Labeled Face

Residual,

Feature Extraction

A&y

Face Type SRC Classification

Compute W

𝑤 = argmin‖𝑒 − 𝐹𝑤‖22 + 𝜆‖𝑤‖1 Matrix A

Weight Vector: W Recognition Stage Testing Face

Feature Extraction

y

SRC Residual Classification

argmin‖𝑥‖1 , 𝑠. 𝑡. 𝐴𝑥0 = 𝑦

Compute Final Residual

Identify Face

𝑣

R𝑖 =

𝑤𝑗 𝑟𝑖𝑗 𝑗=1

Figure 1: Multi-feature fusion technology for thermal face recognition framework

for the testing face, based on which the final recognition decision is made. Multi-feature fusion is highlighted by the weight vector computation and the final residual computation components. Now we provide more technical details on the analysis components. 4.1. Local Classifier Recognition As described in section 3-3.2, a face recognition system contains matrix A and finds the optimal x0 for a test face y. The method to construct matrix A is described in section 3-3.1 for the training face images. Different face recognition systems can be constructed to form several local classifiers regarding different features. Accordingly, we obtain several recognition results and residuals for a face image. These recognition results and residuals are used to compute the feature weight vector in the training stage which is later used in the recognition stage to compute the final residual considering all features to identify a face. 4.2. Feature Weight Computation To explain how to assign weights to the features, we begin with some terms and notions. Suppose there are N labeled faces, from which we randomly choose t (t < N) faces with labels Li (i = 1, 2, · · ·, t) for testing 9

(Note that here the testing faces are not those in the recognition stage). Each face is represented by v features, which means there are v local classifiers since each type feature combined with SRC constructs a local classifier Cj (j = 1, 2, · · ·, v). Further, we define Rij as the minimal residual of the i-th test face under the local classifier Cj . Tij is the classification result of the i-th test face under local classifier Cj . For all test faces, we define a residual matrix f={fi,j } as ( Rij if Li = Tij fi,j = g(Li , Tij ) = (8) −Rij if Li 6= Tij To exploit multi-feature fusion and inspired by [37], for all testing faces in setS, we define the loss function as t v X X L(S) = [1 − wj fi,j ]2 = ke − F wk2 i=1

(9)

j=1 T

where w = [w1 , w2 , · · ·, wv ] is the weight vector regarding all involved features, and e is the vector of all ones whose size is the number of testing face. To obtain the optimal weight vector W, we minimize the total prediction error of all testing faces, and use least-squares and regularization method for optimal learning: wˆ = argmin ke − F wk22 + λ kwk1 s.t.

v X

(10)

wj = 1, wj > 0, j = 1, 2, · · ·, v

j=1

Where kwk1 is the regularization term, and λ is the regularization parameter. 4.3. Final Recognition Result Suppose the system has T classes and includes v local classifier. For each test face, we get T residuals under each local classifier, denoted by ri,j , (i = 1, 2, · · ·, T ; j = 1, 2, · · ·, v). Hence the final residual Ri for each test face considering all features can be defined as Ri =

v X

wj · ri,j .

j=1

10

(11)

So the class of test face can be determined by ˆi = argmini Ri .

(12)

5. Experiments 5.1. Experimental Setup To evaluate the performance of our proposed method, we conduct extensive experiments on data from public IRIS Thermal/Visible Face Database [38]. The set comprises 4228 pairs of 320 × 240 pixel thermal and visible face images which were concurrently acquired but are not mutually co-registered. 30 individuals with respect to expressions, poses, and illuminations are photographed. The five illumination conditions were obtained using different on/off combinations of lights. In our experiments, we select images from two light conditions: (i) all light sources off and (ii) all light on, thus 22 face images for each subject under dark and light environment from different directions are leveraged in our experiments(e.g., Fig. 2). Three metrics, P recision, Recall and Accuracy, which are the mean of P recision(i), Recall(i) and Accuracy(i) respectively, are used for the evaluation. The following equations give their definitions, where TP, TN, FP, FN is true positive, true negative, false positive and false negative, respectively. And i is the index of each subject, and n(i) is the number of total subjects of type i. Twofold cross validation was performed in the following experiments with one folds used for training the model and one fold used for validation. Algorithm 1 is applied to compute the metrics of face recognition. X X P recision(i) = T P (i)/ (T P + F P )(i) (13) i

Recall(i) =

X

i

T P (i)/

i

Accuracy(i) =

X

(T P + F N )(i)

(14)

i

X

(T P + T N )(i)/

i

X i

11

n(i)

(15)

Figure 2: Thermal faces used in experiments of one subject

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

Input:Face Data: C subjects, each subject contains N training face and N testing face Output: Recognition Performance: Accuracy, Recall and Precision Accuracy, Recall, P recision ← 0 for i = 1 : 1 : 20 do For each subject, randomly select I(I = 1, 2, ..., 10) face from training faces Feature extraction & Compute feature weight W & Form A1 , A2 , ···, Av for v type feature for j = 1 : 1 : N ∗ C do Feature extraction & Slove l1 minimization problem w.r.t. A1 , A2 , · · ·, Av Compute final residual & Identify face class end for Compute the Recognition Performance: P recision(i), Recall(i) and Accuracy(i) end for Accuracy ← mean(Accuracy(i)); P recision ← mean(P recision(i)); Recall ← mean(Recall(i)) return Accuracy, P recision and Recall Algorithm 1: Experiments Procedure

12

5.2. Recognition Performance In this experiment, we test the performance of our multi-feature fusion technique with IRIS thermal face data. We first evaluate recognition performance with different numbers of training faces. The number of training faces ranges from 1 to 10. Our method was compared with 4 other different approaches, each of which applies only one feature. As shown in TABLE. 1, our multi-feature fusion technique provides better performance than all the other single feature methods. When the number of training faces reaches 10, the recognition performance achieves 91.5%, 91.8% and 91.1% regarding to precision , recall and accuracy. Table 1: Multi-feature fusion thermal face recognition performance (%)

Recall

Precision

Accuracy

Training Number Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC

1 53.8 49.3 52.5 46.7 26.8 44.6 39.9 40.3 40.5 23.4 63.1 58.6 64.9 52.9 30.3

2 64.9 64.2 65.6 62.3 39.1 60.8 60.8 59.3 58.5 35.3 69.0 67.7 71.9 66.1 42.8

3 75.4 73.3 73.8 71.1 46.3 72.9 71.6 70.5 68.5 43.0 78.0 75.0 77.1 73.5 49.6

4 82.2 79.8 80.5 77.6 53.0 80.9 79.6 78.7 76.2 50.6 83.4 79.9 82.3 79.0 55.3

5 86.8 83.9 84.7 81.4 57.5 86.7 84.2 83.7 80.8 55.6 86.9 83.7 85.6 82.0 59.3

6 87.4 84.6 84.9 83.3 59.2 87.4 85.0 84.6 83.2 57.3 87.4 84.2 85.2 83.5 61.0

7 87.8 85.5 85.8 84.7 61.1 87.9 86.1 85.7 84.8 59.6 87.7 84.9 85.9 84.6 62.5

8 89.7 87.2 88.3 87.2 63.6 90.3 88.0 88.5 87.5 62.7 89.8 86.3 88.1 86.9 64.4

9 90.1 88.6 89.1 88.4 65.1 91.4 89.5 89.3 88.9 63.8 90.6 87.7 88.8 87.9 66.4

5.3. Robustness to Face Expressions We aim to evaluate the robustness of our approach to different face expressions. For this experiment, we used face images from the IRIS thermal/Visible Face Database. Thermal face images for 30 subjects are chosen, 13

10 91.5 89.7 89.6 89.4 66.8 91.8 91.8 90.0 90.1 65.5 91.1 88.7 89.2 88.8 68.1

and there are 22 images for each subject including 3 expressions: surprised, laughing and angry. An example is shown in Fig. 3. Similarly, the number of training faces range from 1 to 10. TABLE. 2 shows the results. We can see that multi-feature fusion technique works the best. When the number of training faces for each subject is increased to 6 or more, the recognition performance including Accuracy, Precision and Recall reaches more than 99.0%.

Angry Expression

Laughing Expression

Surprised Expression

Figure 3: Faces of different expressions for one subject

5.4. Recognition Performance in the Presence of Noise This experiment aims to evaluate the robustness to noise. We also use IRIS thermal face images (including 30 subjects) and the number of training faces for each subject is set to 5. We add Gaussian white noise to the faces before feature extraction. In this experiment, signal to noise ratio (SNR) is used to quantify the noise added to the face image which increases from 3 to 33 with the step 3. The results are shown in TABLE. 3. First, our method exhibits stable performance with around 85% accuracy, precision and recall in spite of different levels of noise, which indicates that noise with SNR ranging from 3 to 33 have nearly the same (small) effect on the performance of multi-feature fusion. The main reason is that Gabor transformation enhances edge, peak, valley and ridge contour information of an image, and thus facial organs, such as nose, eyes and mouth, are highlighted out of the noise. So Gabor feature with SRC plays a key role in providing good recognition performance in the presence of noise. Also note that LBP is much sensitive to the noise and on average has the lowest recognition accuracy. 5.5. Recognition Performance for Different Image Resolutions This experiment aims to evaluate the robustness to different image resolutions. IRIS thermal face data is used in this experiment. Different image 14

Table 2: Recognition performance w.r.t. expressions (%)

Recall

Precision

Accuracy

Training Number Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC

1 77.1 72.9 71.2 64.1 45.7 73.6 69.3 65.8 61.9 42.7 80.5 76.5 76.6 66.4 48.6

2 88.1 85.2 83.8 77.7 61.1 86.5 83.6 81.0 75.9 58.5 89.7 86.8 86.5 79.5 63.6

3 92.7 90.5 88.7 84.8 70.6 91.5 89.6 86.9 83.6 68.4 93.7 91.4 90.5 86.1 72.8

4 97.7 95.2 92.8 90.1 77.8 97.3 94.9 91.9 89.4 76.3 97.8 95.6 93.7 90.7 79.4

5 98.3 97.3 95.2 93.2 82.8 98.7 97.1 94.8 92.8 81.7 98.2 97.5 95.7 93.6 83.8

6 99.1 98.0 96.6 95.0 86.1 99.1 97.9 96.3 94.7 85.3 99.4 98.1 96.8 95.3 86.8

7 99.7 98.9 97.6 96.5 89.4 99.7 98.9 97.4 96.3 88.7 99.6 99.0 97.7 96.7 90.4

8 99.8 99.4 98.2 97.5 91.0 99.8 99.4 98.1 97.3 90.5 99.8 99.4 98.3 97.6 91.4

9 99.8 99.7 98.8 98.2 93.2 99.8 99.7 98.7 98.1 92.9 99.8 99.7 98.9 98.3 93.6

resolutions are 240×320, 120×160, 60×80, 30×40 and 15×20 (the last four resolutions are sampled from the first resolution). The number of training faces for each subject is set to 5 as well. The results are shown in TABLE. 4. It can be seen that multi-feature fusion exhibits quite stable and high performance for different image resolutions. Specifically, it achieves the highest recognition accuracy, precision and recall of 87.4%, 87.3% and 87.5%, respectively, when the resolution is 120×160. Down-sampling feature and Gabor feature are less sensitive to face image resolution, and they still achieve high recognition performance even in lower resolution. While LBP and WLD features show sharp performance degradation with small resolutions. 5.6. Recognition Performance in the Presence of Occlusion We also used our own thermal face data to evaluate recognition performance in the presence of occlusion. We add an occlusion area to a face image by setting the corresponding pixels to black. The ratio of occlusion part to 15

10 99.9 99.9 99.2 98.4 94.5 99.8 99.8 99.2 98.3 94.2 99.8 99.9 99.3 98.5 94.7

Recall

Precision

Accuracy

Table 3: Recognition performance w.r.t. noise (%)

SNR Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC

3 85.3 83.0 43.8 79.9 50.7 85.0 83.3 41.2 78.8 47.2 85.6 82.8 46.3 81.0 54.1

6 83.9 81.8 32.4 78.6 53.1 83.3 81.7 28.7 77.3 50.1 84.4 81.9 36.1 79.7 56.2

9 85.0 82.5 14.9 79.2 55.4 84.6 82.4 13.3 77.9 52.9 85.5 82.5 16.4 80.5 57.8

12 79.9 81.5 12.7 78.9 55.5 78.6 81.2 12.1 77.3 52.7 81.2 81.8 13.4 80.4 58.3

15 85.6 83.5 25.3 79.8 54.4 85.2 83.6 16.9 79.0 52.3 86.1 83.4 33.6 80.5 56.6

18 86.4 83.5 56.2 82.0 54.6 86.2 83.7 41.0 81.5 53.2 86.5 83.4 71.4 82.4 56.1

21 85.7 82.6 72.2 82.2 55.6 85.7 82.6 63.6 81.7 54.1 85.7 82.6 80.7 82.7 57.1

24 84.8 81.5 76.5 81.6 55.1 84.1 81.5 71.5 80.6 52.7 85.4 81.6 81.5 82.5 57.5

27 84.5 82.6 79.6 83.3 53.9 84.3 82.8 96.8 83.0 52.3 84.7 82.5 82.4 83.5 55.7

the whole image range from 5% to 50%. We repeated the experiments 20 times by randomly positioning the occlusion part on different areas of an image. The results are shown in TABLE. 4. Our method outperforms the other single feature recognition methods at all levels of occlusion, correctly identifying over 90% of test thermal faces even when it is up to 50% occlusion, although the performance decrease slightly with the increase of occlusion area. Gabor and Down-sampling features are robust to occlusion: the recognition performance varies slightly when the occlusion area increases, while the performances of LBP and WLD features decrease seriously when the occlusion area increases. 5.7. Recognition Performance w.r.t. l1 -minimization Method At last, we use our own thermal face image data to evaluate how different l1 -minimization methods affect face recognition accuracy. In the above experiments, GPSR Basic [39] method is used to solve l1 -minimization problem. 16

30 86.2 83.3 83.0 85.3 55.9 85.8 83.7 81.8 84.8 53.9 86.5 82.9 84.1 85.8 57.8

90

80

80

70

70

Precision

Accuracy

90

60

50

50

40

30

60

Feature fusion Method Gabor+SRC LBP+SRC WLD+SRC Downsample+SRC

40

30

240*320

120*160

60*80

30*40

15*20

Feature fusion Method Gabor+SRC LBP+SRC WLD+SRC Downsample+SRC 240*320

120*160

60*80

30*40

15*20

Thermal face image resolution (in pixels)

Thermal face image resolution (in pixels)

(a)

(b) 90

80

Recall

70

60

50

40

30

Feature fusion Method Gabor+SRC LBP+SRC WLD+SRC Downsample+SRC 240*320

120*160

60*80

30*40

15*20

Thermal face image resolution (in pixels)

(c) Figure 4: Recognition performance w.r.t. (a)Accuracy; (b)Precision and (c)Recall.

resolution regarding to different metrics:

In this experiment, we use three other methods including GPSR BB [39], TwIST [40], and SpaRSA [41] to solve the l1 -minimization problem in thermal face recognition. Similarly, the number of training faces for each subject are increase from 1 to 10. Results (in TABLE. 5) show that all methods achieve nearly the same performance, which indicates that different l1 -minimization methods have little impact on recognition accuracy.

17

Recall

Precision

Accuracy

Table 4: Recognition performance w.r.t. occlusion (%)

Occlusion(%) Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC Feature Fusion Gabor+SCR LBP+SRC WLD+SRC DS+SRC

5 94.7 93.5 92.9 90.7 84.5 94.2 93.0 92.4 89.9 83.5 95.2 94.0 93.5 91.5 85.4

10 98.3 97.6 95.9 93.0 89.1 98.2 97.5 95.6 92.3 88.3 98.3 97.7 96.2 93.6 89.8

15 95.0 94.5 90.7 86.0 86.6 94.5 94.0 89.2 84.6 85.8 95.5 95.0 92.2 87.3 87.4

20 95.9 95.7 88.8 82.9 88.1 95.6 95.3 86.7 80.9 87.5 96.3 96.1 90.9 85.0 88.6

25 92.7 92.4 82.4 75.6 85.4 92.2 91.8 78.6 71.7 84.5 93.2 92.9 86.2 79.4 86.2

30 87.6 86.9 77.6 71.4 79.5 86.0 85.3 72.0 66.4 78.1 89.1 88.6 83.3 76.5 81.0

35 95.4 95.2 75.4 68.2 86.5 94.8 94.6 66.6 61.3 56.6 95.8 95.7 84.2 75.1 87.4

40 90.2 89.9 66.6 65.7 82.2 88.8 88.5 56.1 57.7 81.2 91.5 91.4 77.1 73.6 83.1

45 93.2 92.8 57.6 65.1 82.9 92.5 92.2 39.7 55.9 81.8 93.9 93.5 75.5 74.3 84.1

6. Discussion & Future Work From the experimental results we can see that our proposed method outperforms methods that use only a single feature, which enables us to combine multiple features in thermal face characterization and recognition. Among four features, the down-sampling feature cannot be recommended. LBP and WLD are good local features, but they are inadequate for different scenarios: LBP is so sensitive to noise that recognition performance is low when images are with low SNR, while both LBP and WLD achieve low recognition performance in low resolution thermal images. When considering the cost of computation, the Gabor feature with SRC is recommended since its recognition performance is slightly lower than that of proposed method. Our work just considers the local features, which represents texture information and regional features, so if other type features (such as global features SIFT, SURF complemented to the critical points information) are added in our fusion framework, the outcome will be effectively improved. Compari18

50 90.3 89.9 50.9 62.2 83.5 89.6 89.2 38.1 54.0 82.3 91.1 90.7 63.8 70.3 84.8

Table 5: Thermal face recognition performance w.r.t. l1 -minimization method

Recall

Precision

Accuracy

Training Number GPSR-Basic GPSR-BB TwIST SpaRSA GPSR-Basic GPSR-BB TwIST SpaRSA GPSR-Basic GPSR-BB TwIST SpaRSA

1 59.5 55.2 59.5 53.8 50.4 43.9 50.5 44.6 68.6 66.4 68.7 63.1

2 71.8 61.8 71.8 64.9 68.5 57.7 68.5 60.8 75.1 66.3 75.2 69.0

3 81.2 78.3 81.2 75.5 79.8 77.9 79.9 72.9 82.6 78.8 82.6 77.9

4 85.1 87.7 85.1 82.1 84.0 87.6 84.1 80.9 86.2 87.7 86.2 83.4

5 86.9 89.2 86.9 86.8 87.1 89.4 87.2 86.7 86.8 88.9 86.8 86.9

6 87.6 88.9 87.6 87.4 87.5 89.1 87.5 87.4 87.8 88.7 87.8 87.4

7 89.0 88.6 89.0 87.8 89.1 88.9 89.1 87.9 88.9 88.3 88.9 87.7

8 89.7 89.1 89.7 90.0 90.1 90.1 90.1 90.3 89.3 89.3 89.3 89.7

9 90.1 91.2 90.1 90.9 90.5 91.7 90.6 91.4 89.7 90.7 89.7 90.5

son of recognition rates for the present method and other commonly referred methods are shown in TABLE 6 for a quick comparison, and all of them use the IRIS thermal face database. The proposed method can be compared favorably against other face recognition methods. Table 6: Comparison of recognition performance (Accuracy) with other Methods

Methods Proposed method Multi-layer Perceptron [42] Visual and Thermal face+Multi-layer Perceptron [43] Visual and Thermal face+PCA [44]

Accuracy 91.5% 84.8% Average:84.2% Average:75.8%

In the future, we plan to further improve our method and consider more complex environments. First, we plan to improve recognition rate, speed by combining more features and explore fusion of what type features can achieve best recognition performance. Second, the method to weight features will be improved so that good performance can be achieved even with a small number of training images. Third, we will conduct more experiments to investigate 19

10 90.7 91.8 90.7 91.4 91.0 92.2 91.1 91.8 90.3 91.5 90.3 90.1

the effect of environmental (e.g., temperature changes), physical (e.g., lack of sleep) and physiological (e.g., fear, stress) conditions on the performance of face recognition. 7. Conclusion Face recognition is an active research field due to its potential use in a wide variety of fields. In this paper, we proposed a multi-feature fusion method to improve the performance of face recognition of thermal face images. The presented method combines four features, including Local binary pattern, Gabor jet descriptor, Weber local descriptor and Down-sampling, for better performance. Experiments show that our method can achieve a very high performance in face recognition and it outperforms methods that leverage only a single feature. Moreover, our method is robust to noise, occlusion, expression, low resolution and different l1 minimization methods. 8. Reference [1] Harin Sellahewa and Sabah Jassim. Image-quality-based adaptive face recognition. IEEE Transactions on Instrumentation and Measurement, 59(4):805–813, 2010. [2] Javier Galbally, Julian Fierrez, Javier Ortega-Garcia, Chris McCool, and Sebastien Marcel. Hill-climbing attack to an eigenface-based face verification system. IEEE International Conference on Biometrics, Identity and Security, Tampa, USA, pages 1–6, 22-23 Sept.,2009. [3] Rodrigo Verschae Gabriel Hermosilla, Javier RuizdelSolar and Mauricio Correa. A comparative study of thermal face recognition methods in unconstrained environments. Pattern Recognition, 45:2445–2459, 2012. [4] M Usman Akram and Aasia Khanum. Retinal images: blood vessel segmentation by threshold probing. Industrial Electronics & Applications (ISIEA), IEEE Symposium on, pages 493–497, 3-6th, October, 2010, Penang, Malaysia. [5] Madasu Hanmandlu Mamta. A new entropy function and a classifier for thermal face recognition. Engineering Applications of Artificial Intelligence, 36:269–286, 2014. 20

[6] Diego A Socolinsky, Lawrence B Wolff, Joshua D Neuheisel, and Christopher K Eveland. Illumination invariant face recognition using thermal infrared imagery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1:I–527, 2001. [7] Diego A Socolinsky and Andrea Selinger. A comparative analysis of face recognition performance with visible and thermal infrared imagery. DTIC Document, 2002. [8] Gabriel Hermosilla, Javier Ruiz-del Solar, and Rodrigo Verschae. An enhanced representation of thermal faces for improving local appearancebased face recognition. Intelligent Automation & Soft Computing, pages 1–12, 2015. [9] Heydi Mendez, Cesar San Martin, Josef Kittler, Yenisel Plasencia, and Edel Garcia-Reyes. Face recognition with lwir imagery using local binary patterns. Advances in Biometrics, pages 327–336, 2009. [10] Ning Wang, Qiong Li, Ahmed A Abd El-Latif, Jialiang Peng, and Xiamu Niu. An enhanced thermal face recognition method based on multiscale complex fusion for gabor coefficients. Multimedia Tools and Applications, 72(3):2339–2358, 2014. [11] Xiaoyuan Zhang, Jucheng Yang, Song Dong, Chao Wang, Yarui Chen, and Chao Wu. Thermal infrared face recognition based on the modified blood perfusion model and improved weber local descriptor. Biometric Recognition, pages 103–110, 2014. [12] Zhihua Xie and Zhengzi Wang. Joint encoding of multi-scale lbp for infrared face recognition. In Genetic and Evolutionary Computing, pages 269–276. Springer, 2015. [13] Gabriel Hermosilla, Javier Ruiz-del Solar, Rodrigo Verschae, and Mauricio Correa. Face recognition using thermal infrared images for humanrobot interaction applications: a comparative study. 6th Latin American Robotics Symposium (LARS), pages 1–7, Valparaiso, 29-30 Oct. 2009. [14] Gabriel Hermosilla, Patricio Loncomilla, and Javier Ruiz-del Solar. Thermal face recognition using local interest points and descriptors for hri applications. RoboCup 2010: Robot Soccer World Cup XIV, pages 25–35, 2011. 21

[15] Junfeng Bai, Yong Ma, Jing Li, Fan Fan, and Hongyuan Wang. Novel averaging window filter for sift in infrared face recognition. Chinese Optics Letters, 9(8):081002–081002, 2011. [16] Junfeng Bai, Yong Ma, Jing Li, Hao Li, Yu Fang, Rui Wang, and Hongyuan Wang. Good match exploration for thermal infrared face recognition based on ywf-sift with multi-scale fusion. Infrared Physics & Technology, 67:91–97, 2014. [17] Pradeep Buddharaju, Ioannis Pavlidis, and Chinmay Manohar. Face recognition beyond the visible spectrum. Advances in Biometrics, pages 157–180, 2008. [18] Siu-Yeung Cho, Lingyu Wang, and Wen Jin Ong. Thermal imprint feature analysis for face recognition. IEEE International Symposium on Industrial Electronics (ISIE), pages 1875–1880, Seoul, 5-8 July 2009. [19] Moulay A Akhloufi and Abdelhakim Bendada. Thermal faceprint: A new thermal face signature extraction for infrared face recognition. In CRV, pages 269–272, 2008. [20] Chun-Fu Lin and Sheng-Fuu Lin. Accuracy enhanced thermal face recognition. Infrared Physics & Technology, 61:200–207, 2013. [21] Ayan Seal, Suranjan Ganguly, Debotosh Bhattacharjee, Mita Nasipuri, and Consuelo Gonzalo-Martin. Feature selection using particle swarm optimization for thermal face recognition. In Applied Computation and Security Systems, pages 25–35. Springer, 2015. [22] Gabriel Hermosilla, Francisco Gallardo, Gonzalo Farias, and Cesar San Martin. Fusion of visible and thermal descriptors using genetic algorithms for face recognition systems. Sensors, 15(8):17944–17962, 2015. [23] Chieh-Li Chen and Bo-Lin Jian. Infrared thermal facial image sequence registration analysis and verification. Infrared Physics & Technology, 69:1–6, 2015. [24] Shuowen Hu, Jonghyun Choi, Alex L Chan, and William Robson Schwartz. Thermal-to-visible face recognition using partial least squares. JOSA A, 32(3):431–442, 2015.

22

[25] Shuowen Hu, Nathaniel J Short, Prudhvi K Gurram, Kristan P Gurton, and Christopher Reale. Mwir-to-visible and lwir-to-visible face recognition using partial least squares and dictionary learning. In Face Recognition Across the Imaging Spectrum, pages 69–90. Springer, 2016. [26] N Pattabhi Ramaiah, Earnest Paul Ijjina, and C Krishna Mohan. Illumination invariant face recognition using convolutional neural networks. In IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES),, pages 1–4. IEEE, 2015. [27] Georgia Koukiou and Vassilis Anastassopoulos. Neural networks for identifying drunk persons using thermal infrared imagery. Forensic science international, 252:69–76, 2015. [28] Guillaume-Alexandre Bilodeau, Atousa Torabi, and Fran¸cois Morin. Visible and infrared image registration using trajectories and composite foreground images. Image and Vision Computing, 29(1):41–50, 2011. [29] Socheat Sonn, Guillaume-Alexandre Bilodeau, and Philippe Galinier. Fast and accurate registration of visible and infrared videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 308–313, June 25-27,2013, Portland, Oregon. [30] Jiayi Ma, Chen Chen, Chang Li, and Jun Huang. Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31:100–109, 2016. [31] Jiayi Ma, Ji Zhao, Yong Ma, and Jinwen Tian. Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3):772–784, 2015. [32] Timo Ahonen, Abdenour Hadid, and Matti Pietikainen. Face description with local binary patterns: Application to face recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(12):2037– 2041, 2006. [33] Matti Pietikainen Timo Ojala and Topi Maenpaa. Multiresolution grayscale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7):971–987, 2002. 23

[34] Jie Zou, Qiang Ji, and George Nagy. A comparative study of local matching approach for face recognition. Image Processing, IEEE Transactions on, 16(10):2617–2628, 2007. [35] Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikainen, Xilin Chen, and Wen Gao. Wld: A robust local image descriptor. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1705–1720, 2010. [36] John Wright, Allen Y Yang, Arvind Ganesh, Shankar S Sastry, and Yi Ma. Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(2):210–227, 2009. [37] Yi Yang, Jingkuan Song, Zi Huang, Zhigang Ma, Nicu Sebe, and Alexander G Hauptmann. Multi-feature fusion via hierarchical regression for multimedia analysis. Multimedia, IEEE Transactions on, 15(3):572–581, 2013. [38] B Abidi. Iris thermal/visible face database. DOE University Research Program in Robotics under grant DOE-DE-FG02-86NE37968, 2007. [39] Mario AT Figueiredo, Robert D Nowak, and Stephen J Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 1(4):586–597, 2007. [40] Massimo Fornasier and Holger Rauhut. Iterative thresholding algorithms. Applied and Computational Harmonic Analysis, 25(2):187–208, 2008. [41] Stephen J Wright, Robert D Nowak, and Mario AT Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009. [42] MK Bhowmik, D Bhattacharjee, M Nasipuri, DK Basu, and M Kundu. Classification of polar-thermal eigenfaces using multilayer perceptron for human face recognition. 3rd IEEE Conference on Industrial and Information System, IIT Kharagpur, India, pages 1–6, 8-10,Dec., 2008.

24

[43] MK Bhowmik, D Bhattacharjee, M Nasipuri, DK Basu, and M Kundu. Optimum fusion of visual and thermal face images for recognition. Sixth International Conference on Information Assurance and Security, pages 311–316, 23-25, Aug., 2010, Atlanta, USA. [44] Camelia Florea Florin Marius Pop, Mihaela Gordan and Aurel Vlaicu. Fusion based approach for thermal and visible face recognition under pose and expresivity variation. 9th Roedunet International Conference, Sibiu, Romania:61–66, 24-26, June, 2010.

25

Highlights 1. A multi-feature fusion technique for thermal face recognition is proposed, which also can be applied in other feature fusion. 2. Proposed thermal face recognition method is robust to noise, occlusion, expression, low resolution and different minimization methods.

3. Proposed method can be compared favorably against other thermal face recognition methods.