Accepted Manuscript
Pose-Robust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy Li-Fang Zhou, Yue-Wei Du, Wei-Sheng Li, Jian-Xun Mi, Xiao Luan PII: DOI: Reference:
S0031-3203(18)30005-0 10.1016/j.patcog.2018.01.003 PR 6412
To appear in:
Pattern Recognition
Received date: Revised date: Accepted date:
6 May 2017 22 October 2017 7 January 2018
Please cite this article as: Li-Fang Zhou, Yue-Wei Du, Wei-Sheng Li, Jian-Xun Mi, Xiao Luan, PoseRobust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy, Pattern Recognition (2018), doi: 10.1016/j.patcog.2018.01.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • A novel LBP-like feature is proposed which takes the contribution of contrast
CR IP T
value into consideration by Huffman coding.
• The Divide-and-Rule strategy is applied to both face representation and classification with the goal of improving the robustness to pose variation.
• Face representation via Region Selection Factor (RSF) is suggested in our method to treat the face images of different poses specifically rather than generally.
AN US
• In order to further make the method tolerate the rotations, we perform the face classification at the patch-level using a patchbased SRC fusion classification
AC
CE
PT
ED
M
strategy.
1
ACCEPTED MANUSCRIPT
Pose-Robust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy
CR IP T
Li-Fang Zhoua,b,∗, Yue-Wei Dua , Wei-Sheng Lia , Jian-Xun Mia , Xiao Luana a Coll.
of Computer Science and Technology, Chongqing Univ. of Posts and Telecommunications, Chongqing 400000, People’s Republic of China b Coll. of software, Chongqing Univ. of Posts and Telecommunications, Chongqing 400000, Peoples Republic of China
AN US
Abstract
Face recognition in harsh environments is an active research topic. As one of the most important challenges, face recognition across pose has received extensive attention. LBP feature has been used widely in face recognition because of its robustness to slight illumination and pose variations. However, due to the way of pattern feature calculation, its effectiveness is limited by the big rotations. In this paper, a new LBP-
M
like feature extraction is proposed which modifies the code rule by Huffman. Besides, a Divide-and-Rule strategy is applied to both face representation and classification,
ED
which aims to improve recognition performance across pose. Extensive experiments on CMU PIE database, FERET database and LFW database are conducted to verify the efficacy of the proposed method. The experimental results show that our method
PT
significantly outperforms other approaches. Keywords: Face recognition across pose, LBP, Huffman, Divide-and-Rule strategy
CE
1. Introduction
Face recognition is of vital importance in person identification due to the rich in-
AC
formation contained in face images. Although frontal face recognition has already reached high accuracy on internationally available databases, face recognition in harsh
5
environments is still a challenging problem because the various variations of the tar∗ Corresponding
author Email address:
[email protected] (Li-Fang Zhou)
Preprint submitted to Pattern Recognition
January 10, 2018
ACCEPTED MANUSCRIPT
get face image are very common in real applications. For example, the intra-personal differences caused by pose variation can even be much larger than inter-personal differences. Therefore, pose variation is identified as one of the prominent unsolved prob-
10
CR IP T
lems in the research of face recognition [1]. Many different methods for solving the problem of pose variation have been proposed, which can be roughly divided into two categories, i.e. 3D and 2D approaches.
Since pose variations are generated by 3D rigid transformations of the face, some researchers believe that using the 3D method to solve this problem is more intuitive and accurate. One type of 3D methods builds the human face 3D model by continuously
evaluating the depth information and fitting the face shape after pose changes. Finally,
AN US
15
pose normalization is realized through 3D transformation. The most representative method is 3D Morphable Model (3DMM) [2]. Another type of methods is the landmark based 3D face model fitting [3–5]. The method in [3] relies on Active Appearance Model (AAM) [6] to fit the landmark, finds the corresponding relationship between 20
the 2D landmark point and the 3D landmark point, and finally synthesizes the multi-
M
pose view to the front view. Recently, Zhu et al. [7] improved the 3DMM by solving the “landmark marching” problem. A more precise correspondence between the 2D
ED
face landmarks and the 3D face landmarks is established, so a high-fidelity pose and expression normalization is realized. Compared with 3D methods, 2D-based methods do not need to rely on 3D informa-
25
PT
tion to identify the subject, so it is easy to implement. One type of 2D methods attempts to find out the relationship among different pose images by regression models, such as
CE
Locally Linear Regression (LLR) [8] and Orthogonal Procrustes Regression (OPR) [9]. In 2016, Tai et al. [9] applied OPR to multi-pose face recognition, which uses orthog-
30
onal procrustes analysis to find the optimal linear transformation between two images
AC
with different poses. At the same time, they proposed a stacked OPR model to solve the problem of highly nonlinear pose change. A large number of 2D-based approaches also normalize the different pose face images to a unified view by reconstructing the virtual view so that the traditional face recognition algorithm can be used. For exam-
35
ple, the method proposed in [10] uses Markov Random Fields (MRFS) to reconstruct a virtual frontal face from a non-frontal face image. In [11], a method based on the 3
ACCEPTED MANUSCRIPT
sparse representation is proposed to achieve a perfect pose-normalization by transferring the representation coefficients and searching the view-dictionary, but it requires a complex training process. In recent years, deep learning has been widely used in pattern recognition and image processing [12–18]. In particular, with the development
CR IP T
40
of high-performance computing, training neural network using a large amount of data
turns into reality. In [16], a Stacked Progressive Auto-Encoders (SPAE) is proposed for
converting non-frontal images to frontal images and learning a pose-robust feature by progressively modeling the complex nonlinear transformation using a deep network. 45
Similarly, in [17], a deep network containing the reconstructed layer and the feature
AN US
extraction layers is designed to learn a face identity-preserving feature from images of
different views. Recently, Schroff et al. [18] adopted a deep convolutional network (FaceNet) to learn a mapping from face images to a compact Euclidean space, and then the distance in the space is used to measure the similarity of face images. Although 50
the deep network has achieved perfect performance in face recognition, it needs largescale data to train and costs many computation resources. Additionally, extracting
M
pose-robust features could be another way, such as LGBP [19], Elastic Bunch Graph Matching (EBGM) [20], Orthogonal Discriminant Vector (ODV) [21], Subspace Point
ED
Representation (SPR) and Layered Canonical Correlated (LCC) [22], etc. Most methods based on holistic approaches are very sensitive to pose variations
55
[1], because the depth rotations of 3D faces almost always lead to image pixel mis-
PT
alignment which is the main classification basis for these methods. In contrast, local methods work only or mainly in a set of discrete regions or points in the images and
CE
pattern features are extracted from an isolated region in the face image. Hence, the 60
local feature extraction methods can tolerate small pose variations. One successful local face descriptor is Local Binary Patterns (LBP) [23], which does not require exact
AC
pattern locations but relies only on the histogram of feature in a region. As a result, LBP can alleviate small rotations and performance perfectly when the pose variations are less than 15◦ .
65
Among the methods based on LBP are the Extended LBP operator (ELBP) [24],
Local Ternary Pattern (LTP) [25], Local Adaptive Ternary Pattern (LATP) [26], Local Multiple Layer Contrast Pattern (LMLCP) [27] and Local Multiple Patterns (LMP) 4
ACCEPTED MANUSCRIPT
[28]. Huang et al. [29] developed a novel local descriptor called Binary Gradient Patterns (BGP). It evaluates relationships between local pixels in the image gradient 70
level and encodes the potential local structures into a string of binary values. In [30],
CR IP T
Scale Selective Local Binary Patterns (SSLBP) is proposed to handle the scale variation by capturing the scale invariant feature from different image scale spaces.
In [31], LBP feature of the image rather than the pixel value is used in Sparse Representation-based Classification (SRC) [32], making the recognition system more 75
robust to illumination variations. Chan et al. [33] argued that the LBP values are only converted from a string of binary labels, but not conventional numerical values, so it is
AN US
unreasonable to linearly combine LBP directly with SRC. Based on this observation,
LBP histogram features of image are used to SRC framework in their work. In [34], the LBP histogram based χ2 kernel is applied to the SRC, and then the Kernel Coordi80
nate Descent (KCD) algorithm is proposed to solve the LASSO problem in the kernel space, which effectively uses the kernel method to discover the nonlinear transformation relationship between the gallery samples and test samples. However, in all these
M
studies, the samples entered into the SRC algorithm are completely represented by the holistic histogram feature, which does not work well in the face recognition under pose variations, since the holistic appearance is very sensitive to change of pose.
ED
85
This paper proposes a novel local facial descriptor called Huffman-LBP to extract face features. Meanwhile, a Divide-and-Rule strategy is adopted to improve the ro-
PT
bustness of multi-pose face recognition. Compared with other existing popular poseinvariant face recognition methods, the proposed method is more natural and more consistent with the human cognitive mechanism, that is, the virtual view reconstruction is
CE
90
not required in the process of face recognition under pose variation, thus avoiding the problem that the recognition performance would be affected due to image distortion.
AC
The contribution of the work is as follows:
95
• A face detection operation is suggested to locate the face so that the background
can be removed. Meanwhile, a Face alignment method called TCDCN [35] is used to detect the landmarks with the purpose of extracting pattern features from the corresponding points associated with the training image.
5
ACCEPTED MANUSCRIPT
• The Huffman coding is utilized to evaluate the weights of contrast value in the neighborhood, so that the discriminative power can be improved. Furthermore, there is no parameter tuning problem in Huffman-LBP compared to other LBP-
100
CR IP T
based methods. • Motivated by the MECE principle (Mutually Exclusive Collectively Exhaustive)
[36], face representation and classification based on Divide-and-Rule strategy
are proposed here: first, face representation via Region Selection Factor (RSF) is suggested in our method to treat the face images of different poses specifi-
105
cally rather than generally; then, we perform the face classification at the patchmethod tolerate the rotations.
AN US
level using a patch-based SRC fusion classification strategy to further make the
• Extensive experiments illustrate the applicability of the proposed approach in
multi-pose face recognition. In particular, the above issues are proved effectively
110
2. Related work
M
by the experiment in section 4.3.
ED
In this section, some related researches on both face representation and face classification are briefly reviewed. We first review the Local Binary Patterns based face 115
representation in 2.1. Then we give a brief introduction of the sparse representation
PT
based face classification in 2.2.
CE
2.1. Local Binary Patterns based face representation LBP was first proposed by Ojala et al. [23], and then Ahonen [37] applied it to the
field of face recognition. The main idea of LBP is to convert the gray scale image into a feature vector composed of binary codes by judging the relationship (positive and
AC
120
negative) of the pixel value difference between the surrounding pixels and the central
pixel in the circular neighborhood. As long as the external factor does not change the positive and negative relationships of pixel value difference between the central pixel and the surrounding pixels, the representation of LBP to the face image is robust. In a
6
ACCEPTED MANUSCRIPT
125
pixel neighborhood, the LBP value of the center pixel can be calculated by: LBPP,R =
P−1 X t=0
s(gt − gc )2t
(1)
CR IP T
where gc and gt refer to the gray value of the center pixel and P surrounding pixels in a image neighborhood of radius R, separately. s corresponds to a thresholding function defined as follows:
1, i f x > 0 s(x) = 0, otherwise
(2)
The LBP values of the pixels in the whole image are counted to get the histogram fea130
tures. The final classification results can be determined by calculating and comparing
AN US
the histogram similarities.
2.2. Sparse representation based face classification
Let A = {A1 , A2 , ..., Ak } be a training set, which consists of k classes of face images,
where Ai = {vi,1 , vi,2 , ..., vi,n } ∈ Rm×n , for a test image y ∈ Rm belonging to the class i, which can be linearly represented by the training images from class i:
M
135
y = ai,1 vi,1 + ai,2 vi,2 + ... + ai,n vi,n
(3)
ED
Since the training set A includes all classes of images, Eq.(3) can be rewritten as follows:
y = Ax0 ∈ Rm
(4)
In the ideal case, y is only a linear combination of the training images from class i.
140
PT
That is, in the coefficient vector x0 , most coefficients are zero except for the coefficients associated with the class i: x0 = {0, ..., 0, ai,1 , ai,2 , ..., ai,n , 0..., 0}T . To find the sparsest
CE
solution of Eq.(4) then equals to solve the following optimization equation: xˆ0 = arg min kxk0
subject to
Ax = y
(5)
However, solving the l0 -minimization of an underdetermined system of linear equa-
AC
tions is NP-hard [38]. In [39], the author proposed a theory: if the solution x0 of the l0 -minimization is sufficiently sparse, then the solution of l0 -minimization is equivalent
145
to the solution of l1 -minimization: xˆ1 = arg min kxk1
subject to kAx − yk2 ≤ ε
7
(6)
ACCEPTED MANUSCRIPT
Where y and the columns of A have unit l2 -norm. Finally, the classification result can be determined by sorting the residuals: ri (y) = ky − Aδi ( xˆ1 )k2
(7)
150
CR IP T
indentity(y) = argmini ri (y) (8) Where, δi is a function that can pick out the coefficients associated with the class i. 3. Methodology
Our proposed algorithm for multi-pose face recognition consists of three main
AN US
steps: face preprocessing, feature extraction via Huffman-LBP, face representation and
classification using Divide-and-Rule strategy. Fig. 1 shows the overall face recognition flow of proposed method. Face Preprocessing
5LJKW 3DWFKHV
9
Ă
Ă 9
)DFH 'HWHFWLRQ
,QWHUHVW 3RLQWV 'HWHFWLRQ
M
/HIW 3DWFKHV
3UREH,PDJHV
Classification
Face Representation
*DOOHU\,PDJHV
9
Ă
5HVLGXDO
Ă
6XP
5HVLGXDO
5HVXOW
6XP
5HVLGXDO
5HVXOW
6XP
5HVLGXDO
5HVXOW
5HVLGXDO
9 9
Ă
Ă 9
3DWFK ([WUDFWLRQ
65&
5HVLGXDO
Ă Ă Ă
5HVLGXDO 5HVLGXDO 5HVLGXDO
9
Ă
Ă
Ă
Ă
Ă
Ă
56)
ED
Ă
9 9 9
5HVLGXDO
Ă
5HVLGXDO
9 9
PT
Huffman-LBP
Fig. 1. The flow chart of the proposed algorithm.
3.1. Face Preprocessing
CE
155
One key step for face recognition is face detection. The task of face detection is
AC
to locate the position of face in the image. Specifically, the image background will be removed so that the recognition performance can be improved. Although great improvements have been made for face detection, it is still a challenge because the ex-
160
ternal disturbance factors in the image are very common, such as variations in rotation, pose and occlusion. Thus a new approach based on the landmark localization to detect face image is developed in [40] by us. Furthermore, the proposed histogram of sparse 8
ACCEPTED MANUSCRIPT
code-based method is very effective because it can capture global elastic and multiview deformation. The face detection results of our previous work on three popular 165
face databases (FERET, CMU PIE and LFW) are shown in Fig. 2. Hence, this method
CR IP T
is used as the face preprocessing way in this paper.
(b)
(a)
(c)
AN US
Fig. 2. The face detection results. (a) FERET. (b) CMU PIE. (c) LFW.
In order to improve the robustness to facial landmark detection under severe occlusions and large pose variations, a particular type of CNN named Tasks-Constrained Deep Convolutional Network (TCDCN) is used [35]. This TCDCN optimizes facial 170
landmark detection together with heterogeneous but subtly correlated tasks. A taskconstrained loss function is formulated to allow the errors of relevant tasks to be back-
M
propagated together with the purpose of improving the effectiveness and generalization of face landmark detection. In order to address the problem that different tasks have
175
ED
different convergence rates and learning difficulties, a task-wise early stopping criterion is devised to facilitate learning convergence. Since most of the points on the contour will move with pose variations, after landmark detection using TCDCN, we select the
PT
key landmarks around the face organs (eyebrows, eyes, nose, and mouth) as the interest points. Fig. 3 illustrates the preprocessing process.
CE
3.2. Huffman-LBP
To achieve invariance with respect to any monotonic transformation of the gray
180
AC
scale, only the signs of the contrast value are considered by LBP [37]. Unfortunately, this property of LBP sometimes leads to unexpected confusion. As shown in Fig. 4, two different sets of image textures get the same result after being encoded by LBP operator. To the best of our knowledge, the Huffman coding is usually used for lossless
185
data compression [41]. However, few researchers have applied Huffman coding to feature extraction. To handle the problem of losing texture information from LBP, this 9
ACCEPTED MANUSCRIPT
)DFH'HWHFWLRQ
CR IP T
/DQGPDUN 'HWHFWLRQ
.H\3RLQWV 6HOHFWLRQ
Fig. 3. The process of face preprocessing.
AN US
paper first adopts Huffman coding to weight the contrast value so that abundant texture information can be supplemented. The novel method is called Huffman-LBP. ķ
gc
M
LBP8ˈ1 71
s
ED
s
gc
gc
H _ LBP8,1
ķ
77
H _ LBP8,1 = 8
+XIIPDQ/%3
ĸ
H _ LBP8,1+ = 40
H _ LBP8,1 = 10
ĸ
Fig. 4. Two different sets of image textures are encoded by LBP and Huffman-LBP respectively.
190
PT
Huffman coding uses variable-length codewords to encode source symbols. The coding string is determined according to the probability (frequency) of each symbol.
CE
Symbols with larger frequency will be represented by fewer bits. In other words, in the Huffman tree, the frequency of the leaf node nearer to the root node is smaller, and the frequency of the leaf node farther from the root node is larger. Moreover, the length of
AC
the Huffman code for each leaf node is consistent with the distance between the leaf
195
node and the root node. Inspired by this, we construct the Huffman tree with regarding the absolute value of gt − gc (t = 0, 1, . . . , p − 1) as the frequency of each leaf node, and
get the corresponding Huffman code. According to the code length, we can measure the weight of the contrast value. The value of the Huffman-LBP of a pixel is given by:
10
ACCEPTED MANUSCRIPT
H LBP+P,R = round(
P−1 X
s+ (t)w+ (t)2t )
H LBP−P,R = round(
t=0
s− (t)w− (t)2t )
(9)
t=0
Where s+ and s− are the thresholding functions: 1, gt − gc ≥ 0 t = 0, 1, ..., P − 1 s+ (t) = 0, Otherwise
CR IP T
200
P−1 X
1, gt − gc < 0 s+ (t) = t = 0, 1, ..., P − 1 0, Otherwise
(10)
In Eq. (10), gc and gt refer to the gray value of the center pixel and P (in this paper, the
AN US
value of P is fixed at 8)surrounding pixels in a image neighborhood of radius R. In Eq. (9), w+ and w− define the weighting functions as follows: 1 1 length(c ) length(c t t) , t ∈ index+ , t ∈ index− P P n m + − 1 1 w (t) = w (t) = i=1 length(cli ) i=1 length(cki ) 0, Otherwise 0, Otherwise
(11)
205
M
Where length(c) is the length of the code C, index+ and index− are the location index sets of surrounding pixels, defined as follows:
ED
index+ = {t|gt −gc ≥ 0} = {l1 , l2 , ...ln }
index− = {t|gt −gc < 0} = {k1 , k2 , ...km } (12)
ct (t ∈ index+ ) and ct (t ∈ index− ) are the Huffman codes of {|gt − gc | |t ∈ index+ } and
PT
{|gt − gc | |t ∈ index− } respectively:
CE
(cl1 , cl2 , ..., cln ) = Hu f f man coding{|gt − gc | |t ∈ index+ }
(ck1 , ck2 , ..., ckm ) = Hu f f man coding{|gt − gc | |t ∈ index− }
(13)
Fig. 5 demonstrates each step of getting the Huffman-LBP feature value. From the
Huffman-LBP coding process, we can see that Huffman-LBP contains both positive and negative values, which represent the sign information of contrast value. What’s
AC
210
more, a relatively precise weight relationship between surrounding pixels can be measured by Huffman encoding. With the novel encoding rule, the sign of the contrast value is no longer the only encoding object, the magnitude of the contrast value will also play a role in the encoding process. As can be seen from Fig.4, the discrimina-
11
ACCEPTED MANUSCRIPT
215
tion ability of the LBP can be improved by supplementing the weight information of the contrast value. Although two different sets of image textures have the same binary coding (s+ and s− ), with Huffman-LBP encoding, they get different feature value.
CR IP T
Furthermore, some LBP improvements (e.g. LTP [25], LMLCP [27] and LMP [28] can achieve a better recognition performance by supplementing a evaluation of 220
contrast value magnitude during encoding process. However, a common problem of the parameter optimal setting has to be considered. For example, the selection of LTP
threshold and the setting of LMLCP layer number will seriously affect the final recognition results. On the contrary, the weight of contrast value is evaluated automatically
AN US
by Huffman coding in this paper. In other words, the key advantage of Huffman coding lies in its non-interactive property, which means it can work in a flexible way.
Huffman _ coding{ gt g c | t index }
gt gc
s
g c
s
gc
gc
ED
w
gc
H _ LBP8,1 =round(0.1538* 2 0.2307 *32
0.1538*64 0.4615*128)=77
gc
0.4615* 8 0.2307 *16)=8
H _ LBP8,1 =round(0.1538*1 0.1538* 4
PT
M
(cl1 , cl2 ,..., cln )
w
CE
(ck1 , ck2 ,..., ckm )
225
Huffman _ coding{ gt g c | t index }
Fig. 5. The process of getting Huffman-LBP value.
Finally, the Huffman-LBP histogram can be obtained by accumulating the Huffman-
AC
LBP value of the pixels, and then it will be used as pattern feature to classify the face images. 3.3. Divide-and-Rule strategy for multi-pose face recognition
230
The MECE principle is popular in the business mapping processes due to its effectiveness and reasonableness. The main idea of MECE is to grasp the core of the 12
ACCEPTED MANUSCRIPT
problem effectively and find a solution through the non-overlapping and non-omission division. Inspired by this, we adopt a Divide-and-Rule strategy in the process of face representation and classification to decompose the problem of pose variation into multiple sub-problems.
CR IP T
235
3.3.1. Divide-and-Rule face representation via Region Selection Factor (RSF)
If the rotation angle is too large, some regions in the 2D face image may move to the face silhouette and become invisible. Even if some non-occluded face regions may be useless for recognition due to deformation. The location of the region with high
fidelity in the image may change with the yaw direction and angle. Therefore, it is
AN US
240
quite obvious that the face images of different poses should be treated specifically. In order to generate the different representations for the face images under different poses, the Region Selection Factor (RSF) is proposed to classify the face image. Specifically, the whole face is divided into two regions (left and right) by connecting the tip 245
of the nose and the center of eyebrows. According to the result of face detection, the
M
locations of the sideburns and the center of the eyes can be determined. Subsequently, we compute the distance (L1 ) of right eye to right sideburn and the distance (L2 ) of left
CE
250
PT
ED
eye to left sideburn respectively. Finally, the value of RSF is calculated by Eq. (14). 1 < ⇒ Extract the left features α L1 1 RS F = (14) ≤ RS F ≤ α ⇒ Extract the whole features L2 α > α ⇒ Extract the right features 1 Where α is the threshold of RSF, RS F < represents that the left face region is useful α for recognition and the features in the left-side face region will be extracted to represent
AC
the face. Similarly, the features in the right-side face region should be extracted when 1 RSF >α. Supposing ≤ RS F ≤ α, then the features in the whole face region will be α extracted. Therefore, the face image will be divided into three channels according to
255
RSF. As we know, image patches have become a popular way to represent images re-
cently [10, 42]. In some sense, image patches are optimal when the purpose is to find the correspondences between two samples with appearance variations. Obviously, the
13
ACCEPTED MANUSCRIPT
holistic structure of face will change unpredictably when the facial pose changes. That 260
is to say, the variation in the intra-class is sometimes greater than the variation in the inter-class. According to the observations, the local structure of face image should be
CR IP T
considered under pose variation. Therefore, we propose to select the local patch of face
AN US
image to extract feature.
Fig. 6. The labels of interest points in the front face.
265
M
Specifically, we select the key landmarks around the facial organs as the interest points to represent the entire face. As shown in Fig. 6, there are 42 interest points on
ED
the whole face, 25 interest points on the twin sides of the face region respectively (including 8 points on the dividing line). We extract fixed-size image patches centered on the interest points. Moreover, considering the situation that spatial information of each
270
PT
patch may be important, each patch is divided into several non-overlapping blocks, and a histogram feature is extracted from each block via Huffman-LBP. The histogram fea-
CE
tures of all the blocks in the same patch are then concatenated into one vector. Finally,
AC
a feature pool is obtained by combining feature vector of all patches in the selected face n o region. The feature pool V j , j = 1, 2, . . . , 25 represents the features on the right-side n o n o face, and similarly V j , j = 18, 19, . . . , 42 for the left-side and V j , j = 1, 2, . . . , 42 for
275
the entire face, where Vj = [h1 ; h2 ; . . . ; hk ], hm is the Huffman-LBP histogram vector for block m in the patch j, and k is the number of blocks in each patch. The workflow of the Divide-and-Rule face representation is shown in Fig. 7. Obviously, α is vital to the dividing result of face samples, which will strongly affect the final recognition results. Here the maximal accuracy rate acts as the selection 14
/
/
RS F
CR IP T
ACCEPTED MANUSCRIPT
L1 !D L2
3RVLWLRQ(\HV DQG7HPSOHV 2ULJLQDO ,PDJH
&DOFXODWH/HQJWK 2I/$QG/
AN US
([WUDFWWKHULJKWIHDWXUHV
Fig. 7. The workflow of Divide-and-Rule face representation via Region Selection Factor. 100 99
97
M
96 95 94 93
ED
Recognition Rate(%)
98
92 91
1.4
1.9
2.4
2.9
3.4
3.9
4.4
4.9
5.4
5.9
6.4
6.9
7.4
7.9
PT
90
Fig. 8. The Relationship between the value of α and recognition rate in FERET database.
criteria of parameter α. Specifically, we discuss the relationship between the value
CE
280
of α and the recognition rate in FERET database (the experimental setting in here is the same as the experimental setting of the Parameters Discussion in section 4.2). As
AC
shown in Fig. 8, the algorithm can achieve the maximal recognition rate of 96.83% when α is 2.9. Considering that the ratio of the geometric distance of the face structure
285
is generic, which does not change with the database, so we fix the value of α at 2.9 in this paper.
15
ACCEPTED MANUSCRIPT
3.3.2. Patch-based SRC fusion classification Sparse representation based classification (SRC) has been demonstrated to be superior to nearest neighbor (NN) and nearest subspace (NS) based classifiers in various subspaces (e.g. PCA or LDA). As described in section 2.2, the success of the SRC de-
CR IP T
290
pends on that the test image can be linearly represented by the gallery images from the same class. However, it is not always the case in realistic face recognition applications.
In practice, the number of images in the gallery is often limited. In order to tackle the
small sample problem, the works in [31, 33, 34] adopted SRC in feature subspace for 295
face recognition. Motivated by this viewpoint, we have made great efforts to construct
AN US
the feature subspace. First, this paper has employed Huffman concept to extract multiscale face feature (H LBP+ and H LBP− ). What’s more, the patch-based SRC fusion classification strategy is proposed here. One common observation/assumption is that pose variation will lead to the unpredictable changes of holistic appearance. There300
fore, it is important to extract robust partial feature. As shown in Fig. 1, the idea of Divide-and-Rule is still adopted in here. we model a face as a collection of patches.
M
For each of patches, SRC is applied for classification independently. The class of test image is decided based on the fusion of classification results of all patches. By adopt-
305
ED
ing the strategy, the damaged patches caused by extreme pose variation will not affect the overall classification result.
More specifically, we use the histograms of patches in the same interest point of
PT
the gallery images to form a sub-dictionary, so that, we can get many sub-dictionaries: Dm , Dm+1 , . . . Dn , n − m is the number of patches in the gallery image, where Di =
CE
[v1,i , v2,i , ..., v p,i ], here v j,i is the Huffman-LBP histogram vector for patch i in the image 310
of class j. p is the number of sample classes. For each of the test image, a feature pool {v s , v s+1 , ..., vz } is obtained by the Divide-and-Rule face representation, where {s, s +
AC
1, ..., z} are the labels of the extracted patches corresponding to the interest points, and
z − s is the number of extracted patches for the test image. For each patch, a residual
vector is obtained by using SRC. Finally, the residual vectors of each patch are summed
315
to construct an augmented residual for the classification decision. The classifier details are described in Algorithm 1.
16
ACCEPTED MANUSCRIPT
Algorithm 1 The Patch-based SRC Fusion Classification Strategy.
AN US
CR IP T
Input: the sub-dictionaries: {Dm , Dm+1 . . . Dn }, the feature pool of test image y: {v s , v s+1 , ..., vz } Output: indentity (y) for i = max (m, s) to min (n, s) do 1. Normalize vi and the columns of Di to have unit l2 -norm; 2. Solve the l1 -minimization problem: subject to kDi x − vi k2 ≤ ε xˆ 1 = arg min kxk1 3. Compute the residuals by:
r j (vi ) =
vi − Di δ j ( xˆ1 )
2 for j = 1, . . . , p, and δ j is a function that can pick out the coefficients associated with the class i. end for min(n,z) X indentity (y) = argmin j r (v ) j i i=max(m,s)
4. Experiments and Results
In this section, we conduct extensive experiments on three popular public face
320
M
databases: FERET [43], CMU PIE [44] and Labeled Faces in the Wild (LFW) [45] databases. A brief description of the databases is given in Section 4.1. Next, we intro-
ED
duce the experimental settings and parameters discussion in Section 4.2. The contribution of proposed enhancements is evaluated in Section 4.3. The experimental results of face recognition compared with existing methods can be found in Section 4.4. Section
325
PT
4.5 further compares the computation time of proposed method with several state-ofthe-art approaches. Finally, the discussion of experiments are presented in Section 4.6.
CE
4.1. Databases
(1) The FERET database.
AC
The FERET database is one of the most widely used benchmark for face recog-
330
nition methods [46]. It contains more than 14000 gray-scale face images with different viewpoints, which are divided into several subsets for different research purposes. In this paper, we use the pose subset to evaluate the performance of the proposed algorithm to cross-pose face recognition. The pose subset of the FERET database contains 200 individuals with 9 poses for each individual, these poses are 17
ACCEPTED MANUSCRIPT
ba (front face image), bb (+60◦ ), bc (+40◦ ), bd (+25◦ ), be (+15◦ ), b f (−15◦ ), bg (−25◦ ), bh (−40◦ ) and bi (−60◦ ), the examples of face images under these poses
335
are shown in Fig. 9.
CR IP T
(2) The CMU PIE database.
The CMU PIE database was designed by Carnegie Mellon University in 2000 to evaluate the performance of face recognition algorithms. It contains 41,368 images
of 68 individuals. Each individual in the database include 13 different poses, 43 dif-
340
ferent illumination conditions and 4 different expressions. In the multi-viewpoints
AN US
of each individual, there are 7 poses varies in the horizontal direction, which are taken by the cameras 02, 37, 05, 27, 29, 11 and 14 respectively. The examples of face images with neutral expression and frontal illumination in these subsets are shown in Fig. 9.
345
(3) The Labeled Faces in the Wild (LFW) database.
The LFW database was created for researching the problem of face recognition
M
under unconstrained environment and has become an academic (industry) performance evaluation benchmark. The data set consists of 13233 samples of 5749 subjects collected from the web. There are 1680 subjects with two or more dif-
ED
350
ferent photos in the database. All the images in the database were normalized to 250 × 250 pixels by cutting and resizing. There are some variations in illumination,
PT
pose, expression, makeup, and age among the images in the database to simulate
AC
CE
real faces in the unrestricted environments.
EL
EK
EI
ED
EH
EF
EE
-65q
-60q
-45q
-30q
-15q
0q
+15q
+30q
+45q
+60q
+65q
q
q
q
q
q
q
q
q
q
q
+65q
-65
-60
-45
&
EJ
-30
&
-15
0
&
&
EG
+15
&
+30
&
+45
+60
&
Fig. 9. Face image examples with different viewpoints from the FERET database(top) and CMU PIE database(bottom).
18
ACCEPTED MANUSCRIPT
355
4.2. Experimental Settings and Parameters Discussion In order to simulate the practical face recognition system, in this paper, each individual in the gallery only stores one face image. The image size of the three databases
CR IP T
are 256×383 pixels (FERET), 640×486 pixels (CMU PIE) and 250×250 pixels (LFW)
respectively. We use the subset of FERET containing 14000 face images from seven 360
poses (front: ba, non-front: bc, bd, be, b f , bg and bh ) of 200 different individuals and the subset of CMU PIE containing 340 face images from five poses (front: C27, nonfront: C11, C29, C05 and C37 ) of 68 different individuals, both of which are divided
into two halves; one for algorithm parameters discussion and the other for algorithm
AN US
performance test.
In this subsection, we conduct the experiments with an unknown probe pose so that
365
we can discuss the parameters objectively and find the relationship between parameters and the recognition result. More specifically, for FERFT database, we set the front face images in subset ba as gallery images, and the non-frontal face images in bc, bd, be, b f , bg and bh subsets as a large test set (contains 600 images of 100 people, each individual having 6 different pose images). Similar to the setting of FERET database,
M
370
we set the front face images in subset C27 of CMU PIE database as gallery images, the
ED
non-frontal face images in four subsets (C11, C29, C05 and C37) as a large test set. 100
99 98 97
AC
Recognition Rate(%)
CE
PT
96 95 94 93 92 91 90 89 88 87 86 85 84 83 82
23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Patch Size
Fig. 10. The relationship between the value of patch size and recognition rate.
(1) Patch size 19
ACCEPTED MANUSCRIPT
In fact, the size of the image patches around the interest points is related to the amount of the information contained. If the size is too small, it will result in loss
375
of information. If the size is too large, it will lead to information redundancy, so
CR IP T
it is important to choose an appropriate size for image patches. Fig. 10. shows that performance arrives its peak when we set the patch size to 43 in the FERET
database. Additionally, it is considered that the optimal image patch size may vary according to the size of face images. In order to estimate the corresponding patch
380
size for the different databases, the ratio r of the distance between the eye centers to the side length of patch is proposed here: m(Deyes ) Lpatch
AN US
r=
(15)
Where m(Deyes ) is the mean distance between the eye centers of the front face images in the database, Lpatch is the side length of the image patch. Through statistics and calculation, we can conclude that m(Deyes ) of FERET are 68.85 pixels, so that
385
r = 68.85/43 ≈ 1.60. At the same time, we can get the m(Deyes ) of CMU PIE is
M
82 .30 pixels, so we choose L patch = 82.30/1.60 ≈ 51 pixels as the side length of
the image patch for the CMU PIE database. Similarly, we can get the L patch for the
AC
CE
PT
ED
LFW database are 26 pixels.
Fig. 11. The relationship between the number of blocks in each patch and recognition rate on the FERET database(left) and CMU PIE database(right). NBR and NBC corresponds to the number of blocks in the row and column of the image patch, respectively.
390
(2) The number of blocks in each patch In order to remain the spatial information, we divided each patch into several non20
ACCEPTED MANUSCRIPT
overlapping blocks, the number of blocks in each patch determines the distribution of spatial information and is related to the dimensions of the histogram features. Fig. 11 shows the relationship between the number of blocks and recognition rate
CR IP T
in the FERET database and CMU PIE database, respectively. As can be seen from
395
Fig. 11, the algorithm gets the highest recognition rate in FERET and CMU PIE when the number of blocks is 3 × 4 and 2 × 2 (2 × 3) respectively. Considering
that too many blocks in the patch will result in sparse histogram and increase the
cost of computation, 3 × 4 and 2 × 2 are selected for FERET and CMU PIE in the following experiments, respectively.
400
AN US
(3) The radius for Huffman-LBP
The radius selection of Huffman-LBP will determine how much of the local texture information is captured, so it is very important. We explored the influence of radius variations on algorithm recognition rate. According to the data in Table 1, we can find that the most appropriate radius for the FIRET database and CMU PIE
405
M
database are 2 and 3, respectively.
Table 1 The recognition rates at different radiuses in the FERET database and CMU PIE database. Radius
ED
Database
2
3
95.17 94.85
96.83 98.53
96.50 100
PT
FERET CMU PIE
1
CE
4.3. Contribution of Enhancements In this subsection, the contribution of three enhancements proposed in this paper is
AC
evaluated by the following experiments:
410
(1) FP+LBP+SRC: face recognition performing only face preprocessing (as described in Section 3.1 and shown in Fig. 3.) and LBP (the feature histogram is extracted from the patches with fixed size centered on the interest points which is same as described in Section 3.3.1), without using RSF, and the final identify results is produced by original SRC (as described in Section 2.2). 21
ACCEPTED MANUSCRIPT
415
(2) FP+LBP+RSF+SRC: adding Region Selection Factor(RSF) (as described in Section 3.3.1) to the method setting of (1). (3) FP+Huffman-LBP+RSF+SRC: replacing LBP in (2) with the proposed Huffman-
CR IP T
LBP(as described in Section 3.2).
(4) FP+Huffman-LBP+RSF+Patch-based SRC: replacing the original SRC in (3) with the proposed Patch-based SRC (as described in Section 3.3.2).
420
100 99 98
AN US
97
Recognition Rate
96 95 94 93 92 91 90
FP+LBP+SRC
89
FP+LBP+RSF+SRC
FP+Huffman−LBP+RSF+SRC
88
86 85
bc
M
FP+Huffman−LBP+RSF+Patch−based SRC
87
bd
be
bf
bg
bh
ED
Probe Image Set
Fig. 12. Performance comparison versus reference image sets under 6 distinct poses on the FERET database.
Each experiment ((2)-(4)) was performed with only one additional improvement
PT
compared to previous experiment so that the contribution of improvements can be measured well. The experiments were conducted on the FERET database. The gallery
CE
consists of the frontal face images from the ba subset, and the face images from the bc, bd, be, b f , bg and bh subsets are set as test images, respectively.
AC
Table 2 Average recognition rates of different variants of our method on FERET Database. Method
FP+LBP+SRC
Avg.(%)
96.17
FP+LBP+RSF FP+Huffman-LBP FP+Huffman-LBP +SRC +RSF+SRC +RSF+Patch-based SRC 97.00
98.17
425
22
98.83
ACCEPTED MANUSCRIPT
As can be seen from Fig. 12, for the test images with near-frontal poses (−15◦ to +15◦ ), all methods can get a perfect result of 100%. However, the recognition rates decrease significantly for the big yaw angles (25◦ or more), and the differences among
430
CR IP T
the different variants of proposed method become apparent. For example, when the images with pose variation of +40◦ are recognized using FP+LBP+RSF+SRC, the face recognition rate is 87%. Replacing the LBP with Huffman-LBP (FP+HuffmanLBP+RSF+SRC), results improve to 96%. This improvement can be explained be-
cause with Huffman-LBP, the representation ability of the face local texture is improved. We can see from Table 2, although the average recognition rate obtained with
FP+LBP+RSF+SRC is only 0.83% higher than the result obtained with FP+LBP+SRC, 1 when RS F < and RSF >α, only half of points involved in the recognition, which α greatly reduces the running time of algorithm. If we further combine patch-based SRC
AN US
435
with FP, RSF and Huffman-LBP, the recognition rate achieves the highest results of 98.83, particularly for the rotation angles larger than 15◦ , which proves that the proposed patch-based SRC is less sensitive to pose variation.
M
440
4.4. Comparisons With the Existing Methods
ED
4.4.1. Comparison Experiment on the FERET Database For FERET database, we conduct experiments using the same experimental setting as section 4.3 to test the performance of proposed algorithm. We also compare our approach with the method based on Partial Least Squares(PLS) [47], Ridge Regression
PT
445
with Gabor features (RRG) [48], Adjacent Discriminant Multiple Coupled Subspace (ADMCLS) [49], a 3D method based on Morphable Displacement Field (MDF) [50],
CE
a virtual view reconstruction method based on Stack Flow [51] and SPAE [16]. In addition, to further evaluate the performance of our algorithm, a leading Deep Convolutional Neural Network (DCNN) based commercial face recognition system called
AC
450
Face++ [52] is introduced to compare with us. The performance and comparison results are reported in Table 3. It can be seen from the experimental results that the proposed method outperforms
the first five methods in most cases. Especially, for the images with pose variation 455
of −25◦ to +25◦ , the proposed method achieves a perfect classification accuracy of 23
ACCEPTED MANUSCRIPT
Table 3 Performance comparison with state-of-the-art methods on the FERET database(%). bd +25◦
be +15◦
MDF RRG ADMCLS PLS Stack Flow SPAE Face++ Our method
97 96 82 59 70 96 100 95
99 99 94 76 89 98 100 100
99 98 95 76 96 99 100 100
bf −15◦ 100 96 96 77 94 99 100 100
bg −25◦ 99 96 94 72 82 99 100 100
bh −40◦ 98 91 85 53 62 95 100 98
Avg. 98.7 96 91 68.8 82.2 97.7 100 98.8
CR IP T
bc +40◦
AN US
Methods
100%. It can also be seen that the 3D method based on MDF performs better than our method when the test images with pose variation of +40◦ , but it should be noted that MDF takes advantage of much 3D information that may be difficult to be collected, and the way of reconstructing virtual front view cannot handle small pose variations 460
well. As can be seen from Table 3, Face++ gets the best recognition performance in
M
all methods. However, the model needs to use a large amount of extra data to train its large number of parameters [53–55]. For example, the Megvii Face Recognition System [55] achieves an unprecedented accuracy on the LFW database, while a large
465
ED
dataset containing 5 million labeled faces of around 20,000 individuals has to be used to train its ten-layer deep convolutional neural network. Although the performance of
PT
our method is slightly lower (1.2%) than Face++, the number of our model parameters and the amount of training data required are relatively small.
CE
4.4.2. Comparison Experiment on the CMU PIE Database For CMU PIE database, we set the remaining 34 frontal face images in C27 as
470
gallery images, and the remaining 34 non-frontal face images in C37, C05, C29 and
AC
C11 are set as test images, respectively. We also compare our results with the Eigen Light-Field (ELF) method [56], LLR [8], the View Synthesis method based on Probabilistic Learning(VS-PL) [57], the three-point Stereo Matching Distance (3ptSMD) [58], MRFs [10], the method based on Partial Least Squares(PLS) [47], OPR [9] and
475
Face++ [52]. The comparison results are shown in Table 4.
24
ACCEPTED MANUSCRIPT
Table 4 Performance comparison with state-of-the-art methods on the CMU PIE database(%). C37
C05
C29
C11
Avg.
ELF LLR VS-PL 3ptSMD MRFs PLS OPR Face++ Our method
89.0 82.4 86.0 100 97.0 100 100 100 100
93.0 98.5 88.0 100 100 100 100 100 100
91.0 100 91.0 100 100 100 100 100 100
78.0 89.7 86.0 97.0 97.0 100 100 100 100
87.8 92.7 87.75 99.3 98.5 100 100 100 100
AN US
CR IP T
Methods
From the table 4, we can see that for ELF, LLR and VS-PL, they have a similar trend that their recognition rates will decrease when the pose variation between the gallery image and test image becomes larger. However, as discussed in Section 3.3, because of the ability of RSF and patch-based SRC to deal with rotation, compared 480
with these methods, our method achieved a stable perfect performance under various
M
pose variation angles. Meanwhile, OPR, PLS and Face++ also achieve a perfect average accuracy of 100% same as our method. However, PLS requires prior knowledge of the pose of the probe image. It indicates that PLS will fail when the pose is unknown.
485
ED
As for OPR, pose correction is required during the recognition process. For the proposed method, pose correction is not necessary. Moreover, our method obtains the best
PT
recognition performance just like Face++. However, the recognition process is much simpler than the latter.
CE
4.4.3. Comparison Experiment on the LFW Database In order to test the effectiveness of our proposed algorithm under the unconstrained
490
environment, we further conduct experiment on the Labeled Faces in the Wild (LFW)
AC
database. The images we used all aligned by deep funneling [59]. Since there is only a pair of face images provided in the face verification system, our algorithm would be inapplicable because the dictionary of SRC based classifier cannot be established. We do not carry on the traditional face verification experiment on this dataset, but face
495
recognition. We choose a subset in the LFW database, which consists of 200 samples
25
CR IP T
ACCEPTED MANUSCRIPT
D
E
Fig. 13. Examples of face image from the subset of LFW database. (a) Gallery images. (b) Test images.
AN US
of 100 subjects. Each subject contains a face image with near frontal pose(used as gallery image) and a face image with non-frontal pose (used as test image), respectively. In addition to variations in pose, there are also other external changes between the gallery image and test image, such as illumination, makeup and expression. Some 500
gallery images and their corresponding test images are shown in Fig. 13. We compare our proposed method (the experimental parameters are the same as the experimental
M
parameters in the FERET database except patch size )with High-fidelity Pose and Expression Normalization (HEPN) [7] combined with the LBP operator, and the results
ED
are shown in table 5.
Table 5 Recognition rates of different methods on the LFW database. Recognition(%)
HEPN+LBP Our method
74.0 80.0
CE
PT
Methods
As can be seen, our method gets a higher recognition rate than HEPN+LBP. This
505
may be attributed to the strong representation capability of the proposed Huffman-LBP.
AC
Moreover, although HEPPN realizes a high-fidelity normalization through 3D transformation, the distortion cannot be avoided. In contrast, our approach does not destroy the original structure of the image, avoiding the interference of the reconstruction residu-
510
als.
26
ACCEPTED MANUSCRIPT
4.5. Computation Time In this subsection, we evaluate the computation time of our method and compare it with other pose-robust methods (3D normalization method: HEPN+LBP; regression
515
CR IP T
based approaches: Mis-alignment Robust Representation (MRR) [60] and Multi-Scale
Patch based Collaborative Representation (MSPCR) [61]; patch based method: Face
Image Classification by Pooling Raw Features (FCPRF) [62]) on the LFW database. A PC with 3.6GHz CPU (i7) and 8GB RAM is used. The programming environment is Matlab R2014b. The experimental setting is the same as the description in section 4.4. It should be noted that the image sizes mentioned in the literatures of these methods are different from each other. For a fair comparison, the size of the image used in all
AN US
520
methods is fixed at 250 × 250 pixels. Correspondingly, we reset the parameters of the
some methods to accommodate the changed image size (the patch scales of MSPCR are set as [8, 23, 38, 53, 68, 83, 98], the patch sizes of FCPRF are set as {10 × 10, 25 ×
25, 40 × 40}). The average computation time per image of all methods is listed in table 6.
ACT(s)
MSPCR MRR 13.2
1.0
10.6
HEPN +LBP 20.3
FP+Huffman-LBP FP+Huffman-LBP +Patch-based SRC +RSF+Patch-based SRC 10.4
9.3
PT
525
FCPRF
ED
Method
M
Table 6 The average computation time (ACT) per image of different methods on the LFW database.
It can be seen that our method (FP+Huffman-LBP+RSF+Patch-based SRC) is the fastest in all methods except for MRR, while FP+Huffman-LBP+Patch-based SRC also
CE
has a fast speed. What’s more, suffering from its complex 3D model fitting process, HEPN+LBP spends a much longer time (nearly 20 seconds per image) on recognition. In addition, both MSPCR and FCPRF use a multi-scale patch extraction strategy,
AC
530
which causes them to be slower than our method using only a single patch scale. Benefitting from the optimized search strategy, MRR has the fastest speed among all methods. From the comparison of FP+Huffman-LBP+Patch-based SRC and FP+Huffman-
LBP+RSF+Patch-based SRC in section 4.3 and here, it can be seen that RSF not only 535
improves the recognition accuracy of the algorithm, but also shortens the running time. 27
ACCEPTED MANUSCRIPT
4.6. Discussion Overall, compared to the methods (2D and 3D) based on reconstructing virtual face view, e.g. MDF, our method carries out the whole recognition process only on the
540
CR IP T
original face images, which can avoid the problem that the recognition performance
would be affected due to image distortion. Moreover, due to the ability of RSF and patch-based SRC to handle pose variation, for the images with various rotation angles,
the recognition performance of our method is steadier than other general methods. We can also find that RSF has a potential function of roughly evaluating the pose of the face images, so that our method does not require any prior knowledge of the pose. From the
point of the data, there is still a gap between our approach and Face++. However, our
AN US
545
approach is a relatively simple and new attempt at multi-pose face recognition. 5. Conclusions
In this paper, we present a method for multi-pose face recognition. A Divide-
550
M
and-Rule strategy is applied to both face representation and face classification. In the face representation, we first carry out the Divide operation: dividing the input image into different types by evaluating the Region Selection factor (RSF). Next, the im-
ED
age patches centered on interest points are extracted in the selected face region and Huffman-LBP is developed to extract the discriminant information of each patch. In
555
PT
the face classification, a designed classification strategy based on patch-based SRC is used to produce the recognition results. Our method is robust to the pose variation because both representation and classification are carried out at patch level that is
CE
less sensitive to the pose variation. The experimental results on traditional multi-pose databases (the CMU PIE database and FERET database) and unconstrained environ-
AC
ment database (the LFW database) show that our approach outperforms the competitive
560
methods in most cases, validating the effectiveness and good generalization. In future,
we will investigate the proposed method in video-based face analysis.
28
ACCEPTED MANUSCRIPT
6. Acknowledgment This work was supported in part by Natural Science Foundation of China (No.61272195, 61472055, 61100114, 61502067, U1401252), Program for New Century Excellent Talents in University of China (NCET-11-1085), Chongqing Outstanding Youth Found
CR IP T
565
(cstc2014jcyjjq40001), Chongqing Research Program of Application Foundation and
Advanced Technology (cstc2012jjA1699, cstc2015jcyjA40013) and Science and Tech-
nology Research Project of Chongqing Municipal Education Commission(KJ1500417). It is also sponsored by China Scholarship Council (201407845019) and supported by Natural Science Foundation of CQ (cstc2015jcyjA40011). References
AN US
570
[1] X. Zhang, Y. Gao, Face recognition across pose: a review, Pattern Recognit. 42 (11) (2009) 2876–2896.
[2] V. Blanz, T. Vetter, Face recognition based on fitting a 3d morphable model, IEEE
M
Trans. Pattern Anal. Mach. Intell. 25 (9) (2003) 1063–1074.
575
[3] A. Asthana, T. K. Marks, M. J. Jones, K. H. Tieu, M. Rohith, Fully automatic
ED
pose-invariant face recognition via 3d pose normalization, in: Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 937–944.
PT
[4] O. Aldrian, W. A. Smith, Inverse rendering of faces with a 3d morphable model, IEEE Trans. Pattern Anal. Mach. Intell. 35 (5) (2013) 1080–1093.
580
CE
[5] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: closing the gap to humanlevel performance in face verification, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1701–1708.
AC
[6] T. F. Cootes, G. J. Edwards, C. J. Taylor, Active appearance models, IEEE Trans.
585
Pattern Anal. Mach. Intell. 23 (6) (2001) 681–685.
[7] X. Zhu, Z. Lei, J. Yan, D. Yi, S. Z. Li, High-fidelity pose and expression normalization for face recognition in the wild, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 07-12-June-2015, 2015, pp. 787–796. 29
ACCEPTED MANUSCRIPT
[8] Xiujuan Chai, Shiguang Shan, Xilin Chen, Wen Gao, Locally linear regression for pose-invariant face recognition, IEEE Trans. Image Process. 16 (7) (2007)
590
1716–1725.
CR IP T
[9] Y. Tai, J. Yang, Y. Zhang, L. Luo, J. Qian, Y. Chen, Face recognition with pose variations and misalignment via orthogonal procrustes regression, IEEE Trans. Image Process. 25 (6) (2016) 2673–2683. 595
[10] H. T. Ho, R. Chellappa, Pose-invariant face recognition using markov random fields, IEEE Trans. Image Process. 22 (4) (2013) 1573–1584.
AN US
[11] H. Zhang, Y. Zhang, T. S. Huang, Pose-robust face recognition via sparse representation, Pattern Recognit. 46 (5) (2013) 1511–1521.
[12] J. Yu, X. Yang, F. Gao, D. Tao, Deep multimodal distance metric learning using click constraints for image ranking, IEEE Trans. Cybern. (2016) 1–11.
600
[13] J. Du, Y. Xu, Hierarchical deep neural network for multivariate regression, Pattern
M
Recognit. 63 (2017) 149–157.
[14] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for
605
ED
human pose recovery, IEEE Trans. Image Process. 24 (12) (2015) 5659–5670. [15] J. Yu, B. Zhang, Z. Kuang, D. Lin, J. Fan, IPrivacy: image privacy protection
PT
by identifying sensitive objects via deep multi-task learning, IEEE Trans. Inf. Forensics Secur. 12 (5) (2017) 1005–1016.
CE
[16] M. Kan, S. Shan, H. Chang, X. Chen, Stacked progressive auto-encoders (spae) for face recognition across poses, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1883–1890.
610
AC
[17] Z. Zhu, P. Luo, X. Wang, X. Tang, Deep learning identity-preserving face space, in: Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 113–120.
[18] F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: a unified embedding for face recognition and clustering, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat-
615
tern Recognit., 2015, pp. 815–823. 30
ACCEPTED MANUSCRIPT
[19] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition, in: Proc. IEEE Int. Conf. Comput. Vis., Vol. I, 2005, pp. 786–
620
CR IP T
791. [20] L. Wiskott, J. M. Fellous, N. Kr¨uger, C. Von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 775–779.
[21] J. Wang, J. You, Q. Li, Y. Xu, Orthogonal discriminant vector for face recognition
625
AN US
across pose, Pattern Recognit. 45 (12) (2012) 4069–4079.
[22] S. Sanyal, S. P. Mudunuri, S. Biswas, Discriminative pose-free descriptors for face and object matching, Pattern Recognit. 67 (2017) 353–365.
[23] T. Ojala, M. Pietik¨ainen, D. Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern Recognit. 29 (1) (1996)
630
M
51–59.
[24] D. Huang, Y. Wang, Y. Wang, A robust method for near infrared face recognition
ED
based on extended local binary pattern, in: Lect. Notes Comput. Sci., Vol. 4842, 2007, pp. 437–446.
PT
[25] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Trans. Image Process. 19 (6) (2010) 1635– 1650.
CE
635
[26] M. A. Akhloufi, A. Bendada, Locally adaptive texture features for multispectral
AC
face recognition, in: Proc. IEEE Int. Conf. Syst. Man, Cybern., 2010, pp. 3308– 3314.
[27] H. X. Chen, Y. Y. Tang, B. Fang, P. S. Wang, A multi-layer contrast analysis
640
method for texture classification based on lbp, Int. J. Pattern Recognit. Artif. Intell. 25 (01) (2011) 147–155.
31
ACCEPTED MANUSCRIPT
[28] C. Zhu, R. Wang, Local multiple patterns based multiresolution gray-scale and rotation invariant texture classification, Inf. Sci. (Ny). 187 (1) (2012) 93–108.
terns, Pattern Recognit. 68 (2017) 126–140.
645
CR IP T
[29] W. Huang, H. Yin, Robust face recognition with structural binary gradient pat-
[30] Z. Guo, X. Wang, J. Zhou, J. You, Robust texture image representation by scale
selective local binary patterns, IEEE Trans. Image Process. 25 (2) (2016) 687– 699.
[31] X. T. Yuan, X. Liu, S. Yan, Visual classification with multitask joint sparse repre-
AN US
sentation, IEEE Trans. Image Process. 21 (10) (2012) 4349–4360.
650
[32] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 210–227.
[33] C. H. Chan, J. Kittler, Sparse representation of (multiscale) histograms for face
M
recognition robust to registration and illumination problems, in: Proc. IEEE Int.
655
Conf. Image Process., 2010, pp. 2441–2444.
ED
[34] C. Kang, S. Liao, S. Xiang, C. Pan, Kernel sparse representation with pixel-level and region-level local feature kernels for face recognition, Neurocomputing 133
660
PT
(2014) 141–152.
[35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, Learning deep representation for face
CE
alignment with auxiliary attributes, IEEE Trans. Pattern Anal. Mach. Intell. 38 (5) (2016) 918–930.
AC
[36] C.-Y. Lee, B.-S. Chen, Mutually-exclusive-and-collectively-exhaustive feature
665
selection scheme, Appl. Soft Comput. (2017) 1–11.
[37] T. Ahonen, A. Hadid, M. Pietik¨ainen, Face description with local binary patterns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 28 (12) (2006) 2037–2041.
32
ACCEPTED MANUSCRIPT
[38] R. Min, J. L. Dugelay, Improved combination of lbp and sparse representation based classification (src) for face recognition, in: Proc. IEEE Int. Conf. Multimed. Expo, 2011, pp. 1–6.
670
CR IP T
[39] D. L. Donoho, For most large underdetermined systems of linear equations the
minimal 1-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (6) (2006) 797–829.
[40] Q. Zhang, L.-F. Zhou, Y. Y. Tang, W.-S. Li, K. Ricanek, X.-Y. Li, Face detection
method based on histogram of sparse code in tree deformable model, in: Proc.
675
AN US
Int. Conf. Mach. Learn. Cybern., 2016, pp. 996–1002.
[41] X. Kavousianos, E. Kalligeros, D. Nikolos, Optimal selective huffman coding for test-data compression, IEEE Trans. Comput. 56 (8) (2007) 1146–1152. [42] J. Zhang, Y. Deng, Z. Guo, Y. Chen, Face recognition using part-based dense sampling local features, Neurocomputing 184 (2016) 176–187.
680
M
[43] P. Jonathon Phillips, H. Moon, S. A. Rizvi, P. J. Rauss, The feret evaluation methodology for face-recognition algorithms, IEEE Trans. Pattern Anal. Mach.
ED
Intell. 22 (10) (2000) 1090–1104. [44] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression database, IEEE Trans. Pattern Anal. Mach. Intell. 25 (12) (2003) 1615–1618.
PT
685
[45] G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a
CE
database for studying face recognition in unconstrained environments, Tech. rep. (2007).
AC
[46] J. Zou, Q. Ji, G. Nagy, A comparative study of local matching approach for face
690
recognition., IEEE Trans. Image Process. 16 (10) (2007) 2617–2628.
[47] A. Sharma, D. W. Jacobs, Bypassing synthesis: pls for face recognition with pose, low-resolution and sketch, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 593–600.
33
ACCEPTED MANUSCRIPT
[48] A. Li, S. Shan, W. Gao, Coupled bias-variance tradeoff for cross-pose face recognition, IEEE Trans. Image Process. 21 (1) (2012) 305–315.
695
[49] A. Sharma, M. A. Haj, J. Choi, L. S. Davis, D. W. Jacobs, Robust pose invariant Image Underst. 116 (11) (2012) 1095–1110.
CR IP T
face recognition using coupled latent space discriminant analysis, Comput. Vis.
[50] S. Li, X. Liu, X. Chai, H. Zhang, S. Lao, S. Shan, Maximal likelihood correspon-
dence estimation for face recognition across pose, IEEE Trans. Image Process.
700
23 (10) (2014) 4587–4600.
AN US
[51] A. B. Ashraf, S. Lucey, T. Chen, Learning patch correspondences for improved
viewpoint invariant face recognition, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2008, pp. 1–8. 705
[52] Megvii, Face++ Research Toolkit, www.faceplusplus.com.
[53] H. Fan, Z. Cao, Y. Jiang, Q. Yin, C. Doudou, Learning deep face representation,
M
arXiv preprint arXiv:1403.2802.
[54] H. Fan, M. Yang, Z. Cao, Y. Jiang, Q. Yin, Learning compact face representation: 933–936.
710
ED
packing a face into an int32, in: MM - Proc. ACM Conf. Multimedia, 2014, pp.
PT
[55] E. Zhou, Z. Cao, Q. Yin, Naive-deep face recognition: touching the limit of lfw benchmark or not?, arXiv preprint arXiv:1501.04690.
CE
[56] R. Gross, I. Matthews, S. Baker, Appearance-based face recognition and lightfields, IEEE Trans. Pattern Anal. Mach. Intell. 26 (4) (2004) 449–465.
[57] M. Saquib Sarfraz, O. Hellwich, Probabilistic learning for fully automatic face
AC
715
recognition across pose, Image Vis. Comput. 28 (5) (2010) 744–753.
[58] C. D. Castillo, D. W. Jacobs, Using stereo matching with general epipolar geometry for 2D face recognition across pose, IEEE Trans. Pattern Anal. Mach. Intell. 31 (12) (2009) 2298–2304.
34
ACCEPTED MANUSCRIPT
720
[59] G. B. Huang, M. A. Mattar, H. Lee, E. Learned-Miller, Learning to align from scratch, Adv. Neural Inf. Process. Syst. 1 (2012) 764–772. [60] M. Yang, L. Zhang, D. Zhang, Efficient misalignment-robust representation for
CR IP T
real-time face recognition, in: Eur. Conf. Comput. Vis., 2012, pp. 850–863.
[61] P. Zhu, L. Zhang, Q. Hu, S. C. K. Shiu, Multi-scale patch based collaborative
representation for face recognition with margin distribution optimization, in: Eur.
725
Conf. Comput. Vis., 2012, pp. 822–835.
[62] F. Shen, C. Shen, X. Zhou, Y. Yang, H. T. Shen, Face image classification by
AC
CE
PT
ED
M
AN US
pooling raw features, Pattern Recognit. 54 (2016) 94–103.
35
ACCEPTED MANUSCRIPT
Li-Fang Zhou was born in Tianshui, Gansu Province, China. She received her M.S. degree and Ph.D. degree from the Chongqing University of Posts and Telecommunications in July 2007 and the Chongqing University in December 2013, respectively. Currently she is an associate professor of Chongqing University of Posts and Telecommunications. Her research focuses on pattern recognition and machine vision, etc. Yue-Wei Du was born in Shanxi Province, China, in 1992. He has received B.S.
CR IP T
degree from Changzhi College, Shanxi, P.R. China. Currently, he is a M.S. candidate in computer technology, with the Chongqing Key Laboratory of Computational
Intelligence, Chongqing University of Posts and Telecommunications. His research interests include image processing and pattern recognition.
Wei-Sheng Li graduated from School of Electronics & Mechanical Engineering at
Xidian University, Xian, China in July 1997. He received M.S. degree and Ph.D.
AN US
degree from School of Electronics & Mechanical Engineering and School of
Computer Science & Technology at Xidian University in July 2000 and July 2004, respectively. Currently he is a professor of Chongqing University of Posts and Telecommunications. His research focuses on intelligent information processing and pattern recognition.
Jian-Xun Mi received B.S. degree in Automation from Sichuan University (SCU),
M
Chengdu, China in 2004 and Ph.D. degree in Pattern Recognition & Intelligent Systems from University of Science and Technology of China (USTC), Hefei, China in 2010. He worked at the Bio-Computing Research Center at Shenzhen Graduate
ED
School Harbin Institute of Technology, Shenzhen, China as a Postdoctoral Research Fellow from Sept. 2011 to Sept. 2013. Now he is an associate professor in Chongqing University of
PT
Posts and Telecommunications, Chongqing, China. Xiao Luan received B.S. degree in Information and Computational Science from Henan University, Kaifeng, China, the Ph.D. degree in Computer Science and
CE
Technology from Chongqing University, Chongqing, China. Currently he is an associated professor with the College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China. His
AC
research interests include pattern recognition and image processing.