Pose-robust face recognition with Huffman-LBP enhanced by Divide-and-Rule strategy

Pose-robust face recognition with Huffman-LBP enhanced by Divide-and-Rule strategy

Accepted Manuscript Pose-Robust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy Li-Fang Zhou, Yue-Wei Du, Wei-Sheng Li, Jian-X...

3MB Sizes 0 Downloads 32 Views

Accepted Manuscript

Pose-Robust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy Li-Fang Zhou, Yue-Wei Du, Wei-Sheng Li, Jian-Xun Mi, Xiao Luan PII: DOI: Reference:

S0031-3203(18)30005-0 10.1016/j.patcog.2018.01.003 PR 6412

To appear in:

Pattern Recognition

Received date: Revised date: Accepted date:

6 May 2017 22 October 2017 7 January 2018

Please cite this article as: Li-Fang Zhou, Yue-Wei Du, Wei-Sheng Li, Jian-Xun Mi, Xiao Luan, PoseRobust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy, Pattern Recognition (2018), doi: 10.1016/j.patcog.2018.01.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A novel LBP-like feature is proposed which takes the contribution of contrast

CR IP T

value into consideration by Huffman coding.

• The Divide-and-Rule strategy is applied to both face representation and classification with the goal of improving the robustness to pose variation.

• Face representation via Region Selection Factor (RSF) is suggested in our method to treat the face images of different poses specifically rather than generally.

AN US

• In order to further make the method tolerate the rotations, we perform the face classification at the patch-level using a patchbased SRC fusion classification

AC

CE

PT

ED

M

strategy.

1

ACCEPTED MANUSCRIPT

Pose-Robust Face Recognition with Huffman-LBP Enhanced by Divide-and-Rule Strategy

CR IP T

Li-Fang Zhoua,b,∗, Yue-Wei Dua , Wei-Sheng Lia , Jian-Xun Mia , Xiao Luana a Coll.

of Computer Science and Technology, Chongqing Univ. of Posts and Telecommunications, Chongqing 400000, People’s Republic of China b Coll. of software, Chongqing Univ. of Posts and Telecommunications, Chongqing 400000, Peoples Republic of China

AN US

Abstract

Face recognition in harsh environments is an active research topic. As one of the most important challenges, face recognition across pose has received extensive attention. LBP feature has been used widely in face recognition because of its robustness to slight illumination and pose variations. However, due to the way of pattern feature calculation, its effectiveness is limited by the big rotations. In this paper, a new LBP-

M

like feature extraction is proposed which modifies the code rule by Huffman. Besides, a Divide-and-Rule strategy is applied to both face representation and classification,

ED

which aims to improve recognition performance across pose. Extensive experiments on CMU PIE database, FERET database and LFW database are conducted to verify the efficacy of the proposed method. The experimental results show that our method

PT

significantly outperforms other approaches. Keywords: Face recognition across pose, LBP, Huffman, Divide-and-Rule strategy

CE

1. Introduction

Face recognition is of vital importance in person identification due to the rich in-

AC

formation contained in face images. Although frontal face recognition has already reached high accuracy on internationally available databases, face recognition in harsh

5

environments is still a challenging problem because the various variations of the tar∗ Corresponding

author Email address: [email protected] (Li-Fang Zhou)

Preprint submitted to Pattern Recognition

January 10, 2018

ACCEPTED MANUSCRIPT

get face image are very common in real applications. For example, the intra-personal differences caused by pose variation can even be much larger than inter-personal differences. Therefore, pose variation is identified as one of the prominent unsolved prob-

10

CR IP T

lems in the research of face recognition [1]. Many different methods for solving the problem of pose variation have been proposed, which can be roughly divided into two categories, i.e. 3D and 2D approaches.

Since pose variations are generated by 3D rigid transformations of the face, some researchers believe that using the 3D method to solve this problem is more intuitive and accurate. One type of 3D methods builds the human face 3D model by continuously

evaluating the depth information and fitting the face shape after pose changes. Finally,

AN US

15

pose normalization is realized through 3D transformation. The most representative method is 3D Morphable Model (3DMM) [2]. Another type of methods is the landmark based 3D face model fitting [3–5]. The method in [3] relies on Active Appearance Model (AAM) [6] to fit the landmark, finds the corresponding relationship between 20

the 2D landmark point and the 3D landmark point, and finally synthesizes the multi-

M

pose view to the front view. Recently, Zhu et al. [7] improved the 3DMM by solving the “landmark marching” problem. A more precise correspondence between the 2D

ED

face landmarks and the 3D face landmarks is established, so a high-fidelity pose and expression normalization is realized. Compared with 3D methods, 2D-based methods do not need to rely on 3D informa-

25

PT

tion to identify the subject, so it is easy to implement. One type of 2D methods attempts to find out the relationship among different pose images by regression models, such as

CE

Locally Linear Regression (LLR) [8] and Orthogonal Procrustes Regression (OPR) [9]. In 2016, Tai et al. [9] applied OPR to multi-pose face recognition, which uses orthog-

30

onal procrustes analysis to find the optimal linear transformation between two images

AC

with different poses. At the same time, they proposed a stacked OPR model to solve the problem of highly nonlinear pose change. A large number of 2D-based approaches also normalize the different pose face images to a unified view by reconstructing the virtual view so that the traditional face recognition algorithm can be used. For exam-

35

ple, the method proposed in [10] uses Markov Random Fields (MRFS) to reconstruct a virtual frontal face from a non-frontal face image. In [11], a method based on the 3

ACCEPTED MANUSCRIPT

sparse representation is proposed to achieve a perfect pose-normalization by transferring the representation coefficients and searching the view-dictionary, but it requires a complex training process. In recent years, deep learning has been widely used in pattern recognition and image processing [12–18]. In particular, with the development

CR IP T

40

of high-performance computing, training neural network using a large amount of data

turns into reality. In [16], a Stacked Progressive Auto-Encoders (SPAE) is proposed for

converting non-frontal images to frontal images and learning a pose-robust feature by progressively modeling the complex nonlinear transformation using a deep network. 45

Similarly, in [17], a deep network containing the reconstructed layer and the feature

AN US

extraction layers is designed to learn a face identity-preserving feature from images of

different views. Recently, Schroff et al. [18] adopted a deep convolutional network (FaceNet) to learn a mapping from face images to a compact Euclidean space, and then the distance in the space is used to measure the similarity of face images. Although 50

the deep network has achieved perfect performance in face recognition, it needs largescale data to train and costs many computation resources. Additionally, extracting

M

pose-robust features could be another way, such as LGBP [19], Elastic Bunch Graph Matching (EBGM) [20], Orthogonal Discriminant Vector (ODV) [21], Subspace Point

ED

Representation (SPR) and Layered Canonical Correlated (LCC) [22], etc. Most methods based on holistic approaches are very sensitive to pose variations

55

[1], because the depth rotations of 3D faces almost always lead to image pixel mis-

PT

alignment which is the main classification basis for these methods. In contrast, local methods work only or mainly in a set of discrete regions or points in the images and

CE

pattern features are extracted from an isolated region in the face image. Hence, the 60

local feature extraction methods can tolerate small pose variations. One successful local face descriptor is Local Binary Patterns (LBP) [23], which does not require exact

AC

pattern locations but relies only on the histogram of feature in a region. As a result, LBP can alleviate small rotations and performance perfectly when the pose variations are less than 15◦ .

65

Among the methods based on LBP are the Extended LBP operator (ELBP) [24],

Local Ternary Pattern (LTP) [25], Local Adaptive Ternary Pattern (LATP) [26], Local Multiple Layer Contrast Pattern (LMLCP) [27] and Local Multiple Patterns (LMP) 4

ACCEPTED MANUSCRIPT

[28]. Huang et al. [29] developed a novel local descriptor called Binary Gradient Patterns (BGP). It evaluates relationships between local pixels in the image gradient 70

level and encodes the potential local structures into a string of binary values. In [30],

CR IP T

Scale Selective Local Binary Patterns (SSLBP) is proposed to handle the scale variation by capturing the scale invariant feature from different image scale spaces.

In [31], LBP feature of the image rather than the pixel value is used in Sparse Representation-based Classification (SRC) [32], making the recognition system more 75

robust to illumination variations. Chan et al. [33] argued that the LBP values are only converted from a string of binary labels, but not conventional numerical values, so it is

AN US

unreasonable to linearly combine LBP directly with SRC. Based on this observation,

LBP histogram features of image are used to SRC framework in their work. In [34], the LBP histogram based χ2 kernel is applied to the SRC, and then the Kernel Coordi80

nate Descent (KCD) algorithm is proposed to solve the LASSO problem in the kernel space, which effectively uses the kernel method to discover the nonlinear transformation relationship between the gallery samples and test samples. However, in all these

M

studies, the samples entered into the SRC algorithm are completely represented by the holistic histogram feature, which does not work well in the face recognition under pose variations, since the holistic appearance is very sensitive to change of pose.

ED

85

This paper proposes a novel local facial descriptor called Huffman-LBP to extract face features. Meanwhile, a Divide-and-Rule strategy is adopted to improve the ro-

PT

bustness of multi-pose face recognition. Compared with other existing popular poseinvariant face recognition methods, the proposed method is more natural and more consistent with the human cognitive mechanism, that is, the virtual view reconstruction is

CE

90

not required in the process of face recognition under pose variation, thus avoiding the problem that the recognition performance would be affected due to image distortion.

AC

The contribution of the work is as follows:

95

• A face detection operation is suggested to locate the face so that the background

can be removed. Meanwhile, a Face alignment method called TCDCN [35] is used to detect the landmarks with the purpose of extracting pattern features from the corresponding points associated with the training image.

5

ACCEPTED MANUSCRIPT

• The Huffman coding is utilized to evaluate the weights of contrast value in the neighborhood, so that the discriminative power can be improved. Furthermore, there is no parameter tuning problem in Huffman-LBP compared to other LBP-

100

CR IP T

based methods. • Motivated by the MECE principle (Mutually Exclusive Collectively Exhaustive)

[36], face representation and classification based on Divide-and-Rule strategy

are proposed here: first, face representation via Region Selection Factor (RSF) is suggested in our method to treat the face images of different poses specifi-

105

cally rather than generally; then, we perform the face classification at the patchmethod tolerate the rotations.

AN US

level using a patch-based SRC fusion classification strategy to further make the

• Extensive experiments illustrate the applicability of the proposed approach in

multi-pose face recognition. In particular, the above issues are proved effectively

110

2. Related work

M

by the experiment in section 4.3.

ED

In this section, some related researches on both face representation and face classification are briefly reviewed. We first review the Local Binary Patterns based face 115

representation in 2.1. Then we give a brief introduction of the sparse representation

PT

based face classification in 2.2.

CE

2.1. Local Binary Patterns based face representation LBP was first proposed by Ojala et al. [23], and then Ahonen [37] applied it to the

field of face recognition. The main idea of LBP is to convert the gray scale image into a feature vector composed of binary codes by judging the relationship (positive and

AC

120

negative) of the pixel value difference between the surrounding pixels and the central

pixel in the circular neighborhood. As long as the external factor does not change the positive and negative relationships of pixel value difference between the central pixel and the surrounding pixels, the representation of LBP to the face image is robust. In a

6

ACCEPTED MANUSCRIPT

125

pixel neighborhood, the LBP value of the center pixel can be calculated by: LBPP,R =

P−1 X t=0

s(gt − gc )2t

(1)

CR IP T

where gc and gt refer to the gray value of the center pixel and P surrounding pixels in a image neighborhood of radius R, separately. s corresponds to a thresholding function defined as follows:

     1, i f x > 0 s(x) =     0, otherwise

(2)

The LBP values of the pixels in the whole image are counted to get the histogram fea130

tures. The final classification results can be determined by calculating and comparing

AN US

the histogram similarities.

2.2. Sparse representation based face classification

Let A = {A1 , A2 , ..., Ak } be a training set, which consists of k classes of face images,

where Ai = {vi,1 , vi,2 , ..., vi,n } ∈ Rm×n , for a test image y ∈ Rm belonging to the class i, which can be linearly represented by the training images from class i:

M

135

y = ai,1 vi,1 + ai,2 vi,2 + ... + ai,n vi,n

(3)

ED

Since the training set A includes all classes of images, Eq.(3) can be rewritten as follows:

y = Ax0 ∈ Rm

(4)

In the ideal case, y is only a linear combination of the training images from class i.

140

PT

That is, in the coefficient vector x0 , most coefficients are zero except for the coefficients associated with the class i: x0 = {0, ..., 0, ai,1 , ai,2 , ..., ai,n , 0..., 0}T . To find the sparsest

CE

solution of Eq.(4) then equals to solve the following optimization equation: xˆ0 = arg min kxk0

subject to

Ax = y

(5)

However, solving the l0 -minimization of an underdetermined system of linear equa-

AC

tions is NP-hard [38]. In [39], the author proposed a theory: if the solution x0 of the l0 -minimization is sufficiently sparse, then the solution of l0 -minimization is equivalent

145

to the solution of l1 -minimization: xˆ1 = arg min kxk1

subject to kAx − yk2 ≤ ε

7

(6)

ACCEPTED MANUSCRIPT

Where y and the columns of A have unit l2 -norm. Finally, the classification result can be determined by sorting the residuals: ri (y) = ky − Aδi ( xˆ1 )k2

(7)

150

CR IP T

indentity(y) = argmini ri (y) (8) Where, δi is a function that can pick out the coefficients associated with the class i. 3. Methodology

Our proposed algorithm for multi-pose face recognition consists of three main

AN US

steps: face preprocessing, feature extraction via Huffman-LBP, face representation and

classification using Divide-and-Rule strategy. Fig. 1 shows the overall face recognition flow of proposed method. Face Preprocessing

5LJKW 3DWFKHV

9

Ă

Ă 9

)DFH 'HWHFWLRQ

,QWHUHVW 3RLQWV 'HWHFWLRQ

M

/HIW 3DWFKHV

3UREH,PDJHV

Classification

Face Representation

*DOOHU\,PDJHV

9

Ă

5HVLGXDO

Ă

6XP

5HVLGXDO

5HVXOW

6XP

5HVLGXDO

5HVXOW

6XP

5HVLGXDO

5HVXOW

5HVLGXDO

9 9

Ă

Ă 9

3DWFK ([WUDFWLRQ

65&

5HVLGXDO

Ă Ă Ă

5HVLGXDO 5HVLGXDO 5HVLGXDO

9

Ă

Ă

Ă

Ă

Ă

Ă

56)

ED

Ă

9 9 9

5HVLGXDO

Ă

5HVLGXDO

9 9

PT

Huffman-LBP

Fig. 1. The flow chart of the proposed algorithm.

3.1. Face Preprocessing

CE

155

One key step for face recognition is face detection. The task of face detection is

AC

to locate the position of face in the image. Specifically, the image background will be removed so that the recognition performance can be improved. Although great improvements have been made for face detection, it is still a challenge because the ex-

160

ternal disturbance factors in the image are very common, such as variations in rotation, pose and occlusion. Thus a new approach based on the landmark localization to detect face image is developed in [40] by us. Furthermore, the proposed histogram of sparse 8

ACCEPTED MANUSCRIPT

code-based method is very effective because it can capture global elastic and multiview deformation. The face detection results of our previous work on three popular 165

face databases (FERET, CMU PIE and LFW) are shown in Fig. 2. Hence, this method

CR IP T

is used as the face preprocessing way in this paper.

(b)

(a)

(c)

AN US

Fig. 2. The face detection results. (a) FERET. (b) CMU PIE. (c) LFW.

In order to improve the robustness to facial landmark detection under severe occlusions and large pose variations, a particular type of CNN named Tasks-Constrained Deep Convolutional Network (TCDCN) is used [35]. This TCDCN optimizes facial 170

landmark detection together with heterogeneous but subtly correlated tasks. A taskconstrained loss function is formulated to allow the errors of relevant tasks to be back-

M

propagated together with the purpose of improving the effectiveness and generalization of face landmark detection. In order to address the problem that different tasks have

175

ED

different convergence rates and learning difficulties, a task-wise early stopping criterion is devised to facilitate learning convergence. Since most of the points on the contour will move with pose variations, after landmark detection using TCDCN, we select the

PT

key landmarks around the face organs (eyebrows, eyes, nose, and mouth) as the interest points. Fig. 3 illustrates the preprocessing process.

CE

3.2. Huffman-LBP

To achieve invariance with respect to any monotonic transformation of the gray

180

AC

scale, only the signs of the contrast value are considered by LBP [37]. Unfortunately, this property of LBP sometimes leads to unexpected confusion. As shown in Fig. 4, two different sets of image textures get the same result after being encoded by LBP operator. To the best of our knowledge, the Huffman coding is usually used for lossless

185

data compression [41]. However, few researchers have applied Huffman coding to feature extraction. To handle the problem of losing texture information from LBP, this 9

ACCEPTED MANUSCRIPT

)DFH'HWHFWLRQ

CR IP T

/DQGPDUN 'HWHFWLRQ

.H\3RLQWV 6HOHFWLRQ

Fig. 3. The process of face preprocessing.

AN US

paper first adopts Huffman coding to weight the contrast value so that abundant texture information can be supplemented. The novel method is called Huffman-LBP. ķ

      





gc









  

M

LBP8ˈ1 71



s



  

ED

     

s











gc

















gc









H _ LBP8,1

ķ

77

H _ LBP8,1 = 8

+XIIPDQ/%3

ĸ

H _ LBP8,1+ = 40

H _ LBP8,1 = 10

ĸ

Fig. 4. Two different sets of image textures are encoded by LBP and Huffman-LBP respectively.

190

PT

Huffman coding uses variable-length codewords to encode source symbols. The coding string is determined according to the probability (frequency) of each symbol.

CE

Symbols with larger frequency will be represented by fewer bits. In other words, in the Huffman tree, the frequency of the leaf node nearer to the root node is smaller, and the frequency of the leaf node farther from the root node is larger. Moreover, the length of

AC

the Huffman code for each leaf node is consistent with the distance between the leaf

195

node and the root node. Inspired by this, we construct the Huffman tree with regarding the absolute value of gt − gc (t = 0, 1, . . . , p − 1) as the frequency of each leaf node, and

get the corresponding Huffman code. According to the code length, we can measure the weight of the contrast value. The value of the Huffman-LBP of a pixel is given by:

10

ACCEPTED MANUSCRIPT

H LBP+P,R = round(

P−1 X

s+ (t)w+ (t)2t )

H LBP−P,R = round(

t=0

s− (t)w− (t)2t )

(9)

t=0

Where s+ and s− are the thresholding functions:      1, gt − gc ≥ 0 t = 0, 1, ..., P − 1 s+ (t) =     0, Otherwise

CR IP T

200

P−1 X

     1, gt − gc < 0 s+ (t) =  t = 0, 1, ..., P − 1    0, Otherwise

(10)

In Eq. (10), gc and gt refer to the gray value of the center pixel and P (in this paper, the

AN US

value of P is fixed at 8)surrounding pixels in a image neighborhood of radius R. In Eq. (9), w+ and w− define the weighting functions as follows:   1 1             length(c ) length(c   t t)     , t ∈ index+ , t ∈ index−    P  P n m + − 1 1 w (t) =  w (t) =          i=1 length(cli ) i=1 length(cki )             0, Otherwise 0, Otherwise

(11)

205

M

Where length(c) is the length of the code C, index+ and index− are the location index sets of surrounding pixels, defined as follows:

ED

index+ = {t|gt −gc ≥ 0} = {l1 , l2 , ...ln }

index− = {t|gt −gc < 0} = {k1 , k2 , ...km } (12)

ct (t ∈ index+ ) and ct (t ∈ index− ) are the Huffman codes of {|gt − gc | |t ∈ index+ } and

PT

{|gt − gc | |t ∈ index− } respectively:

CE

(cl1 , cl2 , ..., cln ) = Hu f f man coding{|gt − gc | |t ∈ index+ }

(ck1 , ck2 , ..., ckm ) = Hu f f man coding{|gt − gc | |t ∈ index− }

(13)

Fig. 5 demonstrates each step of getting the Huffman-LBP feature value. From the

Huffman-LBP coding process, we can see that Huffman-LBP contains both positive and negative values, which represent the sign information of contrast value. What’s

AC

210

more, a relatively precise weight relationship between surrounding pixels can be measured by Huffman encoding. With the novel encoding rule, the sign of the contrast value is no longer the only encoding object, the magnitude of the contrast value will also play a role in the encoding process. As can be seen from Fig.4, the discrimina-

11

ACCEPTED MANUSCRIPT

215

tion ability of the LBP can be improved by supplementing the weight information of the contrast value. Although two different sets of image textures have the same binary coding (s+ and s− ), with Huffman-LBP encoding, they get different feature value.

CR IP T

Furthermore, some LBP improvements (e.g. LTP [25], LMLCP [27] and LMP [28] can achieve a better recognition performance by supplementing a evaluation of 220

contrast value magnitude during encoding process. However, a common problem of the parameter optimal setting has to be considered. For example, the selection of LTP

threshold and the setting of LMLCP layer number will seriously affect the final recognition results. On the contrary, the weight of contrast value is evaluated automatically

AN US

by Huffman coding in this paper. In other words, the key advantage of Huffman coding lies in its non-interactive property, which means it can work in a flexible way.           

Huffman _ coding{ gt  g c | t  index  }

  

  

   gt  gc



  

  

s

g c 





s



gc













gc



ED

 





w











gc



 



H _ LBP8,1 =round(0.1538* 2  0.2307 *32

0.1538*64  0.4615*128)=77







gc

 0.4615* 8  0.2307 *16)=8









H _ LBP8,1 =round(0.1538*1  0.1538* 4





PT

         



M

(cl1 , cl2 ,..., cln )

w

CE

(ck1 , ck2 ,..., ckm )

225

Huffman _ coding{ gt  g c | t  index  }

Fig. 5. The process of getting Huffman-LBP value.

Finally, the Huffman-LBP histogram can be obtained by accumulating the Huffman-

AC

LBP value of the pixels, and then it will be used as pattern feature to classify the face images. 3.3. Divide-and-Rule strategy for multi-pose face recognition

230

The MECE principle is popular in the business mapping processes due to its effectiveness and reasonableness. The main idea of MECE is to grasp the core of the 12

ACCEPTED MANUSCRIPT

problem effectively and find a solution through the non-overlapping and non-omission division. Inspired by this, we adopt a Divide-and-Rule strategy in the process of face representation and classification to decompose the problem of pose variation into multiple sub-problems.

CR IP T

235

3.3.1. Divide-and-Rule face representation via Region Selection Factor (RSF)

If the rotation angle is too large, some regions in the 2D face image may move to the face silhouette and become invisible. Even if some non-occluded face regions may be useless for recognition due to deformation. The location of the region with high

fidelity in the image may change with the yaw direction and angle. Therefore, it is

AN US

240

quite obvious that the face images of different poses should be treated specifically. In order to generate the different representations for the face images under different poses, the Region Selection Factor (RSF) is proposed to classify the face image. Specifically, the whole face is divided into two regions (left and right) by connecting the tip 245

of the nose and the center of eyebrows. According to the result of face detection, the

M

locations of the sideburns and the center of the eyes can be determined. Subsequently, we compute the distance (L1 ) of right eye to right sideburn and the distance (L2 ) of left

CE

250

PT

ED

eye to left sideburn respectively. Finally, the value of RSF is calculated by Eq. (14).  1    < ⇒ Extract the left features    α  L1   1 RS F = (14)  ≤ RS F ≤ α ⇒ Extract the whole features  L2   α      > α ⇒ Extract the right features 1 Where α is the threshold of RSF, RS F < represents that the left face region is useful α for recognition and the features in the left-side face region will be extracted to represent

AC

the face. Similarly, the features in the right-side face region should be extracted when 1 RSF >α. Supposing ≤ RS F ≤ α, then the features in the whole face region will be α extracted. Therefore, the face image will be divided into three channels according to

255

RSF. As we know, image patches have become a popular way to represent images re-

cently [10, 42]. In some sense, image patches are optimal when the purpose is to find the correspondences between two samples with appearance variations. Obviously, the

13

ACCEPTED MANUSCRIPT

holistic structure of face will change unpredictably when the facial pose changes. That 260

is to say, the variation in the intra-class is sometimes greater than the variation in the inter-class. According to the observations, the local structure of face image should be

CR IP T

considered under pose variation. Therefore, we propose to select the local patch of face

AN US

image to extract feature.

Fig. 6. The labels of interest points in the front face.

265

M

Specifically, we select the key landmarks around the facial organs as the interest points to represent the entire face. As shown in Fig. 6, there are 42 interest points on

ED

the whole face, 25 interest points on the twin sides of the face region respectively (including 8 points on the dividing line). We extract fixed-size image patches centered on the interest points. Moreover, considering the situation that spatial information of each

270

PT

patch may be important, each patch is divided into several non-overlapping blocks, and a histogram feature is extracted from each block via Huffman-LBP. The histogram fea-

CE

tures of all the blocks in the same patch are then concatenated into one vector. Finally,

AC

a feature pool is obtained by combining feature vector of all patches in the selected face n o region. The feature pool V j , j = 1, 2, . . . , 25 represents the features on the right-side n o n o face, and similarly V j , j = 18, 19, . . . , 42 for the left-side and V j , j = 1, 2, . . . , 42 for

275

the entire face, where Vj = [h1 ; h2 ; . . . ; hk ], hm is the Huffman-LBP histogram vector for block m in the patch j, and k is the number of blocks in each patch. The workflow of the Divide-and-Rule face representation is shown in Fig. 7. Obviously, α is vital to the dividing result of face samples, which will strongly affect the final recognition results. Here the maximal accuracy rate acts as the selection 14

/

/

RS F

CR IP T

ACCEPTED MANUSCRIPT

L1 !D L2

3RVLWLRQ(\HV DQG7HPSOHV 2ULJLQDO ,PDJH

&DOFXODWH/HQJWK 2I/$QG/

AN US

([WUDFWWKHULJKWIHDWXUHV

Fig. 7. The workflow of Divide-and-Rule face representation via Region Selection Factor. 100 99

97

M

96 95 94 93

ED

Recognition Rate(%)

98

92 91

1.4

1.9

2.4

2.9

3.4

3.9

4.4

4.9

5.4

5.9

6.4

6.9

7.4

7.9

PT

90

Fig. 8. The Relationship between the value of α and recognition rate in FERET database.

criteria of parameter α. Specifically, we discuss the relationship between the value

CE

280

of α and the recognition rate in FERET database (the experimental setting in here is the same as the experimental setting of the Parameters Discussion in section 4.2). As

AC

shown in Fig. 8, the algorithm can achieve the maximal recognition rate of 96.83% when α is 2.9. Considering that the ratio of the geometric distance of the face structure

285

is generic, which does not change with the database, so we fix the value of α at 2.9 in this paper.

15

ACCEPTED MANUSCRIPT

3.3.2. Patch-based SRC fusion classification Sparse representation based classification (SRC) has been demonstrated to be superior to nearest neighbor (NN) and nearest subspace (NS) based classifiers in various subspaces (e.g. PCA or LDA). As described in section 2.2, the success of the SRC de-

CR IP T

290

pends on that the test image can be linearly represented by the gallery images from the same class. However, it is not always the case in realistic face recognition applications.

In practice, the number of images in the gallery is often limited. In order to tackle the

small sample problem, the works in [31, 33, 34] adopted SRC in feature subspace for 295

face recognition. Motivated by this viewpoint, we have made great efforts to construct

AN US

the feature subspace. First, this paper has employed Huffman concept to extract multiscale face feature (H LBP+ and H LBP− ). What’s more, the patch-based SRC fusion classification strategy is proposed here. One common observation/assumption is that pose variation will lead to the unpredictable changes of holistic appearance. There300

fore, it is important to extract robust partial feature. As shown in Fig. 1, the idea of Divide-and-Rule is still adopted in here. we model a face as a collection of patches.

M

For each of patches, SRC is applied for classification independently. The class of test image is decided based on the fusion of classification results of all patches. By adopt-

305

ED

ing the strategy, the damaged patches caused by extreme pose variation will not affect the overall classification result.

More specifically, we use the histograms of patches in the same interest point of

PT

the gallery images to form a sub-dictionary, so that, we can get many sub-dictionaries: Dm , Dm+1 , . . . Dn , n − m is the number of patches in the gallery image, where Di =

CE

[v1,i , v2,i , ..., v p,i ], here v j,i is the Huffman-LBP histogram vector for patch i in the image 310

of class j. p is the number of sample classes. For each of the test image, a feature pool {v s , v s+1 , ..., vz } is obtained by the Divide-and-Rule face representation, where {s, s +

AC

1, ..., z} are the labels of the extracted patches corresponding to the interest points, and

z − s is the number of extracted patches for the test image. For each patch, a residual

vector is obtained by using SRC. Finally, the residual vectors of each patch are summed

315

to construct an augmented residual for the classification decision. The classifier details are described in Algorithm 1.

16

ACCEPTED MANUSCRIPT

Algorithm 1 The Patch-based SRC Fusion Classification Strategy.

AN US

CR IP T

Input: the sub-dictionaries: {Dm , Dm+1 . . . Dn }, the feature pool of test image y: {v s , v s+1 , ..., vz } Output: indentity (y) for i = max (m, s) to min (n, s) do 1. Normalize vi and the columns of Di to have unit l2 -norm; 2. Solve the l1 -minimization problem: subject to kDi x − vi k2 ≤ ε xˆ 1 = arg min kxk1 3. Compute the residuals by:

r j (vi ) =

vi − Di δ j ( xˆ1 )

2 for j = 1, . . . , p, and δ j is a function that can pick out the coefficients associated with the class i. end for     min(n,z)      X  indentity (y) = argmin j  r (v ) j i        i=max(m,s)

4. Experiments and Results

In this section, we conduct extensive experiments on three popular public face

320

M

databases: FERET [43], CMU PIE [44] and Labeled Faces in the Wild (LFW) [45] databases. A brief description of the databases is given in Section 4.1. Next, we intro-

ED

duce the experimental settings and parameters discussion in Section 4.2. The contribution of proposed enhancements is evaluated in Section 4.3. The experimental results of face recognition compared with existing methods can be found in Section 4.4. Section

325

PT

4.5 further compares the computation time of proposed method with several state-ofthe-art approaches. Finally, the discussion of experiments are presented in Section 4.6.

CE

4.1. Databases

(1) The FERET database.

AC

The FERET database is one of the most widely used benchmark for face recog-

330

nition methods [46]. It contains more than 14000 gray-scale face images with different viewpoints, which are divided into several subsets for different research purposes. In this paper, we use the pose subset to evaluate the performance of the proposed algorithm to cross-pose face recognition. The pose subset of the FERET database contains 200 individuals with 9 poses for each individual, these poses are 17

ACCEPTED MANUSCRIPT

ba (front face image), bb (+60◦ ), bc (+40◦ ), bd (+25◦ ), be (+15◦ ), b f (−15◦ ), bg (−25◦ ), bh (−40◦ ) and bi (−60◦ ), the examples of face images under these poses

335

are shown in Fig. 9.

CR IP T

(2) The CMU PIE database.

The CMU PIE database was designed by Carnegie Mellon University in 2000 to evaluate the performance of face recognition algorithms. It contains 41,368 images

of 68 individuals. Each individual in the database include 13 different poses, 43 dif-

340

ferent illumination conditions and 4 different expressions. In the multi-viewpoints

AN US

of each individual, there are 7 poses varies in the horizontal direction, which are taken by the cameras 02, 37, 05, 27, 29, 11 and 14 respectively. The examples of face images with neutral expression and frontal illumination in these subsets are shown in Fig. 9.

345

(3) The Labeled Faces in the Wild (LFW) database.

The LFW database was created for researching the problem of face recognition

M

under unconstrained environment and has become an academic (industry) performance evaluation benchmark. The data set consists of 13233 samples of 5749 subjects collected from the web. There are 1680 subjects with two or more dif-

ED

350

ferent photos in the database. All the images in the database were normalized to 250 × 250 pixels by cutting and resizing. There are some variations in illumination,

PT

pose, expression, makeup, and age among the images in the database to simulate

AC

CE

real faces in the unrestricted environments.

EL

EK

EI

ED

EH

EF

EE

-65q

-60q

-45q

-30q

-15q

0q

+15q

+30q

+45q

+60q

+65q

q

q

q

q

q

q

q

q

q

q

+65q

-65

-60

-45

&

EJ

-30

&

-15

0

&

&

EG

+15

&

+30

&

+45

+60

&

Fig. 9. Face image examples with different viewpoints from the FERET database(top) and CMU PIE database(bottom).

18

ACCEPTED MANUSCRIPT

355

4.2. Experimental Settings and Parameters Discussion In order to simulate the practical face recognition system, in this paper, each individual in the gallery only stores one face image. The image size of the three databases

CR IP T

are 256×383 pixels (FERET), 640×486 pixels (CMU PIE) and 250×250 pixels (LFW)

respectively. We use the subset of FERET containing 14000 face images from seven 360

poses (front: ba, non-front: bc, bd, be, b f , bg and bh ) of 200 different individuals and the subset of CMU PIE containing 340 face images from five poses (front: C27, nonfront: C11, C29, C05 and C37 ) of 68 different individuals, both of which are divided

into two halves; one for algorithm parameters discussion and the other for algorithm

AN US

performance test.

In this subsection, we conduct the experiments with an unknown probe pose so that

365

we can discuss the parameters objectively and find the relationship between parameters and the recognition result. More specifically, for FERFT database, we set the front face images in subset ba as gallery images, and the non-frontal face images in bc, bd, be, b f , bg and bh subsets as a large test set (contains 600 images of 100 people, each individual having 6 different pose images). Similar to the setting of FERET database,

M

370

we set the front face images in subset C27 of CMU PIE database as gallery images, the

ED

non-frontal face images in four subsets (C11, C29, C05 and C37) as a large test set. 100

99 98 97

AC

Recognition Rate(%)

CE

PT

96 95 94 93 92 91 90 89 88 87 86 85 84 83 82

23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Patch Size

Fig. 10. The relationship between the value of patch size and recognition rate.

(1) Patch size 19

ACCEPTED MANUSCRIPT

In fact, the size of the image patches around the interest points is related to the amount of the information contained. If the size is too small, it will result in loss

375

of information. If the size is too large, it will lead to information redundancy, so

CR IP T

it is important to choose an appropriate size for image patches. Fig. 10. shows that performance arrives its peak when we set the patch size to 43 in the FERET

database. Additionally, it is considered that the optimal image patch size may vary according to the size of face images. In order to estimate the corresponding patch

380

size for the different databases, the ratio r of the distance between the eye centers to the side length of patch is proposed here: m(Deyes ) Lpatch

AN US

r=

(15)

Where m(Deyes ) is the mean distance between the eye centers of the front face images in the database, Lpatch is the side length of the image patch. Through statistics and calculation, we can conclude that m(Deyes ) of FERET are 68.85 pixels, so that

385

r = 68.85/43 ≈ 1.60. At the same time, we can get the m(Deyes ) of CMU PIE is

M

82 .30 pixels, so we choose L patch = 82.30/1.60 ≈ 51 pixels as the side length of

the image patch for the CMU PIE database. Similarly, we can get the L patch for the

AC

CE

PT

ED

LFW database are 26 pixels.

Fig. 11. The relationship between the number of blocks in each patch and recognition rate on the FERET database(left) and CMU PIE database(right). NBR and NBC corresponds to the number of blocks in the row and column of the image patch, respectively.

390

(2) The number of blocks in each patch In order to remain the spatial information, we divided each patch into several non20

ACCEPTED MANUSCRIPT

overlapping blocks, the number of blocks in each patch determines the distribution of spatial information and is related to the dimensions of the histogram features. Fig. 11 shows the relationship between the number of blocks and recognition rate

CR IP T

in the FERET database and CMU PIE database, respectively. As can be seen from

395

Fig. 11, the algorithm gets the highest recognition rate in FERET and CMU PIE when the number of blocks is 3 × 4 and 2 × 2 (2 × 3) respectively. Considering

that too many blocks in the patch will result in sparse histogram and increase the

cost of computation, 3 × 4 and 2 × 2 are selected for FERET and CMU PIE in the following experiments, respectively.

400

AN US

(3) The radius for Huffman-LBP

The radius selection of Huffman-LBP will determine how much of the local texture information is captured, so it is very important. We explored the influence of radius variations on algorithm recognition rate. According to the data in Table 1, we can find that the most appropriate radius for the FIRET database and CMU PIE

405

M

database are 2 and 3, respectively.

Table 1 The recognition rates at different radiuses in the FERET database and CMU PIE database. Radius

ED

Database

2

3

95.17 94.85

96.83 98.53

96.50 100

PT

FERET CMU PIE

1

CE

4.3. Contribution of Enhancements In this subsection, the contribution of three enhancements proposed in this paper is

AC

evaluated by the following experiments:

410

(1) FP+LBP+SRC: face recognition performing only face preprocessing (as described in Section 3.1 and shown in Fig. 3.) and LBP (the feature histogram is extracted from the patches with fixed size centered on the interest points which is same as described in Section 3.3.1), without using RSF, and the final identify results is produced by original SRC (as described in Section 2.2). 21

ACCEPTED MANUSCRIPT

415

(2) FP+LBP+RSF+SRC: adding Region Selection Factor(RSF) (as described in Section 3.3.1) to the method setting of (1). (3) FP+Huffman-LBP+RSF+SRC: replacing LBP in (2) with the proposed Huffman-

CR IP T

LBP(as described in Section 3.2).

(4) FP+Huffman-LBP+RSF+Patch-based SRC: replacing the original SRC in (3) with the proposed Patch-based SRC (as described in Section 3.3.2).

420

100 99 98

AN US

97

Recognition Rate

96 95 94 93 92 91 90

FP+LBP+SRC

89

FP+LBP+RSF+SRC

FP+Huffman−LBP+RSF+SRC

88

86 85

bc

M

FP+Huffman−LBP+RSF+Patch−based SRC

87

bd

be

bf

bg

bh

ED

Probe Image Set

Fig. 12. Performance comparison versus reference image sets under 6 distinct poses on the FERET database.

Each experiment ((2)-(4)) was performed with only one additional improvement

PT

compared to previous experiment so that the contribution of improvements can be measured well. The experiments were conducted on the FERET database. The gallery

CE

consists of the frontal face images from the ba subset, and the face images from the bc, bd, be, b f , bg and bh subsets are set as test images, respectively.

AC

Table 2 Average recognition rates of different variants of our method on FERET Database. Method

FP+LBP+SRC

Avg.(%)

96.17

FP+LBP+RSF FP+Huffman-LBP FP+Huffman-LBP +SRC +RSF+SRC +RSF+Patch-based SRC 97.00

98.17

425

22

98.83

ACCEPTED MANUSCRIPT

As can be seen from Fig. 12, for the test images with near-frontal poses (−15◦ to +15◦ ), all methods can get a perfect result of 100%. However, the recognition rates decrease significantly for the big yaw angles (25◦ or more), and the differences among

430

CR IP T

the different variants of proposed method become apparent. For example, when the images with pose variation of +40◦ are recognized using FP+LBP+RSF+SRC, the face recognition rate is 87%. Replacing the LBP with Huffman-LBP (FP+HuffmanLBP+RSF+SRC), results improve to 96%. This improvement can be explained be-

cause with Huffman-LBP, the representation ability of the face local texture is improved. We can see from Table 2, although the average recognition rate obtained with

FP+LBP+RSF+SRC is only 0.83% higher than the result obtained with FP+LBP+SRC, 1 when RS F < and RSF >α, only half of points involved in the recognition, which α greatly reduces the running time of algorithm. If we further combine patch-based SRC

AN US

435

with FP, RSF and Huffman-LBP, the recognition rate achieves the highest results of 98.83, particularly for the rotation angles larger than 15◦ , which proves that the proposed patch-based SRC is less sensitive to pose variation.

M

440

4.4. Comparisons With the Existing Methods

ED

4.4.1. Comparison Experiment on the FERET Database For FERET database, we conduct experiments using the same experimental setting as section 4.3 to test the performance of proposed algorithm. We also compare our approach with the method based on Partial Least Squares(PLS) [47], Ridge Regression

PT

445

with Gabor features (RRG) [48], Adjacent Discriminant Multiple Coupled Subspace (ADMCLS) [49], a 3D method based on Morphable Displacement Field (MDF) [50],

CE

a virtual view reconstruction method based on Stack Flow [51] and SPAE [16]. In addition, to further evaluate the performance of our algorithm, a leading Deep Convolutional Neural Network (DCNN) based commercial face recognition system called

AC

450

Face++ [52] is introduced to compare with us. The performance and comparison results are reported in Table 3. It can be seen from the experimental results that the proposed method outperforms

the first five methods in most cases. Especially, for the images with pose variation 455

of −25◦ to +25◦ , the proposed method achieves a perfect classification accuracy of 23

ACCEPTED MANUSCRIPT

Table 3 Performance comparison with state-of-the-art methods on the FERET database(%). bd +25◦

be +15◦

MDF RRG ADMCLS PLS Stack Flow SPAE Face++ Our method

97 96 82 59 70 96 100 95

99 99 94 76 89 98 100 100

99 98 95 76 96 99 100 100

bf −15◦ 100 96 96 77 94 99 100 100

bg −25◦ 99 96 94 72 82 99 100 100

bh −40◦ 98 91 85 53 62 95 100 98

Avg. 98.7 96 91 68.8 82.2 97.7 100 98.8

CR IP T

bc +40◦

AN US

Methods

100%. It can also be seen that the 3D method based on MDF performs better than our method when the test images with pose variation of +40◦ , but it should be noted that MDF takes advantage of much 3D information that may be difficult to be collected, and the way of reconstructing virtual front view cannot handle small pose variations 460

well. As can be seen from Table 3, Face++ gets the best recognition performance in

M

all methods. However, the model needs to use a large amount of extra data to train its large number of parameters [53–55]. For example, the Megvii Face Recognition System [55] achieves an unprecedented accuracy on the LFW database, while a large

465

ED

dataset containing 5 million labeled faces of around 20,000 individuals has to be used to train its ten-layer deep convolutional neural network. Although the performance of

PT

our method is slightly lower (1.2%) than Face++, the number of our model parameters and the amount of training data required are relatively small.

CE

4.4.2. Comparison Experiment on the CMU PIE Database For CMU PIE database, we set the remaining 34 frontal face images in C27 as

470

gallery images, and the remaining 34 non-frontal face images in C37, C05, C29 and

AC

C11 are set as test images, respectively. We also compare our results with the Eigen Light-Field (ELF) method [56], LLR [8], the View Synthesis method based on Probabilistic Learning(VS-PL) [57], the three-point Stereo Matching Distance (3ptSMD) [58], MRFs [10], the method based on Partial Least Squares(PLS) [47], OPR [9] and

475

Face++ [52]. The comparison results are shown in Table 4.

24

ACCEPTED MANUSCRIPT

Table 4 Performance comparison with state-of-the-art methods on the CMU PIE database(%). C37

C05

C29

C11

Avg.

ELF LLR VS-PL 3ptSMD MRFs PLS OPR Face++ Our method

89.0 82.4 86.0 100 97.0 100 100 100 100

93.0 98.5 88.0 100 100 100 100 100 100

91.0 100 91.0 100 100 100 100 100 100

78.0 89.7 86.0 97.0 97.0 100 100 100 100

87.8 92.7 87.75 99.3 98.5 100 100 100 100

AN US

CR IP T

Methods

From the table 4, we can see that for ELF, LLR and VS-PL, they have a similar trend that their recognition rates will decrease when the pose variation between the gallery image and test image becomes larger. However, as discussed in Section 3.3, because of the ability of RSF and patch-based SRC to deal with rotation, compared 480

with these methods, our method achieved a stable perfect performance under various

M

pose variation angles. Meanwhile, OPR, PLS and Face++ also achieve a perfect average accuracy of 100% same as our method. However, PLS requires prior knowledge of the pose of the probe image. It indicates that PLS will fail when the pose is unknown.

485

ED

As for OPR, pose correction is required during the recognition process. For the proposed method, pose correction is not necessary. Moreover, our method obtains the best

PT

recognition performance just like Face++. However, the recognition process is much simpler than the latter.

CE

4.4.3. Comparison Experiment on the LFW Database In order to test the effectiveness of our proposed algorithm under the unconstrained

490

environment, we further conduct experiment on the Labeled Faces in the Wild (LFW)

AC

database. The images we used all aligned by deep funneling [59]. Since there is only a pair of face images provided in the face verification system, our algorithm would be inapplicable because the dictionary of SRC based classifier cannot be established. We do not carry on the traditional face verification experiment on this dataset, but face

495

recognition. We choose a subset in the LFW database, which consists of 200 samples

25

CR IP T

ACCEPTED MANUSCRIPT

D

E

Fig. 13. Examples of face image from the subset of LFW database. (a) Gallery images. (b) Test images.

AN US

of 100 subjects. Each subject contains a face image with near frontal pose(used as gallery image) and a face image with non-frontal pose (used as test image), respectively. In addition to variations in pose, there are also other external changes between the gallery image and test image, such as illumination, makeup and expression. Some 500

gallery images and their corresponding test images are shown in Fig. 13. We compare our proposed method (the experimental parameters are the same as the experimental

M

parameters in the FERET database except patch size )with High-fidelity Pose and Expression Normalization (HEPN) [7] combined with the LBP operator, and the results

ED

are shown in table 5.

Table 5 Recognition rates of different methods on the LFW database. Recognition(%)

HEPN+LBP Our method

74.0 80.0

CE

PT

Methods

As can be seen, our method gets a higher recognition rate than HEPN+LBP. This

505

may be attributed to the strong representation capability of the proposed Huffman-LBP.

AC

Moreover, although HEPPN realizes a high-fidelity normalization through 3D transformation, the distortion cannot be avoided. In contrast, our approach does not destroy the original structure of the image, avoiding the interference of the reconstruction residu-

510

als.

26

ACCEPTED MANUSCRIPT

4.5. Computation Time In this subsection, we evaluate the computation time of our method and compare it with other pose-robust methods (3D normalization method: HEPN+LBP; regression

515

CR IP T

based approaches: Mis-alignment Robust Representation (MRR) [60] and Multi-Scale

Patch based Collaborative Representation (MSPCR) [61]; patch based method: Face

Image Classification by Pooling Raw Features (FCPRF) [62]) on the LFW database. A PC with 3.6GHz CPU (i7) and 8GB RAM is used. The programming environment is Matlab R2014b. The experimental setting is the same as the description in section 4.4. It should be noted that the image sizes mentioned in the literatures of these methods are different from each other. For a fair comparison, the size of the image used in all

AN US

520

methods is fixed at 250 × 250 pixels. Correspondingly, we reset the parameters of the

some methods to accommodate the changed image size (the patch scales of MSPCR are set as [8, 23, 38, 53, 68, 83, 98], the patch sizes of FCPRF are set as {10 × 10, 25 ×

25, 40 × 40}). The average computation time per image of all methods is listed in table 6.

ACT(s)

MSPCR MRR 13.2

1.0

10.6

HEPN +LBP 20.3

FP+Huffman-LBP FP+Huffman-LBP +Patch-based SRC +RSF+Patch-based SRC 10.4

9.3

PT

525

FCPRF

ED

Method

M

Table 6 The average computation time (ACT) per image of different methods on the LFW database.

It can be seen that our method (FP+Huffman-LBP+RSF+Patch-based SRC) is the fastest in all methods except for MRR, while FP+Huffman-LBP+Patch-based SRC also

CE

has a fast speed. What’s more, suffering from its complex 3D model fitting process, HEPN+LBP spends a much longer time (nearly 20 seconds per image) on recognition. In addition, both MSPCR and FCPRF use a multi-scale patch extraction strategy,

AC

530

which causes them to be slower than our method using only a single patch scale. Benefitting from the optimized search strategy, MRR has the fastest speed among all methods. From the comparison of FP+Huffman-LBP+Patch-based SRC and FP+Huffman-

LBP+RSF+Patch-based SRC in section 4.3 and here, it can be seen that RSF not only 535

improves the recognition accuracy of the algorithm, but also shortens the running time. 27

ACCEPTED MANUSCRIPT

4.6. Discussion Overall, compared to the methods (2D and 3D) based on reconstructing virtual face view, e.g. MDF, our method carries out the whole recognition process only on the

540

CR IP T

original face images, which can avoid the problem that the recognition performance

would be affected due to image distortion. Moreover, due to the ability of RSF and patch-based SRC to handle pose variation, for the images with various rotation angles,

the recognition performance of our method is steadier than other general methods. We can also find that RSF has a potential function of roughly evaluating the pose of the face images, so that our method does not require any prior knowledge of the pose. From the

point of the data, there is still a gap between our approach and Face++. However, our

AN US

545

approach is a relatively simple and new attempt at multi-pose face recognition. 5. Conclusions

In this paper, we present a method for multi-pose face recognition. A Divide-

550

M

and-Rule strategy is applied to both face representation and face classification. In the face representation, we first carry out the Divide operation: dividing the input image into different types by evaluating the Region Selection factor (RSF). Next, the im-

ED

age patches centered on interest points are extracted in the selected face region and Huffman-LBP is developed to extract the discriminant information of each patch. In

555

PT

the face classification, a designed classification strategy based on patch-based SRC is used to produce the recognition results. Our method is robust to the pose variation because both representation and classification are carried out at patch level that is

CE

less sensitive to the pose variation. The experimental results on traditional multi-pose databases (the CMU PIE database and FERET database) and unconstrained environ-

AC

ment database (the LFW database) show that our approach outperforms the competitive

560

methods in most cases, validating the effectiveness and good generalization. In future,

we will investigate the proposed method in video-based face analysis.

28

ACCEPTED MANUSCRIPT

6. Acknowledgment This work was supported in part by Natural Science Foundation of China (No.61272195, 61472055, 61100114, 61502067, U1401252), Program for New Century Excellent Talents in University of China (NCET-11-1085), Chongqing Outstanding Youth Found

CR IP T

565

(cstc2014jcyjjq40001), Chongqing Research Program of Application Foundation and

Advanced Technology (cstc2012jjA1699, cstc2015jcyjA40013) and Science and Tech-

nology Research Project of Chongqing Municipal Education Commission(KJ1500417). It is also sponsored by China Scholarship Council (201407845019) and supported by Natural Science Foundation of CQ (cstc2015jcyjA40011). References

AN US

570

[1] X. Zhang, Y. Gao, Face recognition across pose: a review, Pattern Recognit. 42 (11) (2009) 2876–2896.

[2] V. Blanz, T. Vetter, Face recognition based on fitting a 3d morphable model, IEEE

M

Trans. Pattern Anal. Mach. Intell. 25 (9) (2003) 1063–1074.

575

[3] A. Asthana, T. K. Marks, M. J. Jones, K. H. Tieu, M. Rohith, Fully automatic

ED

pose-invariant face recognition via 3d pose normalization, in: Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 937–944.

PT

[4] O. Aldrian, W. A. Smith, Inverse rendering of faces with a 3d morphable model, IEEE Trans. Pattern Anal. Mach. Intell. 35 (5) (2013) 1080–1093.

580

CE

[5] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: closing the gap to humanlevel performance in face verification, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1701–1708.

AC

[6] T. F. Cootes, G. J. Edwards, C. J. Taylor, Active appearance models, IEEE Trans.

585

Pattern Anal. Mach. Intell. 23 (6) (2001) 681–685.

[7] X. Zhu, Z. Lei, J. Yan, D. Yi, S. Z. Li, High-fidelity pose and expression normalization for face recognition in the wild, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Vol. 07-12-June-2015, 2015, pp. 787–796. 29

ACCEPTED MANUSCRIPT

[8] Xiujuan Chai, Shiguang Shan, Xilin Chen, Wen Gao, Locally linear regression for pose-invariant face recognition, IEEE Trans. Image Process. 16 (7) (2007)

590

1716–1725.

CR IP T

[9] Y. Tai, J. Yang, Y. Zhang, L. Luo, J. Qian, Y. Chen, Face recognition with pose variations and misalignment via orthogonal procrustes regression, IEEE Trans. Image Process. 25 (6) (2016) 2673–2683. 595

[10] H. T. Ho, R. Chellappa, Pose-invariant face recognition using markov random fields, IEEE Trans. Image Process. 22 (4) (2013) 1573–1584.

AN US

[11] H. Zhang, Y. Zhang, T. S. Huang, Pose-robust face recognition via sparse representation, Pattern Recognit. 46 (5) (2013) 1511–1521.

[12] J. Yu, X. Yang, F. Gao, D. Tao, Deep multimodal distance metric learning using click constraints for image ranking, IEEE Trans. Cybern. (2016) 1–11.

600

[13] J. Du, Y. Xu, Hierarchical deep neural network for multivariate regression, Pattern

M

Recognit. 63 (2017) 149–157.

[14] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for

605

ED

human pose recovery, IEEE Trans. Image Process. 24 (12) (2015) 5659–5670. [15] J. Yu, B. Zhang, Z. Kuang, D. Lin, J. Fan, IPrivacy: image privacy protection

PT

by identifying sensitive objects via deep multi-task learning, IEEE Trans. Inf. Forensics Secur. 12 (5) (2017) 1005–1016.

CE

[16] M. Kan, S. Shan, H. Chang, X. Chen, Stacked progressive auto-encoders (spae) for face recognition across poses, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1883–1890.

610

AC

[17] Z. Zhu, P. Luo, X. Wang, X. Tang, Deep learning identity-preserving face space, in: Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 113–120.

[18] F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: a unified embedding for face recognition and clustering, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat-

615

tern Recognit., 2015, pp. 815–823. 30

ACCEPTED MANUSCRIPT

[19] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, Local gabor binary pattern histogram sequence (lgbphs): a novel non-statistical model for face representation and recognition, in: Proc. IEEE Int. Conf. Comput. Vis., Vol. I, 2005, pp. 786–

620

CR IP T

791. [20] L. Wiskott, J. M. Fellous, N. Kr¨uger, C. Von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 775–779.

[21] J. Wang, J. You, Q. Li, Y. Xu, Orthogonal discriminant vector for face recognition

625

AN US

across pose, Pattern Recognit. 45 (12) (2012) 4069–4079.

[22] S. Sanyal, S. P. Mudunuri, S. Biswas, Discriminative pose-free descriptors for face and object matching, Pattern Recognit. 67 (2017) 353–365.

[23] T. Ojala, M. Pietik¨ainen, D. Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern Recognit. 29 (1) (1996)

630

M

51–59.

[24] D. Huang, Y. Wang, Y. Wang, A robust method for near infrared face recognition

ED

based on extended local binary pattern, in: Lect. Notes Comput. Sci., Vol. 4842, 2007, pp. 437–446.

PT

[25] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Trans. Image Process. 19 (6) (2010) 1635– 1650.

CE

635

[26] M. A. Akhloufi, A. Bendada, Locally adaptive texture features for multispectral

AC

face recognition, in: Proc. IEEE Int. Conf. Syst. Man, Cybern., 2010, pp. 3308– 3314.

[27] H. X. Chen, Y. Y. Tang, B. Fang, P. S. Wang, A multi-layer contrast analysis

640

method for texture classification based on lbp, Int. J. Pattern Recognit. Artif. Intell. 25 (01) (2011) 147–155.

31

ACCEPTED MANUSCRIPT

[28] C. Zhu, R. Wang, Local multiple patterns based multiresolution gray-scale and rotation invariant texture classification, Inf. Sci. (Ny). 187 (1) (2012) 93–108.

terns, Pattern Recognit. 68 (2017) 126–140.

645

CR IP T

[29] W. Huang, H. Yin, Robust face recognition with structural binary gradient pat-

[30] Z. Guo, X. Wang, J. Zhou, J. You, Robust texture image representation by scale

selective local binary patterns, IEEE Trans. Image Process. 25 (2) (2016) 687– 699.

[31] X. T. Yuan, X. Liu, S. Yan, Visual classification with multitask joint sparse repre-

AN US

sentation, IEEE Trans. Image Process. 21 (10) (2012) 4349–4360.

650

[32] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 210–227.

[33] C. H. Chan, J. Kittler, Sparse representation of (multiscale) histograms for face

M

recognition robust to registration and illumination problems, in: Proc. IEEE Int.

655

Conf. Image Process., 2010, pp. 2441–2444.

ED

[34] C. Kang, S. Liao, S. Xiang, C. Pan, Kernel sparse representation with pixel-level and region-level local feature kernels for face recognition, Neurocomputing 133

660

PT

(2014) 141–152.

[35] Z. Zhang, P. Luo, C. C. Loy, X. Tang, Learning deep representation for face

CE

alignment with auxiliary attributes, IEEE Trans. Pattern Anal. Mach. Intell. 38 (5) (2016) 918–930.

AC

[36] C.-Y. Lee, B.-S. Chen, Mutually-exclusive-and-collectively-exhaustive feature

665

selection scheme, Appl. Soft Comput. (2017) 1–11.

[37] T. Ahonen, A. Hadid, M. Pietik¨ainen, Face description with local binary patterns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 28 (12) (2006) 2037–2041.

32

ACCEPTED MANUSCRIPT

[38] R. Min, J. L. Dugelay, Improved combination of lbp and sparse representation based classification (src) for face recognition, in: Proc. IEEE Int. Conf. Multimed. Expo, 2011, pp. 1–6.

670

CR IP T

[39] D. L. Donoho, For most large underdetermined systems of linear equations the

minimal 1-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (6) (2006) 797–829.

[40] Q. Zhang, L.-F. Zhou, Y. Y. Tang, W.-S. Li, K. Ricanek, X.-Y. Li, Face detection

method based on histogram of sparse code in tree deformable model, in: Proc.

675

AN US

Int. Conf. Mach. Learn. Cybern., 2016, pp. 996–1002.

[41] X. Kavousianos, E. Kalligeros, D. Nikolos, Optimal selective huffman coding for test-data compression, IEEE Trans. Comput. 56 (8) (2007) 1146–1152. [42] J. Zhang, Y. Deng, Z. Guo, Y. Chen, Face recognition using part-based dense sampling local features, Neurocomputing 184 (2016) 176–187.

680

M

[43] P. Jonathon Phillips, H. Moon, S. A. Rizvi, P. J. Rauss, The feret evaluation methodology for face-recognition algorithms, IEEE Trans. Pattern Anal. Mach.

ED

Intell. 22 (10) (2000) 1090–1104. [44] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression database, IEEE Trans. Pattern Anal. Mach. Intell. 25 (12) (2003) 1615–1618.

PT

685

[45] G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a

CE

database for studying face recognition in unconstrained environments, Tech. rep. (2007).

AC

[46] J. Zou, Q. Ji, G. Nagy, A comparative study of local matching approach for face

690

recognition., IEEE Trans. Image Process. 16 (10) (2007) 2617–2628.

[47] A. Sharma, D. W. Jacobs, Bypassing synthesis: pls for face recognition with pose, low-resolution and sketch, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 593–600.

33

ACCEPTED MANUSCRIPT

[48] A. Li, S. Shan, W. Gao, Coupled bias-variance tradeoff for cross-pose face recognition, IEEE Trans. Image Process. 21 (1) (2012) 305–315.

695

[49] A. Sharma, M. A. Haj, J. Choi, L. S. Davis, D. W. Jacobs, Robust pose invariant Image Underst. 116 (11) (2012) 1095–1110.

CR IP T

face recognition using coupled latent space discriminant analysis, Comput. Vis.

[50] S. Li, X. Liu, X. Chai, H. Zhang, S. Lao, S. Shan, Maximal likelihood correspon-

dence estimation for face recognition across pose, IEEE Trans. Image Process.

700

23 (10) (2014) 4587–4600.

AN US

[51] A. B. Ashraf, S. Lucey, T. Chen, Learning patch correspondences for improved

viewpoint invariant face recognition, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2008, pp. 1–8. 705

[52] Megvii, Face++ Research Toolkit, www.faceplusplus.com.

[53] H. Fan, Z. Cao, Y. Jiang, Q. Yin, C. Doudou, Learning deep face representation,

M

arXiv preprint arXiv:1403.2802.

[54] H. Fan, M. Yang, Z. Cao, Y. Jiang, Q. Yin, Learning compact face representation: 933–936.

710

ED

packing a face into an int32, in: MM - Proc. ACM Conf. Multimedia, 2014, pp.

PT

[55] E. Zhou, Z. Cao, Q. Yin, Naive-deep face recognition: touching the limit of lfw benchmark or not?, arXiv preprint arXiv:1501.04690.

CE

[56] R. Gross, I. Matthews, S. Baker, Appearance-based face recognition and lightfields, IEEE Trans. Pattern Anal. Mach. Intell. 26 (4) (2004) 449–465.

[57] M. Saquib Sarfraz, O. Hellwich, Probabilistic learning for fully automatic face

AC

715

recognition across pose, Image Vis. Comput. 28 (5) (2010) 744–753.

[58] C. D. Castillo, D. W. Jacobs, Using stereo matching with general epipolar geometry for 2D face recognition across pose, IEEE Trans. Pattern Anal. Mach. Intell. 31 (12) (2009) 2298–2304.

34

ACCEPTED MANUSCRIPT

720

[59] G. B. Huang, M. A. Mattar, H. Lee, E. Learned-Miller, Learning to align from scratch, Adv. Neural Inf. Process. Syst. 1 (2012) 764–772. [60] M. Yang, L. Zhang, D. Zhang, Efficient misalignment-robust representation for

CR IP T

real-time face recognition, in: Eur. Conf. Comput. Vis., 2012, pp. 850–863.

[61] P. Zhu, L. Zhang, Q. Hu, S. C. K. Shiu, Multi-scale patch based collaborative

representation for face recognition with margin distribution optimization, in: Eur.

725

Conf. Comput. Vis., 2012, pp. 822–835.

[62] F. Shen, C. Shen, X. Zhou, Y. Yang, H. T. Shen, Face image classification by

AC

CE

PT

ED

M

AN US

pooling raw features, Pattern Recognit. 54 (2016) 94–103.

35

ACCEPTED MANUSCRIPT

Li-Fang Zhou was born in Tianshui, Gansu Province, China. She received her M.S. degree and Ph.D. degree from the Chongqing University of Posts and Telecommunications in July 2007 and the Chongqing University in December 2013, respectively. Currently she is an associate professor of Chongqing University of Posts and Telecommunications. Her research focuses on pattern recognition and machine vision, etc. Yue-Wei Du was born in Shanxi Province, China, in 1992. He has received B.S.

CR IP T

degree from Changzhi College, Shanxi, P.R. China. Currently, he is a M.S. candidate in computer technology, with the Chongqing Key Laboratory of Computational

Intelligence, Chongqing University of Posts and Telecommunications. His research interests include image processing and pattern recognition.

Wei-Sheng Li graduated from School of Electronics & Mechanical Engineering at

Xidian University, Xian, China in July 1997. He received M.S. degree and Ph.D.

AN US

degree from School of Electronics & Mechanical Engineering and School of

Computer Science & Technology at Xidian University in July 2000 and July 2004, respectively. Currently he is a professor of Chongqing University of Posts and Telecommunications. His research focuses on intelligent information processing and pattern recognition.

Jian-Xun Mi received B.S. degree in Automation from Sichuan University (SCU),

M

Chengdu, China in 2004 and Ph.D. degree in Pattern Recognition & Intelligent Systems from University of Science and Technology of China (USTC), Hefei, China in 2010. He worked at the Bio-Computing Research Center at Shenzhen Graduate

ED

School Harbin Institute of Technology, Shenzhen, China as a Postdoctoral Research Fellow from Sept. 2011 to Sept. 2013. Now he is an associate professor in Chongqing University of

PT

Posts and Telecommunications, Chongqing, China. Xiao Luan received B.S. degree in Information and Computational Science from Henan University, Kaifeng, China, the Ph.D. degree in Computer Science and

CE

Technology from Chongqing University, Chongqing, China. Currently he is an associated professor with the College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China. His

AC

research interests include pattern recognition and image processing.