Pseudo-full-space representation based classification for robust face recognition

Pseudo-full-space representation based classification for robust face recognition

Accepted Manuscript Pseudo-full-space representation based classification for Robust Face Recognition Xiaohui Yang, Fang Liu, Li Tian, Haifei Li, Xiao...

1MB Sizes 2 Downloads 156 Views

Accepted Manuscript Pseudo-full-space representation based classification for Robust Face Recognition Xiaohui Yang, Fang Liu, Li Tian, Haifei Li, Xiaoying Jiang

PII: DOI: Reference:

S0923-5965(17)30159-5 https://doi.org/10.1016/j.image.2017.09.006 IMAGE 15279

To appear in:

Signal Processing: Image Communication

Received date : 30 March 2017 Revised date : 1 August 2017 Accepted date : 13 September 2017 Please cite this article as: X. Yang, F. Liu, L. Tian, H. Li, X. Jiang, Pseudo-full-space representation based classification for Robust Face Recognition, Signal Processing: Image Communication (2017), https://doi.org/10.1016/j.image.2017.09.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript

Pseudo-Full-Space Representation Based Classification for Robust Face Recognition Xiaohui Yanga,*, Fang Liub, Li Tiana, Haifei Lia, Xiaoying Jianga a

Institute of Applied Mathematics, Data Analysis Technology Lab, School of Mathematics and Statistics, Henan University, Kaifeng 475004, China b Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, China

Abstract: Sparse representation based classification shows significant performance on face recognition (FR) when there are enough available training samples per subject. However, FR often suffers from insufficient training samples. To tackle this problem, a novel classification technique is presented based on utilizing existing available samples

rather

than

constructing

auxiliary

training

samples.

An

inverse

projection-based pseudo-full-space representation (PFSR) is firstly proposed to stably and effectively exploit complementary information between samples. The representation ability of sparse representation-based methods is quantified by defining category concentration index. In order to match PFSR and complete classification, a simple classification criterion, category contribution rate, is designed. Extensive experimentations on the AR, Extended Yale B and CMU Multi-PIE databases demonstrate that PFSR-based classification method is competitive and robust for insufficient training samples FR problem. Key words: sparse representation; pseudo-full-space representation; category concentration index; category contribution rate; face recognition 1. Introduction Face recognition (FR) has been a well-studied problem despite its many inherent * Corresponding author. E-mail: [email protected]

difficulties, such as limited training samples, varying illumination, occlusion and poses. In general, FR methods for static front faces are divided into three categories: statistical-based

methods

(Eigenface

and

hiddern

Markov

model,

etc),

connectionist/neural-net techniques (artificial neural network and elastic matching, etc) and some other synthesis methods [1]. Standard sparse representation is an effective recognition technique based on an over-completed dictionary without learning. Sparse representation based classification (SRC) has been successfully applied to robust FR [2] when there are sufficient training samples per subject. FR, however, often suffers from the small number of available training samples [3], even one sample per person [4]. Based on the idea of SRC, there are many improvements. Some researches focus on dictionary construction and analysis. L. Zhang [5] [6] proved that it is collaborative representation rather than sparsity constraint that played a key role in SRC. Z. Hu [7] introduced a novel dictionary building method based on sparse representation considering the relationship between dictionary atoms and category labels. Y. Xu [8] integrated conventional and inverse representation for better recognizing a face by constructing symmetry virtual samples, where the inverse representation projected each training sample into a space consisting of a test sample and training samples of the other categories. Making use of low-rank structural information, J. Yang [9] presented a two-dimensional image-matrix-based error model for face representation and classification. Y. Su [10] put forward an adaptive generic learning (AGL) model based on single sample per person. Another works mainly study the feature representation. M. Yang [11] represented a

2

Gabor feature based robust representation and classification. M. Yang [12] replaced Eigenfaces in SRC with statistical local feature. P. Zhu [13] proposed an image set based collaborative representation for FR. G. Grossi [14] [15] significantly promoted ability of sparse coding by exploiting local feature and multi-feature representation. W. Deng [16] presented expanded SRC (ESRC) via intraclass variant dictionary to undersampled FR. S. Gao [17] gave a patch based SRC (PSRC) algorithm, likes Patch Based CRC (PCRC) in [18]. Introducing learning strategy into SRC also reaches good results. Z. Feng [19] proposed a training dictionary through joint discriminative dimensionality reduction and dictionary learning for FR. L. Liu [20] proposed a sparse representation-based partial FR approach by virtue of supervised dictionary learning. Recently, deep-learning based methods have been proved effective for FR [21] [22] [23] [24]. It is noted that the success of deep-learning relies on big data, complex net structure and advanced hardware. In SRC, l1 -norm sparse constraint is used to depict sparsity. Stimulated by the success of sparsity, researchers have developed various models for FR [25]. X. Shi [26] and Z. Lai [27] develop a novel regression model by integrating the manifold structure to learn the approximate orthogonal sparse projection. Z. Lai [28] extended multilinear discriminant analysis to a sparse case by introducing L1 -norm and L2 -norm. Z. Lai [29] relaxed the multilinear principal component analysis [30] for

sparse regression from tensor data. X. Shi [31] replaced elastic net in sparse discriminant analysis with l2,1 -norm penalty term for FR. However, the l1 -norm

3

sparsity constraint based classification is time consuming [5]. To tackle the problem, many fast algorithms have been proposed to accelerate [32], [33], [34]. Replace

l1 -norm sparse constraint by l2 -norm sparse constraint is a commonly used strategy [5]. Based on l1 -norm and l2 -norm sparse constraint, Y. Meng [35] proposed a new face coding model named regularized robust coding (RRC) and gave an iteratively reweighted regularized robust coding algorithm to solve the RRC model effectively. In general, these methods are all based on standard sparse representation, which may suffer from requiring sufficient training samples, being less able to take advantage of test samples or requiring high computational complexity to solve representation model. Moreover, it is noted that there are two facts: one is that all the sparse representation based works represent each test sample by training samples except for [8]; another is the matched classification criterion of standard SRC is the minimum reconstruction error. From another point of view, in semi-supervised discriminant analysis [36], the labeled data, combined with the unlabeled data, are used to build a graph incorporating neighborhood information of the data set. The labeled data are used to infer the discriminative structure of the data while the intrinsic geometric structure of the data is inferred from both labeled and unlabeled samples. All this shows that unlabeled samples help a lot in exploring the intrinsic structure of data. It will be useful and interesting that how to make full use of existing labeled and unlabeled samples to obtain the best representation. Motivated by these works, a simple but effective technique, pseudo-full-space

4

representation (PFSR) based classification (PFSRC) is proposed for robust FR. We restrict our attention to representation without learning, and focus on utilizing existing available samples rather than constructing auxiliary training samples. The main contributions of our work are as follows. (1) An improved inverse projection based sparse representation method, named as PFSR, is proposed to enlarge a representation space as soon as possible, and to exploit complementary information contained in available samples. (2) A statistical index, category concentration index (CCI), is defined to quantify the representation ability of representation-based method. (3) A simple and robust classification decision rule, category contribution rate (CCR), is designed to match the PFSR and complete the classification. The rest of this paper is organized as follows. In Section 2, we start the analysis by briefly reviewing SRC. Following that, PFSRC is introduced and analyzed theoretically. Section 3 conducts extensive experiments to explore the performance of our technique using three publicly available benchmark face databases. Finally, conclusions are drawn in Section 4. 2. Pseudo-full-space representation based classification 2.1 Sparse representation based classification Suppose

X i  [ xsi1 1 ,

X   X1 ,

Xi ,

, X c   Rm  sc

is

a

training

sample

, xsi ]  Rm  ( si si1 ) are the i -th category samples, i  1,

number of category. Y =  y1 , y2 ,

set,

, c is the

, yk   Rm  k is a test sample set. In SRC, each test

sample yl  R m can be linearly represented by all training samples.

5

yl   l ,1 x1 

 l , s xs   l X , c

c

(1)

where  l  R n are the corresponding coefficient vector. For seeking the sparsest solution to Eq. (1), we can solve the following optimization problem,

arg min || l ||0 ,

s. t. yl   l X ,

l

(2)

where ||  ||0 denotes the l0 -norm. l0 -minimization problem is non-convex and NP-hard. According to the theory of compressive sensing and sparse representation, l0 - minimization problem can be relaxed to a convex optimization problem, that is, l1 -minimization problem.

arg min || l ||1 ,

s. t. yl   l X ,

l

(3)

where ||  ||1 denotes the l1 -norm. Tao and Candès [37] has proved that under the condition of restricted isometry constants, the l0 -minimization problem and the l1 -minimization problem has similar solution. The l1 -minimization problem can be

transferred into non-condition l1 -minimization problem,



ˆl  arg min yl   l X 2    l l

2

1

.

(4)

Eq. (4) can be solved by l1 -regularized least square approximation. Let  i : R n  R n be a characteristic function that selects coefficients associated with the i -th category, where n denotes the dimension of coefficient vector. According to reconstruction error ei ( yl ) || yl   i (ˆl ) X ||2 , yl can be classified, i.e.,

Identity( yl ) a r g meii n. { }

(5)

i

Next, the stability of SRC is analyzed. Let el ,i  yl   i (ˆl ) X ,

Suppose

X i , X j  Rmn , X j  X i    X i  or yl has a disturbance   yl  , if   X i  or

6

  yl  is small, then    Xi    yl  2    n  X i  2 ,  yl 2   Xi 2  1  X i  

  max 

(6)

where 1  X i  and n  X i  are the largest and the smallest eigenvalues of X i , respectively. The relationship between el ,i and el , j can be written as [38], el , j  el ,i yl

2

  1   2  X i   min 1, m  n  O   2  ,

(7)

2

where  2  X i  is the l2 -norm conditional number of X i . From Eq. (7), one can see that if  is small, i.e., X i and X j look similar to each other, then the distance between el ,i and el , j an be very small, which makes classification unstable because a small disturbance can lead to el , j

2

 el ,i 2 .

2.2 Pseudo-full-space representation Let us begin our main discussion with the problem that there is insufficient training data. An intuition is that we can enrich the representation by integrating the available test samples into sparse representation space. 2.2.1 Pseudo-full-space PFSR aims to seek a representation space as large as possible. As stated in Subsection 2.1, the training and test sample space are X and Y , respectively. If the space { X , Y } is called full space, which contains all training samples and test samples. And then the space V j is the largest representation space of a sample x j , V j  { X , Y }  {x j }, j  1,

, sc .

It is quite natural that the V j is just the full space except the training sample x j itself and called the pseudo-full-space of x j . Obviously, V j provides richer

7

information than the training sample space because of the addition of the test samples. 2.2.2 Pseudo-full-space representation PFSR means that a training sample x j from a category i is represented by its corresponding pseudo-full-space. xj 

 i , j 1 x j 1  i , j 1 x j 1 

  j ,1 y1 

  j , k yk ,

(8)

where  i , s  R and  j ,l  R are the corresponding coefficients before training samples s  1, 2,

and

test

samples

, sc , s  j , l  1, 2,

Let Aj   , i , j 1 , i , j 1 ,

respectively,

i  1, 2,

,c

,

j  1, 2,

, sc

,

,k .

,  j ,1 ,

T

,  j ,k  , x j can be rewritten as x j  A jV j .

(9)

All training samples can be linearly represented as, X  AV ,

(10)

where V  [V j ] and A  [ Aj ] are the pseudo-full-space and the corresponding coefficient matrix respectively. It is worth mentioning that the PFSR, from the view of the highest utilization rate of samples, looks a bit like leave one out technique. However, there is an essential difference between PFSR and leave one out technique. Leave one out technique is an almost unbiased statistical estimator of the performance of a learning algorithm and is frequently used for model selection. PFSR is a sparse representation technique, which focuses on improving classification ability by taking full use of limited training samples and other available samples comes from unknown categories (easy to get). Here, the so-called test sample space in pseudo-full-space may have different

8

scenarios: (1) if the test sample space is null, pseudo-full-space is just the training sample space. (2) the test sample space may contain unlabeled samples which come from totally different categories with those of training samples. (3) the test sample space may contain some unlabeled samples which have the same categories with those of training samples, while others come from different categories. An important fact of the effect of the proposed method is that we can easily obtain a test sample and some unlabeled samples for assisted recognition at the same time, no matter which categories these unlabeled samples come from. The performance of PFSR in different cases will be demonstrated in Subsection 3.3. The regularized least square method is used to calculate the coefficients of PFSR. Aˆ j  arg min Aj

where 

 x  AV j

j

2 j 2

  Aj

2 2

,

(11)

is regularization parameter. Projection coefficient matrix can be

analytically derived as, A  (V TV   I )1V T X .

(12)

By comparing Eq. (1) and Eq. (8), one can see the pseudo-full-space does enrich the representation space because of introducing these test samples. As a result, a sample can get a more comprehensive representation. When we consider a test sample, other test samples in pseudo-full-space provide some complementary information, which is not taken into consideration in standard sparse representation. Experiments in Subsection 3.2 will demonstrate the performance of the proposed PFSR. Assuming the training set has been arranged according to the order of categories, Fig.1 gives a motivating example. From Fig. 1(a), one can see that each test sample is

9

linearly represented by all training samples in standard sparse representation. While the PFSR projects each training sample onto the pseudo-full-space with an inverse projection approach, which is shown in Fig. 1(b). Test samples

Training samples

 y1  1,1 x1  ...  1, s1 xs1  1,s1 1 xs1 1  ...  1, s2 xs2  ...  1,sc1 xsc1   1,sc xsc  ...    yr   r ,1 x1  ...   r ,s1 xs1   r ,s1 1 xs1 1  ...   r ,s2 xs2  ...   r , sc1 xsc1    r ,sc xsc  ...   yk   k ,1 x1  ...   k , s xs   k , s 1 xs 1  ...   k ,s xs  ...   k ,s xs    k ,s xs 1 1 1 1 2 2 c1 c1 c c  One category

Another category

(a) Training samples  One  category      Another  category      

x1  1,2 x2 

 1, sc xs

... xs1   s1 ,1 x1 

  s1 , s 1 xs 1   s1 , s 1 xs 1  1

xs1 1   s1 1,1 x1 

1

1

1

1

c

  s1 , sc xs

1

  s1 1, s xs   s1 1, s  2 xs  2  1

Test samples   1,1 y1  ...   1, k yk

1

  s1 ,1 y1  ...

c

  s1 , k yk

  s1 1, sc xs   s1 1,1 y1  ...   s1 1, k yk c

... xs2   s2 ,1 x1 

  s2 , s 1 xs

... xsc   sc ,1 x1 

  sc , s

2

c 1

2

1

  s2 , s 1 xs

x

1 sc 1 1

2

2

1

  s2 , sc xs



  sc , s 1 xs

 ...

c

c

c

  s2 ,1 y1  ...  s

1

c

,1

  s2 , k yk

y1  ...   s , k yk c

(b) Fig.1. Comparison of representations. (a) standard sparse representation (b) PFSR.

The feasibility of the proposed PFSR will be proved as follows. For the sake of analysis, the l2 -norm regular term in Eq. (11) is removed and then the representation becomes

a

least

square

Aˆ j  arg min || x j  V j Aj ||22

problem:

.

Assume

Aj

V j  [V1, j ,V2, j ,

,Vc , j ] , where Vi , j (i  1,

, c) expresses all samples of i -th in V j , the

 

associate representation xˆ j   i Vi , j i Aˆ j

is actually the perpendicular projection

 

of x j onto the space spanned by V j . By ei || x j  Vi , j i Aˆ j ||22 , it can be readily derived that,

 

 

ei || x j  Vi , j i Aˆ j ||22 || x j  xˆ j ||22  || xˆ j  Vi , ji Aˆ j ||22 .

 

Obviously, it is the amount ei* || xˆ j  Vi , j i Aˆ j ||22 that works because || x j  xˆ j ||22 is a constant for all categories.

10



xj  xj

xj 

Angel between  i and i



i



xj

i

ei     x j  Vi , j i  Aj   

The space spanned by V j



Angel between xi and i

Fig.2. Geometric illustration of the representation of x j over V j .

Denote

by

 

 

i  Vi , j i Aˆ j , ˆi   mi Vi , j i Aˆ j

.

Fig.2

shows

geometrical

 

representation of x j over V j . Since ˆ i is parallel to xˆ j  Vi , j i Aˆ j , one can have

xˆ j

2

sin(  i , ˆ i )

=

 

xˆ j  Vi , j i Aˆ j sin(  i , xˆ j )

2

,

where (  i , ˆ i ) is the angel between i and ˆ i , and (  i , xˆ j ) is the angel between

i and xˆ j . Finally, the representation error can be represented by

 

ei*  xˆ j  Vi , j i Aˆ j

2 2



sin( i , xˆ j ) || xˆ j ||2 . sin( i , ˆ i )

(13)

Eq. (13) shows that, by using CR in PRSR model, we not only consider if sin(  i , xˆ j ) is small, also consider if sin( i , ˆi ) is big. Such a “double checking”

makes the PRSR more effective and robust. 2.2.3 Measurement of representation ability: category concentration index As reported in SRC [2], the minimal reconstruction error theory implies the larger the proportion of one category coefficients in all coefficients is, the better the representation is. Inspired by this, a statistical index, CCI, is introduced to measure the representation ability of coefficient vectors of sparse representation-based method. Definition (Category Concentration Index, CCI). Suppose  l  R n is a coefficient vector about training space,  i is a characteristic function, i l   Rn is a vector

11

whose nonzero entries are the entries in  l that are associated with the i -th category. The CCI of  l is defined as .

CCI ( l ) 

m a x i  l  1 i

l

.

(14)

1

The larger the CCI is, the better the representation is. In order to use the CCI to compare the representation ability of the PFSR and standard sparse representation, we randomly select a training sample x j  Rm and conduct PFSR (Eq. (8)) and standard sparse representation (Eq. (15)), xj  



 j1

x j 1 

x 

 j1  j1

.

(15)

The difference between Eq. (8) and Eq. (15) is the representation space with and without test samples. Assume   ( , j 1 , j 1 , ) ,   ( , i , j 1 , i , j 1 , ) , CCI ( ) and CCI (  ) are analyzed for comparing the representation ability of

standard sparse representation and PFSR. The detailed examples and experimental results will be given in Subsection 3.3. 2.3 Pseudo-full-space representation based classification 2.3.1 Classification criterion: category contribution rate For a test sample yl , the classification is based on the coefficients  l in SRC, while in PFSR, we focus on the representation coefficients  j ,l in Eq. (8), which can also be seen from Fig.1. It is noted that conventional classification criteria, minimum reconstruction error, will do not work anymore since the test samples in representation dictionary are unlabeled. To address the difficulty, a simple and robust decision rule, CCR, is defined to match the PFSR and complete classification. Based on the projection matrix A in Eq. (12), the CCR is defined as follows. By 12

calculating relevancy of a test sample and every category, CCRs are obtained and the test sample is classified into the category with the maximal CCR. Definition (Category Contribution Rate, CCR) For a test sample yl , the projection coefficients of every category in projection coefficient matrix A are normalized by Eq. (16). The CCR matrix, Ci ,l , i {1, 2,

, c} of yl for all

categories are obtained.

Ci ,l 





 i  j ,l  j 1,..., s 1 c ,  si  si 1 j   j ,l

(16)

j 1,..., sc 1

where  j ,l is a column vector representation coefficients corresponding to the j -th

yl and  i () is a vector whose entries are zero except those associated with the i -th category. The coefficients of every category are normalized using l2 -norm for eliminating effects of training sample size which may differ in different categories. And then the CCR matrix [Ci ,l ] , i  1,

, c , l  1,

, k for all test samples is

calculated. The CCR indicates the relevance between a test sample and a category. The larger the CCR is, the stronger the corresponding relevance is. The test sample yl is classified into the m -th category with the maximal CCR [Ci ,l ] . m  arg max(Ci ,l ) .

(17)

i(1, ,c )

What’s more, the CCR matrix [Ci ,l ] can be seen as membership degree matrix of all categories for each test sample. By this means CCRs of all test samples are calculated simultaneously. In order to fully verify the performance of the CCR, following work considers the relationship not only between a test sample and all

13

categories, but between all samples in pseudo-full-space and a certain category (demonstrated in Subsection 3.4). Based on the CCR, there are the following obvious facts: if a test sample belongs to an existing subject in training set, the maximum CCR should significantly larger than others. Otherwise, there should be no obvious difference between the maximal CCR and other CCRs. Thus, based on the CCRs, we can define a statistical measure to describe the category sparsity and use it to reject outliers. 1 2 Definition (Category Sparsity Index, CSI) Suppose CCRbest and CCRbest are

the maximal CCR corresponds to the most suitable category and the second maximal CCR corresponds to the hypo-suitable one. The CSI corresponds to a test sample can be defined as

CSI () 

2 CCRbest ()  [0,1] . 1 CCRbest ()

(18)

If CSI ()  0 ,the test image is represented using only images from a single subject, and if CSI ()  1 , the coefficients are generally not concentrated on any individual subject, but rather be spread evenly over all categories. We can therefore choose a threshold   [0,1] and accept a test image as valid if CSI ()   ,

(19)

and otherwise reject as invalid. The experiments will be given in Subsection 3.5.4. 2.3.2 Stability analysis of PFSRC The stability of the proposed PFSRC will be analyzed theoretically as follows.

 

Theorem (Classification Stability of PFSRC) Suppose x j2  x j1   x j1 corresponding

pseudo-full-space

has

disturbance, 14

i.e.,

 

V j2  V j1   V j1

or and

corresponding PFSR x j1  V j1 Aj1 , x j2  V j2 Aj2 , if

 

  xj 1  x j1  2 

  max 

and sin    LS / x j1

2

 

 V j1

,

V j1

2

2

   

  V n j1  ,  1 V j1  

 1 , where LS  V j1 ALS  x j1 , ALS  arg min x j1  V j1 Aj1 ,

2

2

2

then Aj2  Aj1 Aj1

Proof.

 

 V j1

2

 

E   x j1 / 

Assume

2

 

 2 2 V j 1    tan    2 V j1 cos(  ) 

2

 

 

 

f   V j1 / 

,

,

2

 2   O( ). 

it

is

easily



 n V j1

(20) derived



from assumption. For all t  [0,  ] , rank V j1  tf  n . Then

V

j1

 V





 tf



 tf A j1  t   V j1  tf

j1

 x 

j1



 tE ,

(21)

whose solution Aj1  t  is continuously differentiable. Because Aj1  Aj1  0 and Aj2  Aj1   , we have

Aj2  Aj1 Aj1

2

Aj1  0 



Aj2  Aj1

2

 O  2 

2

.

(22)

2

The derivation of Eq. (20) with t  0 is



Aj1  0   V j1 V j1



1



 

V j1  E  fAj1  V j1 V j1





1



f  x j1  V j1 Aj1 .

(23)

Substituting Eq. (23) into Eq. (22), we get Aj2  Aj1 Aj1

2

2

    V j1  

It

V j1

is 2 2

 Aj1

also 2 2

 x j1

known 2 2

V j1

2

V V   j1

 LS

2

2

Aj1

that

1

j1

V j1

( 22

2 2

V j1

x j1

2

2

A j1

V V   j1

j1

2

x j1  V j1 Aj1

2 2

1

 1) 2

 2   O   . 2 

 V j1 Aj1

2 2

 x j1

2 2

2 . The relationship between A j1 and A j2 will be  LS

15

,

then

Aj2  Aj1 Aj1

2

2

  1     2 (V j1 )   1   2 V j1  cos( )  

 

2

sin    2   O( ). cos( ) 

Therefore, Eq. (20) can be inferred and the stability theorem of PFSRC is proved. Thus, for nonzero residual problems, it is the square of the condition number that measures the sensitivity of coefficients. In contrast, according to Subsection 2.1, reconstruction error sensitivity linearly depends on  2 ( X i ) . That is, the coefficients are more sensitive to a small disturbance  than that of reconstruction error. Even so, it is worth noting that the difference between coefficients resulted by  has a positive impact when CCRs of different categories are calculated. 2.3.3 The PFSRC algorithm The flowchart of PFSRC based robust FR is shown in Fig.3. A test sample in pseudo-full-space is classified into the category with the maximal CCR. Projection

Classification Calculate relevancy between test samples and each category

Training sample space

...

Training samples

Projection matrix

Projection

Category contribution rate matrix

Pseudo-full space a test sample Other test samples Other training samples

-3

Classification result

+

1.4

x 10

1.2

1

... + ...

0.8

Category contribution rates of a test sample to all categories

0.6

0.4

0.2

0

0

10

20

30

40

50

60

70

80

Fig.3. Pseudo-full-space representation based classification for face recognition.

Algorithm 1 summarizes the procedure of the proposed PFSRC algorithm. 16

90

100

Algorithm. 1. The PFSRC algorithm Input: Given a training sample set X  [ x1 , x2 ,

Y  [ y1 , y2 ,

, xsc ]  Rmsc and a test sample set

, yk ] .

Step1: The PFSR is realized by Eq. (10). Step2: The projection coefficient matrix A is obtained by Eq. (12). Step3: The CCRs of each test sample yl for all categories are acquired by Eq. (16). Output: The category label, m , with the largest CCR is achieved by Eq. (17).

2.3.4 Complexity analysis Generally speaking, the complexity of SRC and RRC[35], based on l1 -minimization, mainly lies in the coding process and has a computational complexity of O  n 2 m1.5  [39], where n is the dimensionality of face feature, and m is the number

of dictionary atoms. SRC is also reported that the commonly used l1 -minimization solvers, e.g., l1 -magic [40] and l1 _ ls [41], have an empirical complexity of O  n 2 m1.3  [41]. While for PFSRC, the coding (i.e., Eq. (11)) is an l2 -regularized least

square problem. Standard time complexity of l2 -regularized least square problem is O  k1nm , here k1 is the iteration number in conjugate gradient method, k is the

number of test samples. The solution Aˆ j  (V jTV j   I )1V j T x j could be got by solving (V jTV j   I ) Aˆ j  V j T x j

efficiently via conjugate gradient method [42],

whose time complexity is about O  k1n(m  k  1)  because it extends original dictionary to pseudo-full-space. By experience, k1 is less than 30. It is easy to see that PFSRC has much lower complexity than SRC. 3. Experiments and discussions In this section, we conduct extensive experiments on three standard face databases including AR database [43], Extended Yale B database [44] and CMU Multi-PIE 17

database [45] to verify the efficacy of the PFSRC. Five sets of experiments are performed: parameter analysis, performance of the proposed representation way, performance of the corresponding classification criterion, robustness analysis and time comparison. Compared methods are SVM [46], NN [47] and NS [48], SRC [2], CRC [5], PCRC [34], PSRC [33], ESRC [32], AGL [31], SRC-MP [49], WSRC [50], PCA-SRC-1[7], PCA-SRC-2 [7], RRC[35] and PPL[51]. All the experiments are carried out using MATLAB R2013a on a 3.30 GHz machine with 4.00GB RAM. Considering accuracy and efficiency, we chose l1 _ ls to solve l1 -regularized minimization in SRC. 3.1 Databases As stated in [2], there are two assumptions in standard SRC: (1) the experiments are confined to human frontal FR, which allows small variations in pose and displacement. (2) detection, cropping, and normalization of the face have been performed prior to applying the corresponding algorithms. Similar to SRC [2], our work also considers the data obtained under these conditions. The AR face database contains over 4000 face images of 126 subjects with 26 images per subject including frontal view of faces with different facial expressions, lighting conditions and occlusions. The 26 images from each individual can be divided into two sections. Each section contains 7 clean images without occlusion, 3 images with sunglasses and 3 images with scarf. The Extended Yale B contains 38 subjects with about 2414 front face images for all subjects, and with the face images being abundant in illumination variation. Cropped

18

Yale B is a subset of Extended Yale B. Cropped Yale B contains 38 categories and 3 subsets, where subset1 contains 263 samples, subset2 includes 456 samples with more expressions, and subset3 has 455 samples with more different illumination. In Cropped Yale B database, we conduct two experiments. One is that subset2 and subset3 are test set and subset1 is training set. The other is that subset2 is training set, while subset1 and subset3 are test set. CMU Multi-PIE is a database with 68 individuals and 41368 images. These images are obtained by 13 different pose, 43 different illuminations and 4 different expressions. 3.2 Parameter analysis There are two parameters, regularization parameter  and dimension number, to be determined in our experiments. The role of the regularization term is to avoid over-fitting and get a stable and unique solution. The parameter setting is similar with CRC [5]. When more samples are used for CRC the least square solution will become more unstable and thus the higher regularization is required. In all experiments, we set the parameter  as 0.001* n / m , where n, m are the number of pseudo-full-space samples and training samples, respectively. Moreover, the rationality of n and m has been tested by adjusting them to achieve the best performance. As discussed in SRC [2], the choice of features is no longer critical if sparsity in the recognition problem is properly harnessed. In our experiments, Eigenfaces [52] is used as input facial feature. The impact of dimensionalities on PFSRC is analyzed by testing on various feature dimensions. The curves of recognition rates versus the

19

feature dimensions are illustrated in Fig. 4. One can see that PFSRC with about 200, 200 and 300 dimensions outperforms others on AR, Extended Yale B and CMU Multi-PIE databases, respectively. So, in the following experiments, we use the feature of dimensionality 200, 200, 300 on the three databases respectively. A more detail discussion about feature extraction please refers to [53] and [54]. 0.8

0.97

0.78

0.96

0.74

0.76 0.74 0.72 0.7 0.68

Classification accuracy

Classification accuracy

Classification accuracy

0.72

0.95

0.94 0.93

0.7 0.68 0.66 0.64 0.62 0.6

0.92 0.58

0

100 200 300 Feature dimension

(a)

400

0.91 50

100

150

200

250

300

350

100

Feature Dimension

(b)

200

300

400

500

Feature Dimension

(c)

Fig.4. The curves of classification accuracy versus feature dimension. (a) AR databases, (b) Yale B database and (c) CMU Multi-PIE database.

3.3. Performance of pseudo-full-space representation As mentioned before, the difference between standard sparse representation and PFSR is whether the representation space has unlabeled test samples. To verify the fact that the proposed PFSR is indeed better than standard sparse representation, we test three different representation spaces: training sample space (named SR), training sample space add test sample space (exclude those test samples comes the same category with the sample to be represented) (named PFSR-1), and training sample space add the whole test sample space (named PFSR-2). The performance of the representation ability can be demonstrated by comparing the CCI results. Fig. 5 gives the CCI values between standard sparse representation (SR) (blue lines), PFSR-1 (green lines) and PFSR-2 (red lines) on AR database (Fig. 5 20

(a)) and Extended Yale B database (Fig. 5 (b)). The bigger the index is, the better the representation is. One can see that for all samples, CCI PFSR-2  CCI PFSR-1  CCISR is almost always satisfied, that is, CCI of PFSR is larger than that of standard sparse presentation. Moreover, the relationship CCI PFSR-2  CCI PFSR-1 is satisfied in 88.43%, 73.46%, CCI PFSR-2  CCISR is satisfied in 98.43%, 98.25%, and CCI PFSR-1  CCISR is satisfied in 96.57%, 92.54% on AR database and Extended Yale B database, respectively. Experiment results demonstrate that: (1) the proposed PFSR has better representation ability than standard sparse presentation; (2) test samples, not only come from the same category with the sample to be represented but also the different categories, provide positive effect on representation. 0.45

0.16 SR PFSR-1 PFSR-2

0.12

0.35

0.1

0.3

0.08

0.25

0.06

0.2

0.04

0.15

0.02

0.1

0

0

100

200

300

400

500

600

SR PFSR-1 PFSR-2

0.4

CCI

CCI

0.14

0.05

700

Number of training samples

0

50

100

150 200 250 300 350 Number of training samples

400

450

500

(a) (b) Fig. 5. Comparison of CCI values among standard sparse representation (blue lines), PFSR-1 (green lines) and PFSR-2 (red lines) on (a) AR database and (b) Extended Yale B database.

In order to more intuitive display the complementarity of PFSR, two samples are randomly selected from AR database and Extended Yale B database, and their projection coefficients based on the two representations are exhibited in Fig. 6. One can see that the coefficients (blue curves) between PFSR (left column) (Fig. 6 (a) and (c)) and standard sparse representation (right column) (Fig. 6 (b) and (d)) on AR database and Extended Yale B database. By comparing Figs. 6 (a) and (b), Figs. 6(c) 21

and (d), one can see that those test samples (with red frames) which comes from the same category do provide complementary information in representing a sample. 0.1

=

0.2

0.08

Coefficients

Coefficients

0.09

0.07 0.06 0.05

0.18 0.16 0.14 0.12

=

0.1

0.04

0.08

0.03 0.06

0.02

0.04

0.01 0

0.02 0

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0

100

200

300

400

500

600

700

Samples in training space

Samples in pseudo-full space

(a)

(b)

0.25

0.3

Coefficients

Coefficients

0.35

0.25

0.2

=

0.2

0.15

=

0.15

0.1

0.1 0.05

0.05

0

0

0

200

400

600

800

1000

1200

0

50

Samples in pseudo-full space

100

150

200

250

300

350

400

450

500

Samples in training space

(c) (d) Fig. 6. Comparison of coefficients (blue curves) between PFSR (images with red frame are test samples) and sparse representation. (a) and (b) are coefficients of PFSR and sparse representation on AR database, (c) and (d) are coefficients of those on Extended Yale B database.

For further verifying the positive effect of the test image space on the PFSR, we test the case where test image comes one by one on AR database. Apart form the obvious mentioned SRC and PFSR-1, we also test another case named PFSR-0, where half of the subjects (50 categories) are randomly selected as the test set and the other 20 categories subjects as the training set. Compared with the representation of SRC (Eq. (1)), PFSR-0 and PFSR-1 can be written as Eq. (24).

x j  i ,1 x1 

 i , j 1 x j 1  i , j 1 x j 1 

 i ,sc xsc  1

22

1

  j ,1 y1 

  j ,k yk   j , p y p , (24)

where

j  1,

i(i {1,

, c1}, c1  c) , y1 ,

, sc1 .

For

come

yp

PFSR-0,

from

training

category

, yk come from categories outside of training space. For

PFSR-1, y p come from training category i(i {1,

from training categories except for i , i.e., {1,

, c1}, c1  c) and y1 ,

, yk come

, c1} \ i .

It is clear that SRC, PSNR-0 and PSNR-1 are three cases where test sample comes from on one by one. The classification accuracies of SRC, PSNR-0 and PSNR-1 are 47.5%, 80.83% and 88.33%, respectively. From the results one can see the following conclusions: (1) there is indeed positive effect on representation and recognition form test sample space, which consists of samples come from not only the same categories with the training samples but also totally different categories. A classification results based on the three compared methods is shown in Fig 7. Test samples

SRC

























PFSR-0

























PFSR-1

























Fig. 7. An example of three compared methods for test image comes one by one.

3.4. Performance of category contribution rate To demonstrate the performance of CCR, some individuals of Extended Yale B are taken as examples, where subset2 is training set, subset1 and subset3 are test sets. Fig.8 randomly gives CCR results of some test samples to all 38 classes, where a test sample of each category is taken to show. It can be seen that there is only one peak (circled in red) in every subfigure, and then the test sample should belong to the corresponding category. 23

-3

-3

x 10

8

x 10

7

0.01

0.01

0.009

7

0.009 6

0.008

0.008

6 5

0.007

0.007

5 0.006

4

0.006

4

0.005

0.005 3

0.004

3

0.004

0.003

0.003

2

2 0.002

0.002 1

1 0.001

0

0

5

10

15

20

25

30

35

40

0

5

10

15

20

25

30

35

40

-3

-3

9

0.001 0

0

x 10

8

0

5

10

15

20

25

30

35

0

40

4

x 10

5

10

15

20

25

30

35

40

-3

x 10

8 8

0

-3

x 10

7

3.5

7 7

6

3

5

2.5

4

2

3

1.5

2

1

1

0.5

6 6

5

5

4

4

CCR CCR

3

3

2

2

1

1 0

0

5

10

15

20

25

30

35

40

0

0

5

10

15

20

25

30

35

40

0

0

5

10

15

20

25

30

35

40

0

0

5

10

15

20

25

30

35

40

5

10

15

20

25

30

35

40

5

10

15

20

25

30

35

40

-3

9

x 10

-3

x 10

8

9

8 7 6

7

5

6

4

4

5

3

3

2

2

6

x 10

7 6 5 4

4

3 2

2 1

1

1 0

5

10

15

20

25

30

35

40

0

0

-3

9

8

3

1 0

x 10

8

7

5

-3

-3

5

10

15

20

25

30

35

40

0

0 0

5

10

15

20

25

30

35

40

0

-3

x 10

8

x 10

-3

6 8

-3

x 10

6

x 10

7

7

5

5

4

4

3

3

2

2

6

6 5

5 4

4 3

3 2

2 1

1

0

0

0

5

10

15

20

25

30

35

40

1

0

5

10

15

20

25

30

35

40

0

1

0

5

10

15

20

25

30

35

40

0

0

Categories

Fig.8. The curves of category contribution rates of some test samples versus all categories on Extended Yale B. -3

7

x 10

CCR

category contribution rate

6

5

4

3

2

1

0

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Samples in pseudo-full pseudo-full spacespace samples

Fig.9. An example of category contribution rate of all samples in pseudo-full-space (the 15-th category of AR database).

To further illustrate the performance of CCR, another example of AR database is give in Fig.9, which shows CCR results of all samples in pseudo-full-space to the 15-th category. There are 5 peaks because of location distribution of samples. For AR database, test samples of the same category are located in 4 parts, so the first 4 peaks correspond to four locations of the 15-th category of test set. Apart from the training 24

sample being represented, the other training samples are put together, and then we have the 5-th peak. 3.5. Robustness In this subsection, robustness of the proposed PFSRC method is verified, including random block occlusion, small training samples, different poses and illumination. The performance of CCR for rejecting invalid test images is also analyzed. 3.5.1. Extended Yale B database A subset of Extended Yale B, Cropped Yale B is used to verify the robustness of PFSRC to random block occlusion and small training samples. Example images of an individual are shown in Fig.10.

Fig.10. Example images of an individual in Cropped Yale B.

Different proportion occlusions are randomly added into images as test samples. Fig.11 gives some example images of an individual with random occlusions. It is obvious that there is some complementary information between these test images, because occlusions distribute at random locations. The occlusion covers 0%, 10%, 20%, 30%, 40% and 50% percent of an image.

Fig.11. Example images of an individual with random occlusions in Cropped Yale B. 25

Experimental setting is the same as the description at the beginning of Section 3. Classification accuracies with different sizes of occlusion are shown in Table 1 and

Fig.12 (a). It can be seen that accuracies of our PFSRC method decrease much slower than that of SRC, CRC, SVM, NN and NS as increasing proportion of occlusion. Table 1. Classification accuracies (%) of different sizes of occlusions on Cropped Yale B database. (subset2 and subset3 as training set , subset1 as test set) Occlusion percentage Classification accuracy

0%

10%

20%

30%

40%

50%

Our method

99.01

98.46

99.01

98.68

98.13

97.15

SRC

99.89

99.58

97.04

82.22

60.59

44.13

CRC

99.89

98.05

90.23

70.80

48.85

31.50

SVM

94.73

97.21

85.29

66.08

46.76

29.75

NN

92.21

96.08

84.08

71.02

50.93

33.26

NS

100

99.72

97.91

86.17

60.81

37.10

To illustrate the stability, we also reduce number of training samples per category by choosing subset1 and subset3 as testing set, and randomly selecting ten, eight, six samples per individual from subset2 for training. Experimental results are shown in Figs.12 (b), (c) and (d) respectively. Experimental results indicate that our method does really good in the case of occlusion and small training samples. It is worth noting that PFSR is more robust than standard sparse representation to illumination, expressions and occlusions. Potential cause for this is the PFSR can take fully advantage of complementarity and redundancy between samples. When we consider a test sample, other test samples in pseudo-full-space provide complementary information, which is not taken into consideration in standard sparse representation.

26

1

1 0.9

0.8

accuracy Classification rate(%) accuracy classification

accuracy Classification rate(%) accuracy classification

0.9

0.7 our method SRC CRC SVM NN NS

0.6 0.5 0.4 0.3 0.2

0.8 0.7 0.6

our method SRC CRC NN NS SVM

0.5 0.4 0.3

0

0.05

0.1

0.15

0.2 0.25 0.3 0.35 occlued percentage(%)

0.4

0.45

0.2

0.5

0

0.05

0.1

0.15

0.2

0.9

0.9

Classification accuracy

0.8 0.7 our method SRC CRC SVM NN NS

0.5 0.4 0.3 0.2

0.3

0.35

0.4

0.45

0.5

(b) 1

classification accuracy rate(%)

accuracy Classification rate(%) accuracy classification

(a) 1

0.6

0.25

occlued percentage(%) Occluded proportion

Occluded proportion

0.8 0.7 our method SRC CRC SVM NN NS

0.6 0.5 0.4 0.3

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

occlued proportion percentage(%) Occluded

0.2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

occluedproportion percentage(%) Occluded

(c)

(d)

Fig.12. Classification accuracy rates of different sizes of random block occlusions on Cropped Yale B database. Test set is subset1 and subset3, the training samples are randomly selected ten (b), eight (c), six (d) per individual from subset2.

3.5.2. CMU Multi-PIE database In this database, a subset with 5 different poses (C05, C07, C09, C27, C29) is utilized for testing, where contains 68 individuals and 11554 images. All images are cropped and resized to 32  32 pixels. Fig.13 shows some examples.

Fig.13. Sample images of two individuals with different poses in CMU Multi-PIE database.

In order to illustrate the performance of PFSRC on SSS and occlusion, images with

27

randomly occluded (from 0% to 50%) are selected for training (30% and 10%) and the rest are used for testing. The classification accuracy of different sizes of occlusion is shown in Fig.14. One can see that classification accuracy of our algorithm is slightly lower than some compared methods when proportion of occlusion below 20%. While with increasing proportion of occlusion, accuracy of all the other methods decreases much more quickly than that of our method. Furthermore, the recognition rate still keeps competing with others when reducing the proportion of the training samples. Less training samples relatively mean that more test samples’ information can be used in PFSRC. All these prove that the proposed PFSRC is more robust than

1

0.9

0.9

0.8

0.8

accuracy classification accuracyrate(%) Classification

accuracy classification accuracyrate(%) Classification

SRC to occlusion and limited training sample problem.

0.7 0.6 0.5 our method SRC CRC NN NS

0.4 0.3

0

0.05

0.1

0.15

0.6 0.5 0.4

our method SRC CRC NN NS

0.3 0.2

0.2 0.1

0.7

0.2 0.25 0.3 0.35 occlued percentage(%)

0.4

0.45

0.5

0.1

0

0.05

0.1

0.15

0.2 0.25 0.3 0.35 occlued percentage(%)

0.4

0.45

0.5

Occluded proportion

Occluded proportion

(a)

(b)

Fig.14. Classification accuracies in CMU Multi-PIE database. (a) 30% as training set, (b) 10% as training set.

3.5.3. AR database

Fig.15. Example images of an individual in two sections in AR database. 28

In AR database, some example images of an individual are shown in Fig.15. Experimental setting is also the same as the description at the beginning of Section 3. Redundancy in images is useful for robust image recognition [2]. For example, when a portion of an image is occluded, non-occluded region of the same image still contains useful information for identification. It seems so hopeful to achieve better recognition results when we use complementary information in non-occluded regions reasonably and effectively. Firstly, we randomly select 700 images as training set and 1200 occluded images as test set. Classification accuracy of SRC is 43.83%, CRC is 46.50%, SVM is 24.50%, NN is 29.25%, NS is 29.42% and PFSRC is 77.92%. It can be seen that our PFSRC achieves the highest accuracy and outperform all the other methods. Secondly, we reduce the number of training samples per category one by one from 7 to 3 to provide a further test of our method. Fig.16 shows that our method performs better than others. It is clear that the compared methods rely more on the number of training samples each category. 0.9

rate(%) accuracy classification accuracy Classification

0.8 our method SRC CRC NN NS SVM

0.7

0.6

0.5

0.4

0.3

0.2

3

3.5

4

4.5

5

5.5

6

6.5

7

number of training samples per class Number of training samples per category

Fig.16. Classification accuracy on AR database. The number of training samples per category decreases from 7 to 3. 29

Finally, the performance of PFSRC is also discussed with decreasing the number of test samples per category from 12 to 2. Without loss of generality, we reduce the number of test samples per category by randomly selecting the same number of wearing sunglasses images with wearing scarves. Fig. 17 verifies the performance of our method affected by gradually reducing the number of test samples. The experiment is repeated ten times and results are averaged. From Fig.17, one can see that the curve declines slowly with reducing the number of test samples, and even in the case of two test samples, our method still obtain recognition rate above 70%. This indicates that the proposed PFSRC is robust to the number of test samples. 0.8

Classification accuracy

0.7

0.6

0.5

0.4

0.3

0.2 12

11

10

9

8

7

6

5

4

3

2

Number of test samples per category

Fig. 17. Classification accuracy on AR database. The number of test samples per category decreases from 12 to 2.

Table 2 compares the proposed PFSRC to the other five algorithms described in [2], and the experimental setting is the same as Subsection 4.5 in [2]. For images with occlusion by sunglasses, our method achieves a recognition rate of 88% and 1% better than SRC. But for occlusion by scarves, our recognition rate is 89.1% and exceeds SRC 30%.

30

Table 2. Comparison of classification accuracies (%) on AR database. Algorithms

Recognition rates with sunglasses

Recognition rates with scarves

SRC [6]

87.0

59.5

PCA + NN [6]

70.0

12.0

ICA I + NN [6]

53.5

15.0

LNMF + NN [6]

33.5

24.4

l 2 + NS [6]

64.5

12.5

Our method

88.0

89.1

Next, we compare PFSRC with the competing method RRC [35] by accuracy and time complexity on AR database. As shown in reference [35], for RRC_L1, 2 computational complexity of occlusion is about O  tn m  (usually t  15 with

occlusion); For RRC_L2 [35], computational complexity is O  tk1nm (usually t is less than 15). According to Subsection 2.3.4, time complexity of PFSRC is O(k1n(m  k  1)) and is lower than that of RRC_L2 and RRC_L1 on AR database

which is verified by results of Table 3. We can see that the proposed PFSRC is more efficient than the other two methods for FR on higher classification accuracy and less classification time. Table. 3 Comparison of classification accuracy (%) and time on AR database. Method

RRC_L1 [35]

RRC_L2 [35]

Our method

Accurac (%)

65.08

63.42

77.92

Time (s)

2788.52

1185.19

448.52

3.5.4. Outlier rejection Practical FR systems are sometimes confronted with invalid test images: images from a completely different database, or images from the same database but different 31

categories from training samples. An effective FR system should recognize the correct category of a valid image and reject invalid images outright. Based on Eq. (18) and (19) in Subsection 2.3.1, we test the simple outlier rejection rule, give the corresponding statistical results of coefficients and based on the proposed CSI to reject invalid images Firstly, we test images which do not belong to any category of existing databases. We randomly select a face image from ORL (Olivetti research laboratory) database [55], and a non-face image from Brodatz texture database [56] for verification. Compared to the coefficients of a valid test image in Fig. 8 (Subsection 3.4), Fig.18 shows that coefficients of invalid samples are not concentrated on any one category but spread widely across entire training set. -3

2.2

-3

x 10

2.6

2

-4

x 10

2

x 10

2.4 1.8

1.8

2.2

1.6

2

1.4

1.8

1.2

1.6

1

1.4

0.8

1.2

0.6

1

0.4

0.8

1.6

1.4

1.2

1

0.8

0.2

0

10

20

30

40

50

60

70

80

90

100

0.6

0

5

3

10

15

20

25

30

35

0.6

40

0

10

-3

-3

x 10

20

30

40

50

60

70

-4

x 10

3

2.4

x 10

2.2

2.5

2.5 2

2

2

1.8

1.5

1.6

1.5 1

1.4 1

0.5

0

1.2

0

10

20

30

40

50

60

70

80

90

100

0.5

0

5

10

15

20

25

30

35

40

1

0

10

20

30

40

50

60

70

Fig.18. Comparison of the proposed technique of invalid test samples (the first column) on AR (the second column), Cropped Yale B (the third column) and CMU Multi-PIE databases (the fourth column). Vertical axis and Horizontal axis of the second, the third and the fourth columns are CCR and categories. Image in upper left corner and lower left corner are from ORL face database and Brodatz texture database, respectively.

The distribution of coefficients contains important information about validity of test images: a valid test image should have a sparse representation whose nonzero entries

32

concentrate mostly on one category, whereas coefficients of an invalid image spread widely rather than concentrate in one place. According to [36], Eigenface can also used to judge whether an invalid image is a face image or not. On the other hand, images from different databases differ not only for the subject identity, but also for the acquisition and environmental conditions. Therefore, we test on the same database by leave one out technique and give the average results in Fig. 19. One can see that the average recognition results of invalid samples are similar with Fig.18, which embody in coefficients are spread across several subjects. An example of one subject in AR database is randomly selected and given in Fig. 20. -3

1.5

-3

x 10

-4

x 10

3.5

2

x 10

1.9

1

Average CCR

1.8

Average CCR

Average CCR

3

1.7

2.5

1.6

2

1.5 1.4

1.5

1.3 0.5

0

20

40

60

80

100

1

0

10

Categories

20

30

40

1.2

0

10

20

Categories

30

40

50

60

70

Categories

(a) (b) (c) Fig. 19. Average recognition results of invalid samples on (a) AR, (b) Extended Yale B and (c) PIE databases. Vertical axis: average of CCRs. Horizontal axis: categories. -3

2

-3

x 10

2

1 0

CCR

4

2

-3 50 0 x 10

100

0 4

0 -3 50 x 10

100

0 4

-3 50 0 x 10

100

50

100

0

0 2

4

0 -3 50 x 10

100

0 4

-3 50 0 x 10

100

50

100

0

0 2

-3 50 0 x 10

100

0 -3 50 x 10

100

0

100

1

0 -3 50 x 10

100

0 2

2

0

x 10

2

1

2

0

-3

x 10

1

2

1 0

2

1

2 0

-3

x 10

1

0

50

100

0

50

Categories

Fig.20. CCR of one category samples in AR database. 33

Based on Eq.(19) in Subsection 2.3.1,we test the system’s ability to determine whether a given test subject is in the training database or not by sweeping the thresholds  through a range of values in [0,1] , generating the receiver operator characteristic (ROC) curves in Fig. 21. For comparison, we also considered outlier rejection by SRC based on SCI [6]. The curves (blue lines) are also displayed in Fig. 21. Notice that the simple rejection rule (Eqs. (18) and (19)) performs better on all the

1 0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4 0.3 0.2

0

0

0.2

0.4

0.6

False positive rate

0.8

0.8

0.6 0.5 0.4 0.3 0.2

SRC based on SCI PFSRC based on CSI

0.1

1 0.9

True positive rate

1 0.9

True positive rate

True positive rate

three databases.

1

0

0

0.2

0.4

0.6

0.8

False positive rate

(a)

(b)

0.6 0.5 0.4 0.3 0.2

SRC based on SCI PFSRC based on CSI

0.1

0.7

SRC based on SCI PFSRC based on CSI

0.1 1

0

0

0.2

0.4

0.6

0.8

1

False positive rate

(c)

Fig. 21 ROC curves for outlier rejection on AR (a), Extended Yale B (b) and PIE databases (c). Vertical axis: true positive rate. Horizontal axis: false positive rate. The red curve is generated by computing PFSRC based on rejecting outliers via Eq. (19), while blue curve is SRC based on SCI (Eq. (14) in [2]).

3.6. Comparing with state-of-the-art face recognition classification algorithms Apart from standard sparse representation-based FR methods and some popular classifiers, we also compare our method with some other state-of-the-art methods, including AGL, ESRC, PSRC and PCRC, which are also performed on AR and Extended Yale B databases. Table 4 proves that our algorithm is still competitive with higher accuracies than these methods. The performance of PFSRC is also compared with the latest method based sparse 34

representation for FR. The compared methods are SRC [2], SRC-MP [49], WSRC [50], PCA-SRC [7], PCA-SRC-2 [7], RRC_L1 [35] and RRC_L2 [35]. PCA-SRC-2 is named for convenience and the difference between PCA-SRC and PCA-SRC-2 is based on minimal residuals and maximal probability. RRC_L1 and RRC_L2 are the RRC coding model with L1 and L2 constraints respectively. Table 5 show the results on CMU Multi-PIE database in the same experimental setting with [7]. One can observe that our method achieves competitive results of classification accuracies. Table 4. Classification accuracies of different methods on AR and Cropped Yale B databases. Image database AR

Cropped Yale B

AGL[10]

74.17

60.23

ESRC[16]

70.90

96.67

PSRC[17]

69.38

96.88

PCRC[18]

74.57

96.88

our method

79.50

99.16

Algorithms

Table 5. Classification accuracies (mean  std-dev%) on CMU Multi-PIE database. Number of training samples for per category Algorithms 5

6

7

8

SRC [2]

81.41  3.54

87.58  2.89

88.14  2.53

90.21  4.12

SRC-MP [49]

83.36  2.89

89.12  3.34

89.67  3.27

90.85  3.98

WSRC [50]

83.05  3.25

89.97  2.89

90.14  4.21

90.75  3.67

PCA-SRC [7]

88.97  3.42

90.21  1.24

91.59  2.16

90.85  2.04

91.91  3.64

92.79  3.14

93.01  3.89

93.43  3.52

RRC_L1 [35]

96.01  1.28

95.57  2.19

96.28  2.43

96.88  0.40

RRC_L2 [35]

86.52  4.48

87.73  2.58

87.23  3.99

90.72  3.34

96.24  1.14

97.34  1.53

97.51  1.55

PCA-SRC-2[7]

our method

95.54  2.06

35

The PPL algorithm [51] is a recent FR approach that integrates supervised and unsupervised information and achieves good performance. Thus, we also do the following comparison experiments. The compare methods are PPL [51], other projection-based algorithms, SPP [57] and SRC-DP [58], and dictionary learning method, FDDL [59]. Tables 6 and 7 show the results on Extended Yale B and AR database in the same experimental setting with [51]. It is clear that our PFSRC method obtains very competitive results. For Extended Yale B database, PFSRC achieves the best results. For AR database, one can observe PFSRC has the significant advantage for the case where there are fewer training samples. Table 6. Classification accuracies (mean  std-dev%) on Extended Yale B database. Number of training samples for each category Methods 5

8

16

24

32

SPP+SRC [57]

70.28  2.46

81.77  1.84

90.81  2.04

94.41  0.86

96.11  0.65

SRC-DP+SRC [58]

69.42  2.52

81.51  1.05

91.04  0.63

94.27  0.60

96.36  0.74

FDDL [59]

67.04  3.69

82.89  1.54

90.60  0.88

94.33  0.61

96.23  0.67

PPL [51] (Table 2)

70.14  1.94

81.90  1.89

90.11  1.63

94.29  0.72

96.26  0.47

Our method

88.29  4.35

91.99  2.67

95.65  2.21

96.18  2.42

97.11  1.34

Table 7. Classification accuracies (mean  std-dev%) on AR database. Number of training samples for each category Methods 5

7

9

11

13

SPP+SRC [57]

69.78  3.23

79.02  3.36

84.73  1.54

87.93  1.70

90.63  1.74

SRC-DP+SRC[58]

72.25  5.02

83.07  4.95

88.78  3.47

90.12  4.22

95.92  0.94

FDDL [59]

75.25  3.58

88.45  4.64

91.21  3.60

91.47  4.47

94.32  1.94

PPL [51] (Table 3)

75.52  4.17

90.58  3.97

92.59  3.60

94.27  2.66

97.23  0.82

Our method

87.00  2.78

91.15  2.25

91.51  2.70

93.47  2.06

96.00  0.75

36

4 Conclusions In this paper, we present an improved sparse representation based classification technique, named PFSRC. The superiority of the PFSRC lies in its highest utilization rate of samples. PFSRC is useful for improving insufficient available training sample problem and is robust to illumination, poses and occluded, and can be capable of real-time identification requirements. Moreover, the PFSRC is not only limited to FR, but also can be used for other object recognition if the linear condition is satisfied. PFSRC is the most suitable for the situation that can easily obtain a test sample and some unlabeled samples for assisted recognition at the same time, no matter which categories these unlabeled samples come from. The applicable environment can be objects (face, vehicle, et al) crowded places, such as station, airport, and so on. There naturally remain many interesting questions to be investigated. There is not advantage for the case where test sample comes one by one and no other auxiliary unlabeled samples can be obtained simultaneously. As an open system, the recognition performance can further be improved by integrating other advanced tricks, for instance, deep learning. Another challenge and interesting direction is how to improve it and use for FR unconstrained conditions. Acknowledgments The authors would like to thank the anonymous reviewers and the associate editor for their valuable comments and thoughtful suggestions which improved the quality of the presented work. We also thank Dr. Yunmei Chen and Dr. Xianqi Li at University of Florida, for helpful and informative discussion, and thank Dr. Haishun

37

Du at Henan University, for their useful suggestions and experiments. This work was supported in part by Key Project of Science and Technology of the Education Department Henan Province (14A120009), and Natural Science Foundation of Henan Province (162300410061) and Program of Henan Province Young Scholar (2013GGJS-027) of China. References [1] J. Zhang, Y. Yan and M. Lades, Face recognition: eigenface, elastic matching and neural nets, in: Proc.. IEEE, 85 (9) (1997) 1423-1435. [2] J. Wright, A. Y. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 210-227. [3] S. Raudys, A. K. Jain, Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13 (3) (1991) 252-264. [4] X. Tan, S. Chen, Z. Zhou, F. Zhang, Face recognition from a single image per person: a survey. Pattern Recogn. 39 (9) (2006) 1725-1745. [5] L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which helps face recognition? in: Proc. Int. Conf. Computer Vision, 2011, pp. 471-478. [6] S. Cai, L. Zhang, W. Zuo and X. Feng, A probabilistic collaborative representation based approach for pattern classification. in: Proc. Int. Conf. Computer Vision and Pattern Recognition, 2016, pp. 2950-2959. [7] Z. P. Hu, F. Bai, S. H. Zhao, M. Wang, Z. Sun, Extended common molecular and discriminative atom dictionary based sparse representation for face recognition. J. Vis. Commun. Image Represent. 40 (2016) 42-50. [8] Y. Xu, X. Li, J. Yang, Z. Lai, D. Zhang. Integrating conventional and inverse representation for face recognition. IEEE Trans Cybern. 44(10) (2014) 1738-1746. [9] J. Yang, L. Luo, J. Qian, Y. Tai, F. Zhang. Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes. IEEE Trans. Pattern Anal. Mach. Intell. 39 (1) (2017) 156. 38

[10] Y. Su, S. Shan, X. Chen, W. Gao, Adaptive generic learning for face recognition from a single sample per person. in: Proc. Int. Conf. Computer Vision and Pattern Recognition, 2010, pp. 2699-2706. [11] M. Yang, L. Zhang, S. C. K. Shiu, D. Zhang, Gabor feature based robust representation and classification for face recognition with Gabor occlusion dictionary. Pattern Recogn. 46 (7) (2010) 1865-1878. [12] M. Yang, L. Zhang, S. C. K. Shiu, D. Zhang, Robust kernel representation with statistical local features for face recognition. IEEE Trans. Neural. Netw. Learn. Syst. 24 (6) (2013) 900-912. [13] P. Zhu, W. Zuo, L. Zhang, Image set based collaborative representation for face recognition. IEEE Trans. Inform. Forensics. Secur. 9 (7) (2014) 1120-1132. [14] A. Adamo, G. Grossi, R. Lanzarotti. Local features and sparse representation for face recognition with partial occlusions. IEEE International Conference on Image Processing. IEEE, 2014, pp. 3008-3012. [15] G.Grossi, R. Lanzarotti, J. Lin. Robust face recognition providing the identity and its reliability degree combining sparse representation and multiple features. Int. J. Patt. Recogn. Artif. Intell., 30 (10) (2016). [16] W. Deng, J. Hu, J. Guo, Extended SRC: undersampled face recognition via intraclass variant dictionary. IEEE Trans. Pattern Anal. Mach. Intell. 34 (9) (2012) 1864-1870. [17] S. Gao, K. Jia, L. Zhuang, Y. Ma, Neither global nor local: regularized patch-based representation for single sample per person face recognition. Int. J. Comput. Vision, 111 (3) (2015) 365-383. [18] P. Zhu, L. Zhang, Q. Hu, S. C. K. Shiu, Multi-scale patch Based collaborative representation for face recognition with margin distribution optimization. in: Proc. European Conf. Computer Vision, 7572 (2012) 822-835. [19] Z. Feng, M. Yang, L. Zhang, Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recogn. 46 (8) (2013) 2134-2143. [20] L. Liu, T. D. Tran, P. C. Sang, Partial face recognition: a sparse representation-based approach, in: Proc. Int. Conf. Aacoustics, Speech and Signal Processing. 2016, pp. 2389-2393. 39

[21] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 2015, pp. 436-444. [22] G. B. Huang, H. Lee, E. Learned-Miller, Learning hierarchical representations for face verification with convolutional deep belief networks. In Proc. Computer Vision and Pattern Recognition, 2012, pp. 2518-2525. [23] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: closing the gap to human-level performance in face verification. in: Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 1701-1708. [24] Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes. in: Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 1891-1898. [25] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, S. Yan, Sparse representation for computer vision and pattern recognition. in: Proc. Int. Conf. IEEE, 98 (6) (2010) 1031-1044. [26] X. Shi, Z. Guo, Z. Lai, Y. Yang, Z. Bao, D. Zhang, A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans. Image Process. 24 (4) (2015) 1341-1355. [27] Z. Lai, W. K. Wong, Y. Xu, J. Yang, Approximate orthogonal sparse embedding for dimensionality reduction. IEEE Trans. Neural. Netw. Learn. Syst. 27 (4) (2016) 723-735. [28] Z. Lai, Y. Xu, J. Yang, J. Tang, D. Zhang, Sparse tensor discriminant analysis. IEEE Trans. Image. Process. 22 (10) (2013) 3904-3915. [29] Z. Lai, Y. Xu, Q. Chen, J. Yang, D. Zhang, Multilinear sparse principal component analysis. IEEE Trans. Neural. Netw. Learn. Syst. 25 (25) (2014) 1942-1950. [30] H. Lu, K. N. Plataniotis, A. N. Venetsanopoulous, MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural. Netw. 19 (1) (2008) 776-779. [31] X. Shi, Y. Yang, Z. Guo, Z. Lai, Face recognition by sparse discriminant analysis via joint L2,1

-norm minimization. Pattern Recogn. 47 (7) (2014) 2447-2453.

[32] A.Y. Yang, Z. H. Zhou, A. Ganesh, S.S. Sastry, Y. Ma, Fast l1 -minimization algorithms for robust face recognition. IEEE Trans. Image. Process. 22 (8) (2013) 3234-3246. [33] J. Yang, Y. Zhang, Alternating direction algorithms for l1 -problems in compressive sensing. SIAM. J. Sci. Comput. 33 (1) (2011) 250-278. [34] Y. Ouyang, Y. Chen, G. Lan, E. Pasiliao Jr, An accelerated linearized alternating direction

40

method of multipliers. SIAM. J. Imag. Sci. 8 (1) (2015) 644-681. [35] M. Yang, L. Zhang, J. Yang, D. Zhang, Regularized robust coding for face recognition, IEEE Trans. Image Processing, 22 (2013) 1753-1766. [36] D. Cai, X. He, J. Han, Semi-supervised discriminant analysis. in: Proc. Int. Conf. Computer Vision. 2007, pp. 1-7. [37] E. J. Candès, T Tao, Decoding by linear programming. IEEE Trans. Inf. Theory. 51 (12) (2005) 4203-4215. [38] G. H. Golub, C. F. V. Loan, Matrix computations, Johns Hopkins University, Press, Balt imore, MD, USA , pp. 242-243, 1996. [39] Y. Nesterov, A. Nemirovskii, Interior-point polynomial algorithms in convex programming. SIAM Philadelphia, PA, 1994. [40] E. Candès, J. Romberg, L1-magic: Recovery of sparse signals via convex programming, http://www.acm.caltech.edu/l1magic/, 2005. [41] S. J. Kim, K. Koh, M. Lustig, S. Boyd, D. Gorinevsky, A interior-point method for large-scale l1-regularized least squares, IEEE Journal on Selected Topics in Signal Processing, 1 (4) (2007) 606-617. [42] J. R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, (Tech. Rep. CMU-CS-94-125) Pittsburgh, PA: School of Computer Science, Carnegie Mellon University, 1994. [43] A. Martinez, R. Benavente, The AR face database. CVC Tech. Report No. 24, 1998. [44] A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and Pose. IEEE Trans. Pattern Anal. Mach. Intell. 23 (6) (2001) 643-660. [45] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expression database. IEEE Trans. Pattern Anal. Mach. Intell. 25 (12) (2003) 1615-1618. [46] V. Vapink, The neural of statistical learning theory. Springer, 2000. [47] R. Duda, P. Hart, D. Stork, Pattern classification (second edition). John Wiley & Sons, 2001. [48] J. Ho, M. Yang, J. Lim, K. Lee, D. Kriegman, Clustering appearances of objects under varying illumination conditions. in: Proc. Int. Conf. Computer Vision and Pattern Recognition. 2003, pp. 11-18. 41

[49] Y. H. Shiau, C. C. Chen, A sparse representation method with maximum probability of partial ranking for face recognition, in: Proc. Int. Conf. Image Processing, Orlando, USA, October 2012, pp. 1445-1448. [50] C. Y. Lu, H. Min, J. Gui, L. Zhu, Y. K. Lei, Face recognition via weighted sparse representation, J. Vis. Commun. Image Represent. 24 (2) (2013) 111-116. [51] W. Zhu, Y. Yan, Y. Peng, Pair of projections based on sparse consistence with applications to efficient face recognition, Signal Process.: Image Commun. 2017, 55: 32-40. [52] M. Turk, A. Pentland, Eigenfaces for recognition. J. Cogn. Neur. 3(1) (1991) 71-86. [53] H. S. Du, Q. P. Hu, M. M. Jiang, F. Zhang, Two-dimensional principal component analysis based on Schatten p -norm for image feature extraction. J. Vis. Commun. Image Represent. 32 (2015) 55-62. [54] R. X. Ding, D. K. Du, Z. H. Huang, Variation feature representation-based classification for face recognition with single sample per person. J. Vis. Commun. Image Represent. 30 (2015) 35-45. [55] The ORL Face Database, AT&T (Olivetti) Research Laboratories, Cambridge, UK. . [56] The

Brodatz,

Texture

Database.
a/

original_brodatz.html>. [57] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognit. 43 (2010) 331-341. [58] J. Yang, D. Chu, Sparse representation classifier steered discriminative projection, in: Proc. Int. Conf. Pattern Recognition, 2010, pp. 694-697. [59] M. Yang, L. Zhang, X. Feng, D. Zhang, Sparse representation based fisher discrimination dictionary learning for image classification, Int. J. Comput. Vis. 109 (2014) 209-232.

42

*Highlights (for review)

Highlights: 

PFSR is presented in an opposite way to sparse representation.



CCR is designed to match with PSFR and complete the classification.



CCI is defined to measure representation ability, CSI is defined to describe the category sparsity and reject outliers



PFSRC is based on utilizing existing available samples rather than constructing auxiliary training samples



The feasibility and stability of PFSRC are analyzed theoretically.



PFSRC is competitive and robust for insufficient training samples FR problem.