Optimized projection for Collaborative Representation based Classification and its applications to face recognition

Optimized projection for Collaborative Representation based Classification and its applications to face recognition

Accepted Manuscript Optimized Projection for Collaborative Representation based Classification and Its Applications to Face Recognition Jun Yin , Lai...

618KB Sizes 0 Downloads 60 Views

Accepted Manuscript

Optimized Projection for Collaborative Representation based Classification and Its Applications to Face Recognition Jun Yin , Lai Wei , Miao Song , Weiming Zeng PII: DOI: Reference:

S0167-8655(16)00024-6 10.1016/j.patrec.2016.01.012 PATREC 6435

To appear in:

Pattern Recognition Letters

Received date: Accepted date:

4 September 2015 26 January 2016

Please cite this article as: Jun Yin , Lai Wei , Miao Song , Weiming Zeng , Optimized Projection for Collaborative Representation based Classification and Its Applications to Face Recognition, Pattern Recognition Letters (2016), doi: 10.1016/j.patrec.2016.01.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Optimized Projection for Collaborative Representation based Classification and Its Applications to Face Recognition Jun Yin, Lai Wei, Miao Song, Weiming Zeng College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

Highlights A new dimensionality reduction method called OP-CRC is proposed. OP-CRC is designed based on Collaborative Representation based Classification(CRC). The projection matrix of OP-CRC is solved by iteration algorithm. OP-CRC is effective for face recognition.

CR IP T

   

ABSTRACT

AN US

Collaborative Representation based Classification (CRC) is powerful for face recognition and has lower computational complexity than Sparse Representation based Classification (SRC). To improve the performance of CRC, this paper proposes a new dimensionality reduction method called Optimized Projection for Collaborative Representation based Classification (OP-CRC), which has the direct connection to CRC. CRC uses the minimum reconstruction residual based on collaborative representation as the decision rule. OP-CRC is designed according to this rule. The criterion of OP-CRC is maximizing the collaborative representation based between-class scatter and minimizing the collaborative representation based within-class scatter in the transformed space simultaneously. This criterion is solved by iterative algorithm and the algorithm converges fast. CRC performs very well in the transformed space of OP-CRC. Experimental results on Yale, AR, FERET, CMU_PIE and LFW databases show the effectiveness of OP-CRC in face recognition. Keywords Dimensionality reduction; Collaborative representation; Face recognition; Discriminant analysis.

1. Introduction

M

Face recognition has been a research focus of pattern recognition and computer vision in the past few decades[1]. The dimensionality of face image is often very high. This increases the computational cost and possibly causes “curse of dimensionality”. To solve this problem, numerous dimensionality reduction methods were proposed. These methods reduce the dimension of face image and achieve the discriminating features.

CE

PT

ED

Principal Component Analysis (PCA) [2] and Linear Discriminant Analysis (LDA) [3] are two classical dimensionality reduction methods. PCA finds a mapping by maximizing the variance, and LDA derives a projection from maximizing the between-class scatter and minimizing the within-class scatter simultaneously. However, these two methods fail to discover the manifold structure of data. Manifold learning methods such as Local Preserving Projection (LPP)[4], Neighborhood Preserving Embedding (NPE)[5] and Isometric Projection (IsoProjection)[6] overcome this limitation. They assume the data lie in a low dimensional manifold of the high dimensional space and discover this manifold. LPP, NPE and IsoProjection are unsupervised methods. To further increase the discriminating ability, some supervised manifold learning methods were developed. Local Discriminant Embedding (LDE)[7], Marginal Fisher Analysis (MFA)[8], Locality Preserving Discriminant Projections (LPDP)[9] and Discriminant Simplex Analysis (DSA)[10] are the representive methods. Moreover, a Graph Embedding framework [8] was proposed, the manifold learning methods can be reformulated in this framework.

AC

Recently, a Sparse Representation Based Classification (SRC) [11] method was proposed for face recognition. In SRC, a test sample is represented as a sparse linear combination of all the training samples and this test sample is assigned to the class with the smallest reconstruction residual. Kernel Sparse Representation based Classification (KSRC)[12, 13] performs SRC in a higher dimensional space by using kernel function. It is more effective for nonlinear data. In SRC and KSRC, sparse representation is used for classification. Sparse representation is also used for dimensionality reduction in some methods such as Sparsity Preserving Projection (SPP) [14] and Sparse Neighborhood Preserving Embedding (SNPE) [15]. They construct L1-gragh using sparse representation coefficients, and then obtain the low dimensional features by Graph Embedding. Discriminant Sparsity Preserving Embedding (DSPE)[16] and Discriminant Sparse Neighborhood Preserving Embedding (DSNPE) [17] construct the discriminant L1-graph by introducing class information into sparse representation. Furthermore, Yang et al. proposed SRC steered Discriminative Projection (SRC-DP)[18] and Lu et al. proposed Optimized Projection for Sparse Representation based Classification (OP-SRC) [19]. SRC-DP and OP-SRC both try to find the low dimensional features which are optimal for SRC. SRC could perform better in the transformed space of these two methods. Very recently, Zhang et al. indicated that it is collaborative representation not L1-norm sparsity which makes SRC powerful and developed a face recognition method called Collaborative Representation based Classification (CRC) [20]. For a test sample, CRC also

 Corresponding author. Tel.: +8602138282823; e-mail: [email protected]

ACCEPTED MANUSCRIPT uses all the training samples to represent it. The main difference of SRC and CRC is the regularized term. L2-norm but not L1-norm is adopted in the objective function of CRC, this makes CRC have significantly lower complexity than SRC. Kernel Collaborative Representation Classification (KCRC)[21] maps the data into a higher dimensional space where different classes are more separable and then performs CRC in this new space. Using collaborative representation, Yang et al. proposed a dimensionality reduction method called Collaborative Representation based Projections (CRP)[22] for face recognition. CRP constructs L2-graph using collaborative representation coefficients and reduces dimensionality by graph embedding. It could preserve the collaborative representation based reconstruction relationship and is faster than dimensionality reduction methods based on sparse representation. Although CRP utilizes collaborative representation, it has no direct connection to CRC. To improve the classification performance of CRC, we should strengthen the connection of the dimensionality reduction method and CRC. In this paper, a new dimensionality reduction method named Optimized Projection for Collaborative Representation based Classification (OP-CRC) is developed. OP-CRC is designed according to the classification criterion of CRC. It seeks a projection that maximizes the collaborative representation based between-class reconstruction residual and minimizes the collaborative representation based within-class reconstruction residual simultaneously. In the transformed space of OP-CRC, CRC could achieve the optimum classification performance.

2. Collaborative Representation based Classification

n

Suppose there are

X

[X1, X2 ,

training samples

m n

, Xc ]

c

from

classes

and

be the training sample matrix, where Xi

the

[x i,1, x i,2 ,

the i th class and m is the dimension of the sample.

CR IP T

The remaining of the paper is organized as follows: Section 2 briefly reviews CRC. Section 3 gives the proposed OP-CRC method. Section 4 presents the experimental results to demonstrate the effectiveness of OP-CRC. Finally, the conclusions are provided in section 5.

i th class has

, x i,n ] i

m ni

ni

training samples.

Let

is the training sample matrix of

ˆ where

arg min{ y

X

2

2

2

2

AN US

In CRC, a test sample y is collaboratively represented by all the training samples. The collaborative representation coefficient vector ˆ is achieved by solving the following regularized minimization problem:

}

(1)

is the regularization parameter. Unlike SRC, CRC uses L2-norm instead of L1-norm in the regularized term. The solution

ˆ of Eq.(1) can be easily derived as:

(X T X

I ) 1 XT y

(2)

M

ˆ

ED

With ˆ , the collaborative representation based reconstruction residual of each class can be calculated. In SRC, the test sample y is assigned to the class which has the minimum residual. Considering that L2-norm of ˆ contains some discrimination information, CRC uses the following classification criterion:

identity( y)  arg min{ y  X iˆi i

2

/ ˆi 2 }

(3)

where ˆi is the coefficient vector associated with the i th class.

PT

3. Optimized Projection for Collaborative Representation based Classification

CE

CRC performs very well in face recognition and it is computationally more efficient than SRC. To enhance the performance of CRC, we propose Optimized Projection for Collaborative Representation based Classification (OP-CRC). The optimum features for CRC could be achieved by the transformation of OP-CRC. Therefore, CRC can perform better in the transformed space of OP-CRC. 3.1. The OP-CRC Algorithm m d

be the optimized projection matrix. Each training sample x ij

AC

Let P

training sample matrix X

m n

is mapped into Y

T

P X

d n

m

is mapped into yij

PT xij

d

and the

. In the transformed space, for a training sample yij from the

i th class , use the remaining training samples to linearly represent it. The collaborative representation coefficient vector ˆ is obtained by solving the regularized minimization problem in Eq.(1). Let ˆk be the coefficient vector associated with the k th class. The collaborative representation based reconstruction residual of the k th class is:

Rk (yij )

yij

Y ˆk

2

(4)

To make CRC perform well in the transformed space, for the sample yij , the residual of the i th class should be as small as possible and the residuals of other classes should be as big as possible. Using the residual of each class, we define the collaborative representation based within-class scatter and the collaborative representation based between-class scatter. The collaborative representation based within-class scatter is defined as:

ACCEPTED MANUSCRIPT 1 c ni  ( Ri ( yij ))2 n i 1 j 1 

1 c ni  yij  Y ˆi n i 1 j 1



1 c ni  ( yij  Y ˆi )T ( yij  Y ˆi ) n i 1 j 1

2 2

(5)

1   tr   PT ( xij  X ˆ i )( xij  X ˆ i )T P   n i 1 j 1  ni

c

 tr  PT S w P 

Sw 

1 c ni  ( xij  X ˆi )( xij  X ˆi )T n i 1 j 1

CR IP T

where

(6)

S w is called the collaborative representation based within-class scatter matrix. The collaborative representation based between-class scatter is defined as: c ni 1 ( Rk ( yij )) 2  n(c  1) i 1 j 1 k  i c ni 1 yij  Y ˆ k  n(c  1) i 1 j 1 k  i



2 2

ni

c 1 ( yij  Y ˆ k )T ( yij  Y ˆ k )  n(c  1) i 1 j 1 k  i

(7)

c ni  1   tr  PT ( xij  X ˆ k )( xij  X ˆ k )T P    n(c  1) i 1 j 1 k  i 

M

 tr  PT Sb P  where

c ni 1 ( xij  X ˆ k )( xij  X ˆ k )T  n(c  1) i 1 j 1 k i

(8)

ED

Sb 

AN US



Sb is called the collaborative representation based between-class scatter matrix.

J (P )

PT

According to the classification criterion of CRC, we define the criterion of OP-CRC as maximizing:

tr (PT Sb P )

tr (PT Sw P )

(9)

CE

With the known S w and Sb , the optimum projection matrix P can be obtained by solving the following generalized eigenvalue problem:

Sb p

Sw p

(10)

AC

P is formed by the generalized eigenvectors corresponding to d largest eigenvalues. However, without P , the collaborative representation coefficient vector ˆ in the transformed space can not be calculated. Then S w and Sb can not be constructed directly. To solve this problem, we adopt the iterative algorithm. In the beginning of the iteration, suppose the initial projection matrix P1 space is mapped into yij

m d

is given. The training sample x ij in the input

P1T x ij in the transformed space by P1 . Each training sample yij is linearly represented by the remaining

training samples in the transformed space, and the collaborative representation coefficient vector ˆ of each training sample in the transformed space is achieved by Eq.(2). Then S w and Sb can be calculated and the new projection matrix P2 is obtained by solving the generalized eigenvalue problem in Eq.(10). Based on this idea, in the k th iteration, we can obtain the new projection matrix Pk

1

by using Pk as the initial projection matrix. The iteration continues until the algorithm converges. The convergence of

the algorithm is determined by the value of J (P ) . The convergence criterion is defined as:

(J (Pk 1 ) J (Pk )) / J (Pk 1 )

(11)

ACCEPTED MANUSCRIPT The iterative algorithm of OP-CRC is summarized as follows: Input: training samples x ij (i

1,2,

, c; j

1,2,

, ni ) , a small positive number

Output: projection matrix P Step1. Choose an initial projection matrix P1 and set k

1.

Step2. Use the known projection matrix Pk to calculate new training samples yij

PkT xij (i

1,2,

, c; j

1,2,

, ni ) in the

transformed space. Step3. Calculate the collaborative representation coefficient vector ˆ of each training sample yij in the transformed space by Eq.(2) and then construct S w and Sb by Eq.(6) and Eq.(8).

CR IP T

Step4. Solve the generalized eigenvalue problem in Eq.(10) and use the generalized eigenvectors corresponding to d largest eigenvalues to form the projection matrix Pk

1

Step5. If (J (Pk 1 )

, the algorithm converges. Here, the final projection matrix P

J (Pk 1 )

J (Pk )) / J (Pk 1 )

.

J (Pk ) and P  Pk when J (Pk 1 )

J (Pk ) . Otherwise, set k

3.2. Initial Projection Matrix of OP-CRC

k

Pk

1

when

1 and go to Step2.

AN US

To run the OP-CRC algorithm, the initial projection matrix P1 must be given. A simple way is using the random matrix as P1 . Here, to improve the performance of OP-CRC, we employ another way to calculate P1 . We suppose that the collaborative representation coefficient vector ˆ is not changed after projection. Therefore, without the projection matrix P , ˆ can be directly calculated in the input space by Eq.(2) and S w and Sb are constructed using ˆ by Eq.(6) and Eq.(8). Then a projection matrix is obtained by solving the generalized eigenvalue problem in Eq.(10). We use this projection matrix as the initial projection matrix P1 of

M

the OP-CRC algorithm. This initial projection matrix is obtained by the criterion of OP-CRC, so it fits OP-CRC better. This initial projection matrix is called the fit matrix. 4. Experiments

PT

ED

In this section, we evaluate the effectiveness of the proposed OP-CRC algorithm by face recognition experiments. OP-CRC is compared with some popular dimensionality reduction methods such as PCA, LDA, LPP and SPP and some up to date methods such as SRC-DP, OP-SRC and CRP. To solve the small sample size problem, PCA is first performed before implementing LDA, LPP, SPP, SRC-DP, CRP and OP-CRC. As suggested in [20], the regularization parameter of OP-CRC is set as 0.001 n / 700 , where n is the number of training samples. For SRC-DP and OP-CRC, the parameter  in the convergence criterion is set as 107 . For LPP, the K -nearest neighbors parameter K is set as l  1 , where l is the training sample number of each class. For OP-SRC, the weight parameter  is set as 0.25. After dimensionality reduction, CRC is used for classification. All the experiments are carried out using MATLAB on a Intel Core i5-4300U CPU 1.9GHZ machine with 8G RAM.

CE

4.1. Experiments on Yale database

AC

The Yale face database contains 165 gray-scale images of 15 individuals, and each person has 11 different images under various facial expressions (normal, happy, sad, sleepy, surprised, and wink), lighting conditions (left-light, center-light, right-light) and with/without glasses. In the experiment, each image was manually cropped and resized to 100 80 pixels. Fig.1 shows sample images of one person.

Fig.1. Sample images of one person on Yale database

In the first experiment, we use the first three, four, five and six images of each person for training and the rest images for testing. To test the convergence of OP-CRC, the random matrix and the fit matrix are used as the initial projection matrix of OP-CRC respectively. Fig.2 shows the values of the criterion function J (P ) of OP-CRC versus No. of iteration. From fig.2, we can see that, no matter how many images are used for training, OP-CRC using the random matrix and the fit matrix as the initial projection matrices both converge well. However, using the fit matrix can converge faster and it all converges after the second iteration in the experiment. Therefore, OP-CRC using the fit matrix has lower computational complexity.

ACCEPTED MANUSCRIPT 2 Value of the Criterion Function J(P)

Value of the Criterion Function J(P)

2.1 2 1.9 1.8 1.7 1.6 1.5 Random Matrix

1.4

1.95 1.9 1.85 1.8 1.75 1.7 1.65

Fit Matrix 1.3 1

2

3

4

5 6 No. of Iteration

7

8

9

1.6 1

10

2

3

(a)

4 No. of Iteration

5

(b)

2.6

2.8 Random Matrix

Fit Matrix

2.4 2.2 2 1.8 1.6

2

3

Fit Matrix

2.6 2.4 2.2 2 1.8 1.6 1.4 1

4

2

No. of Iteration

(c)

CR IP T

Value of the Criterion Function J(P)

Value of the Criterion Function J(P)

Random Matrix

1.4 1

Random Matrix Fit Matrix 6 7

3

4 No. of Iteration

5

6

7

(d)

Fig.2. The convergence of OP-CRC on Yale database. (a)Three images of each person are used for training. (b)Four images of each person are used for training. (c)Five images of each person are used for training. (d)Six images of each person are used for training.

AN US

SRC-DP is similar to OP-CRC and it uses the iterative algorithm to search the optimal projection matrix for SRC. SRC-DP can also use the random matrix or the fit matrix as the initial projection matrix. Table1 lists the maximal recognition rates and the corresponding dimensions of SRC-DP and OP-CRC with two kinds of initial projection matrices. From table1, it can be found that SRC-DP using the random matrix and using the fit matrix obtain the same recognition rate, irrespective of the variation of training sample size. We can also find that OP-CRC using the fit matrix performs a little better than using the random matrix when the first three and four images per person are used for training, and they obtain the same recognition rates when the first five and six images per person are used for training. OP-CRC using the fit matrix has a better performance in general. In addition, for different training sample sizes, OP-CRC all performs better than SRC-DP, no matter which kind of initial projection matrix is used. Considering that SRC-DP and OP-CRC using the fit matrix converge faster than using the random matrix[18], the fit matrix is used as the initial projection matrix for these two methods in the following experiments. Fig.3 shows the recognition rates of SRC-DP and OP-CRC versus the dimensions. It can be seen from fig.3 that, when the dimension is large enough, OP-CRC almost obtains a higher recognition rates than SRC-DP, irrespective of the variation of dimensions.

M

Table 1 The maximal recognition rates(percent) and the corresponding dimensions(in parentheses) of SRC-DP and OP-CRC with two kinds of initial projection matrices on Yale database when the first three, four, five and six images per person are used for training Three training Four training Five training Six training 86.7 (14)

90.5 (14)

93.3 (13)

94.7 (14)

86.7(14)

90.5 (14)

93.3 (13)

94.7 (14)

89.2 (14)

93.3 (13)

95.6 (13)

96.0 (11)

90.0 (13)

94.3 (14)

95.6 (14)

96.0 (13)

ED

SRC-DP(random matrix) SRC-DP(fit matrix) OP-CRC(random matrix)

PT

OP-CRC(fit matrix)

1

0.9

0.95

CE

0.95

0.9

Recognition Rate

Recognition Rate

0.85

0.8

0.75

0.7

0.8 0.75

0.65

AC

0.85

0.7

SRC-DP

0.6

SRC-DP

OP-CRC 0.55 5

10

15

OP-CRC 20

0.65 5

10

Dimension

15

20

Dimension

(a)

(b)

1

1 0.95

0.95

Recognition Rate

Recognition Rate

0.9

0.9

0.85

0.85 0.8 0.75

0.8 0.7

SRC-DP

SRC-DP OP-CRC

OP-CRC 0.75 5

10

15 Dimension

20

0.65 5

10

15

20

Dimension

(c) (d) Fig.3. The recognition rates(percent) of SRC-DP and OP-CRC versus the dimensions on Yale database. (a)Three images of each person are used for training. (b)Four images of each person are used for training. (c)Five images of each person are used for training. (d)Six images of each person are used for training.

To evaluate the performance of OP-CRC for different training samples, in the second experiment, we randomly choose three, four, five and six images of each person for training, and the remaining images for testing. We run the system 20 times. PCA, LDA, LPP,

ACCEPTED MANUSCRIPT

SPP, SRC-DP, OP-SRC, CRP and OP-CRC are used to extract low dimensional face features. The maximal average recognition rates, the corresponding dimensions and the average running time of each method are shown in table2. It can be seen from table2 that OP-CRC achieves the highest recognition rate among all the eight methods, irrespective of the variation of training sample size. OP-CRC can improve the classification ability of CRC. LDA, SRC-DP and OP-SRC also have good performances. However, the running time of SRC-DP is long. It is caused by the iteration of sparse representation. OP-CRC implements the iteration of collaborative representation. Collaborative representation has lower complexity than sparse representation, so OP-CRC is faster than SRC-DP. CRP is very fast, but it does not have high recognition rates. Table 2 The maximal average recognition rates(percent), the corresponding dimensions(in parentheses) and the average running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on Yale database when three, four, five and six images of each person are randomly chosen for training Four training Rate Time 89.1(19) 0.073 90.8(14) 0.084 89.3(17) 0.086 89.5(19) 3.115 89.7(15) 11.81 90.0(14) 4.921 89.6(19) 0.135 91.7(14) 0.345

4.2. Experiments on AR database

Five training Rate Time 89.1(19) 0.093 90.7(14) 0.099 90.4(18) 0.108 89.8(20) 2.732 90.6(15) 8.394 90.4(14) 7.451 89.7(22) 0.182 92.4(14) 0.519

Six training Rate Time 90.2(20) 0.108 92.8(14) 0.120 92.1(18) 0.137 91.0(18) 6.076 92.5(14) 23.554 92.5(14) 9.514 91.1(19) 0.249 94.1(14) 0.827

CR IP T

PCA LDA LPP SPP SRC-DP OP-SRC CRP OP-CRC

Three training Rate Time 87.1(16) 0.062 88.1(14) 0.067 87.2(16) 0.071 87.5(16) 1.055 87.8(15) 3.539 88.0(14) 3.952 87.3(16) 0.101 88.5(14) 0.220

AN US

The AR face database contains over 4000 color face images of 126 people, including 26 frontal views of faces with different facial expressions, lighting conditions, and occlusions for each people. The pictures of 120 individuals were collected in two sessions (14 days apart) and each session contains 13 color images. We selected fourteen face images (each session containing 7) of these 120 individuals in our experiment. The images were converted to grayscale. The face portion of each image was manually cropped and normalized to 50 40 pixels. Fig.4 shows sample images of one person. These images vary as follows: neutral expression, smiling, angry, screaming, left light on, right light on, all sides light on.

M

Fig.4. Sample images of one person on AR database

PT

ED

Firstly, seven images of each person collected in the first session are used for training and the rest seven images collected in the second session for testing. The maximal recognition rates, the corresponding dimensions and the running time of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC are shown in table3. Secondly, we randomly select three images of each person for training and the remaining images for testing. The process is repeated for 10 times. Table4 lists the maximal average recognition rates, the corresponding dimensions and the average running time of eight dimensionality reduction methods. From table3 and table4, we can find that, in the first experiment, SPP has good performance and LDA, SRC-DP and OP-SRC don't perform very well. In the second experiment, LDA, SRC-DP and OP-SRC perform well and the performance of SPP is not very good. However, OP-CRC achieves the highest recognition rate in both of the experiments, no matter which samples are used for training. It can also be seen from table3 and table4 that OP-CRC is faster than SRC-DP.

AC

CE

Table 3 The maximal recognition rates(percent), the corresponding dimensions(in parentheses) and the running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on AR database when the first seven images per person are used for training PCA LDA LPP SPP SRC-DP OP-SRC CRP OP-CRC Recognition rate 75.2 74.4 75.0 76.2 74.4 74.8 74.5 77.0 Dimension 111 101 119 81 112 105 120 106 Running time 6.46 7.13 11.93 453.45 1362.70 589.84 396.91 1026.35 Table 4 The maximal average recognition rates(percent), the corresponding dimensions(in parentheses) and the average running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on AR database when three images per person are randomly selected for training PCA LDA LPP SPP SRC-DP OP-SRC CRP OP-CRC Recognition rate 94.8 95.3 95.2 94.4 95.3 95.2 94.8 96.1 Dimension 118 119 120 119 115 118 117 127 Running time 0.63 0.73 1.53 87.96 477.17 109.86 20.75 96.05

4.3. Experiments on FERET database The FERET face database was sponsored by the US Department of Defense through the DARPA Program. It has become a standard database for testing and evaluating face recognition algorithms. We performed algorithms on a subset of the FERET database. The subset is composed of 1400 images of 200 individuals, and each individual has seven images. It involves variations in face expression, pose and illumination. In the experiment, the facial portion of the original image was cropped based on the location of eyes and mouth. Then we resized the cropped images to 80  80 pixels. Seven sample images of one individual are shown in Fig. 5.

Fig.5. Sample images of one person on FERET database

ACCEPTED MANUSCRIPT Three and four images of each person are randomly selected to form the training set and the rest images are used to form the testing set. This process is repeated for 10 times. Table5 lists the maximal average recognition rates, the corresponding dimensions and the average running time of eight dimensionality reduction methods. From table5, it can be seen that supervised methods LDA, SRC-DP, OP-SRC and OP-CRC significantly outperform unsupervised methods PCA, LPP, SPP and CRP under different training sample sizes. The number of class is relatively bigger on this database. In this situation, supervised methods using class information may perform much better than unsupervised methods. The iteration algorithms SRC-DP and OP-CRC perform better than LDA and OP-SRC. More effective features in the transformed space can be obtained by iteration. The proposed OP-CRC receives the highest recognition rate under the CRC classifier, irrespective of the variation of training sample size. This indicates that OP-CRC could extract low dimensional features which fit CRC better than other methods. We can also find from table5 that the running time of OP-CRC is shorter than SRC-DP.

CR IP T

Table 5 The maximal average recognition rates(percent), the corresponding dimensions(in parentheses) and the average running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on FERET database when three and four images of each person are randomly selected for training Three training Four training Rate Time Rate Time PCA 41.3(89) 2.60 56.8(89) 4.26 LDA 46.4(89) 2.72 64.2(88) 4.75 LPP 42.4(89) 5.09 58.0(90) 9.12 SPP 40.6(87) 185.80 56.4(90) 330.31 SRC-DP 49.4(82) 675.98 68.7(76) 1193.20 OP-SRC 46.0(88) 255.86 64.6(87) 331.17 CRP 41.3(83) 127.22 56.6(90) 340.11 OP-CRC 50.4(89) 452.21 69.5(68) 726.69

4.4. Experiments on CMU_PIE and LFW databases

M

AN US

CMU PIE database totally contains 41368 images of 68 people. In our experiment, 18 images for each people including 7 poses with neutral expression in different views and 11 talking images in frontal view are selected. All these images were aligned by the centers of eyes and mouth and then cropped and resized into 217×178 pixels. Fig.6 shows 18 sample images of one person.

ED

Fig.6. Sample images of one person on CMU_PIE database

Fig.7. Sample images of one person on LFW database

AC

CE

PT

Labeled Faces in the Wild (LFW) is a face database designed for studying unconstrained face recognition. The database contains more than 13,000 images of faces collected from the web [23]. There are four different sets of LFW images including the original and three types of aligned images. In our experiment, the aligned images called "deep funneled" images were used [24]. We selected 500 images of 50 persons. Each person has 10 images. The face portions of all the 500 images were cropped and resized to 120×100 pixels. Fig.7 shows 10 sample images of one person.

On CMU_PIE database, we use the first 7 images of each person for training and the rest 11 talking images for testing. Table 6 lists the experimental results of eight dimensionality reduction methods. From table 6, we can find that the performances of sparse representation based dimensionality reduction methods SPP, SRC-DP and OP-SRC are not good. However, collaborative representation based dimensionality reduction methods CRC and OP-CRC perform very well. The proposed OP-CRC obtains the highest recognition rate and has a shorter running time than SRC-DP. On LFW database, we randomly select five images of each person for training and the remaining images for testing. This process is repeated for 10 times. The experimental results of eight dimensionality reduction methods are shown in table 7. Since the images are collected in the unconstrained environment, the recognition rates are low on this database. LDA and two iteration algorithms SRC-DP and OP-CRC have relatively higher recognition rates. The best recognition rate is obtained by our OP-CRC and OP-CRC has higher efficiency than SRC-DP. Table 6 The maximal recognition rates(percent), the corresponding dimensions(in parentheses) and the running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on CMU_PIE database when the first seven images per person are used for training PCA LDA LPP SPP SRC-DP OP-SRC CRP OP-CRC Recognition rate 66.3 59.8 61.4 57.9 59.0 52.3 67.6 68.4 Dimension 49 67 72 51 79 86 52 72 Running time 31.78 32.53 33.58 138.1 319.53 122.76 125.35 184.57

ACCEPTED MANUSCRIPT Table 7 The maximal average recognition rates(percent), the corresponding dimensions(in parentheses) and the average running time(second) of PCA, LDA, LPP, SPP, SRC-DP, OP-SRC, CRP and OP-CRC on LFW database when five images per person are randomly selected for training PCA LDA LPP SPP SRC-DP OP-SRC CRP OP-CRC Recognition rate 35.9 39.3 36.7 36.0 38.7 36.6 35.9 41.3 Dimension 60 40 60 57 60 58 57 53 Running time 0.88 0.89 1.15 29.05 103.69 32.14 5.64 37.70

4.5. Discussion From the above experiments, we have the following findings: In contrast to the random matrix, using the fit matrix as the initial projection matrix can speed up the convergence of OP-CRC. It also makes OP-CRC have a little better performance.

2)

On different databases, using the low dimensional features extracted by OP-CRC, the classifier CRC can obtain the highest recognition rate This indicates that OP-CRC is a dimensionality reduction method fit for CRC.

3)

As an iteration algorithm, OP-CRC is faster than SRC-DP which also needs iteration. However, the speed difference between two methods decreases with the increase of training sample size. For example, OP-CRC is significantly faster than SRC-DP on Yale database while slightly faster on FERET database. This is mainly caused by that the time of computing the inverse of the I increases quickly with the increase of training sample size. According to the algorithm of OP-CRC, this matrix X T X process is indispensable for OP-CRC.

CR IP T

1)

5. Conclusions

AN US

In this paper, we propose Optimized Projection for Collaborative Representation based Classification (OP-CRC) and successfully apply it to face recognition. OP-CRC is a dimensionality reduction method which fits Collaborative Representation based Classification (CRC) well. OP-CRC seeks a projection that the ratio of the collaborative representation based between-class scatter and the collaborative representation based within-class scatter is maximized in the projected space. It is faster than Sparse Representation based Classification steered Discriminative Projection (SRC-DP) because of the low complexity of collaborative representation. Experiments are performed on Yale, AR, FERET, CMU_PIE and LFW databases and the results demonstrate the advantages of OP-CRC.

Acknowledgements

M

This work is supported by Shanghai Municipal Natural Science Foundation under Grants No. 13ZR1455600 and No.14ZR1419300, the National Natural Science Foundation of China under Grants No. 31470954, No. 61203240 and No. 61403251. References

AC

CE

PT

ED

[1] W. Zhao, R.Chellappa, A. Rosenfeld, P.J. Phillips, Face recognition: A literature survey, ACM Comput. Surv., 35 (2003) 399-458. [2] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on1991), pp. 586-591. [3] P.N. Belhumeur, J.P. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19 (1997) 711-720. [4] X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using Laplacianfaces, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27 (2005) 328-340. [5] X. He, D. Cai, S. Yan, H.-J. Zhang, Neighborhood preserving embedding, Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, (IEEE2005), pp. 1208-1213. [6] D. Cai, X. He, J. Han, Isometric projection, Proceedings of the National Conference on Artificial Intelligence2007), pp. 528-533. [7] C. Hwann-Tzong, C. Huang-Wei, L. Tyng-Luh, Local discriminant embedding and its variants, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on2005), pp. 846-853 vol. 842. [8] Y. Shuicheng, X. Dong, Z. Benyu, Z. Hong-Jiang, Y. Qiang, S. Lin, Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29 (2007) 40-51. [9] J. Gui, C. Wang, L. Zhu, Locality preserving discriminant projections, International Conference on Intelligent Computing2009), pp. 566-572. [10] F. Yun, Y. Shuicheng, T.S. Huang, Classification and Feature Extraction by Simplexization, Information Forensics and Security, IEEE Transactions on, 3 (2008) 91-100. [11] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, M. Yi, Robust Face Recognition via Sparse Representation, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31 (2009) 210-227. [12] J. Yin, Z. Liu, Z. Jin, W. Yang, Kernel sparse representation based classification, Neurocomputing, 77 (2012) 120-128. [13] L. Zhang, W.D. Zhou, P.C. Chang, J. Liu, Kernel Sparse Representation-based Classifier, IEEE Transactions on Signal Processing, 60 (2012) 1684-1695. [14] L.S. Qiao, S.C. Chen, X.Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition, 43 (2010) 331-341. [15] C. Bin, Y. Jianchao, Y. Shuicheng, F. Yun, T.S. Huang, Learning With l(1)-Graph for Image Analysis, Image Processing, IEEE Transactions on, 19 (2010) 858-866. [16] J. Lai, X. Jiang, Discriminative sparsity preserving embedding for face recognition, ICIP2013), pp. 3695-3699. [17] J. Gui, Z.A. Sun, W. Jia, R.X. Hu, Y.K. Lei, S.W. Ji, Discriminant sparse neighborhood preserving embedding for face recognition, Pattern Recognition, 45 (2012) 2884-2893. [18] J. Yang, D.L. Chu, L. Zhang, Y. Xu, J.Y. Yang, Sparse Representation Classifier Steered Discriminative Projection With Application to Face Recognition, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 24 (2013) 1023-1035. [19] C.-Y. Lu, D.-S. Huang, Optimized projections for sparse representation based classification, Neurocomputing, 113 (2013) 213-219. [20] D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: Which helps face recognition?, Computer Vision (ICCV), 2011 IEEE International Conference on, (IEEE2011), pp. 471-478. [21] W. Liu, L. Lu, H. Li, W. Wang, Y. Zou, A novel kernel collaborative representation approach for image classification, 2014 IEEE Internatonal Conference on Image Processing (ICIP), (IEEE, Paris, France, 2014), pp. 4241-4245. [22] W.K. Yang, Z.Y. Wang, C.Y. Sun, A collaborative representation based projections method for feature extraction, Pattern Recognition, 48 (2015) 20-27. [23] M.R. Gary B. Huang, Tamara Berg, Erik Learned-Miller, Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, (University of Massachusetts, Amherst, 2007).

ACCEPTED MANUSCRIPT Erik Learned-Miller, Learning to Align from Scratch, Advances in Neural Information Processing Systems

AC

CE

PT

ED

M

AN US

CR IP T

[24] M.M. Gary B. Huang, Honglak Lee, (NIPS)2012).