Robust 2DLDA based on correntropy

Robust 2DLDA based on correntropy

ARTICLE IN PRESS JID: NEUCOM [m5G;August 29, 2018;4:24] Neurocomputing 0 0 0 (2018) 1–6 Contents lists available at ScienceDirect Neurocomputing ...

1MB Sizes 0 Downloads 45 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;August 29, 2018;4:24]

Neurocomputing 0 0 0 (2018) 1–6

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Brief papers

Robust 2DLDA based on correntropy Fujin Zhong a,b,∗, Li Liu a,b, Jun Hu a,b a b

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

a r t i c l e

i n f o

Article history: Received 23 November 2017 Revised 23 July 2018 Accepted 10 August 2018 Available online xxx Keywords: 2DLDA Outliers Robustness Correntropy Optimization

a b s t r a c t To further improve the robustness of two-dimensional LDA (2DLDA) methods against outliers, this paper proposes a new robust 2DLDA version which obtains the optimal projection transformation by maximizing the correntropy-based within-class similarity and maintaining the global dispersity simultaneously. The objective problem of the proposed method can be solved by an iterative optimization algorithm which is proved to converge at a local maximum point. The experimental results on FERET face database, PolyU palmprint database and Binary Alphadigits database illustrate that the proposed method outperforms three conventional 2DLDA methods when there are outliers. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Principal Component Analysis (PCA) [1] and Linear Discriminant Analysis (LDA) [2] are two well-known dimensionality reduction techniques for feature extraction, which have played a significant role in the fields of pattern recognition, machine learning and computer vision. PCA is an unsupervised technique and seeks the optimal projection transformation by maximizing the variance of training data in low-dimensional feature space. LDA is a supervised method and obtains the maximum class discrimination by means of maximizing the within-class similarity and the between-class dissimilarity simultaneously. LDA has been widely applied in many fields such as face recognition [3], hepatitis diagnosis [4], human action recognition [5] and motor bearing fault diagnosis [6]. Classic LDA extracts the low-dimensional feature just from the vectorbased (1D) training data, which leads matrix-based (2D) data, such as images, to be transformed to vectors. Obviously, this transformation ignores the local spatial structure which is useful for expressing the intrinsic feature of image data. The multidimensional extension can improve the adaption of the 1D model [7]. Therefore, 2DLDA was proposed to extract features from the image matrices [8]. However, Frobenius-norm 2DLDA (2DLDA-L2) is sensitive to outliers because its similarity/dissimilarity metric adopts Frobenius-norm distance which magnifies the influence of the out∗ Corresponding author at: Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 40 0 065, China. E-mail address: [email protected] (F. Zhong).

lying data due to the square operation. In the last several years, L1-norm was popularly applied to polish up the robustness of PCA against outliers [9–12]. Motivated by the essence of PCA based on rotational invariant L1-norm (PCA-R1) [9], LDA-R1 and 2DLDA-R1 were proposed to improve the robustness of LDA-L2 and 2DLDAL2 [13]. The high time complexity will be encountered for LDAR1 when the dimensionality of the training data is high. Similarly, inspired by PCA-L1 [10] and CSP-L1 [14], LDA-L1 [15,16] was proposed to further improve the robustness of LDA-L2 and the performance of LDA-R1. Subsequently, Li et al. [17] extended LDA-L1 to two-dimensional linear discriminant analysis (L1-2DLDA) for making full use of the local structure of images. But the optimization of L1-2DLDA cannot obtain the solution of the corresponding trace ratio form [18]. Recently, Liu et al. [19] proposed an iterative nongreedy framework to optimize the trace ratio form of LDA-L1 and analyzed the convergence of their proposed algorithm. The idea of LDA-L1 was extended to optimize the objective function of 2DLDAL1 [18]. Correntropy was proposed as a similarity metric which reflects the second-order statistics of the input data in the transformed space [20]. Theoretically, maximum correntropy criterion (MCC) is a local similarity criterion of data pairs and should have outstanding advantages when large nonzero mean and non-Gaussian outliers are present [20]. In 2011, He et al. [21] proposed a robust PCA based on MCC (PCA-MCC) by applying correntropy to measure the construction error. PCA-MCC outperforms PCA-R1 and PCA-L1 when there are large outliers in the training data. He et al. [22] further proposed a robust face recognition method to validate the feasibility of correntropy. Subsequently, MCC was used to improve LDA [23] and LPP [24] against outliers respectively. Recently, cor-

https://doi.org/10.1016/j.neucom.2018.08.026 0925-2312/© 2018 Elsevier B.V. All rights reserved.

Please cite this article as: F. Zhong https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

2DLDA

based

on

correntropy,

Neurocomputing

(2018),

ARTICLE IN PRESS

JID: NEUCOM 2

[m5G;August 29, 2018;4:24]

F. Zhong et al. / Neurocomputing 000 (2018) 1–6

rentropy was successfully applied to multidimensional scaling [25], tensor factorization [26] and multi-label active learning [27]. The aforementioned methods adequately demonstrated the robustness of correntropy. Although 2DLDA-R1 [13] and 2DLDA-L1 [18] are capable of alleviating the impact of outliers at a certain level, they are difficult to cope with large nonzero mean and non-Gaussian outliers. Correntropy has been theoretically proved to be a robust metric for dealing with large nonzero mean and non-Gaussian outliers [20]. In practice, several applications also have illustrated that MCC is very robust against outliers [21–27]. In this paper, therefore, correntropy will be applied to further enhance the robustness of 2DLDA methods against outliers and a new robust 2DLDA method based on MCC (2DLDA-MCC) is proposed. 2DLDA-MCC adopts the withinclass similarity based on correntropy to alleviate the negative effect of outliers, and aims to obtain the optimal projection transformation by maximizing the correntropy-based within-class similarity and simultaneously maintaining the global dispersity. Moreover, the objective problem of 2DLDA-MCC is solved by an iterative half-quadratic optimization procedure. Compared with the existing 2DLDA methods, 2DLDA-MCC has the following two appealing aspects. 1) Traditional 2DLDA methods often apply the same metric (i.e. Frobenius-norm, R1-norm or L1-norm) to measure the withinclass similarity and the global dispersity in their objective problems. 2DLDA-MCC adopts two different metrics to measure the within-class similarity and the global dispersity respectively. The main advantage is that LDA-MCC simultaneously maintains the within-class cohesion owing the robustness of correntropy against outliers and the global dispersity owing the property of Frobenius-norm. The idea may inspire the potential researches on robust supervised subspace learning with multiple metrics. 2) Actually, the proposed optimization procedure of 2DLDA-MCC is a reweighted method, and the Gaussian like weighting function attenuates the large outlying terms so that outliers would have a less impact on the adaptation during the optimization procedure [20]. Therefore, 2DLDA-MCC shows better robustness on overcoming outliers compared with the traditional 2DLDA methods and several experiments illustrate the results.

If kernel function kσ ( · ) is set to the Gaussian kernel function g(a ) = exp(−||a||22 /2σ 2 ) where σ denotes the kernel size, the Eq. (2) can be rewritten as follows N 1 g(xi − yi ). N

CˆN,σ (x, y ) =

(3)

i=1

Maximizing the sample correntropy of Eq. (3) is called maximum correntropy criterion (MCC) which is a robust local similarity metric [20]. For improving the robustness, 2DLDA-MCC adopts correntropy to measure the similarity between the training samples and their class means. Suppose that there are N training samples X j ∈ Rm×n ( j = 1, 2, ..., N ) which belong to  classes. Meanwhile, we can appoint i as the index set of the ith class which has Ni samples, thus the sum of Ni (i = 1to) is N. The global mean of all training samples and the class mean of the ith class samples are denoted by U and Ui , respectively. Then, we can compute the total correntropy-based within-class similarity in the feature space as follows

JCorr (P ) =

  



 

g X j − Ui P

(4)

i=1 j∈i

where P ∈ Rn × d (d < n) is a linear transformation from m × n dimensional sample space to m × d dimensional feature space. For obtaining the discrimination, 2DLDA-MCC formulates the objective problem by maximizing the total correntropy-based within-class similarity and simultaneously maintaining the global dispersity as follows

Popt = arg max JCorr (P ) = arg max s.t.

N   

P

P

 2 X j − U P = c

   i=1 j∈i



 

g X j − Ui P

(5)

F

j=1

where c is a positive constant. In the next subsection, we present a standard optimization algorithm based on an iterative halfquadratic programming framework to solve the objective problem of 2DLDA-MCC. 2.2. Optimization algorithm

The remainder of this paper is organized as follows. In Section 2, the objective problem of 2DLDA-MCC is formulated and its optimization algorithm is given in detail. In Section 3, we report the experimental results on three image databases. Finally, Section 4 simply concludes this paper.

In information theoretic learning, the half-quadratic programming method is often used to optimize the nonlinear objective problem [21]. According to the half-quadratic method, an iterative procedure is designed to optimize the problem of (5). Being derived from the theory of convex conjugated functions, a proposition is firstly introduced as follows [21].

2. 2DLDA based on correntropy

Proposition 1. There should be a convex conjugated function δ of Gaussian function g(a), so that



2.1. Objective problem Based on the information potential, we can view correntropy as a generalized similarity metric of two arbitrary random variables x and y [20]. Thus, correntropy can be defined as follows [20]

Cσ (x, y ) = E [kσ (x − y )]

(1)

where E( · ) denotes the mathematical expectation and kσ ( · ) denotes the kernel function that satisfies Mercer’s theory so that it induces a nonlinear mapping from the input space to an infinite dimensional reproducing kernel Hilbert space [20]. However, the joint probability density function of x and y cannot be accurately estimated in practice, and there are just a certain amount of sample pairs {(xi , yi )}N available for use. Therefore, the sample estii=1 mator of correntropy can be computed as follows

CˆN,σ (x, y ) =

N 1 k σ ( x i − y i ). N

(2)

g(a ) = max

cite

this

article

as:

F.

Zhong

https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

2

(6)

σ2

q

where q ∈ R denotes an auxiliary variable which is a scalar. When q = −g(a ), the maximum of g(a) will be reached for the fixed a [21]. By substituting (6) into (4), the expanded objective function of 2DLDA-MCC can be derived in the augmented parameter space as follows

JˆCorr (P, q ) =

  



 2

 

q j  X j − Ui P − δ q j F

i=1 j∈i

(7)

where the entries qj (1 ≤ j ≤ N) of vector q are the auxiliary variables. According to Proposition 1, the following equation is obtained in the case of the fixed P

JCorr (P ) = max JˆCorr (P, q ).

(8)

q

i=1

Please

  

a2 q − δ q

2DLDA

based

on

correntropy,

Neurocomputing

(2018),

ARTICLE IN PRESS

JID: NEUCOM

[m5G;August 29, 2018;4:24]

F. Zhong et al. / Neurocomputing 000 (2018) 1–6

Thus, there is the following equation

Algorithm 1 2DLDA-MCC.

max JCorr (P ) = max JˆCorr (P, q )

(9)

P,q

P

which illustrates that the maximization of JCorr (P) is equivalent to the maximization of JˆCorr (P, q ). According to (9), an alternating maximization method in an iterative procedure will be introduced to obtain one local maximum point of the objective problem. Firstly, suppose that Ut−1 is the class mean of the ith class i training data and Pt−1 is the projection matrix at the t-1th iteration. Thus, for the fixed Pt−1 , we can compute qtj (1 ≤ j ≤ N, j ∈ i ) as follows





qtj = −g X j − Ut−1 Pt−1 i



(10)

Secondly, we can update the class means and the global mean as follows

1 Uti =  j∈i

 qt

N 

1

Ut =

N  j=1

j∈i

j

qtj

qtj X j

qtj X j .

(12)

j=1

Thirdly, for the fixed q j = qtj , we can obtain the projection ma-





Pt = arg max JˆCorr P, qt = arg max P

P

N  

   X j − Ut P2 = c.

   i=1 j∈i

  2

qt  X j − Ut P i

j

F

F

j=1

(13) By simple algebra transformation, (13) is rewritten as follows





P = arg max tr PT Sw P t



P



(14)

s.t . t r PT St P = c

   i=1 j∈i

St =

N  



qtj X j − Uti

X j − Ut

T 

T 

X j − Uti



(15)



X j − Ut .

(16)

The solution of (14) can be given by the following generalized eigenvalue problem

Sw p = λSt p.

(17)

Pt

is constituted of d eigenvectors which correspond to the d largest eigenvalues. Then, Pt is projected onto an orthogonal cone T as Pt = Pt ( (Pt ) Pt )−1/2 . During the aforementioned iterative procedure, the objective of 2DLDA-MCC is a nondecreasing function of t. Proposition 2 gives the convergence of the proposed optimization algorithm and is proved as the following. Proposition 2. {JˆCorr (Pt , qt ), t = 1, 2, 3, ...} will converge during the iterative optimization procedure. Proof: Proposition 1 illustrates that the right side of Eq. (6) arrives at the maximum for a fixed a when q = −g(a ). So, for the cite

this

j=1

fixed Pt−1 , we obtain that JˆCorr (Pt−1 , q ) reaches the next local maximum when q is updated from qt−1 to qt as (10) at the first step of the tth iteration, i.e. JˆCorr (Pt−1 , qt−1 ) ≤ JˆCorr (Pt−1 , qt ). Meanwhile, according to (13), Pt is the maximum point of JˆCorr (P, qt ) for the fixed qt , i.e. JˆCorr (Pt−1 , qt ) ≤ JˆCorr (Pt , qt ). As a result, we can conclude that











JˆCorr Pt−1 , qt−1 ≤ JˆCorr Pt−1 , qt ≤ JˆCorr Pt , qt



(18)

which illustrates that JˆCorr (Pt , qt ) does not decrease at each iteration in the alternating maximization procedure. In addition, JCorr (P) is bounded according to the bounded property of correntropy [20]. Then, we can obtain that JˆCorr (P, q ) is bounded because of Eq. (9). Finally, it is concluded that {JˆCorr (Pt , qt ), t = 1, 2, 3, ...} converges. We summarize the proposed optimization procedure in Algorithm 1. For reducing the time cost, Algorithm 1 shows that the projection matrix Pt can be just constituted of dr (dr < d) eigenvectors before it converges. Actually, the final optimal solution Popt can be constituted of d eigenvectors which are obtained at the last iteration for more discriminating information.

In this section, the proposed 2DLDA-MCC is evaluated on three image databases (FERET face database [28], PolyU palmprint database [29] and Binary Alphadigits database [30]) and compared with 2DLDA-L2 [8], 2DLDA-R1 [13] and 2DLDA-L1 [18]. Motivated by reference [21], the kernel size of 2DLDA-MCC for the ith class samples is set as follows

  t 2 1  2 X j − Ut−1  . σi = 2 ∗ i F Ni

j=1

Please

j∈i

2: REPEAT 3: t=t+1 4: Update qtj according to (10) 5: Update Uti and Ut according to (11) and (12) respectively T 6: Update Pt ∈ Rn×dr according to (17), then set Pt = Pt ( (Pt ) Pt )−1/2 7: IF JCorr (Pt ) does not significantly increase 8: convergence = TRUE 9: ENDIF 10: UNTIL convergence 11: For more discriminating information, Popt ∈ Rn × d is constituted of d eigenvectors corresponding to the d largest eigenvalues at the last iteration.

3. Experimental results

where tr( · ) is the trace of a matrix, Sw can be viewed as the weighted within-class scatter matrix, and St is the global scatter matrix. Sw and St are defined as follows

Sw =

Input: Training samples X j ∈ Rm×n ( j = 1, 2, ..., N ) and an initializing orthonormal projection matrix P0 ∈ Rn×dr (dr < d < n). Output: Popt ∈ Rn × d . Procedure: N   X j , Ut = N1 X j , and convergence = FALSE 1: Initialize t = 0, Uti = N1i

(11)

trix Pt by maximizing the objective function of (7). The convex conjugated function δ (qj ) has nothing to do with P for the fixed qj , so we can derive Pt by solving the following optimization problem

s.t.

3

article

as:

F.

Zhong

https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

(19)

j∈i

Moreover, the nearest neighbor (1NN) classifier based on Euclidean distance is adopted for classification, and all experiments are conducted on a desktop system with Intel i5-4590 3.30 GHz, 8GB RAM and Matlab R2014b. 3.1. Results on FERET database FERET face database is constituted of 14,051 gray-scale images of 1199 individuals which have several variations including lighting, expression, pose, etc. In this work, we only consider the frontal facial images and select a subset from the FERET database. This subset is constituted of 1200 facial images of 200 individuals, and each individual has six facial images which are named as “Ia”, “Ib”, “Ic”, “Id”, “Ie”, “If”, respectively. All images are manually cropped and normalized to 80 × 80 size according to the locations of the 2DLDA

based

on

correntropy,

Neurocomputing

(2018),

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;August 29, 2018;4:24]

F. Zhong et al. / Neurocomputing 000 (2018) 1–6 Table 1 Optimal classification accuracy (%) on FERET subset.

Fig. 1. Facial images of FERET subset: (a) the images of two individuals with no occlusion, (b) the occluded images of six individuals.

mouth and eyes. Fig. 1(a) shows the images of two individuals in the FERET subset. Then, an artificial rectangle occlusion is added to the image called “Ia” of each individual. The rectangle occlusions are constituted of random white or black pixels. These occlusions are placed in the random positions of the images and their sizes are randomly set from 20 × 20 to 60 × 60. Fig. 1(b) gives some facial images with occlusion. This subsection conducts four tests

Methods

Test #1

Test #2

Test #3

Test #4

Average

2DLDA-L2 2DLDA-R1 2DLDA-L1 2DLDA-MCC

64.00 73.33 70.67 75.17

64.83 73.17 75.00 73.50

82.67 85.50 83.50 86.17

85.67 86.33 87.17 89.17

74.29 79.58 79.08 81.00

which are named as Test #1, Test #2, Test #3 and Test #4. The training set of each test contains three images of each individual, and the remaining images are used as the testing set. Specifically, the training set of Test #1 contains images “Ib”, “Ic” and the occluded “Ia”, the training set of Test #2 contains images “Id”, “Ie” and the occluded “Ia”, the training set of Test #3 contains images “Ic”, “Id” and the occluded “Ia”, and the training set of Test #4 contains images “Ib”, “Ie” and the occluded “Ia”. We focus on comparing the correct classification performance variation versus feature dimensionality and average optimal classification accuracy. Fig. 2 gives their classification performance variation versus the size of projection matrix and Table 1 shows the optimal correct classification rates of four methods. Obviously, 2DLDA-MCC outperforms 2DLDA-L2 and 2DLDA-R1, and gains higher correct classification rates than 2DLDA-L1 except for Test #2. Statistically, the average optimal correct classification rate of 2DLDA-MCC is the highest of four methods. The results illustrate that 2DLDA-MCC gains

Fig. 2. Classification accuracy versus the size of projection matrix on FERET subset: (a) Test #1, (b) Test #2, (c) Test #3, (d) Test #4.

Please

cite

this

article

as:

F.

Zhong

https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

2DLDA

based

on

correntropy,

Neurocomputing

(2018),

ARTICLE IN PRESS

JID: NEUCOM

[m5G;August 29, 2018;4:24]

F. Zhong et al. / Neurocomputing 000 (2018) 1–6

Fig. 3. Images of one palm from four datasets based on PolyU database: (first row) Original, (second row) Outlier 1, (third row) Outlier 2 and (fourth row) Outlier 3. Table 2 Average optimal classification accuracy (%) and standard deviation on PolyU database. Methods

Original

Outlier 1

Outlier 2

Outlier 3

2DLDA-L2 2DLDA-R1 2DLDA-L1 2DLDA-MCC

99.90 ± 0.16 99.87 ± 0.42 99.73 ± 0.73 99.90 ± 0.23

98.30 ± 2.12 99.23 ± 1.33 99.03 ± 1.20 99.47 ± 0.91

98.03 ± 1.58 98.40 ± 2.33 98.05 ± 2.33 98.43 ± 2.10

90.83 ± 4.01 91.03 ± 3.25 90.97 ± 3.34 92.97 ± 1.92

Fig. 4. Digit instances of Binary Alphadigits database.

better robustness than 2DLDA-L2, 2DLDA-R1 and 2DLDA-L1 against outliers. 3.2. Results on PolyU database PolyU palmprint database is constituted of 600 gray-scale images of 100 palms with six images of each palm. The central parts of all images are cropped and normalized to 64 × 64 pixels using a similar algorithm of reference [29]. In our experiments, these preprocessed images without occlusion form the first dataset called “Original” . Six images of one palm from “Original” dataset are shown in the first row of Fig. 3. In addition, the other three datasets are formed from the “Original” dataset with some rectangle occlusions as outliers. The rectangle occlusions are generated as the above subsection. The second dataset called “Outlier 1” includes 600 preprocessed images in the “Original” dataset, among which two images of every palm are randomly selected to

5

be added rectangle occlusions whose sizes range from 10 × 10 to 30 × 30. The second row of Figure 3 gives the corresponding samples of one palm in the second dataset. Similarly, we construct the third dataset “Outlier 2” and the fourth dataset “Outlier 3”. Their samples of one palm are shown by the third and fourth row of Fig. 3, respectively. The sizes of the occlusions in dataset “Outlier 2” range from 20 × 20 to 40 × 40 and the sizes of the occlusions in dataset “Outlier 3” range from 30 × 30 to 50 × 50. The classification experiments are respectively conducted on four datasets. All images are normalized by mapping each image’s means to 0 and deviations to 1 before the projection matrix is learned. We randomly choose three images of each palm to form the training set and adopt the rest images as the testing set. Considering that each palm just has 6 images, we repeat the experiments ten times and record the optimal correct classification rates of four methods. Then, we calculate the average optimal correct classification rates and the standard deviations which are shown in Table 2. The experimental results clearly indicate that 2DLDA-MCC outperforms the other three 2DLDA methods. 3.3. Results on Binary Alphadigits database Ten binary digits of “0” through “9” are selected from Binary Alphadigits database to form the experimental dataset. Each digit has 39 instances with 20 × 16 size and Fig. 4 gives all 390 instances. We can find that most instances are relatively standardized, but a few instances are illegible and hardly distinguished visually. To some extent, we can view those normative instances as the inlying data and those illegible instances as the outlying data. The experiments on Binary Alphadigits dataset mainly evaluate the optimal correct classification rates of four 2DLDA methods under different training numbers of each digit. We randomly choose K (K = 8, 10, 12, 14, 16, 18, 20) instances of each digit to form the training set. And the remaining 39-K instances of each digit are used as the testing set. For a specific K, we repeat the random classification test twenty times for every method considering that each digit has 39 instances. Random classification test means that the K training instances of each digit are randomly selected from Binary Alphadigits dataset. The optimal correct classification rates of four methods are recorded. Then, the average optimal correct classification rates and the standard deviations are computed. The experimental results are shown in Table 3 which clearly indicates that 2DLDA-MCC outperforms the other three methods. 4. Conclusion In this paper, we propose a robust 2DLDA version, i.e. 2DLDAMCC, which aims to obtain the optimal projection transformation by maximizing the correntropy-based within-class similarity while maintaining the global dispersity. To solve the objective problem of 2DLDA-MCC, we present an iterative standard optimization algorithm based on a half-quadratic programming framework which can obtain a local maximum solution. The experimental results on FERET face database, PolyU palmprint database and Binary Alphadigits database illustrate that 2DLDA-MCC gains better classification accuracy than 2DLDA-L2, 2DLDA-R1 and 2DLDA-L1 when

Table 3 Average optimal classification accuracy (%) and standard deviation on Binary Alphadigits database. Methods

2DLDA-L2 2DLDA-R1 2DLDA-L1 2DLDA-MCC

Please

cite

this

article

K training number of each digit K=8

K = 10

K = 12

K = 14

K = 16

K = 18

K = 20

82.55 ± 2.46 84.63 ± 2.56 82.32 ± 3.30 85.24 ± 2.40

83.95 ± 2.28 86.97 ± 1.57 83.55 ± 2.53 87.07 ± 1.78

85.61 ± 1.85 87.50 ± 1.47 84.91 ± 2.17 88.30 ± 1.84

86.96 ± 1.58 88.44 ± 1.59 85.06 ± 2.23 89.42 ± 1.19

87.50 ± 1.69 89.50 ± 1.71 86.11 ± 2.47 89.89 ± 1.87

88.55 ± 1.72 90.12 ± 1.72 85.17 ± 2.34 90.60 ± 1.51

89.55 ± 1.72 91.21 ± 1.56 87.58 ± 2.80 91.76 ± 1.91

as:

F.

Zhong

https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

2DLDA

based

on

correntropy,

Neurocomputing

(2018),

ARTICLE IN PRESS

JID: NEUCOM 6

there are outliers. In the future, we will explore the application of 2DLDA-MCC to other tasks, such as image retrieval [31]. Acknowledgments The authors also extend special thanks to all reviewers and the associate editor for their constructive comments and suggestions. This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFC0804002, in part by the National Natural Science Foundation of China under Grant 61533020 and Grant 61751312, and in part by Chongqing Research Program of Basic Research and Frontier Technology under Grant cstc2017jcyjAX0406 and Grant cstc2017jcyjAX0325. References [1] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, 1986. [2] R.A. Fisher, The use of multiple measures in taxonomic problems, Ann. Eugenics 7 (1936) 179–188. [3] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711–720. [4] E. Dogantekin, A. Dogantekin, D. Avci, Automatic hepatitis diagnosis system based on linear discriminant analysis and adaptive network based on fuzzy inference system, Expert Syst. Appl. 36 (8) (2009) 11282–11286. [5] A. Iosifidis, A. Tefas, N. Nikolaidis, I. Pitas, Multi-view human movement recognition based on fuzzy distances and linear discriminant analysis, Comput. Vis. Image Understand. 116 (3) (2012) 347–360. [6] X. Jin, M. Zhao, T.W. Chow, M. Pecht, Motor bearing fault diagnosis using trace ratio linear discriminant analysis, IEEE Trans. Ind. Electron. 61 (5) (2014) 2441–2451. [7] Y. Luo, D. Tao, K. Ramamohanarao, C. Xu, Tensor canonical correlation analysis for multi-view dimension reduction, IEEE Trans. Knowl. Data Eng. 27 (11) (2015) 3111–3124. [8] J. Yang, D. Zhang, Y. Xu, J.Y. Yang, Two-dimensional discriminant transform for face recognition, Pattern Recognit 38 (7) (2005) 1125–1129. [9] C. Ding, D. Zhou, X. He, H. Zha, R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization, in: Proceedings of the International Conference on Machine Learning, 2006, pp. 281–288. [10] N. Kwak, Principal component analysis based on L1-norm maximization, IEEE Trans. Pattern Anal. Mach. Intell. 30 (9) (2008) 672–1680. [11] X.L. Li, Y.W. Pang, Y. Yuan, L1-norm-based 2DPCA, IEEE Trans. Syts., Man, Cybern. B. 40 (4) (2009) 1170–1175. [12] Y. Pang, X. Li, Y. Yuan, Robust tensor analysis with L1-norm, IEEE Trans. Circuits Syst. Video Technol. 20 (2) (Feb. 2010) 172–178. [13] X. Li, W. Hua, H. Wang, Z. Zhang, Linear discriminant analysis using rotational invariant L1 norm, Neurocomputing 73 (2010) 2571–2579. [14] H. Wang, Q. Tang, W. Zheng, L1-norm-based common spatial patterns, IEEE Trans. Biomedical Eng. 59 (3) (2012) 653–662. [15] F. Zhong, J. Zhang, Linear discriminant analysis based on L1-norm maximization, IEEE Trans. Image Process. 22 (8) (2013) 3018–3027. [16] H. Wang, X. Lu, Z. Hu, W. Zheng, Fisher discriminant analysis with L1-norm, IEEE Trans. Cybern. 44 (6) (2014) 828–842. [17] C. Li, Y. Shao, N. Deng, Robust L1-norm two-dimensional linear discriminant analysis, Neural Netw 65 (5) (2015) 92–104. [18] M. Li, J. Wang, Q. Wang, Q. Gao, Trace ratio 2DLDA with L1-norm optimization, Neurocomputing (2017). [19] Y. Liu, Q. Gao, S. Miao, X. Gao, F. Nie, Y. Li, A non-greedy algorithm for L1-norm LDA, IEEE Trans. Image Process. 26 (2) (2017) 684–695. [20] W. Liu, P.P. Pokharel, J.C. Principe, Correntropy: properties and applications in non-Gaussian signal processing, IEEE Trans. Image Process. 55 (11) (2007) 5286–5298.

Please

[m5G;August 29, 2018;4:24]

F. Zhong et al. / Neurocomputing 000 (2018) 1–6

cite

this

article

as:

F.

Zhong

https://doi.org/10.1016/j.neucom.2018.08.026

et

al.,

Robust

[21] R. He, B. Hu, W. Zheng, X. Kong, Robust principal component analysis based on maximum correntropy criterion, IEEE Trans. Image Process. 20 (6) (2011) 1485–1494. [22] R. He, W. Zheng, B. Hu, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1561–1576. [23] W. Zhou, S. Kamata, Linear discriminant analysis with maximum correntropy criterion, in: Proceedings of the Asian Conference on Computer Vision, 2012, pp. 500–511. [24] F. Zhong, D. Li, J. Zhang, Robust locality preserving projection based on maximum correntropy criterion, Journal of Visual Communication and Image Representation 25 (7) (2014) 1676–1685. [25] F.D. Mandanas, C.L. Kotropoulos, Robust multidimensional scaling using a maximum correntropy criterion, IEEE Trans. Image Process. 65 (4) (2016) 919–932. [26] M. Zhang, Y. Gao, C. Sun, J. La Salle, J. Liang, Robust tensor factorization using maximum correntropy criterion, in: Proceedings of the IEEE International Conference on Pattern Recognition (ICPR 2016), 2016, pp. 4184–4189. [27] B. Du, Z. Wang, L. Zhang, L. Zhang, D. Tao, Robust and discriminative labeling for multi-label active learning based on maximum correntropy criterion, IEEE Trans. Image Process. 26 (4) (2017) 1694–1707. [28] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERET database and evaluation procedure for face recognition algorithms, Image Vision Comput 16 (5) (1998) 295–306. [29] D. Zhang, W.K. Kong, J. You, M. Wong, Online palmprint identification, IEEE Trans. Pattern Anal. Mach. Intell. 25 (9) (2003) 1041–1050. [30] Binary Alphadigits database. [Online]. Available: http://www.cs.nyu.edu/∼ roweis/data.html. [31] J. Li, C. Xu, W. Yang, C. Sun, D. Tao, Discriminative multi-view interactive image re-ranking, IEEE Trans. Image Process. 26 (7) (2017) 3113–3127. Fujin Zhong received the B.S. and M.S. degree from Hefei University of Technology, Hefei, China, in 20 02 and 20 05, respectively. He received the Ph.D. degree from Southwest Jiaotong University, Chengdu, China, in 2015. He is currently an Associate Professor with the School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China, and also a member of Chongqing Key Laboratory of Computational Intelligence. His current research interests include machine learning and its application fields such as pattern recognition, computer vision, knowledge discovery and biometrics & security. Li Liu received the B.S. in Information management and information system from Chongqing University of Posts and Telecommunications, M.S. in computer science from Kunming University of Science and Technology and Ph.D. in computer science from Beijing Institute of Technology in 2009, 2012 and 2016, respectively. He is currently an Assistant Professor in Chongqing University of Posts and Telecommunications. His research interests include machine learning and social computing.

Jun Hu is an associate professor in the School of Computer Science and Technology, Chongqing University of Posts and Telecommunications. He received his Ph.D. degree in pattern recognition and intelligent systems in 2010 from Xidian University, Xi’an, China. His primary research interests include granular computing, rough set, intelligent information processing and data mining.

2DLDA

based

on

correntropy,

Neurocomputing

(2018),