Low-Rank Projection Learning via Graph Embedding

Low-Rank Projection Learning via Graph Embedding

Accepted Manuscript Low-Rank Projection Learning via Graph Embedding Yingyi Liang, Lei You, Xiaohuan Lu, Zhenyu He, Hongpeng Wang PII: DOI: Reference...

3MB Sizes 0 Downloads 81 Views

Accepted Manuscript

Low-Rank Projection Learning via Graph Embedding Yingyi Liang, Lei You, Xiaohuan Lu, Zhenyu He, Hongpeng Wang PII: DOI: Reference:

S0925-2312(18)31292-X https://doi.org/10.1016/j.neucom.2018.05.122 NEUCOM 20116

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

14 January 2018 29 May 2018 31 May 2018

Please cite this article as: Yingyi Liang, Lei You, Xiaohuan Lu, Zhenyu He, Hongpeng Wang, Low-Rank Projection Learning via Graph Embedding, Neurocomputing (2018), doi: https://doi.org/10.1016/j.neucom.2018.05.122

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Low-Rank Projection Learning via Graph Embedding Yingyi Lianga , Lei Youa , Xiaohuan Lua , Zhenyu Hea,∗, Hongpeng Wanga,∗ a Department

of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen, China

Abstract

AN US

CR IP T

With robustness to various corruptions, it is the local geometrical relationship among data that plays an important role in the recognition/clustering task of subspace learning (SL). However, a lot of previous SL methods cannot take into consideration both of the local neighborhood and the robustness, which results in poor performance in image classification and feature extraction. In this paper, a robust SL method is proposed to solve the feature extraction problem, named as Low-Rank Projection Learning via Graph Embedding (LRP-GE). The proposed algorithm enjoys two merits. First, it preserves the local neighborhood information among data by introducing the graph embedding (GE). Second, it alleviates the impact of noise and corruption by learning a robust subspace based on the low-rank projection. We cast the problem as a convex optimization problem and provide an iterative solution that can be solved efficiently in polynomial time. Extensive experiments performed on four benchmark data sets demonstrate that the proposed method performs favorably against other well-established SL methods in image classification. Keywords: Subspace learning, Graph Embedding, Low-rank Representation, Sparse Constraint, Image Classification 1. Introduction

AC

CE

PT

ED

M

Projection-based subspace learning (SL) methods have played a vital role in the real-life application such as document analysis [1, 2], image recognition [3–5, 18, 28], and video surveillance [6–13]. As a canonical and popular method, the principal component analysis (PCA) has been used in many fields since many years ago. The extensions of PCA are also proposed, such as Kernel PCA [14] and Graph-based kernel PCA. Besides, the well-known discriminant methods include Linear Discriminant Analysis (LDA) [15] and its extensions, such as model-based discriminant analysis [16], kernelized LDA [17], and so on. Since there is no prior knowledge and label information about the original data in many scenarios, we focus on the unsupervised SL for image classification and feature extraction in this paper. For SL methods, it is an efficient strategy to construct a good graph for the successful application in practice [19, 20]. The graph-based SL methods view all samples as vertices and the relationship between two samples as edges in a graph. In the graph, an adjacent matrix contains non-negative weights corresponding to pairs of samples with the connection. Assume that the data is sampled from independent subspaces without noises, the graph generated by SL methods should be block-diagonal, each block of which corresponds to one subspace. Based on the block-diagonal graph, SL methods can cluster the data samples into their own subspaces exactly. Low-Rank Representa-

∗ Corresponding author. Y. Liang and L. You have equal contribution to this paper. Email addresses: [email protected] (Yingyi Liang), [email protected] (Lei You), [email protected] (Xiaohuan Lu), [email protected] (Zhenyu He), [email protected] (Hongpeng Wang)

Preprint submitted to Neurocomputing

tion (LRR) proposed by Liu et al. [21] is one of the representative methods that use the block-diagonal graph for subspace clustering [22, 23]. However, the block-diagonal graph is often fragile due to the assumption that the manifold structure is linearly or nearly linearly embedded in the observed data space. The basic idea of manifold learning-based methods is to assume that the raw data is sampled from a latent manifold structure, which is embedded in the high-dimensional space. The purpose is to represent the samples in high-dimensional space with low-dimensional data in a low-dimensional space for convenient computation. For different methods, they have different requirements for the manifold properties, which leads to the different assumptions of manifold structure. For example, Laplacian Eigenmaps [24] assumes that the raw data contain a compact Riemann manifold structure. Accordingly, the assumption of linear manifolds is not always true in practice because the nonlinear manifolds can often be found in many high dimensional space. In this situation, the block-diagonal graph-based SL methods also fail in clustering data into exact subspace(s). Thus, the block-diagonal property loses its clustering effect.

In the past years, as the theoretic and technical development of manifold learning, lots of methods using graph embedding (GE) [25–27] have been proposed because GE cannot only greatly improve the smoothness of the underlying manifold in the observed data space but also preserve the local geometric relationship between data samples. In fact, GE is a mathematical function that maps all vertices in a graph with high dimensionality to a low-dimensional subspace and preserves the local geometrical relationship between vertices. A representative GE method is the Local Linear Embedding (LLE) [29], which assumes that the manifold structure is embedded in the Swiss Roll dataset with 3D and spreads the Roll out over a 2D plane for preserving the Euclidean distances among samples. Zhang November 3, 2018

ACCEPTED MANUSCRIPT

latent detection in images to solve subspace segmentation and feature extraction. However, these methods based on low-rank representation have a limitation in applications due to the assumption that each subspace is independently drawn from a union of subspaces. The assumption is the disjoint subspace segmentation problem and it can be solved by the structure constrained low-rank representation [49]. Some other methods using the low-rankness are also proposed for image composition, image colorization, video restoration, foreground detection and motion saliency detection [50, 51, 61]. However, the above methods that focus on learning a low-rank representation matrix are essentially transductive and cannot handle well the new samples which are not involved in the training procedure. In order to solve the above-mentioned problem, Bao et al. [52] proposed the so-called Inductive Robust Principal Component Analysis (IRPCA), which imposes the low-rankness on the projection other than the representation matrix. IRPCA can project the corrupted data points into the true subspace by learning a low-rank projection. The projection obtained in the training procedure can be directly used to handle the new samples (not involved in the training), while previous works require all the training samples and the new samples to be trained again for learning a new projection. The success of IRPCA lies in video surveillance where the background is static (that can be seen as the true underlying subspace) and the foreground is dynamic (that can be seen as the sparse errors). Bao et al. [53] also proposed an improved version termed corruptions tolerant discriminant analysis (CTDA) for feature extraction, which is integrated with three important components (i.e. the label information, the neighborhood graph, and the low-rankness) together. However, IRPCA and CTDA cannot be directly applied for dimensionality reduction. In this paper, taking full advantages of all the above methods, we proposed a robust SL method for image classification and feature extraction called Low-rank Projection Learning via Graph Embedding, LRP-GE. Fig. 1 demonstrates the flowchart of the proposed method. LRP-GE combines the lowrank projection learning with the neighborhood relationship between data samples (i.e. graph embedding) as a robust feature extraction framework, which improves the low-rankness with strong robustness to gross corruption and thus can process the occluded and noisy data in image classification. The contributions of this paper are summarized as follows: (1) We proposed a robust unsupervised SL framework, i.e. Low-rank Projection Learning via Graph Embedding (LRPGE), for image classification and feature extraction. (2) We incorporate three important components (i.e. the lowrank constraint, the GE and the sparse norm) into one framework and provide an iterative solution to the convex optimization problem. (3) Extensive experiments demonstrate that LRP-GE performs favorably against the other state-of-the-art methods in image classification even when the data is grossly corrupted by noises and occlusions. The remainder of the paper is organized as follows. The review of Graph Embedding (GE), Low-Rank Representation (LRR), and Inductive RPCA (IRPCA) is introduced in Section

AC

CE

PT

ED

M

AN US

CR IP T

and Zhao [30] proposed Manifold Regularized Matrix Factorization (MMF) for data clustering in the low dimensional space under the implicit assumption that the graph Laplacian matrix is smooth. Modified PCA [31] is a direct extension of PCA that discovers the similarity structure of data by applying multisimilarity metrics. Other methods, such as Neighbor Preserving Embedding (NPE) [32], and Locality Preserving Projection (LPP) [33], preserve the intrinsic structure of data by using the reconstructed neighborhood information. Yan et al. [34] proposed a general framework that unifies the previous methods (i.e., Modified PCA, LLE, LE, LPP, ISOMAP, and Linear Discriminant Analysis) into one model by using the statistical and geometrical information in data. The Sparse Preserving Projection (SPP) [35] was proposed for feature extraction in the face image by using the sparse reconstruction information among data. In general, the performance of these methods can be improved in feature extraction and image classification when the hypothesis is tenable (i.e., each object in images has one manifold and the number of objects corresponds to the quantity of manifolds), and the key issue is how to model the intrinsic structure of data. However, due to the presence of random noises and gross corruption, it remains a challenging problem to construct such a robust geometrical structure for feature extraction and image classification. That is because the similarity between two images will be severely affected by the gross corruption, which therefore cannot reflect the actual geometrical connection among images or the underlying manifolds of data. Thus, the performance of the above methods will be greatly reduced. Recently, many robust methods have been proposed for image classification and feature extraction [36–38]. The GraphLaplacian PCA (gLPCA) [39] is proposed to learn a robust and low-dimensional representation of raw data by imposing `2,1 -norm on the objective function. The Sparse Representation Classifier (SRC) [40] was proposed for robust face recognition by using the `1 -norm constrained sparsity. Similarly, SPP uses the `1 -norm constrained sparsity for robust feature extraction of the face image. The sparse PCA (SPCA) [41, 42] calculates the principal components of input data by selecting the most important variables in the scattering distribution for dimensionality reduction and processes the noise with the sparse projection for robustness. Seen from the above methods, it is proven somewhat robust to construct the intrinsic geometry of data and introduce `2,1 -norm or `1 -norm as a regularization term. Therefore, when dealing with noisy and corrupt data, these methods with robust norm may perform better than those methods based on local neighborhood such as LPP and NPE. In the past decade, the low-rank minimization based methods have attracted a lot of attention [43–47]. Suppose the original data consists of low-rank subspace and sparse errors, the Robust Principal Component Analysis (RPCA) proposed by Wright et al. [48] aims at efficiently recovering the low-rank subspace and robustly correcting the errors, which is different from PCA that assumes errors in data follow Gaussian distribution. The Low-Rank Representation (LRR) was proposed to segment a union of low-rank subspaces, which is extended from one single low-rank subspace problem in RPCA, and its extension, the Latent Low-Rank Representation (LatLRR), was proposed for

2

CR IP T

ACCEPTED MANUSCRIPT

Figure 1: Flowchart of the proposed method. For simplicity, only three classes of the data samples are chosen for illustration and denoted by three types of shape object (i.e. star, square, and circle) in three colors (i.e. red, green, and blue) with noises. The original data space with noises is processed by the proposed model with three steps: Project raw data into the subspace, separate the noises with sparse constraint and construct an effective graph for projection while preserving the local relationship among data samples in the projected subspace. At last, the proposed model gives well-separated results in the classification tasks.

be the graph with vertexes (samples of X, i.e. xi ) and edges (weights among samples, i.e. Wi j ). Through simple transformation, we have the normalized Laplacian graph matrix defined as L = D−1/2 (D − W)D−1/2 . The purpose of GE is to reduce dimensionality by exploring the low-dimensional embedding Y of data (i.e. the GE).

AN US

2. The proposed LRP-GE is discussed in details in Section 3. Experiments and analysis are presented in Section 4. The conclusion of this paper is provided in Section 5. 2. Related work

2.1. Graph Embedding (GE)

ED

M

In this section, we briefly review some related works such as GE, LRR, and IRPCA. In this paper, we denote the samples of data as matrix X = [x1 , x2 , ..., xn ], where xi ∈
hR, Ei = arg min rank(R) + λkEk` , R,E

As stated in Section 1, the GE technology is a dimensionality reduction method based on the local neighborhood information of data. In this paper, GE is established on the basis of locality preserving projection (LPP) and the optimal projection derived from LPP is formulated as follows: 1 XX T kP xi − PT x j k2 Wi j hPi = arg min P 2 i j (1) = arg min Tr(PT X(D − W)XT P),

PT

s.t. X = R + E,

(3)

CE

where rank(·) denotes the rank of a matrix, k · k` is the regularization strategy in characterizing errors (e.g., `0 -norm, `1 -norm and `2,1 -norm) [54] and λ > 0 is the Lagrange multiplier parameter. By introducing a dictionary matrix, the general formulation of LRR is written as hR, Ei = arg min rank(R) + λkEk` , R,E

AC

P

s.t. X = AR + E,

where D = diag(di ) is a diagonal matrix with entries di = P j Wi j , and W is the weight matrix defined as:    1, if xi ∈ Nk (x j ) or x j ∈ Nk (xi ) Wi j =   0, otherwise,

2.2. Low-Rank Representation (LRR) To overcome the limitation that the canonical PCA cannot deal with the grossly corrupted data, Wright et al. proposed the so-called RPCA [48]. It is assumed that the given data matrix X composes of a low-rank matrix R and an error item E. The purpose of RPCA is to recover the low-rank structure R and correct the errors E which is formulated as follows:

(4)

where A linearly spans the original input matrix. By setting A = I, LRR is identical to RPCA and therefore LRR is viewed as a generation of RPCA [21, 54]. And by relaxing the optimization problem, setting A = X and ` = 2, 1 , we have the following formulation:

(2)

where Nk (x) denotes the k nearest neighbors of x. (2) guarantees that any two samples (xi and x j ) being close in the original space should be close as well in the projected space (PT X), which preserves the local neighborhood relationship among samples. L = D − W is defined as the Laplacian matrix that is used by GE to construct an affinity graph. Let G

hR, Ei = arg min kRk∗ + λkEk2,1 , R,E

s.t. X = XR + E,

(5)

where k · k∗ denotes the nuclear norm (sum of singular values). Assuming (R∗ , E∗ ) is the optimal solution to (5), the recovered 3

ACCEPTED MANUSCRIPT

data could be represented by XR∗ or X − E∗ , whose rank is less than rank(R∗ ). It is proved that LRR is a very effective SL method in addressing the subspace segmentation or clustering problem. However, LRR cannot be applied for dimensionality reduction and handle well the new samples which are not involved in the training procedure. And `2,1 -norm aims at characterizing specific corruptions which is not good at modeling block occlusions or random noises [21].

distorted seriously. That makes the manifold learning fail in effectively reducing dimensionality. Recently, the methods based on the low-rank minimization have played important roles in feature extraction, dimensional reduction, and subspace segmentation. For example, LLR as the representative low-rank representation method can exactly segment the true subspace(s) from the observed space, which is assumed to be drawn approximately from a subspace or a union of multiple subspaces [49]. In the situation of occlusion and corruption, LRR is still robust for the subspace clustering. IRPCA is another low-rank minimization based method, which is to learn an optimal projection by imposing the low-rank constraint on the projection. It can project grossly corrupted data into the true subspace and handle well the new data samples which are not involved in the training procedure. However, both LRR and IRPCA cannot be directly used for dimensionality reduction. In this paper, we expect to take full advantage of the GE and the low-rank projection for learning a robust subspace in the image classification and feature extraction. The GE plays an important role in preserving the local geometrical structure of data and exploring the smooth manifold embedded latently in the high-dimensional space. And the low-rank projection learning is strongly robust to the random noises and block occlusion. Therefore, we incorporate the two important properties together into one robust SL framework called the low-rank projection learning via graph embedding, LRP-GE.

hP, Ei = arg min kPk∗ + λkEk1 , P,E

CR IP T

2.3. Inductive Robust Principal Component Analysis (IRPCA) IRPCA proposed by Bao et al. [52] is to learn an optimal low-rank projection that projects grossly corrupted data into the true subspace, which is formulated as follows: (6)

s.t. X = PX + E,

AN US

Assuming P∗ is the optimal solution to (6), the new sample x could be represented by P∗ x and the corrupted item e could be x − P∗ x. As seen from the equation, the input data that consists of column vectors and each column is a data point does not take into consideration the local neighborhood relationship among data, especially for the face and object images. Thus, IRPCA fails in preserving the intrinsic geometrical structure of data.

M

3. LRP-GE

ED

In this section, the motivation of the Low-Rank Projection Learning via Graph Embedding is firstly presented. Secondly, the problem pointed out by LRP-GE is formulated and the solution is introduced in details.

PT

3.1. Motivation of LRP-GE In the past years, many classical manifold learning methods have been proposed for dimensionality reduction such as LLE, ISOMAP, and LE, which explore the intrinsic geometry of data to find the manifold embedding in high-dimensional space. LPP and its extensions were also proposed to preserve the local neighborhood relationship among data by learning an optimal projection. The previous researches focus on providing a linear manifold embedding for data. In many applications, these methods are not fit for those data embedded in nonlinear manifold. In order to solve the problem, the popular way is to use a Laplacian graph embedding (LGE). The LGE can not only preserve the local geometrical relationship but also greatly improve the smoothness of the underlying manifold in data. It is seem from the above fact that the neighborhood relationship is important in manifold learning based methods. The manifold learning has a general assumption that the manifold structure of data is smooth and it is latently embedded in the high dimensional space in the ideal situation. However, the assumption is not valid in real life application. For example, when the data is grossly corrupted by random noises or outliers, even the similarity between two data points in the same class is severely damaged so that the neighborhood relationship among data could be

3.2. Problem Formulation In this paper, we consider about capturing the local geometrical relationship among data with the robustness to various corruptions in the robust SL based on feature extraction. In order to explore the local neighborhood of data, we take the GE as a good mapping into account. And it is assumed that the corrupted data can be approximately projected into the true subspace by the underlying low-rank projection. Based on the above interpretation, the objective function of LRP-GE is defined as

CE

hP, Ei = arg min rank(P) n X n X +γ kPxi − Px j k2F Wi j ,

(7)

i=1 j=1

AC

s.t. X =PX,

where γ is a parameter that evaluates the importance of the GE item. The objective function aims at exploring the optimal lowrank projection P∗ and capturing the underlying manifold structure of data in the original space, which is measured by the P P term ni=1 nj=1 kPxi − Px j k2F Wi j . In real-life applications, the original data space is generally corrupted by noise. Thus (7) is re-defined by adding a sparse error item as follows: hP, Ei = arg min rank(P) + λkEk1 n X n X +γ kPxi − Px j k2F Wi j , i=1 j=1

s.t. X =PX + E,

4

(8)

ACCEPTED MANUSCRIPT

where λ is a balance parameter that controls the importance of the error item E. It is worth to note that the purpose of imposing `1 -norm on E is to characterize well the various corruptions. Since it is a NP-hard problem to minimize the low-rank function, the above objective function can be approximately solved by converting it to the nuclear norm minimization problem. Therefore, we have the following formulation:

process of minimizing Φ. So, we have the following four steps to get the optimal variables. When the convergence conditions are reached, the optimal solutions U∗ , Z∗ and E∗ can be obtained. Note that all formulations in the four steps have closed form solutions. Step 1. Update U: Updating U requires that the solution to the problem (13) has a closed form solution (14). LU = arg min γTr(UALAT UT ) + hY1,k , X − UA − Ek i

hP, Ei = arg min kPk∗ + λkEk1 n X n X kPxi − Px j k2F Wi j , +γ

U

µ + hY2,k , U − Zk i + kX − UA − Ek k2F 2 µ 2 + kU − Zk kF , 2

(9)

CR IP T

i=1 j=1

s.t. X =PX + E.

⇒ Uk+1 =[(X − Ek )AT + Zk + (Y1,k AT − Y2,k )/µk ]

According to the theory of optimization, the objective function (9) can be degraded to be (7) when the Lagrange multiplier (i.e. the value of λ) is small relatively. By considering the sparse error item, we expect that the proposed objective function can be more robust to various kinds of corruptions in data.

[I + AAT + 2(γ/µk )ALAT ]−1 .

AN US

Z

Although the solution to LRR could be used for the proposed method, the computational cost (i.e., O(d3 )) is still very expensive due to the high-dimensional data according to [55]. In order to reduce the cost, we apply the mathematical transformation based on the Theorem 1 of [21]: The transpose of the optimal solution P∗ always lies within the subspace spanned by the columns of X: P∗ = U∗ (Q)T and Q = orthogonal(X). By using the transformation, the solving of P∗ is changed to solve another optimal variable U∗ . The graph regularization item can be P P transformed to the trace form, i.e., ni=1 nj=1 kPxi −Px j k2F Wi j = Tr(PXLXT PT ), where L is the Laplacian matrix of W. Thus, by replacing P with UQT we can equivalently transform (9) into a simpler problem as follows:

s.t. X = UA + E,

LE = arg min λkEk1 + hY1,k , X − Uk+1 A − Ei E

µk + kX − Uk+1 A − Ek2F , 2

⇒ Ek+1 =S λ/µk (X − Uk+1 A + Y1,k /µk ),

(16)

(17) (18)

where S is the shrinkage operator [56]. Step 4. Update Y1 , Y2 , and µ: The Lagrange multiplier and penalty parameters are updated as follows (ρ > 1):    Y1,k+1 = Y1,k + µk (X − Uk+1 A − Ek+1 )     (19) Y2,k+1 = Y2,k + µk (Uk+1 − Zk+1 )      µk+1 = min(ρµk , µmax ).

(10)

CE

AC

s.t. X = UA + E, U = Z.

(15)

where J1/µk (X) = US 1/µk VT is a thresholding operator with regards to 1/µk ; S 1/µk (Xi j ) = sign(Xi j ) max(0, |Xi j − 1/µk |) is the soft-thresholding operator; and UΣVT is the SVD representation of X, i.e., X = UΣVT . Step 3. Update E: E is updated by solving the optimization problem (17) with the closed form solution (18).

where A = (Q)T X and U is a substitute matrix for P. (10) can be solved by the inexact Augmented Lagrange Multiplier (iALM). For convenience in optimization, an auxiliary variable Z is adopted to improve its speed, which makes the computation converge faster. min kZk∗ + λkEk1 + γTr(UALAT UT ),

µk kUk+1 − Zk2F , 2

⇒ Zk+1 = J1/µk (Uk+1 + Y2,k /µk ),

M

ED

PT

min kUk∗ + λkEk1 + γTr(UALAT UT ),

(14)

Step 2. Update Z: Z is updated by solving the optimization problem (15) with the closed form solution (16). LZ = arg minhY2,k , Uk+1 − Zi +

3.3. Solution to LRP-GE

(13)

The descriptive algorithm is outlined in Algorithm 1. U∗ , Z∗ , and E∗ are the optimal solutions to the proposed algorithm. In ADMM, we generally initialize them to be some value that is close to be optimal (Here, all of them are set to be zeros.). For example, assumed that the optimal solution of U∗ is an orthogonal basis, the initialized value of U could be the PCA result of the original data space. Y1 , Y2 are two Lagrange multipliers that are normally set to be zeros. µ is a step of updating the values of two Lagrange multipliers and it is changing during the iteration from a small value (i.e. 10−6 ) to a big value (i.e. 106 ).  is a threshold that can stop iteration when the iterative times is becoming larger and larger. In some situation, the convergency cannot be guaranteed or it costs much time to obtain the convergency. Thus,  is used as another tool to cut the iteration for

(11)

The augmented Lagrangian function of problem (11) is Φ(U, Z, E) =kZk∗ + λkEk1 + γTr(UALAT UT )

+ Tr(YT1k (X − UA − E)) + Tr(YT2k (U − Z)) (12) µ µ + kX − UA − Ek2F + kU − Zk2F , 2 2

where Y1 and Y2 are the Lagrange multipliers, and µ is a penalty parameter. The idea of iALM updating variables (U, Z, and E) is to fix the others and update one alternatively in the 5

ACCEPTED MANUSCRIPT

Algorithm 1 Solving Problem by Inexact ALM

(2) The dictionary (it is replaced by X in the paper) is of full column rank. (3) The value of (Zk , LZk ) obtained in every iteration is close to be (Z, LZ ), not (Zk+1 , LZk+1 ), which guarantees that ∆k k(Zk , LZk ) − (Z, LZ )k2F is monotonically decreasing. In terms of [21], the above conditions can be satisfied in specific situations. For condition 1, the upper bound of µ can be proven in iALM; For condition 2, the dictionary can be represented by its orthogonal basis, which has been proven in [21]; For condition 3, the condition cannot be proved strictly from theory but it can ensure ∆k is monotonically decreasing to a certain extent owing to the convexity of the Lagrange function.

CR IP T

Input: matrices X and A; parameter λ. Initialize: U = 0, Z = 0, E = 0, Y1 = 0, Y2 = 0, µ = 10−6 , µmax = 106 , ρ = 1.1, and  = 10−8 . while not converged do 1. Fix other variables and update U by U = [(X − E)AT + Z + (Y1 AT − Y2 )/µ] [I + AAT + 2(γ/µ)ALAT ]−1 . 2. Fix other variables and update Z by Z = J1/µ (U + Y2 /µ). 3. Fix other variables and update E by E = S λ/µ (X − UA + Y1 /µ). 4. Update the multipliers Y1 = Y1 + µ(X − UA − E). Y2 = Y2 + µ(U − Z). 5. Update the parameter µ by µ = min(ρµ, µmax ). 6. Check the convergence conditions: kX − UA − Ek∞ <  and kU − Zk∞ <  end while Output: (U, Z, E)

4. Experiments

AN US

In this section, we evaluate the algorithm LRP-GE on the image classification task and conduct experiments on four public data sets: Extended Yale B1 face image set [56, 60], Face Recognition Technology (FERET)2 image set [36, 62], AR3 image set [60, 62, 63], and COIL204 object image set [36] shown in Fig. 2.

fast convergence in Algorithm 1. ρ is a step length in updating the value of a variable during the iteration and it is required to be greater than 1. When it is initialized to be a larger number, the speed of updating value would become too high to reach an optimal value and vise verse. As known to all, it is still an open problem to get a precise value for ρ. Generally, the value of ρ is initialized to be 1.1 through empirical observation.

M

4.1. Description of Data Sets

3.4. Computational Complexity and Convergence Analysis

AC

CE

PT

ED

It is found in Algorithm 1 that the overall computational complexity of our LRP-GE method mainly comes from the iterative times, the matrix inverse calculation in (14), and the singular value decomposition in (16) and (18). Suppose the dimension m of data is larger than the number n of samples and the algorithm converges within δ iterative times. In contrast to the matrix inversion, the SVD decomposition takes up to a maximum of computations in each iteration with the complexity of O(m3 ). We add up the computational complexity and obtain O(δ(m3 )). It will result in the dimensional catastrophe when the dimension of samples and the value of δ are very large. Fortunately, we can use the classical PCA method to calculate the standard eigenvectors and reduce the dimension m of data. Meanwhile, the iterative time δ is inversely proportional to the update step ρ, i.e., the larger the step is, the smaller the iteration is and vice versa. In the experiment, ρ is initialized to be 1.1 because it ensures that δ is located within small range. Readers can refer to [21, 57] for more details about the approaches to the problem of large computational cost. In the condition that the objective function is smooth, the exact augmented Lagrangian multiplier (EALM) algorithm could be proven convergent [58]. For iALM, Liu et al. have also proved the convergence in another way [21]. According to [21, 58, 59], the conditions for the convergence of the algorithm in Algorithm 1 are sufficient but may be unnecessary: (1) The parameter µ in Step 5 needs to be upper bounded.

The Extended Yale B face dataset contains about 2432 near frontal images, taking from 38 individuals (64 images per individual) under various illuminations. Half of the images are corrupted by shadows or reflections. For efficiency, the face images have been cropped and resized to 32×32 pixels. The FERET face dataset contains 1400 images, taking from 200 individuals (7 images per individual) under various expressions, illumination, and pose. The original images from the dataset are cropped automatically, leaving only the face part. And for efficiency, we also have resized the face images to 40×40 pixels. The AR face dataset contains over 4000 color face images, taking from 70 men and 56 women under various facial expressions, lighting conditions, and occlusions in two sessions. From the original images, we select 3120 images for experiments, including only 120 people (26 images per person). For efficiency, all selected images are converted to be grayscale and resized to 40×50 pixels. The COIL20 dataset contains 1440 images, taking from 20 objects (70 images per object) under various positions at the interval of 5 degrees. The images are normalized with the size of 128×128 pixels. For efficiency, all original images have been converted to be grayscale and resized to 40×50 pixels. 4.2. Experimental Preparation To begin with, the above image data are actually required to be gray for experiments. At the same time, PCA is applied to the images for finding the main principal components before 1 http://vision.ucsd.edu/content/yale-face-database

2 https://www.nist.gov/programs-projects/face-recognition-technology-feret 3 http://www2.ece.ohio-state.edu/

aleix/ARdatabase.html

4 http://www.cs.columbia.edu/CAVE/software/softlib/coil20.php

6

ACCEPTED MANUSCRIPT

(a) Images from the Extended Yale B dataset

(b) Images from the FERET dataset

(a) 10% noise

(b) 20% noise

(c) 10×10 block

(d) 20×20 block

CR IP T

Figure 3: (a) 10% of Salt & Pepper noise corruption. (b) 20% of Salt & Pepper noise corruption. (c) 10×10 size of block occlusion. (d) 20×20 size of block occlusion.

projection matrix that is used for final feature extraction. For robust testing, we only choose images from Extended Yale B dataset with various corruptions, such as two different sizes of block occlusion and two different percentages of random pixel corruption. The block occlusion with size of 10×10 or 20×20 pixels is randomly attached to various locations for pixel replacement in images, as seen in Fig. 3 (a) and 3 (b). And the pixel corruptions contain 10% or 20% of salt & pepper noises respectively, which are randomly generated in each image and illustrated in Fig. 3 (c) and 3 (d).

AN US

(c) Images from the AR dataset

4.3. Experimental Results and Analysis

(d) Images from the COIL20 dataset

M

Figure 2: (a), (b) and (c) are human face image dataset while (d) is an object image dataset. These images are part of four data sets, i.e., Extended Yale B, FERET, AR and COIL20

AC

CE

PT

ED

implementing algorithms in the experiments, which is necessary for preserving energy and improving computational efficiency. In this paper, we preserve 98% energy for Extended Yale B and 95% for the remaining data sets. In order to evaluate the performance of the proposed algorithm, we compare LRP-GE with some other conventional SL methods, e.g., PCA, NPE [32], LPP [33], SPP [35], IRPCA [52], Principal Component Analysis with non-greedy `1 -norm maximization (PCA `1 ) [64] and Optimal Mean Robust Principal Component Analysis (RPCA OM) [65]. And K Nearest Neighbor (NN) (K = 1) is applied for classification to evaluate results from all the above algorithms owning to the simple parameter settings. For evaluating the robustness of the proposed method, we also conduct experiments on the contaminated data sets with various corruptions. In the experiments, part of samples from each subject are randomly selected for training and the remaining ones are for testing. According to the size of individuals/objects on different data sets, the numbers of training samples are set to 20/30 for Extended Yale B, 3/5 for FERET, 6/8 for AR, and 4/6 for COIL20, respectively. Like cross-validation, the algorithms run independently at 10 times for obtaining the mean and standard deviation values of classification accuracy (MEAN±STD%), which are reported in the Tables 1-4. In each running time, the best parameters are determined by the validation set and used to learn the optimal 7

Seen from the experimental results in Table 1 to 4, the proposed method has shown its comparative performance against other methods on four public data sets and we give the interesting observations and analysis as follows. (1) The methods based on low-rankness are robust to the image data presented in the experiments to some extent. In different data sets with/without corruptions, LRP-GE shows its best performance among the state-of-art methods in the task of image classification. However, RPCA OM (the extension of RPCA) and IRPCA cannot obtain good performance because they cannot be used to reduce dimensionality and thus they are not fit for feature extraction. (2) In previous research, SPP is reported to possess the robustness property. However, comparing with SPP, LRP-GE has better classification accuracy in the four data sets, especially in COIL20 dataset. As shown in Table 4, the classification accuracy obtained by SPP are 76.47% (±1.66) and 76.71% (±2.09), which are 19% to 21% lower than 90.13% (±0.57) and 93.09% (±0.66) obtained by LRP-GE. When the corruptions (i.e. 10×10 and 20×20 block occlusions) are introduced to the images, LRP-GE is also more robust than SPP. Because SPP uses the sparse reconstruction information among data, LRPGE fails in the performance with a small percentage (i.e. 10%) of salt & pepper noises. When large percentage (i.e. 20%) of salt & pepper noises occurs, the reconstruction information becomes useless and lose its sparse property among data. Thus, LRP-GE outperforms SPP in this situation. (3) It is a challenge for the state-of-art methods to handle images with glass or scarf occlusion, illumination affection, and rich expressions in AR dataset and put the images into the accurate classes. However, LRP-GE achieves the satisfying re-

ACCEPTED MANUSCRIPT

Table 1: Classification accuracies (MEAN±STD%) of the compared SL methods on the Extended Yale B dataset

Train (#) 20 30

PCA 60.35±1.38 67.08±1.00

PCA `1 60.40±0.80 67.29±1.16

LPP 85.05±0.71 87.03±0.87

NPE 80.58±0.64 85.09±0.66

SPP 89.07±0.92 88.97±0.54

RPCA OM 57.82±0.62 64.91±0.67

IRPCA 59.98±0.64 67.79±0.67

Ours 89.81±3.50 91.84±0.51

Table 2: Classification accuracies (MEAN±STD%) of the compared SL methods on the FERET dataset

Train (#) 3 5

PCA 34.46±1.20 44.35±2.35

PCA `1 33.75±1.09 43.80±2.25

LPP 42.09±2.17 52.55±1.84

NPE 35.33±1.68 52.28±1.73

SPP 48.23±1.33 49.24±1.29

RPCA OM 33.74±1.88 42.85±2.11

IRPCA 34.50±2.09 43.98±0.61

Ours 48.71±1.79 60.23±1.32

Train (#) 6 8

PCA 62.57±1.26 68.06±0.89

PCA `1 62.90±0.92 68.37±0.86

LPP 85.55±1.14 90.06±1.23

NPE 82.25±0.76 -

CR IP T

Table 3: Classification accuracies (MEAN±STD%) of the compared SL methods on the AR dataset

SPP 89.32±0.77 90.18±0.76

RPCA OM 59.82±0.89 64.84±1.44

IRPCA 62.14±1.30 68.22±0.74

Ours 89.93±0.58 93.16±0.69

Table 4: Classification accuracies (MEAN±STD%) of the compared SL methods on the COIL20 dataset

PCA 82.44±2.64 86.91±1.25

PCA `1 80.93±1.76 87.15±0.78

LPP 82.60±1.57 87.70±1.00

NPE -

SPP 76.47±1.66 76.71±2.09

RPCA OM 82.33±1.59 86.94±1.75

AN US

Train (#) 4 6

IRPCA 81.59±1.92 87.63±1.20

Ours 90.13±0.57 93.09±0.66

Table 5: Robust testing of various methods on the Extended Yale B dataset with two types of corruptions (i.e., salt & pepper noise and block occlusion), but with different percentages (10% and 20%) and sizes (10×10 and 20×20).

PCA `1 tab-6 62.40±1.23 51.44±1.55 17.97±0.83 16.00±0.74

LPP 59.79±1.19 28.78±1.47 85.53±0.86 44.35±1.05

NPE 64.32±1.57 42.91±1.40 42.09±1.77 17.48±1.01

M

PCA 62.01±0.90 51.33±0.64 17.53±0.73 15.78±0.61

ED

Corruptions 10% 20% 10×10 20×20

RPCA OM 62.40±1.19 55.75±0.82 15.98±0.64 14.68±0.57

IRPCA 63.59±1.40 53.24±0.97 17.72±0.72 16.05±0.51

Ours 74.00±0.68 77.27±1.59 85.31±0.73 47.92±1.50

method, the values of λ and γ have to be determined beforehand. Specifically, we select values for both parameters from an empirical and reasonable range [1e-6, 1e6] with the step 10. The different value combination affects the classification performance of LRP-GE with different extents. Fig. 4 shows the classification accuracies of LRP-GE over the variations of λ and γ. From Fig. 4 we can see that the classification performance almost preserves the same situation over a wide range of parameter values on the four databases. This shows that the classification results are not severely influenced when the values of parameters are not large/small and the proposed model is robust to the parameter settings. Thus, we can tune both parameters to have the best values for image classification. In this paper, λ = 10−4 and γ = 103 are set for the Extended Yale B dataset, λ = 0.1 and γ = 0.1 are for FERET, λ = 1 and γ = 0.01 are for AR, λ = 0.01 and γ = 103 are for COIL20.

AC

CE

PT

sults in image classification, which proves the good ability in dimensionality reduction. (4) Some learning method (e.g. NPE) does not yield results in AR and COIL20 data sets because it fails in capturing the local geometrical structure of data when the number of training samples is small. However, LRP-GE still can explore effectively the intrinsic neighborhood information among data by combing GE with the low-rank projection together. (5) Because the sparse error is introduced into the proposed model, LRP-GE has more robust performance than others in the classification of block occluded images. As shown in Table 5, LRP-GE has shown superior performance in comparison with other state-of-the-art methods. Even in the case of 10×10 block occlusion, the classification accuracy obtained by LRP-GE is very close to the result obtained by LPP because both methods similarly pursue to learn an optimal projection.

SPP 82.39±1.10 60.24±1.23 81.95±1.05 40.57±1.49

To verify the efficient convergence of LRP-GE method, we conduct experiments on four different databases (i.e., the Extended Yale B, FERET, AR, and COIL20). Fig. 4 shows that how the performance of LRP-GE on image classification varies as λ and γ are changing and what the convergence curves of our LRP-GE model on the four data sets look like. As seen from the first two columns, the classification accuracy curves are closely symmetric on the four data sets, while in COIL20 the gap be-

4.4. Parameter Sensitivity and Convergence Condition In order to investigate the parameters sensitivity in the proposed model, the classification accuracies versus the various values of regularization terms, λ and γ, are explored in this subsection. To comprehensively verify the effects of the two terms, we conduct experiments on the above four databases, i.e. Extended Yale B, FERET, AR, and COIL20. In the proposed 8

ACCEPTED MANUSCRIPT

100

80 60 40 20 0

10 -5

10 0 γ(λ=10)

10 Objective function value

Classification Accuracies(%)

Classification Accuracies(%)

100

80 60 40 20 0

10 5

10 -5

10 0 λ(γ=0.001)

20 0

10 -5

10 0 γ(λ=0.1)

60 40 20 0

0 5

10 -5

10 0 λ(γ=0.1)

10 5

4 2 0

40 20 0

10 -5

10 0 γ(λ=0.00001)

60 40 20 0

10 5

Objective function value

60

80

M

Classification Accuracies(%)

80

10 -5

10 0 λ(γ=0.01)

50

10 -5

10 0

10 5

90

80

70

60

50

140

×10 7

6 4 2

20

40

60 80 100 Number of iterations

120

140

80 60 40 20 0

10 -5

10 0

10 5

λ(γ=103)

AC

γ(λ=10-2 )

120

100 Objective function value

60

Classification Accuracies(%)

PT

70

CE

Classification Accuracies(%)

80

60 80 100 Number of iterations

(i) Convergence curve on AR.

100

90

40

8

0

10 5

20

(f) Convergence curve on FERET.

(g) The classification accuracies on AR vary as γ (h) The classification accuracies on AR vary as λ changes when λ = 0.00001. changes when γ = 0.01. 100

×10 5

6

10

ED

Classification Accuracies(%)

100

20

8

(d) The classification accuracies on FERET vary as γ (e) The classification accuracies on FERET vary as λ changes when λ = 0.1. changes when γ = 0.1. 100

10 15 Number of iterations

(c) Convergence curve on Extended Yale B.

10

80

10 5

2

Objective function value

40

4

AN US

60

6

CR IP T

Classification Accuracies(%)

Classification Accuracies(%)

100

80

8

10 5

(a) The classification accuracies on Extended Yale B (b) The classification accuracies on Extended Yale B vary as γ changes when λ = 10. vary as λ changes when γ = 0.001. 100

×10 4

(j) The classification accuracies on COIL20 vary as γ (k) The classification accuracies on COIL20 vary as changes when λ = 10−2 . λ changes when γ = 103 .

50

100 150 Number of iterations

200

(l) Convergence curve on COIL20.

Figure 4: The results of affecting the performance of LRP-GE by parameters λ and γ, are shown in the first two columns. The third column demonstrates the convergence curves of the classification accuracies versus the number of iterations for LRP-GE on the four databases.

tween the top and the low values becomes larger. The phenomenon indicates that the performance of our proposed model is somewhat sensitive to the value setting of parameters, λ and γ. In the situation of parameter insensitivity, the accuracy curve tends to be a straight line in the first three face image data sets such as Extended Yale B, FERET, and AR. As seen from the third column, the objective function value is monotonically de-

creased or increased to the stationary point as the iterative number is becoming bigger. Especially, the iterative number is less than 100 in Extended Yale B dataset. It demonstrates the fast convergence of LRP-GE.

9

ACCEPTED MANUSCRIPT

Table 6: RUN TIME COMPARISON AMONG DIFFERENT METHODS (SECOND).

Acknowledgment

Alg. PCA PCA `1 LPP NPE

This research was supported by the National Natural Science Foundation of China (Grant No.61672183), by the Shenzhen Research Council (Grant Nos. JCYJ20170413104556946, JCYJ20160406161948211, JCYJ20160226201453085), and by the Natural Science Foundation of Guangdong Province (Grant No. 2015A030313544).

Test 6.05 4.20 3.74 3.56

Alg. SPP RPCA OM IRPCA Ours

Train 9.62 177.38 1.90 7.72

Test 4.14 4.11 4.93 4.93

4.5. Efficiency Comparison

References

To demonstrate the computational efficiency of the proposed method, we present the runtime comparisons of out method with other state-of-the-art methods in the subsection. In the implementation of all the algorithms, we use Matlab 2015a under Windows 10 on a PC with 3.4-GHz CPU and 8-GB memory. We conduct experiments on the Extended Yale B dataset and evaluate the computational time on all the methods. For the sake of convenience, 30 images are selected randomly from each subject as the training samples and the remaining images are treated as the test samples. The result of the computational comparisons among all methods is summarized in Table 6. From the table, we can see that all methods have both the training and test time. Our method requires 7.72 seconds for training and 4.93 seconds for testing. Because KNN is only used for image classification in the testing, the test time of all the methods is almost the same. The training time of our method is much less than that of RPCA OM (i.e. 177.38 seconds). Thus, the result demonstrates its efficiency of the proposed method.

References

CR IP T

Train 0.03 2.07 0.14 0.12

M

AN US

[1] X. You, W. Guo, S. Yu, K. Li, J.C. Principe, D. Tao, Kernel learning for dynamic texture synthesis, IEEE Transactions on Image Processing 25 (10) (2016) 4782–4795. [2] Z. He, Y.Y. Tang, Writer identification of Chinese handwriting documents using hidden Markov tree model, Pattern Recognition 41 (2008) 1295– 1307. [3] P. Zhang, W. Ou, C.L.P. Chen, Y.-M. Cheung, Sparse discriminative multi-manifold embedding for one-sample face identification, Pattern Recognition 52 (2016) 249–259. [4] Z. Zhang, Y. Xu, L. Shao, J. Yang, Discriminative blockdiagonal representation learning for image recognition, IEEE Transactions on Neural Networks and Learning Systems (2017), DOI 10.1109/TNNLS.2017.2712801. [5] X. Yang, W. Liu, D. Tao, J. Cheng, Canonical correlation analysis networks for two view image recognition, Information Sciences 385 (C) (2017) 338–352 [6] X. You, Q. Li, D. Tao, W. Ou, M. Gong, Local metric learning for exemplar-based object detection, IEEE Transactions on Circuits and Systems for Video Technology 24 (2014) 1265–1276. [7] Y. Wei, H. Li, Multiscale patch-based contrast measure for small infrared target detection, Pattern Recognition 58 (2016) 216–226. [8] Z. He, S. Yi, Y.-M. Cheung, Y.Y. Tang, Robust object tracking via key patch sparse representation, IEEE Transactions on Cybernetics 47 (2) (2016) 354–364. [9] Q. Liu, X. Lu, C. Zhang, W.-S. Chen, Deep convolutional neural networks for thermal infrared object tracking, Knowledge-Based Systems 134 (2017) 189–198. [10] X. Li, Q. Liu, H. Wang, C. Zhang, W.-S. Chen, A multi-view model for visual tracking via correlation filters, Knowledge-Based Systems 113 (2016) 88–99. [11] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for human pose recovery, IEEE Transactions on Image Processing 24 (12) (2015) 5659–5670 [12] J. Yu, Z. Kuang, B. Zhang, W. Zhang, D. Lin, J. Fan, Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing, IEEE Transactions on Information Forensics & Security 13 (5) (2018) 1317–1332 [13] J. Yu, D. Tao, M. Wang, Y. Rui, Learning to rank using user clicks and visual features for image retrieval, IEEE Transactions on Cybernetics 45 (4) (2015) 767–779 [14] B. Sch¨olkopf, Al. Smola, K.-R. M¨uller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (5) (1998) 1299– 1319. [15] P.N. Belhumeur, J.P. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720. [16] C. Fraley, A.E. Raftery, Model-based clustering, discriminant analysis and density estimation, Publications of the American Statistical Association 97 (1) (2002) 611–631. [17] Q. Liu, H. Lu, S. Ma, Improving kernel Fisher discriminant analysis for face recognition, IEEE Transactions on Circuits and Systems for Video Technology 14 (1) (2004) 42–49. [18] X. You, Q. Peng, Y. Yuan, Y. Cheung, J. Lei, Segmentation of retinal blood vessels using the radial projection and semi-supervised approach, Pattern Recognition 44 (2011) 2314–2324.

ED

5. Conclusion

AC

CE

PT

In this paper, we proposed a robust SL method based on the low-rank projection and graph embedding named LRP-GE, which addresses the problem of exploring the intrinsic geometrical relationship between data with strong robustness to random noises, gross corruption and block occlusion. Since the low-rank projection learning is with strong robustness to various corruptions and the GE plays an important role in preserving the local neighborhood among data, the proposed method benefits from the robustness of low-rank projection for learning an optimal projection in the subspace. Because of the alternative update strategy, it can be guaranteed that the low-rank projection learning and the GE are smoothly integrated into one model and the whole model is optimal to learn a robust subspace. By using the singular value decomposition (SVD) and the inexact Lagrangian method (iALM), we can achieve the optimal solution to the problem of LRP-GE. We provide some theoretical analysis on the computational complexity and parameter sensitivity of the proposed method. Compared with other state-of-the-art methods, the experiments conducted on four public data sets have also shown the superior performance of the proposed method in the task of image classification. For robustness testing, we conduct experiments on two types of corruptions and the results show that the proposed method achieves better classification accuracy than other methods. 10

ACCEPTED MANUSCRIPT

[44] [45] [46] [47] [48] [49] [50] [51] [52]

AN US

[53]

tive distance penalty for semi-supervised subspace classification, Pattern Recognition 67 (2017) 252–262. X. Fang, Y. Xu, X. Li, Z. Lai, W.K. Wong, Robust semi-supervised subspace clustering via non-negative low-rank representation, IEEE Transaction on Cybernetics 46 (8) (2016) 1828–1838. L. Fei, Y. Xu, B. Zhang, X. Fang, J. Wen, Low-rank representation integrated with principal line distance for contactless palmprint recognition, Neurocomputing 218 (19) (2016) 264–275. B. Chen, Z. Yang, Z. Yang, An algorithm for low-rank matrix factorization and its applications, Neurocomputing 275 (2018) 1012–1020 H. Du, Z. Zhao, S. Wang, F. Zhang, A Riemannian rank-adaptive method for low-rank optimization, Neurocomputing 192 (2016) 72–80 E. Cand´es, X.D. Li, J. Wright, Robust principal component analysis?, Journal of the ACM 58 (3) (2011) 1–37. K. Tang, R. Liu, Z. Su, J. Zhang, Structure-constrained low-rank representation, IEEE Transactions on Neural Networks and Learning Systems 25 (12) (2014) 2167–2179. W.K. Wong, Z. Lai, J. Wen, X. Fang, Y. Lu, Low-rank embedding for robust image feature extraction, IEEE Transaction on Image Processing 26 (6) (2017) 2905–2917. X. You, R. Wang, D. Tao, Diverse expected gradient active learning for relative attributes, IEEE Transactions on Image Processing 23 (7) (2014) 3203–3217. B.-K. Bao, C. Xu, S. Yan, Inductive robust principal component analysis, IEEE Transaction on Image Processing 21 (8) (2012) 3794-3800. B.-K. Bao, G. Liu, R. Hong, S. Yan, C. Xu, General subspace learning with corrupted training data via graph embedding, IEEE Transaction on Image Processing 22 (11) (2013) 4380–4393. G. Liu, S. Yan, Latent low-rank representation for subspace segmentation and feature extraction, in: Proceeding of International Conference on Computer Vision, 2011, pp. 1615–1622. G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: Proceeding of International Conference on Machine Learning, 2010, pp. 1–8. L.S. Zhuang, H.Y. Gao, Z.C. Lin, Y. Ma, X. Zhang, N.H. Yu, Nonnegative low-rank and sparse graph for semi-supervised learning, in: Proceeding of Computer Vision and Pattern Recognition, 2012, pp. 2328– 2335. M. Shao, D. Kit, Y. Fu, Generalized transfer subspace learning through low-rank constraint, International Journal of Computer Vision 109 (1-2) (2014) 74–93. D. Bertsekas, Constrained optimization and lagrange multiplier methods, Elsevier Inc., 1996. J. Eckstein, D. Bertsekas, On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming 55 (1-3) (1992) 293–318. Z.L. Jiang, Z. Lin, L.S. Davis, Label consistent K-SVD: Learning a discriminative dictionary for recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11) (2013) 2651–2664. S.Zhang, X. Lan,Robust Visual Tracking via Basis Matching, IEEE Transactions on Circuits and Systems for Video Technology , 27(3) (2017), pp. 421–430. Y. Xu, X.Z. Fang, X.L. Li, J. Yang, J. You, H. Liu, S.H. Teng, Data uncertainty in face recognition, IEEE Transactions on Cybernetics 44 (10) (2014) 1950–1961. Z.Z. Fan, Y. Xu, D. Zhang, Local linear discriminant analysis framework using sample neighbor, IEEE Transactions on Neural Networks 22 (7) (2011) 1119–1132. F.P. Nie, H. Huang, C. Ding, D.J. Luo, H. Wang, Principal component analysis with non-greedy l1-norm maximization, in: Proceeding of Artificial Intelligence, 2011, pp. 1433–1438. F.P. Nie, J.J. Yuan, H. Huang, Optimal mean robust principal component analysis, in: Proceeding of 31th International Conference on Machine Learning, 2014, pp. 1062–1070.

CR IP T

[19] W. Liu, Z.-J. Zha, Y. Wang, K. Lu, D. Tao, p-Laplacian regularized sparse coding for human activity recognition, IEEE Transactions on Industrial Electronics 63 (8) (2016) 5120–5129 [20] H. Du, Z. Zhao, S. Wang, F. Zhang, Discriminative low-rank graph preserving dictionary learning with Schatten-p quasi-norm regularization for image recognition, Neurocomputing 275 (2018) 697–710 [21] G. Liu, Z. Lin, S.C. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1) (2013) 171–184. [22] Y. Song, Y. Wu, Subspace clustering based on latent low rank representation with Frobenius norm minimization, Neurocomputing (2017) 2479– 2489 [23] J. Chen, H. Zhang, H. Mao, Y. Sang, Z. Yi, Symmetric low-rank representation for subspace clustering, Neurocomputing 173 (2016) 1192–1202 [24] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Proceeding of Neural Information Processing Systems, 2001, pp. 585–591. [25] Z. Zhang, L. Shao, Y. Xu, L. Liu, J. Yang, Marginal representation learning with graph structure self-adaptation, IEEE Transactions on Neural Networks and Learning Systems (2017), DOIG10.1109/TNNLS.2017.2772264. [26] X. Fang, Y. Xu, X. Li, Z. Fan, H. Liu, Y. Chen, Locality and similarity preserving embedding for feature selection, Neurocomputing 128 (2014) 304–315. [27] X. Fang, Y. Xu, X. Li, Z. Lai, S. Teng, L. Fei, Orthogonal self-guided similarity preserving projection for classification and clustering, Neural Networks 88 (2017) 1–8. [28] Z. He, A.C.S. Chung, 3-D B-spline wavelet-based local standard deviation (BWLSD): its application to edge detection and vascular segmentation in magnetic resonance angiography, International Journal of Computer Vision 87 (3) (2010) 235–265. [29] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326. [30] Z. Zhang, K. Zhao, Low-rank matrix approximation with manifold regularization, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (7) (2013) 1717–1729. [31] Z.Z. Fan, Y. Xu, W.M. Zuo, J. Yang, J.H. Tang, Z.H. Lai, D. Zhang, Modified principal component analysis: An integration of multiple similarity subspace models, IEEE Transactions on neural network and learning systems 25 (8) (2014) 1538–1552. [32] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in: Proceeding of International Conference on Computer Vision, 2005, pp. 1208–1213. [33] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (3) (2005) 328–340. [34] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (1) (2007) 40–51. [35] L.S. Qiao, S.C. Chen, X.Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition 43 (1) (2010) 331–341. [36] Z. Lai, Y. Xu, Q. Chen, J. Yang, D. Zhang, Multilinear sparse principal component analysis, IEEE Transactions on Neural Networks and Learning Systems 25 (10) (2014) 1538–1552. [37] S. Yi, Y. Li, Y.-M. Cheung, Unified sparse subspace learning via selfcontained regression, IEEE Transactions on Circuits and Systems for Video Technology (2017) 1–1. [38] S. Yi, Z. Lai, Y.-M. Cheung, Y. Liu, Joint sparse principal component analysis, Pattern Recognition 61 (2016) 524–536. [39] B. Jiang, C. Ding, B. Luo, J. Tang, Y. Xie, Graph PCA: Closed-form solution and robustness, in: Proceeding of Computer Vision and Pattern Recognition. 9 (4) (2013) 3492–3498. [40] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2009) 210–227. [41] Z. He, X. Li, D. Tao, Y.Y. Tang, Connected component model for multiobject tracking, IEEE Transactions on Image Processing 25 (8) (2015) [42] H. Zhou, T. Hastie, R. Tibshirani, Sparse principal component analysis, Journal of Computational and Graphical Statistics 15 (2) (2006) 265–286. [43] L. Fei, Y. Xu, X. Fang, J. Yang, Low rank representation with adap-

[54] [55]

ED

M

[56]

[57] [58]

PT

[59] [60]

CE

[61] [62]

AC

[63] [64] [65]

11

ACCEPTED MANUSCRIPT

ily intelligent robots, machine learning and image processing.

CR IP T

Xiaohuan Lu received the B.E degree in 2011 and currently, he is a Ph.D candidate with the Department of Computer Science both from Harbin Institute of Technology Shenzhen Graduate School. His research interests lie primarily in visual tracking, image processing and machine learning.

Zhenyu He received his Ph.D. degree from the Department of Computer Science, Hong Kong Baptist University, Hong Kong, in 2007. He is currently a full professor with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. His research interests include sparse representation and its applications, deep learning and its applications, pattern recognition, image processing, and computer vision.

CE

PT

ED

M

AN US

Yingyi Liang received the B.S. degree from Wuhan University of Science and Technology, in 2007, and the M.S. degree from Guangdong University of Technology, in 2010. He is currently pursuing the Ph.D. degree in computer science with Harbin Institute of Technology, Shenzhen, China. His research interests include pattern recognition, computer vision and machine learning.

AC

Hongpeng Wang received the Ph.D. degree from the Department of Computer Science, Harbin Institute of Technology, in 2001. He is currently a full professor with the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China. His research interests include intelligent robot, pattern recognition, image processing and computer vision.

Lei You received the M.S. degree from computer science and technology from Harbin Institute of Technology, Shenzhen, China, in 2012, where he is currently pursuing toward the Ph.D. degree in computer science. His research interests lie in primar12