Neurocomputing 122 (2013) 398–405
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Low-rank representation with local constraint for graph construction Yaoguo Zheng, Xiangrong Zhang n, Shuyuan Yang, Licheng Jiao Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian, University, Xi'an 710071, China
art ic l e i nf o
a b s t r a c t
Article history: Received 23 July 2012 Received in revised form 28 May 2013 Accepted 3 June 2013 Communicated by Zhouchen Lin Available online 5 July 2013
Graph-based semi-supervised learning has been widely researched in recent years. A novel Low-Rank Representation with Local Constraint (LRRLC) approach for graph construction is proposed in this paper. The LRRLC is derived from the original Low-Rank Representation (LRR) algorithm by incorporating the local information of data. Rank constraint has the capacity to capture the global structure of data. Therefore, LRRLC is able to capture both the global structure by LRR and the local structure by the locally constrained regularization term simultaneously. The regularization term is induced by the locality assumption that similar samples have large similarity coefficients. The measurement of similarity among all samples is obtained by LRR in this paper. Considering the non-negativity restriction of the coefficients in physical interpretation, the regularization term can be written as a weighted ℓ1 -norm. Then a semisupervised learning framework based on local and global consistency is used for the classification task. Experimental results show that the LRRLC algorithm provides better representation of data structure and achieves higher classification accuracy in comparison with the state-of-the-art graphs on real face and digit databases. & 2013 Elsevier B.V. All rights reserved.
Keywords: Rank minimization Local regularization Weighted ℓ1 -norm Semi-supervised Learning Classification
1. Introduction Graph-based algorithms have attracted much attention in machine learning and computer vision fields because the graph is a powerful tool in capturing the structure information hidden in data. Graph has been proved to be successful in characterizing pair-wise data relationship and manifold exploration, and thus is widely used in dimensionality reduction [1–5], semi-supervised learning [6–12] and machine learning [13–16]. The procedure of graph construction essentially determines the potentials of the graph-based learning algorithms, thus for a specific task, a graph which aptly models the data structure will correspondingly achieve a good performance. How to construct a good graph describing the relationship of samples has been widely studied in recent years, and it is still an open problem [17]. Recently, Liu et al. [20] proposed to use the samples to represent themselves. By solving a rank function optimization problem, the obtained coefficient matrix is used for the subsequent graph-based algorithm. Due to the discrete nature of the rank function, the above problem was relaxed to a nuclear norm optimization problem. By solving the nuclear norm minimization problem [20], it found the lowest rank representation of all samples jointly and captured the global structure of the data. It also
n
Corresponding author. Tel.: +86 2988202279. E-mail addresses:
[email protected] (Y. Zheng),
[email protected] (X. Zhang). 0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.06.013
automatically corrected the noises and corruptions in data [20]. This process is called Low-Rank Representation (LRR). We will explain the process in details in Section 2. Various algorithms have been proposed to construct graphs. According to Yan and Wang [18], the traditional graph construction methods can be decomposed into two steps: adjacency construction and weight calculation. In the first step, rules such as k-nearest neighbors and ε-ball neighborhood are often used to link some of the samples with edges. And then the weights of edges are calculated by Euclidean distance on the fixed adjacency. Frequently used approaches for weight calculation include Heat Kernel [3], Inverse Euclidean Distance [19], Local Linear Reconstruction Coefficient [2] and Dot-Product Weight, etc. Traditional methods used the pair-wise relationship of samples, which describes the relationship between one sample and other individual samples. Yan and Wang [18] recently presented a new algorithm, which implements the graph construction in a parameter-free way in one step. By solving an ℓ1 -norm optimization problem, the neighborhood relationships are determined by the samples corresponding to those nonzero coefficients, and the values of these nonzero coefficients determine the contributions of the selected samples to the reconstruction of a given sample. The nonzero coefficients can be regarded as the graph weights, thus graph weight and neighborhood relationship can be simultaneously obtained. In comparison with the traditional ways, ℓ1 -graph represents the relationship between one sample and the remaining ones, and the same method is used to all the samples individually. Such a graph and the improved ones [40–42] have achieved the state-of-
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
the-art results. But when there is no “clean” data available, the performance of algorithm based on the ℓ1 -graph would be depressed [20]. Lacking of enough labeled training data is one of the most intractable problems for classification task in real applications. Labeling large numbers of samples is a difficult and timeconsuming task. On the other hand, abundant unlabeled samples are easy to collect. Meanwhile, a large quantity of unlabeled data can reveal the data structure. Semi-supervised learning just serves the purpose which using limited labeled samples and large numbers of unlabeled samples to improve classification and generalization capacity on the test data. Various graph-based semi-supervised learning methods have been proposed and shown promising results. Zhu et al. [7] utilized the harmonic property of Gaussian random field over the graph for semi-supervised learning, and in [22] active learning and semi-supervised learning are combined by using Gaussian fields and harmonic functions where active learning is used to select queries from unlabeled data to form or augment the labeled data set. Zhou et al. [6] proposed to conduct a semisupervised learning for classification with the local and global consistency and the labels are propagated on the constructed graph. Belkin et al. [23] combined graph Laplacian regularization with regularized least squares, and combined graph Laplacian regularization with support vector machine for semi-supervised learning. Wang et al. [24] proposed to use alternating minimization method for graph transduction. A detailed survey is available in [25]. Minimizing a rank constraint function can address the problems of ℓ1 -graph, which constructs a dense graph and even a block-diagonal matrix. However, a sparse graph can convey valuable information for classification. Such a dense graph obtained by LRR is not very desirable for graph-based semi-supervised learning [21]. Zhuang et al. [44] proposed a novel Non-Negative Low Rank and Sparse graph (NNLRS) which adds a non-negative and sparse constraint on the original LRR model. Global structure of samples is obtained by the low-rank constraint and the locally linear structure is captured by the sparse constraint. The sparsity of the obtained graph is improved and it gets a satisfactory result. Different from NNLRS which uses a constant weight on the sparsity term, inspired by the observation of the locally invariant structure of data, in this paper, a local constraint with datadependent weights on representation coefficients is introduced to the graph construction. The weights change with the distance between samples. One criterion often used in machine learning tasks is the local consistency assumption. It claims that if two samples are close in the intrinsic geometry of the data distribution, then they will have a large similarity coefficient. By incorporating the local consistency assumption into the LRR model, we can obtain a graph which can capture both the global structure of data by low-rank constraint and the local structure by the local constraint regularization term. The local regularization term can make distant samples have small similarity coefficients. The main difference between LRRLC and NNLRS is as follows: the presentation of capturing the locally linear structure of data in NNLRS is compared with the global structure captured by the low-rank constraint in LRR in the view of data representation. In NNLRS, the sparsest representation is used to calculate the coefficients which describe the linear relationship of each given sample to the dictionary individually. While in our LRRLC, we consider the local property in the view of intrinsic data structure. Considering the locally invariant structure of data and the non-negativity of representation coefficient, the added regularization term can be written as a weighted ℓ1 -norm. After adding the local constraint regularization term into the LRR model, the sparsity of the objective function is promoted. The remainder of this paper is organized as follows: in Section 2, we give an overview of the low-rank representation algorithm.
399
Section 3 presents the local constraint and our LRRLC approach. Section 4 gives the graph-based semi-supervised classification framework. The experimental results are presented in Section 5 and the traditional graph construction methods are also mentioned. Finally, we conclude this paper and provide some suggestions for future work in Section 6.
2. Low-rank representation Let X ¼ ½x1 ; x2 ; ⋯; xn be a set of samples, each column is a sample which can be represented by a linear combination of the samples in X: X ¼ XZ
ð1Þ
where Z ¼ ½z1 ; z2 ; ⋯; zn is the coefficient matrix with each zi being the representation coefficient of sample xi . Each element in zi can be regarded as the contribution to the reconstruction of xi with X as the basis. Different from the sparse representation which may not capture the global structure of the data X, we consider the low rankness criterion: min rankðZÞ Z
s:t: X ¼ XZ
ð2Þ
Due to the discrete nature of the rank function, the above optimization problem can be relaxed to the following convex optimization: min jjZjjn Z
s:t: X ¼ XZ
ð3Þ
Here, jj⋅jjn denotes the nuclear norm of a matrix (the sum of the singular values of a matrix). Recently, Zhang et al. [46] proved (2) and (3) have closed-form solutions. By considering the noise or corruption in our real observations, even when the data vectors are grossly corrupted, a more reasonable objective function is min jjZjjn þ λjjEjj2;1 Z;E
s:t: X ¼ XZ þ E
ð4Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 where jjEjj2;1 ¼ ∑nj¼ 1 ∑ni¼ 1 ð½Eij Þ is called the ℓ2;1 -norm and the parameter λ is used to trade off the effect of low-rank part and the noise term. According to Liu et al. [20], low rankness is a more appropriate criterion for capturing the global structure of the data. Thus the global structure is obtained via solving the above optimization problem. Such a model has found wide applications in image segmentation [28], saliency detection [29], feature extraction [30] and background modeling [31], etc.
3. Low-rank representation with local constraint 3.1. Local constraint The importance of the geometrical information of samples for discrimination has been shown in many literatures. Motivated by recent progress in manifold learning, in this paper we propose a novel algorithm which explicitly considers the geometrical structure of samples. Most of the manifold learning algorithms use the locally invariant idea that if two data points xi and xj are close in the intrinsic structure of the data distribution, then they will have a large weight W ij between the two points and are likely to have similar presentations in the new data representation space. The locally invariant idea has been utilized in various algorithms, including dimensionality reduction [9], clustering [43], classification [8] and semi-supervised learning [6].
400
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
Considering the locally invariant property of data, a natural regularization term can be defined as ∑jjxi xj jj2 W ij
ð5Þ
i;j
In real applications, it is lacking of physical interpretation of the subtraction operator for visual data, so we constrain the values in the similarity matrix W to be non-negativity. By adding this constraint on the coefficients, we can obtain ∑jjxi xj jj2 jW ij j
ð6Þ
i;j
(6) is consistent with (5) in essence. Different from the common ways changing (5) to a Laplacian format X T LX, where L ¼ DW and Dii ¼ ∑j W ij , we leave it in the absolute value form. Then (6) can be regarded as a weighted ℓ1 -norm. In this essence, the non-negative local regularization can be seen as a weighted sparse coding, thus it can promote the sparsity of the objective function. According to [3], it is beneficial for classification when using a sparse graph which characterizes the locality relationships. We denote (6) as gðWÞ ¼ ∑i;j f ij jW ij j, where f ij ¼ jjxi xj jj2 . 3.2. LRRLC graph In this subsection, we introduce our LRRLC algorithm, which can capture both the global and the local structure of the data by incorporating the locally constrained regularization term. By incorporating (6) into the low-rank representation model (4), we can obtain the following optimization problem, minjjZjjn þ λ1 jjEjj2;1 þ λ2 gðZÞ Z;E
s:t: X ¼ XZ þ E
ð7Þ
Here parameter λ1 40 is used to balance the effect of noise, and parameter λ2 4 0 controls the effects of the low-rank representation part and the local regularization part. According to Lin et al. [26], we separate the variable in the objective function (7) with an auxiliary variable M. Thus problem (7) can be rewritten as minjjZjjn þ λ1 jjEjj2;1 þ λ2 gðMÞ ð8Þ
LðZ; M; E; Y 1 ; Y 2 ; μÞ ¼ jjZjjn þ λ1 jjEjj2;1 þ λ2 gðMÞ þ trðY t1 ðXXZEÞÞ μ þtrðY t2 ðZMÞÞ þ ½jjXXZEjj2F þ jjZMjj2F 2 ð9Þ where Y 1 and Y 2 are Lagrange multipliers, μ4 0 is a penalty parameter, and jj⋅jjF is the Frobenius norm. Algorithm 1. Solving Problem (7) by LADMAP Input: Data matrix X, parameters λ1 and λ2 . Initialize: Z ¼ M ¼ E ¼ Y 1 ¼ Y 2 ¼ 0, μ0 ¼ 0:1, μmax ¼ 1010 , ρ0 ¼ 1:1, and ε1 ¼ 104 , ε2 ¼ 102 , η ¼ jjXjj22 þ 1, k ¼ 0. pffiffiffi while jjXXZ k Ek jjF =jjXjjF ≥ε1 or μk maxð ηjjZ k Z k1 jjF ; jjM k M k1 jjF ; jjEk Ek1 jjF Þ=jjXjjF ≥ε2 do
2. Fix the others and update M by M kþ1 ¼ argmin
λ2 1 ^ kþ1 ; gðMÞ þ MðZ kþ1 þ Y 2;k =μk Þj2F ¼ M 2 μk
4. Update the multipliers by Y 1;kþ1 ¼ Y 1;k þ μk ðXXZ kþ1 Ekþ1 Þ; Y 2;kþ1 ¼ Y 2;k þ μk ðZ kþ1 M kþ1 Þ 5. Update the parameter μ by μkþ1 ¼ minðμmax ; ρμk Þ, where ρ¼
ρ0 1
pffiffiffi if μk maxð ηjjZ kþ1 Z k jjF ; jjM kþ1 M k jjF ; jjEkþ1 Ek jjF Þ=jjXjjF o ε2 ; otherwise:
6. Update k : k←k þ 1. end while Output:Z,E. The linearized alternating direction method with adaptive penalty (LADMAP) [26] is used for solving the problem (9), which is summarized in Algorithm 1. Notice that Step1 can be solved by singular value thresholding operator [32], Step 2 can be solved by soft thresholding operator [33] and T ε ðxÞ ¼ sgnðxÞmaxðjxjε; 0Þ, and Step 3 can be solved by the ℓ2;1 minimization operator [20]. After solving problem (7), with the same way to [20], we define the affinity matrix of an undirected graph using the obtained Z. With normalized coefficient vectors, similar to [44], we make small coefficients zeros and then compute the affinity between data points xi and xj by ðjZ nij j þ jZ nij jT Þ=2. The construction procedure is summarized in Algorithm 2. Algorithm 2. LRRLC construction Input: data matrix X, regularization parameters λ1 and λ2 .
minjjZjjn þ λ1 jjEjj2;1 þ λ2 ∑jjxi xj jj2 jZ ij j Z;E
i;j
s:t: X ¼ XZ þ E
The augmented Lagrange function of problem (8) is
1. Fix the others and update Z by 1 1 jjZjjn þ jjZðZ k þ X t ðXXZ k E þ Y 1;k =μk Þ Z kþ1 ¼ argmin ημk 2 ðZ k M k þ Y 2;k =μk Þ=ηjj2F ;
3. Fix the others and update E by λ1 1 Ekþ1 ¼ argmin jjEjj2;1 þ jjEðXXZ kþ1 þ Y 1;k =μk Þjj2F ; 2 μ
1 Solve the optimization problem using Algorithm 1,
Z;M;E
s:t: X ¼ XZ þ E; Z ¼ M
where ^ kþ1 Þ ¼ T λ f =μ ðZ kþ1 þ Y 2;k =μ Þ ; ðM k ij ij 2 ij k
and obtain the optimal solution ðZ; EÞ. 2 Normalize all column vectors of Z by zni ¼ zi =jjzi jjand make small coefficients zeros. 3 Construct the graph weight matrix W n by W n ¼ ðjZ n j þ jZ n jT Þ=2: Output: The weight W n of LRRLC. 4. Graph-based label propagation Given a set of samples X ¼ fx1 ; ⋯; xl ; xlþ1 ; ⋯; xn g and a label set L ¼ f1; 2; ⋯; cg, the first l samples xi ði ≤lÞ are labeled and the remaining nl samples xu ðl þ 1 ≤u ≤nÞ are unlabeled. The goal is to predict the labels of the unlabeled samples through the given labels and the graph constructed before. Define a n c matrix Y as the initial label matrix with Y ij ¼ 1 if xi is labeled as yi ¼ j and Y ij ¼ 0 otherwise. Let F denotes the final label matrix with the size of n c. Using all the samples to construct an affinity matrix W whose elements measure the pair-wise relationships of the database X. Consider a graph G ¼ ðV; EÞ defined on X, where the vertex set V is the samples in X and the edge set E is a weight matrix W. We iteratively propagate the label information to every
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
sample's neighbors through the constructed affinity matrix W until a global state is achieved. In every iteration process, each sample receives the label information from its neighbors and also retains the initial label information. According to Zhou et al. [6], the iteration process can be described as Fðt þ 1Þ ¼ αSFðtÞ þ ð1αÞY
ð10Þ
where α ð0 o α o 1Þ is a parameter which controls the percentage of the information from its neighbors and its initial label information. S is the normalization of W with the equation S ¼ D1=2 WD1=2 , where D is a diagonal matrix with Dii ¼ ∑j W ij . The convergence of sequence fFðtÞg has been proved in [6] and the limit of the sequence is F n ¼ ð1αÞðIαSÞ1 Y. Finally the sample xi is labeled asyi ¼ argmaxj o c F nij . In all our experiments, we fix the parameter α to 0.01 to pay more attention to the original label information in each iteration process.
5. Experimental analysis In this section, we make a set of experiments to evaluate the effectiveness of our LRRLC algorithm in semi-supervised classification case. We use three popular face databases with various illuminations, poses, facial expressions and a digit dataset in our experiments. For comparison, the traditional graph approaches are implemented as evaluation baselines. All experiments are performed with Matlab 2010a on a personal computer with 2GHz Intel Core2 Duo CPU and 2GB RAM.
NNLRS-graph [44]: in the implementation of NNLRS, the two parameters are the same as those in [44]. By solving the following optimization problem, we obtain the optimal solution Z. minjjZjjn þ βjjZjj1 þ λjjEjj2;1 Z;E
s:t: X ¼ AZ þ E; Z≥0 As suggested in NNLRS, we normalize Z and make small values under the given threshold 0.05 zeros. The parameter in label propagation is also the same as it in [44].
5.2. Databases Three popular face databases (Extended YaleB, ORL and AR) and a handwritten database (USPS) are used in our experiments. The Extended YaleB varies mostly with illumination which has the linear subspace structure. The ORL varies mostly with pose, and the AR varies mostly with facial expression and illumination. For all the three face databases, facial images are aligned by fixing the locations of two eyes. The statistics of the four databases are summarized below.
Extended YaleB database [34]. The Extended YaleB face data-
5.1. Compared algorithms In order to demonstrate how the classification performance can be improved by our LRRLC, we firstly list out several traditional used graphs and the newly proposed LRR-graph and NNLRS-graph for comparison.
KNN-graph: the samples xi and xj are considered as neighbors if xi is among the k nearest neighbors of xj or xj is among the k nearest neighbors of xi . The k nearest neighbors are measured by the Euclidean distance. The weight of connected edge is set as: W ij ¼ ejjxi xj jj
2= 2 s
where the Gaussian kernel parameter s is set to 1. Similar to [18], we use different numbers of nearest neighbors, 3 and 7, to make two kinds of configurations. ℓ1 -graph [18]: ℓ1 -graph considers the reconstruction coefficients in the sparse sample representation by solving the following problem: a^ ¼ argmin jjyXajj1 a
The graph weight is defined as W ij ¼ jaij j, and we symmetrize the weight matrix as W ¼ ðW þ W T Þ=2. LLE-graph [2]: LLE-graph considers the situation of reconstructing a sample from its neighbor points and then minimizes the ℓ2 reconstruction error ξðWÞ ¼ ∑ jjxi ∑ W ij xj jj2 ; s:t: ∑ W ij ¼ 1 i
j
j
W ij ¼ 0 if xj does not belong to the set of neighbors of xi . Finally, symmertrization is performed in the graph construction. Different numbers of neighbors, 3 and 7, are also used to make two kinds of configurations. LRR-graph [20]: after solving the problem (4), the representation matrix Z can be obtained over the basis of all data vectors X. Symmertrization is also applied in the graph construction.
401
base contains 16,128 images of 38 human subjects under 9 poses and 64 illumination conditions. In our experiment, similar to [35], we choose the frontal pose and use all the images under different illumination, thus we get 64 images for each person. And we use the first 10 persons and resize the images to 32 32. ORL database [36]. There are 10 different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/ closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). We resize the images to 32 32. AR database [37]. The AR face database contains over 4000 face images of 126 people, including frontal. View of faces with different facial expressions, lighting conditions and occlusions. The images of 120 individuals were taken in two sessions (separated by two weeks) and each session contains 13 color images. Similar to [38], we select 14 face images (each session contains seven images) from each of these 120 individuals and sample one pixel in every four pixels. USPS database [45]. We select 200 samples in each class of digits 1, 2, 3 and 4, the size of the handwritten digits are 16 16.
PCA has been widely used for unsupervised dimensionality reduction especially for face recognition. We use it in our experiments due to the following considerations. Firstly, the computing cost of our algorithm is related to the dimensionality of samples, and higher dimensionality requires higher computing cost. Secondly, a face image after PCA projection is called Eigenface [27], which has explicitly physical interpretation. Thirdly, it is very easy to be implemented. For all the four databases, the number of dimensionality after PCA projection is set to 100.1 And then each sample is normalized to have a unit form. Fig. 1 shows some samples from the four databases. 1 We just consider the problem of computing costs, and a suitable dimensionality will improve the performance.
402
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
Fig. 1. Some samples from the four databases. (a) Samples of two persons on the YaleB database. (b) Samples of two persons on the ORL database. (c) Samples of two persons on the AR database. (d) Samples of two digits on the USPS database.
Table 1 Accuracies and standard deviations (%) of different graphs using the label propagation algorithm on the four databases. Note that the bold numbers are the best accuracies for each configuration and the percentage number after the data set is the percentage of the labeled samples. Database (label %)
ℓ1
LLE3
LLE7
KNN3
KNN7
ε-ball
LRR
NNLRS
LRRLC
YaleB (10%) YaleB (20%) YaleB (30%) YaleB (40%) YaleB (50%) ORL (10%) ORL (20%) ORL (30%) ORL (40%) ORL (50%) AR (10%) AR (20%) AR (30%) AR (40%) AR (50%) USPS (10%) USPS (20%) USPS (30%) USPS (40%) USPS (50%)
66.62(4.93) 82.29(4.06) 89.86(3.83) 93.37(2.79) 96.17(1.87) 32.93(1.80) 52.14(2.16) 66.48(3.70) 76.00(3.21) 83.78(2.76) 32.13(4.49) 66.58(7.18) 77.89(8.80) 88.01(5.59) 93.40(4.08) 71.40(2.04) 84.23(1.22) 90.05(1.29) 93.43(1.63) 95.43(1.01)
75.47(2.97) 80.40(2.85) 84.36(2.52) 85.16(2.88) 85.94(2.77) 65.85(2.12) 74.00(2.05) 82.16(3.20) 86.40(2.45) 89.40(2.68) 43.97(7.55) 70.53(8.36) 77.44(8.80) 80.54(11.46) 83.43(8.93) 91.83(1.72) 94.62(1.04) 95.95(1.14) 96.53(0.80) 97.18(0.85)
76.58(2.43) 82.85(3.27) 87.07(2.90) 88.13(2.30) 89.52(2.85) 65.83(2.81) 73.03(2.22) 79.87(3.13) 84.40(2.22) 88.00(2.63) 56.19(4.47) 75.56(5.54) 81.01(5.52) 83.58(7.67) 86.45(6.19) 90.83(1.50) 94.01(0.98) 94.86(1.19) 96.17(0.92) 97.16(0.72)
73.72(2.90) 78.56(2.79) 83.03(2.56) 83.39(3.08) 84.55(3.04) 66.47(1.95) 74.39(2.01) 82.11(3.44) 86.50(3.19) 89.20(1.89) 43.24(8.50) 69.20(8.14) 75.09(8.40) 78.27(11.09) 81.05(9.04) 92.13(1.66) 94.86(1.20) 96.10(0.92) 96.72(0.76) 97.26(0.75)
69.69(2.25) 76.14(3.52) 80.72(3.31) 82.82(2.82) 84.73(3.28) 66.36(3.00) 73.05(2.26) 79.20(3.84) 82.90(3.11) 85.05(2.79) 52.19(6.64) 71.62(5.38) 75.60(5.49) 78.53(8.33) 81.08(7.13) 91.78(1.62) 95.09(1.02) 95.93(0.91) 96.90(0.86) 97.28(0.96)
63.42(4.74) 75.89(5.46) 83.78(3.76) 85.97(3.17) 87.34(3.48) 59.50(2.43) 71.75(1.88) 80.77(3.18) 85.42(2.51) 89.28(2.76) 55.71(4.85) 77.31(6.32) 80.97(7.08) 84.26(6.61) 87.29(5.75) 91.96(1.53) 94.45(0.95) 96.01(0.66) 96.66(0.84) 97.19(0.97)
83.02(3.75) 91.02(3.54) 94.07(2.80) 95.29(1.78) 96.12(1.62) 67.69(2.46) 79.39(1.90) 84.91(2.25) 87.69(2.15) 88.85(2.08) 78.79(1.98) 91.26(4.32) 93.73(3.58) 95.11(2.36) 96.18(0.99) 83.35(2.11) 89.91(1.63) 91.86(1.25) 93.71(1.23) 94.03(1.40)
91.78(1.96) 92.40(2.04) 93.29(2.40) 93.79(1.86) 93.62(2.48) 58.90(3.41) 73.19(2.76) 81.43(2.46) 85.56(2.38) 88.80(2.11) 80.36(2.42) 92.75(4.02) 94.08(3.10) 95.23(1.93) 96.12(0.97) 87.90(2.60) 92.45(1.90) 93.65(1.77) 95.10(1.79) 94.90(1.62)
93.73(1.75) 94.96(1.44) 95.84(1.25) 96.79(1.24) 97.16(1.33) 69.79(1.98) 80.19(2.45) 86.80(2.40) 90.81(1.47) 92.25(1.71) 85.75(1.51) 96.34(4.38) 97.61(3.22) 98.76(0.69) 99.05(0.39) 93.16(1.10) 95.54(0.77) 96.85(0.66) 97.75(0.87) 98.23(0.87)
5.3. Evaluation of performance The evaluations are conducted with 20 randomly selected subsets of each database. We average the results and report the mean as well as the standard deviation. For each test, a fixed number of labeled samples are randomly chosen from each class to form the labeled set and the rest samples are used for testing. We vary the label rate from 10% to 50% to evaluate the classification
performance. The results of semi-supervised classification based on different graphs are shown in Table 1. It is observed that: (1) the LRRLC graph achieves the highest accuracies, which validates the adoption of local information of samples is reasonable; (2) the performance fluctuation ranges of LRRLC are smaller than that of the other traditional graphs, and the standard deviations of LRRLC are often smaller than those of other methods, which shows the stability of LRRLC; (3) LRRLC with even low label rate can result in
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
403
Fig. 2. The impacts of the parameters λ1 and λ2 on the performance of the four databases. The evaluations are conducted with 20 randomly generated subsets from YaleB, ORL, AR and USPS. We report out the average classification accuracies. (a), (c), (e) and (g) show the impacts of parameterλ1 . (b), (d), (f) and (h) show the impacts of parameterλ2 .
404
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
Table 2 Comparison of operation time among ℓ1 -graph, LRR-graph, NNLRS-graph and LRRLC-graph. Datasets
ℓ1 -graph
LRR-graph
NNLRS-graph
LRRLC-graph
YaleB ORL AR USPS
36.48 17.23 261.99 52.78
7.35 3.69 13.02 6.32
4.29 2.86 24.49 5.74
5.72 2.83 35.56 4.51
better performance than that of the traditional graphs, for example, with 10% labeled samples LRRLC achieves better result of 93.73% than ℓ1 -graph with 40% labeled samples on YaleB. 5.4. Parameters analysis 5.4.1. Effect of regularization parameters λ1 and λ2 In this subsection, we analyze the effect of the two regularization parameters to the classification performance. We set λ1 to a relative large value and set λ2 to a relative small value. For YaleB and AR databases, λ1 is set to 10 and λ2 to 0.3. For ORL dataset, λ1 and λ2 are set to 1.5 and 0.2 respectively. For USPS, the two parameters are all set to 0.3. In Fig. 2 (a), λ2 is fixed at 0.3. The performance of LRRLC almost remains the same with a relative large value of λ1 . In Fig. 2 (b), λ1 is fixed at 10, and the performance is tested with various values of λ2 . We can see that with a relative small value of λ2 , the accuracy increases with the increase of λ2 and the algorithm can get a good performance even with a large value of λ2 . Similar effect of λ1 can be seen in Fig. 2 (c), (e) and (g). For ORL, the performance can be mostly improved with a smaller λ2 . For AR, the parameter λ2 has small influence on the accuracy. As can be seen from Fig. 2, our algorithm can work well under a wide range of parameter settings. 5.4.2. Effect of the number of labeled samples In this subsection, we will evaluate the influence of the number of labeled samples on the classification performance. As we can see from Table 1, the accuracy raises with the increase of the percentage of labeled samples. In the experiment of YaleB, our algorithm with 10% labeled samples achieves a higher accuracy than ℓ1 -graph with 40% labeled samples. 20% labeled samples in our algorithm also obtained a higher accuracy than ℓ1 -graph with 50% labeled samples on the AR dataset. 5.4.3. Time comparison among different methods We compare the operation time of ℓ1 -graph, LRR-graph, NNLRS-graph and LRRLC-graph on the four datasets with 50% labeled samples in this subsection. ℓ1 -graph spends more time than the other methods because it reconstructs each sample individually. NNLRS-graph and LRRLC-graph spend less time than LRR-graph since they only need one auxiliary variable in the optimization process. NNLRS-graph and LRRLC-graph have comparable operation time. Details of the operation time in seconds are shown in Table 2.
6. Conclusions A novel graph for semi-supervised classification problem has been presented in this paper. The graph is obtained by integrating a local regularization term and a rank minimization function, which can capture both the global structure and the local structure of samples. The locally invariant information of data is written as a weighted ℓ1 norm. The graph adjacency structure and the graph weight are derived simultaneously. Experimental results on four
datasets show the effectiveness of our approach. There are some problems that we will study in the future: (1) the dimensionality used in our experiments. A better feature dimensionality may further improve the accuracy; (2) a better dictionary. As pointed out in [39], a better dictionary may also improve the performance; (3) the usability of the algorithm in large-scale dataset; (4) the performance of our LRRLC for low-dimensional data (e.g. UCI). It is also a direction for our research to extend the algorithm to more kinds of data structures.
Acknowledgment The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to significantly improve the quality of this paper. We also appreciate Dr. Zhouchen Lin of Peking University and Dr. Yuanyuan Liu of Xidian University for discussions and suggestions. This work was supported by the National Basic Research Program of China (973 Program) under Grant no. 2013CB329402, the National Natural Science Foundation of China (No. 61272282, 61203303, 61072108 and 61001206), the Fundamental Research Funds for the Central Universities (grant K50511020011), the National Science Basic Research Plan in Shaanxi Province of China (Grant 2011JQ8020), NCET-10–0668 and the Program for Cheung Kong Scholars and Innovative Research Team in University (IRT1170). References [1] J. Tenenbaum, V. Silva, J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [2] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326. [3] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [4] X. He, P. Niyogi, Locality preserving projections, Adv. Neural Inf. Process. Syst. 16 (2003) 585–591. [5] S. Yan, S. Bouaziz, D. Lee, J. Barlow, Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints, Neurocomputing 76 (2012) 114–124. [6] D. Zhou, O. Bousquet, T. Lal, J. Weston, B. Schölkopf, Learning with local and global consistency, Adv. Neural Inf. Process. Syst. 16 (2003) 321–328. [7] X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, in: Proceedings of ICML, 2003, pp. 912–919. [8] F. Wu, W. Wang, Y. Yang, Y. Zhuang, F. Nie, Classification by semi-supervised discriminative regularization, Neurocomputing 73 (2010) 1641–1651. [9] D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: Proceedings of ICCV, 2007, pp. 1–7. [10] G. Camps-Valls, T. Bandos, D. Zhou, Semi-supervised graph-based hyperspectral image classification, IEEE Trans. Geosci. Remote Sensing 45 (10) (2007) 2044–3054. [11] M. Zheng, J. Bu, C. Chen, C. Wang, L. Zhang, G. Qiu, D. Cai, Graph regularized sparse coding for image representation, IEEE Trans. Image Process. 20 (5) (2011) 1327–1336. [12] X. Tian, G. Gasso, S. Canu, A multiple kernel framework for inductive semisupervised SVM learning, Neurocomputing 90 (2012) 46–58. [13] H. Zhang, J. Yu, M. Wang, Y. Liu, Semi-supervised distance metric learning based on local linear regression for data clustering, Neurocomputing 93 (2012) 100–105. [14] D. Cai, X. He, J. Han, T. Huang, Graph regularized non-negative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1548–1560. [15] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 888–905. [16] S. Gao, I. Tsang, L. Chia, P. Zhao, Local features are not lonely—Laplacian sparse coding for image classification, in: Proceedings of CVPR, 2010, pp. 3555–3561. [17] W. Liu, S. Chang, Robust multi-class transductive learning with graphs, in: Proceedings of CVPR, 2009, pp. 381–388. [18] S. Yan, H. Wang, Semi-supervised learning by sparse representation, in: Proceedings of SDM, 2009, pp. 792–801. [19] I. Joliffe, Principal Component Analysis, Springer, Verlag, New York, 1986. [20] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2013) 171–184. [21] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T.S. Huang, S. Yan, Sparse representation for computer vision and pattern recognition, Proc. IEEE 98 (6) (2010) 1031–1044.
Y. Zheng et al. / Neurocomputing 122 (2013) 398–405
[22] X. Zhou, J. Lafferty, Z. Ghahramani, Combining active learning and semisupervised learning using Gaussian fields and harmonic functions, in: Proceedings of ICML, 2003, pp. 58–65. [23] M. Belkin, V. Sindhwani, P. Niyogi, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res. 7 (2006) 2399–2434. [24] J. Wang, T. Jebara, S. Chang, Graph transduction via alternating minimization, in: Proceedings of ICML, 2008, pp. 1144–1151. [25] X. Zhu, Semi-Supervised Learning Literature Survey (Technical Report 1530), Computer Sciences, University of Wisconsin Madison, 2005. [26] Z. Lin, R. Liu, Z. Su, Linearized alternating direction method with adaptive penalty for low-rank representation, Adv. Neural Inf. Process. Syst. (2011) 612–620. [27] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1) (1991) 71–86. [28] B. Cheng, G. Liu, J. Wang, Z. Huang, S. Yan, Multi-task low-rank affinity pursuit for image segmentation, in: Proceedings of ICCV, 2011, pp. 2439–2446. [29] C. Lang, G. Liu, J. Yu, S. Yan, Saliency detection by multi-task sparsity pursuit, IEEE Trans. Image Process. 21 (3) (2012) 1327–1338. [30] G. Liu, S. Yan, Latent low-rank representation for subspace segmentation and feature extraction, in: Proceedings of ICCV, 2011, pp. 1615–1622. [31] Y. Mu, J. Dong, X. Yuan, S. Yan, Accelerated low-rank visual recovery by random projection, in: Proceedings of CVPR, 2011, pp. 2609–2616. [32] J. Cai, E. Candés, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20 (4) (2010) 1956–1982. [33] Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices, 2009. UIUC Technical Report UILUENG-09-2215, arXiv: 1009. 5055. [34] A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Trans. Pattern Anal. Mach. Intell. 23 (6) (2001) 643–660. [35] D. Cai, X. He, J. Han, Spectral regression for efficient regularized subspace learning, in: Proceedings of ICCV, 2007, pp. 1–8. [36] F. Samaria, A. Harter, Parameterisations of a stochastic model for human face identification, in: Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994. [37] A. Martinez, R. Benavente, The AR Face Database, CVC Tech. Report no. 24, 1998. [38] Y. Yan, Y. Zhang, 1D correlation filter based class-dependence feature analysis for face recognition, Pattern Recognition 41 (12) (2008) 3834–3841. [39] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2233–2246. [40] C. Li, J. Guo, H. Zhang, Local sparse representation based classification, in: Proceedings of ICPR, 2010, pp. 649–652. [41] R. He, W. Zheng, B. Hu, X. Kong, Nonnegative sparse coding for discriminative semi-supervised learning, in: Proceedings of CVPR, 2011, pp. 2849–2856. [42] L. Zhang, S. Chen, L. Qiao, Graph optimization for dimensionality reduction with sparsity constraints, Pattern Recognition (2012) 1205–1210. [43] Q. Gu, J. Zhou, Co-clustering on manifolds, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009, pp. 359–368. [44] L. Zhuang, H. Gao, Z. Lin, Y. Ma, N. Yu, Non-negative low rank and sparse graph for semi-supervised learning, in: Proceedings of CVPR, 2012, pp. 2328–2335. [45] J. Hull, A database for handwritten text recognition research, IEEE Trans. Pattern Anal. Mach. Intell. 16 (5) (1994) 550–554. [46] H. Zhang, Z. Lin, C. Zhang, A Counter Example for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank, arXiv: 1304.6233.
405 Yaoguo Zheng received the B.S. degree from the School of Electronic Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an, China, in 2010. He is currently pursuing the Ph.D. degree in circuits and systems from the Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi'an, and China. His current research interests include machine learning, hyperspectral image classification.
Xiangrong Zhang received the B.S. and M.S. degrees from the School of Computer Science, Xidian University, Xi'an, China, in 1999 and 2003, respectively, and the Ph.D. degree from the School of Electronic Engineering, Xidian University, in 2006. Since 2006, she has been working in the Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education, Xidian University, China. Her research interests include remote sensing image analysis and understanding, pattern recognition, and machine learning.
Shuyuan Yang received the B.A. degree in electrical engineering from Xidian University, Xi'an, China, in 2000, the M.S. degree and Ph.D. degree in Circuit and System from Xidian University, Xi'an, China, in 2003 and 2005, respectively. She is now a professor in electrical engineering in Xidian University and her main current research interests are machine learning and multiscale geometric analysis.
Licheng Jiao received the B.S. degree from Shanghai Jiaotong University, Shanghai, China, in 1982 and the M. S. and Ph.D. degrees from Xi'an Jiaotong University, Xi'an, China, in 1984 and 1990, respectively. He is the author or coauthor of more than 150 scientific papers. His current research interests include signal and image processing, nonlinear circuit and systems theory, learning theory and algorithms, optimization problems, wavelet theory, and data mining.