Structured optimal graph based sparse feature extraction for semi-supervised learning

Structured optimal graph based sparse feature extraction for semi-supervised learning

Journal Pre-proof Structured optimal graph based sparse feature extraction for semi-supervised learning Zhonghua Liu ConceptualizationMethodologySoft...

712KB Sizes 0 Downloads 46 Views

Journal Pre-proof

Structured optimal graph based sparse feature extraction for semi-supervised learning Zhonghua Liu ConceptualizationMethodologySoftwareInvestigationWriting - Original Draft , Zhihui Lai ResourcesWriting - Review & EditingSupervision , Weihua Ou Writing: Review & Editing , Kaibing Zhang Writing: Review & Editing , Ruijuan Zheng Writing: Review & Editing PII: DOI: Reference:

S0165-1684(20)30003-7 https://doi.org/10.1016/j.sigpro.2020.107456 SIGPRO 107456

To appear in:

Signal Processing

Received date: Revised date: Accepted date:

2 December 2019 31 December 2019 3 January 2020

Please cite this article as: Zhonghua Liu ConceptualizationMethodologySoftwareInvestigationWriting - Original Draf Zhihui Lai ResourcesWriting - Review & EditingSupervision , Weihua Ou Writing: Review & Editing , Kaibing Zhang Writing: Review & Editing , Ruijuan Zheng Writing: Review & Editing , Structured optimal graph based sparse feature extraction for semi-supervised learning, Signal Processing (2020), doi: https://doi.org/10.1016/j.sigpro.2020.107456

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

Highlights 

Sparse representation, discriminative projection and manifold learning are integrated into a unified model.



An adaptive graph is constructed for capturing local manifold information.



An iterative scheme is presented to solve the optimization problem.

Structured optimal graph based sparse feature extraction for semi-supervised learning Zhonghua Liu1,* , Zhihui Lai2, Weihua Ou3, Kaibing Zhang4,Ruijuan Zheng1 (1 Information Engineering College, Henan University of Science and Technology; 2 College of Computer Science and Software Engineering, Shenzhen University; 3 School of Big Data and Computer Science, Guizhou Normal University; 4 College of Electronics and Information, Xi'an Polytechnic University)

Abstract: Graph-based feature extraction is an efficient technique for data dimensionality reduction, and it has gained intensive attention in various fields such as image processing, pattern recognition, and machine learning. However, conventional graph-based dimensionality reduction algorithms usually depend on a fixed weight graph called similarity matrix, which seriously affects the subsequent feature extraction process. In this paper, a novel structured optimal graph based sparse feature extraction (SOGSFE) method for semi-supervised learning is proposed. In the proposed method, the local structure learning, sparse representation, and label propagation are simultaneously framed to perform data dimensionality reduction. In particular the similarity matrix and the projection matrix are obtained by an iterative calculation manner. The experimental results on several public image datasets demonstrate the robustness and effectiveness of the proposed method. Keywords:

Feature

extraction;

semi-supervised

learning;

graph

construction;

sparse

representation

1 Introduction The original data sets composed of thousands of features are often of high dimensionality in many applications including image processing, data mining, machine learning, and pattern recognition. More computational time and storage space is needed to process such high-dimensional data, which may result in “curse of dimensionality” [1]. In order to overcome this problem, many dimensionality reduction methods have been applied to different fields [2-4]. For example, Taibi et al. [5] presented a robust reservoir rock fracture recognition method based on sparse feature learning and data training, aiming to identify the sine fractures of reservoir rock automatically. Based on the wavelet transform and support vector machine, Akbarizadeh et al. [6] proposed a kurtosis wavelet energy (KWE) method for the texture recognition of synthetic aperture radar (SAR) images. Sharifzadeh et al. [7] developed a hybrid algorithm based on a convolutional neural network (CNN) and multilayer perceptron for SAR image classification. Recently, deep learning is widely applied to image denoising and segmentation [8, 9]. It has been proved that images derived from different classes lie on a low-dimension manifold embedded in a high-dimensional space. The manifold-based learning algorithms are expected to obtain the local geometric structure and the nonlinear information of the observed

samples. The most classical manifold-based learning methods include locally linear embedding (LLE) [10], ISOMAP [11], and Laplacian eigenmap [12]. However, these nonlinear manifold learning algorithms face the so-called out-of-sample issue, i.e., they have no projection axes and cannot deal with new samples. To address the out-of-sample issue [13], some manifold-based improved methods [14-17] are developed for dimension reduction. He et al. [18] presented a locality preserving projections (LPP) algorithm, where an affinity graph was constructed and the local manifold structure of the observed samples was preserved. Neighborhood preserving embedding (NPE) [19] aims to obtain the neighborhood local structure information of the observed samples in the transformed space via constructing the affinity matrix with local squares approximation. For instance, Lai et al. [20] proposed a robust discriminant regression (RDE) algorithm via integrating an L2,1-norm constraint. Lu et al. [21] suggested a low-rank preserving projections (LRPP) algorithm in order to resolve the corrupted data. Jing et al. [22] presented a low-rank regularized tensor discriminant representation (LRRTDR) algorithm, in which both the lowest-rank representation and local discriminative information were simultaneously learned. The success of manifold learning indicates that the data points in high-dimensional space can be sparsely encoded or represented by a few representative data points on the manifold [23]. Previous methods reveal that sparse representation based methods can effectively improve the effectiveness and robustness for clustering and classification. Wright et al. [24] proposed a sparse representation technique for the robust image classification, which successfully achieved the sparse representation coefficients by L1 norm minimization. In addition, integrating L1 constraint and the graph construction of manifold learning can well describe sparse reconstruction relationships among samples. By imposing the sparse constraint and low rank constraint on the reconstruction coefficient matrix, Xu et al. [25] elaborated on a discriminative transfer subspace model, which is capable of preserving the information of the local and global structures from the observed data. The use of label information is beneficial to image classification and clustering. Due to the use of the label information of observed data, the supervised learning algorithms generally outperform those unsupervised learning algorithms. However, it is time-consuming to collect the labeled data since labeling samples is laborious and trivial. Then, the performance of some supervised methods is degenerated because of the lack of labeled samples. Furthermore, in practice a large amount of unlabeled data can be easily obtained. In order to fully utilize the information on the limited labeled samples and massive unlabeled samples, the researchers proposed a lot of semi-supervised methods, which demonstrated satisfactory results for classification and clustering. Peng et al. [26] proposed a manifold-based low-rank representation (MLRR) method. In detail, the local manifold structure of data was described by weight graph, which is constructed by both labeled and unlabeled data points. In order to improve the classification accuracy of hyperspectral images, Andekah et al. [27] presented a semi-supervised

hyperspectral image classification method by using spatial-spectral features and superpixel-based sparse codes. Based on sparse representation and adaptive learning, Zeng et al. [28] presented a local adaptive learning approach for semi-supervised feature selection. Despite their effectiveness, there are two noticeable disadvantages of traditional manifold learning algorithms. First, the graph construction and subsequent operations are independently performed, which usually cannot obtain the optimal solution. Second, the final results are heavily dependent on the quality of the construction graph. Although sparse representation based methods exhibit good performance, they only focus on image classification while failing to deal with feature extraction. To address or alleviate the above problem, we present a structured optimal graph based sparse feature extraction (SOGSFE) method for semi-supervised learning. The major advantages of the proposed SOGSFE are summarized as follows. Firstly, both local manifold sparse structure and semi-supervised feature extraction are simultaneously obtained. Secondly, the similarity matrix is not fixed but modified during every iteration until the optimal value is achieved. Third, the limited labeled data and a large number of unlabeled data are utilized in SOGSFE, which is helpful to capture the manifold structure information of the original samples. In summary, the main contributions of this paper are summarized as follows. (1) We integrate sparse representation, discriminative projection, and manifold learning into a unified framework for sparse feature extraction. (2) An adaptive graph is constructed for capturing local manifold information. (3) An iterative scheme is proposed to solve the regression learning problem. The remainder of the study is organized as follows. Section 2 briefly describes the SOGFS algorithm. Section 3 analyzes the proposed algorithm. Section 4 presents the experimental results and the analysis of experiments. Finally Section 5 concludes the paper.

2 Structured optimal graph It is widely recognized that the data in a high-dimensional space generally embeds in low-dimensional manifold [29]. Preserving local manifold structure information for graph-based methods is a key factor for dimensionality reduction. The local manifold structure is captured by a similarity matrix, which determines the ultimate performance of graph-based methods. Supposed that there are N samples from C classes denoted by X  [ x1 , x2 ,, xN ]  R where m is the dimensional size of the observed data, and xi (i  1,2,, N )  R

m1

N m

,

is the ith

sample for the sample set X . For any a give sample xi , it can be connected by other samples N N with a probability sij , where sij is an element of the similarity matrix S  R . The closer

distance between two samples is, the greater their probability (or similarity) has, and vice versa. Therefore, the similarity sij between xi and x j is inversely proportional to their distance. The similarity sij can be obtained by solving the following Equation (1).

min  ( xi  x j sij  sij2 ) 2 2

i, j

(1)

s.t. i, s 1  1,0  sij  1 T i

where si  R

n1

is a vector whose jth element is sij in similarity matrix S , and

 is a

regularization parameter. The second item in Equation (1) is mainly used to avoid a trivial solution. It is an ideal state for each sample to include C nearest neighbor numbers. That is to say, each

sij (i  1,2,, N ) in similarity matrix S has exact C connected components. Actually, the obtained similarity matrix S in Equation (1) fails to meet this requirement in most cases. The problem can be solved by



2

f i  f j sij  2Tr ( F T LS F ) , 2

i, j

where F  [ f1 , f 2 ,, f N ]  R

N C

(2)

denotes as a class label matrix corresponding to the

observed data X ; LS  D  ( S  S ) / 2 refers to the Laplacian matrix; and the ith entry in T



the diagonal matrix D is set to

( sij  s ji ) j

2

.

If the rank of the Laplacian matrix LS equals to N  C , namely, rank ( LS )  N  C , the obtained similarity matrix S will include exact C connected components [30]. By combining the constraint to Equation (1), Equation (1) is rewritten as

min  ( xi  x j sij  sij2 ) 2 2

i, j

(3)

s.t. i, s 1  1,0  sij  1, rank ( LS )  N  C T i

In order to solve Equation (1), the ith smallest eigenvalue of the Laplacian matrix LS is denoted by

 i ( LS ) . It is well known that the solution of a positive semi-definite matrix is larger

than zero. Considering that the Laplacian matrix LS is positive semi-definite, Based on rank ( LS )  N  C ,

 i ( LS )  0 .

C

 ( L )  0 i 1

i

S

is satisfied [31]. According to Ky Fan’s

Theorem [32], we have the following Equation (4): C

 ( L )  i 1

i

S

min

FR N C , F T F  I

Tr ( F T LS F )

(4)

Therefore, based on Equation (4), Equation (3) can be further rewritten as:

min  ( xi  x j sij  sij2 )  2Tr ( F T LS F ) 2 2

i, j

s.t. i, s 1  1,0  sij  1, F  R T i

N C

(5)

,F F  I T

3 Structured optimal graph based sparse feature extraction 3.1 The idea of SOGSFE Let X  [ x1 , x2 ,, xN ]  R

N m

be the training data points from C classes in an original

high-dimensional space. Given a linear transformation y  PT x , each sample xi  R m1 in a m-dimensional space can be mapped into yi  PT xi  R d1 in a d-dimensional space. For each sample xi from the training set, the rest samples without including xi are denoted by

X i  [ x1 , x2 ,, xi1 , xi1 ,, xN ] . According to the objective function of sparse representation based classification (SRC) [23, 24], the objective function of sparse dimension reduction can be defined as: N

2

i 1

F

J P ,{i }  arg min  ( Pxi  PX i  i

   i 1 )   X  PT PX

2 F

(6)

s.t. P P  I T

where  i is the sparse representation coefficient vector corresponding to the sample xi over the training set X ;  and  are two balance parameters, respectively. The first and the i

second items in Equation (6) are the loss function of the linear reconstruction and a sparse constraint term, respectively. The third item ensures that the sample set X can be well reconstructed in a low-dimensional space obtained by P . By combining Equation (6) with Equation (5) together, the objective function of the proposed SOGSFE algorithm is given as follows:

min  ( PT xi  PT x j sij   sij2 )  2Tr ( F T LS F ) 2

P,S ,F

2

i, j

N

  ( Pxi  PX i i i 1

2 F

  i 1 )   X  PT PX

2

(7)

F

s.t. siT 1  1, 0  sij  1, F T F  I , PT P  I The objective function of SOGSFE is a joint optimization problem based on similarity matrix

S , projection matrix P , label matrix F , and sparse representation coefficients { i } , which is difficult to solve directly. Hence, an alternative iterative algorithm can be used to solve this problem. 

Update

i

by fixing other variables.

 i , Equation (7) can be transformed into:

When other variables fix except N

min  ( Pxi  PX i i i 1

2 F

  i 1 )

(8)

Equation (8) has the same objective function as SRC [24], and the convex optimization technique in [24] can be used to obtain 

i .

update P by fixing other variables to. When other variables fix except P , Equation (7) can be transformed into: n

2

i 1

F

min  PT xi  PT x j sij   Pxi  PX i i T 2

P PI

i, j

2

  X  PT PX

According to Equation (2), Equation (9) can be rewritten as:

2 F

(9)

n

2

i 1

F

min Tr ( P T X T LS XP )    Pxi  PX i  i T

P PI

min Tr ( P T X T LS XP )   P

PT P  I

2 F

  X  P T PX

  X  P T PX

2 F

2 F





(10)

min Tr ( P T X T LS XP  P T P T   ( X T  X T P T P)( X  P T PX )) 

PT P  I

min Tr ( P( X T LS X   T  XX T ) P T )

PT P  I

where   [ 1 ,  2 ,,  N ] ,

i  xi  X i i . P can be computed by the following Equation

(11):

P( X T LS X  T  XX T )  P

(11)

P is composed of the eigenvectors associated with the first l smallest eigenvalues of Equation (11). 

Update S by fixing other variables. When other variables fix except S , Equation (7) can be transformed into:

min  ( PT xi  PT x j sij   sij2 )  2Tr ( F T LS F ) 2 2

i, j

(12)

s.t. s 1  1, 0  sij  1 T i

According to Equation (2), Equation (12) can be rewritten as:

min  ( PT xi  PT x j sij   sij2 )    fi  f j sij 2

2

2

i, j

i, j

2

(13)

s.t. s 1  1, 0  sij  1 T i

Because it is independent for the similarity vector of each data point, the optimal problem for each sample can be resolved as follows:

min  ( P T xi  P T x j sij  sij2 )    f i  f j sij 2

2

2

j

i, j

2

(14)

s.t. i, s 1  1,0  sij  1 T i

Let mij  PT xi  PT x j

2 2

, nij  f i  f j

2

and dij  mij  nij , Equation (14) can be

2

rewritten as:

1 min si  di siT 11, 0 sij 1 2 

2

(15) 2

According to [28], the optimal solution of Equation (15) can be obtained. update F by fixing other variables. When other variables fix except P , Equation (7) can be transformed into:

min Tr ( F T LS F )

(16)

F T F I

Supposed that there are l labeled samples and u unlabeled samples in the training set X . Without loss of generality, all the training samples are rearranged by the first l labeled sample the rest unlabeled samples. In this way, the Laplacian matrix LS and the label matrix F are split

 Lll

into the blocks, which are denoted as LS  

 Lul

Llu   and F  [ Fl ; Fu ] , respectively. Luu 

According to GFHF [33], the optimal solution of Equation (16) can be obtained as:

Fu   Luu1 Lul Fl

(17)

The specific process of the proposed SOGSFE is summarized in algorithm 1. Algorithm 1: Structured optimal graph based SOGSFE Input: Training sample matrix X  [ x1 , x2 ,, xN ] , the number of classes c, parameters

 ,  ,  , , and  , and projection matrix P  Rmd . Initialization: P = Ppca , maxiter  1000 , iter  0 While (iter  maxiter) Step 1. { i } is being updated by Equation (8). Step 2. P is being updated by Equation (11). Step 3. S is being updated, and each vector si in S are obtained by solving Equation (15). Step 4. The unlabeled fraction of the label matrix F is being updated by Equation (17). Endwhile Output: The projection matrix P and label matrix F . 3.2 Convergence analysis of the proposed SOGSFE The proposed SOGSFE can obtain a local optimal solution. To prove the convergence of the proposed SOGSFE method, a lemma introduced by Nie et al. [34] is described as follows. Lemma 1. If u and v are any two positive real numbers, the following inequality holds:

u

u

v

 v

(18)

2 v 2 v P Theorem 1. In the Algorithm, updated will decrease the objective value in Equation (10) until convergence.

~

Proof. Suppose the the updated P in each iteration is denoted by P , it is easy to obtain:

~ ~ Tr ( P T X T LS XP ) 

n ~ ~ (  P xi  P X i  i i 1 n

2 F

2  Pxi  PX i  i

 Tr ( P T X T LS XP ) 

)2 2

i 1

F

n

2

i 1 n

F

(  Pxi  PX i  i

2  Pxi  PX i  i i 1

~ ~   X  P T PX

2 F

(19)

)2 2

  X  P T PX

F

According to Lemma 1, the following inequality (20) can be derived:

2 F

n

~

~

  P xi  P X i  i

2 F

i 1



n 2 ~ ~ (  P xi  P X i  i ) 2 F

i 1 n

2  Pxi  PX i  i i 1

(  Pxi  PX  i ) i

   Pxi  PX  i i

i 1

2 F



F

(20)

n

n

2

2

2

F

i 1 n

2  Pxi  PX i  i i 1

2 F

Summing over the inequality (19) and inequality (20), we can obtain an inequality as follows. n ~T T ~ ~ ~ Tr ( P X LS XP )    Pxi  PX i  i i 1

n

2 F

 Tr ( P X LS XP )    Pxi  PX  i T

T

i

i 1

~ ~   X  P T PX

2 F

(21) 2 F

  X  P PX T

2 F

Therefore, the proof is completed.



4 Experiments and results In this section, we evaluate the proposed SOGSFE algorithm based on five public image databases. Six state-of-the-art methods including PCA [5], MSEC [35], DLSR [36], NLDLSR [37], SOGFS [31], SDR [23] are used to verify the effectiveness of our SOGSFE. The test samples are performed as unlabeled samples in the SOGSFE algorithm. In addition, the optimal values are selected for the parameters of algorithms. 4.1 Data sets Experiments are performed on five image databases, including AR [38], FERET [39], GT [40], ORL [41] and UMIST [36], and the description of image sets are listed below. 

AR image database The AR dataset has over 4000 color images from 126 persons. Moreover, it contains 26

frontal view images under various occlusions, facial expressions, and lighting conditions for each person. The images of 120 persons were obtained in two stages (14 days apart) and each stage includes 13 color face images. Fourteen images from 120 persons in which each stage has seven images were used in the experiment. Each image was converted into a grayscale image. All images were set to 50×40 pixels. 

FERET image database The whole FERET database includes 13539 images from 1565 objects. The images were

captured under different illumination and facial expression. A selected subset has 1400 face images from 200 objects, and each object includes seven images. Each image was resized to 40×40 pixels. 

Georgia Tech image database Georgia Tech database (GT) contains image data from 50 objects. Each object has fifteen

images with a complex background. The size of the image data was 640x480 pixels. The images under various scales, expressions and illuminations may be frontal and/or tilted. All images were set to 30×25 pixels. 

ORL image database The ORL database possesses 400 face images from 40 people, and each person has 10 images

obtained at different times. There are various facial details and expressions on these images. The size of all images was set to 56×46 pixels. 

UMIST image database The UMIST database contains 575 images from 20 people. Each individual is a mix of

appearance, sex, and race. All images were obtained in a range of poses from frontal to profile views. Each person has 19 to 48 images. All face images were resized to 56×46 pixels. More details of those five data sets are listed in Table 1. Table 1. The description of five datasets Database

Size in each class

Dimension

Class number

AR

14

2000 (50×40)

120

FERET

7

1600 (40×40)

200

Georgia Tech

15

750 (30×25)

50

ORL

10

2576 (56×46)

40

UMIST

19 to 48

2576 (56×46)

20

Some sample images from each database are shown in Figure 1.

(a)

(b)

(c)

(d)

(e) Figure 1. Various samples from five databases. (a) AR, (b) FERET, (c) GT, (d) ORL, (e) UMIST.

4.2 Recognition results The above five image databases are used to evaluate the proposed SOGSFE algorithm. In the experiment, the first l (l  1, 2,

) samples per subject are utilized for a training set, and the

rest ( N  l ) per subject is used as a test set, respectively. The recognition performance on AR, FERET, GT, ORL, and UMIST image databases are demonstrated in Tables 2 to 6, respectively. Based on the results in the tables, we can draw some conclusions as follows. Firstly, the similarity matrix S is learned to act as a part of the classification task in the proposed SOGSFE algorithm. The similarity matrix S , the label matrix F , and the projection matrix P can be obtained by the iterative calculation. It is worth noting that the structured optimal graph is applied in the proposed SOGSFE algorithm, and both feature extraction and local structure learning are simultaneously performed, which ensures favorable recognition performance and high robustness of the proposed SOGSFE. Secondly, the L1 constraint is added to the observed data in the projection space, guaranteeing robust extracted features. Thirdly, the proposed SOGSFE algorithm exceeds other state-of-the-art methods on five image databases. Table 2. Recognition rates of seven algorithms on AR dataset (%) Number of the training samples per object Methods 1

2

3

4

5

6

7

PCA

43.93

43.45

46.07

46.31

53.69

60.36

62.98

MSEC

67.02

67.14

66.79

65.83

70.06

74.52

75.12

DLSR

68.21

67.38

67.62

66.07

70.60

75.48

75.71

NLDLSR

68.89

69.17

70.71

70.60

73.10

76.55

77.38

SOGFS

54.05

55.24

47.98

42.62

50.95

60

62.14

SDR

45

44.76

46.43

46.90

54.76

60.48

63.81

SOGSFE

68.10

70.88

71.5

72.19

74.67

77.29

79.31

Table 3. Recognition rates of seven algorithms on FERET dataset (%) Number of the training samples per object Methods 1

2

3

4

5

6

PCA

40

48.10

42.50

55

43.25

45

MSEC

43.67

57.60

45.12

62.36

65.68

68.91

DLSR

44.54

58.78

59.24

63.38

67.96

69.67

NLDLSR

48.31

60.73

62.94

70.89

74.66

77.98

SOGFS

37.83

48.20

45.50

52.50

55.25

58.5

SDR

42.75

50.9

45.25

57.5

45.5

48.6

SOGSFE

51.58

66.7

63.12

79

81.75

83

Table 4. Recognition rates of seven algorithms on Georgia Tech dataset (%) Number of the training samples per object Methods 1

2

3

4

5

PCA

36.57

45.23

47.67

51.82

53.60

MSEC

33.00

42.77

46.00

48.36

51.60

DLSR

33.71

44.46

49.00

50.55

54.40

NLDLSR

36.14

45.38

49.00

53.64

56.80

SOGFS

43

54.77

55.5

60.55

61.8

SDR

40.14

47.38

50.17

52.55

56

SOGSFE

44.25

56.15

57.67

61.36

64

Table 5. Recognition rates of seven algorithms on ORL dataset (%) Number of the training samples per object Methods 1

2

3

4

5

PCA

69.44

80.31

85

87.92

90

MSEC

68.12

78.43

82.91

83.95

88.63

DLSR

71.65

81.95

83.13

84.61

86.56

NLDLSR

73.28

83.47

86.64

89.79

90.89

SOGFS

68.33

83.75

84.64

89.17

89.5

SDR

70.83

81.25

85.71

88.75

90.5

SOGSFE

79.72

87.5

88.21

91.25

93.25

Table 6. Recognition rates of seven algorithms on UMIST dataset (%) Number of the training samples per object Methods 1

2

3

4

5

6

7

PCA

46.53

60.00

71.72

75.63

84.60

88.51

89.66

MSEC

58.28

58.28

69.79

75.31

83.29

88.74

90.57

DLSR

55.45

60.00

74.25

80.23

85.98

88.74

91.95

NLDLSR

56.78

60.69

73.79

81.61

88.28

89.89

93.10

SOGFS

36.09

61.38

72.87

73.56

88.28

96.55

96.55

SDR

50.11

62.07

72.41

77.93

86.21

88.97

91.49

SOGSFE

68.74

78.62

87.59

88.51

95.63

97.47

97.93

4.3 Image visualization To illustrate that the proposed SOGSFE method can capture local discrimination information well in a transformed space, we give its visualization on two databases. Five samples from each class are performed as a training set for ORL and GT databases, respectively, and the rest is selected as a test set. After the projection matrix P is obtained, the visualization of all the samples (including training samples and test samples) on ORL and GT databases is shown in Figure 2 and 3 by t-SNE, respectively [42, 43]. As seen from Figures 2 and 3 we can find that: 1) the local structure information can be preserved in the projection space; 2) the proposed method cannot well capture global discriminative information, and) 3 the visualization display also validates their recognition performance (e.g., Figure 2 corresponds to Table 5, and Figure 3 corresponds to Table 4).

(a) The original features

(b) The extracted features by the proposed method

Figure 2. The t-SNE visualization of (a) the original features, and (b) the extracted features by the proposed method on ORL image database

(a) The original features

(b) The extracted features by the proposed method

Figure 3. The t-SNE visualization of (a) the original features, and (b) the extracted features by the proposed method on GT image database

4.4 Parameter selection There are five parameters in the SOGSFE algorithm, and their effects are investigated in this

 can be adaptively obtained. Thus, we only discuss

section. According to [31], the parameter the parameters  ,

 ,  and  here. For brevity, FERET and GT databases are used to verity

the effects of each parameter on our algorithm. Figure 4 displays the experimental results. We can

 . Moreover, though it is slightly

see that the proposed SOGSFE algorithm is robust to  and

sensitive to  and  , the range of values for the two parameters can be determined. 0.8 SOGSFE

SOGSFE

0.8

0.7 0.7

0.6

Recognition rate

Recognition rate

0.6 0.5 0.4

0.5 0.4 0.3

0.3

0.2

0.2

0.1

0.1 0

0 0.001

0.01

0.1

1 10 Parameter 

100

1000

0.001

0.01

0.1

1 10 Parameter 

100

1000

0.8 SOGSFE

0.8

SOGSFE 0.7

0.7

0.6 Recognition rate

Recognition rate

0.6 0.5 0.4 0.3

0.5 0.4 0.3

0.2

0.2

0.1

0.1

0

0.5

1

1.5

2

2.5 3 Parameter 

3.5

4

4.5

0

5

0.5

1

1.5

2

2.5 3 Parameter 

3.5

4

4.5

5

0.8 SOGSFE

0.8

SOGSFE 0.7

0.7

0.6 Recognition rate

Recognition rate

0.6 0.5 0.4 0.3

0.5 0.4 0.3 0.2

0.2

0.1

0.1 0

0.5

1

1.5 2 Parameter 

2.5

0

3

0.01

0.05

0.1

0.5 Parameter 

1

2

3

0.8 0.8

SOGSFE

SOGSFE 0.7

0.7 0.6 Recognition rate

Recognition rate

0.6 0.5 0.4 0.3

0.5 0.4 0.3

0.2

0.2

0.1

0.1

0

0.2

0.5

0.8

1.1

1.4 1.7 Parameter 

2

2.3

2.6

2.9

0

0.2

0.5

0.8

1.1

1.4 1.7 Parameter 

2

2.3

2.6

2.9

(a)

(b)

Figure 4. Classification accuracy with different parameters

 , , 

, and

 . (a) FERET, (b) GT

5 Conclusion In the paper, a new structured optimal graph based sparse feature extraction (SOGSFE) algorithm has proposed for semi-supervised learning. In the newly proposed approach manifold learning, sparse representation, label propagation, and discriminant projection are integrated into a unified framework for dimension reduction. An efficient optimization method is also presented to solve the objective function of the proposed SOGSFE algorithm. Extensive results on several image databases indicate that the presented SOGSFE method outperforms other competing feature extraction approaches. In the future, the local discriminative information can be fully considered to obtain a more discriminative projection.

Acknowledgement This work was supported in part by the National Natural Science Foundation of China under Grant U1504610, Grant 61971339 and Grant 61471161, in part by the Key Project of the Natural Science Foundation of Shaanxi Province under Grant 2018JZ6002, in part by the Scientific and Technological Innovation Team of Colleges and Universities in Henan Province under Grant 20IRTSTHN018, in part by the National Key Research and Development Project under Grant 2016YFE0104600, and in party by the Natural Science Foundations of Henan Province under Grant 192102210130 and Grant 19B520008.

Conflicts of Interest The authors declare that there are no conflicts of interest regarding the publication of this paper. Author contributions Zhonghua Liu: Conceptualization, Methodology, Software, Investigation, Writing - Original Draft. Zhihui Lai: Resources, Writing - Review & Editing, Supervision. Weihua Ou: Writing: Review & Editing. Kaibing Zhang: Writing: Review & Editing. Ruijuan Zheng: Writing: Review & Editing.

Reference [1] Luofeng Xie. Ming Yin, Xiangyun Yin, et al. Low-rank sparse preserving projections for dimensionality reduction. IEEE Transactions on Image Processing, 2018, 27(11): 5261-5274.

[2] Ya Li, Xinmei Tian, Tongliang Liu, et al. On better exploring and exploiting task relationships in multitask learning: joint model and feature learning. IEEE transactions on neural networks and learning systems, 2018, 29(5): 1975-1985. [3] Gholamreza Akbarizadeh, Zeinab Tirandaz. Segmentation parameter estimation algorithm based on curvelet transform coefficients energy for feature extraction and texture description of SAR images. 7th Conference on Information and Knowledge Technology (IKT), Urmia, Iran, 2015, DOI: 10.1109/IKT.2015.7288778. [4] Minnan Luo, Feiping Nie, Xiaojun Chang, et al. Adaptive unsupervised feature selection with structure regularization. IEEE transactions on neural networks and learning systems, 2018, 29(4): 944-956. [5] F Taibi, G Akbarizadeh, E Farshidi. Robust reservoir rock fracture recognition based on a new sparse feature learning and data training method. Multidimensional Systems and Signal Processing, 2019, DOI: 10.1007/s11045-019-00645-8. [6] Gholamreza Akbarizadeh. A new statistical-based kurtosis wavelet energy feature for texture recognition of SAR images. IEEE Transactions on Geoscience and Remote Sensing, 2012, 50(11): 4358-4368. [7] Foroogh Sharifzadeh, Gholamreza Akbarizadeh, Yousef Seifi Kavian. Ship classification in SAR Images using a new hybrid CNN-MLP classifier. Journal of the Indian Society of Remote Sensing, 2019, 47(4): 551-562. [8] Akrem Sellami, Mohamed Farah, Imed Riadh Farah, et al. Hyperspectral imagery classification based on semi-supervised 3-D deep neural network and adaptive band selection. Expert Systems with Applications, 2019, 129: 246-259. [9] Mohammad Modava, Gholamreza Akbarizadeh. Coastline extraction from SAR images using spatial fuzzy clustering and the active contour method. International Journal of Remote Sensing, 2017, 38(2): 355-370. [10] S. T. Roweis, L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323–2326. [11] J. B. Tenenbaum, V. de Silva, J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319–2323. [12] M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003, 15(6): 1373–1396. [13] Yoshua. Bengio, Jean-Francois Paiement, Pascal Vincent, et al. Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. Advances in Neural Information Processing, Vancouver, BC, Canada, 2004, pp. 1–8. [14] Wenbo Yu, Miao Zhang, Yi Shen. Learning a local manifold representation based on improved neighborhood rough set and LLE for hyperspectral dimensionality reduction. Signal Processing, 2019, 164: 20-29. [15] Wankou Yang, Zhenyu Wang, Changyin Sun. A collaborative representation based projections method for feature extraction. Pattern Recognition, 2015, 48:20-27. [16] Zhenyue Zhang, Jing Wang, Hongyuan Zha. Adaptive manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(2): 253-265. [17] Jia Zhang, Zhiming Luo, Candong Li, et al. Manifold regularized discriminative feature selection for multi-label learning. Pattern Recognition, 2019, 95: 136-150. [18] X. He, S. Yan, Y. Hu, et al. Face recognition using Laplacian faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(3): 328–340. [19] X. He, D. Cai, S. Yan, et al. Neighborhood preserving embedding. IEEE International Conference on Computer Vision (ICCV), Beijing, China, 2005, pp. 1208–1213.

[20] Zhihui Lai, Dongmei Mo, Wai Keung Wong, et al. Robust discriminant regression for feature extraction. IEEE Transactions on Cybernetics, 2018, 48(8): 2472-2484. [21] Yuwu Lu, Zhihui Lai, Yong Xu, et al. Low-rank preserving projections. IEEE Transactions on Cybernetics, 2016, 46(8): 1900-1912. [22] Peiguang Jing, Yuting Su, Zhengnan Li, et al. Low-rank regularized tensor discriminant representation for image set classification. Signal Processing, 2019, 156: 62-70. [23] Lei Zhang, Meng Yang, Zhizhao Feng, et al. On the dimensionality reduction for sparse representation based face recognition. International Conference on Pattern Recognition, 2010, pp. 1237-1240. [24] J. Wright, A. Y. Yang, A. Ganesh, et al. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210–227. [25] Yong Xu, Xiaozhao Fang, Jian Wu, et al. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Transactions on Image Processing, 2016, 25(2): 850-863. [26] Yong Peng, BaoLiang Lu, Suhang Wang. Enhanced low-rank representation via sparse manifold adaption for semi-supervised learning. Neural Networks, 2015, 65: 1-17. [27] Zehtab Alasvand Andekah, Marjan Naderan,Gholamreza Akbarizadeh. Semi-supervised Hyperspectral image classification using spatial-spectral features and superpixel-based sparse codes. 25th Iranian Conference on Electrical Engineering (ICEE), 2017, pp. 2229-2234. [28] Zhiqiang Zeng, Xiaodong Wang, Fei Yan. Local adaptive learning for semi-supervised feature selection with group sparsity. Knowledge-based Systems, 2019, 181: 104787. [29] Dalton Lunga, Saurabh Prasad, Melba M. Crawford, et al. Manifold-learning-based feature extraction for classification of hyperspectral data. IEEE Signal Processing Magazine, 2014, 31(1): 55-66. [30] B. Mohar, Y. Alavi, G. Chartrand, et al. The laplacian spectrum of graphs. Graph theory, combinatorics, and applications, 1991, 2: 871–898. [31] Feiping Nie, Wei Zhu, Xuelong Li. Unsupervised feature selection with structured graph optimization. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 2016, pp.1302-1308. [32] Fan, K. On a theorem of weyl concerning eigenvalues of linear transformations i. Proceedings of the National Academy of Sciences of the United States of America, 1949, 35(11): 652. [33] X. Zhu, Z. Ghahramani, J. D. Lafferty. Semisupervised learning using gaussian fields and harmonic functions. In Machine Learning, Proceedings of the Twentieth International Conference (ICML), August 21-24, Washington, DC, USA, 2003, pp. 912–919. [34] F. Nie, H. Huang, X. Cai, et al. Efficient and robust feature selection via joint L2,1-norms minimization. In Advances in Neural Information Processing Systems, 2010, pp.1813–1821. [35] Yong Xu, Xiaozhao Fang, Qi Zhu, et al. Modified minimum squared error algorithm for robust classification and face recognition experiments. Neurocomputing, 2014, 135:253-261. [36] Shiming Xiang, Feiping Nie, Gaofeng Meng, et al. Discriminative least squares regression for multiclass classification and feature selection. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(11): 1738-1754. [37] Zhonghua Liu, Gang Liu, Jiexin Pu, et al. Noisy label based discriminative least squares regression and its kernel extension for object identification. KSII Transactions on Internet & Information Systems, 2017, 11(5): 2523-2538. [38] A. Martinez, R. benavente. The AR face database. CVC Tech. Report

No. 24, 1998.

[39] P. Jonathon Phillips, Harry Wechsler, Jeffery Huang, et al. The FERET database and evaluation procedure for face-recognition algorithms. Image and Vision Computing, 1998, 16(5): 295-306. [40] Yong Xu, Xuelong Li. Jian Yang, et al. Integrating conventional and inverse representation for face recognition. IEEE Transactions on Cybernetics, 2014, 44(10): 1738–1746. [41] F. Samaria, A. Harter. Parameterisation of a stochastic model for human face identification, in: Second IEEE Workshop on Applications of Computer Vision, Sarasota, FL, December 1994. [42] L. V. D. Maaten, G. Hinton (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 2008, 9: 2579–2605. [43] Jie Wen, Yong Xu, Zuoyong Li, et al. Inter-class sparsity based discriminative least square regression. Neural Networks, 2018, 102: 36-47.