Multi-manifold locality graph preserving analysis for hyperspectral image classification

Multi-manifold locality graph preserving analysis for hyperspectral image classification

Multi-manifold Locality Graph Preserving Analysis for Hyperspectral Image Classification Communicated by Dr grana manuel Journal Pre-proof Multi-ma...

21MB Sizes 2 Downloads 84 Views

Multi-manifold Locality Graph Preserving Analysis for Hyperspectral Image Classification

Communicated by Dr grana manuel

Journal Pre-proof

Multi-manifold Locality Graph Preserving Analysis for Hyperspectral Image Classification Guangyao Shi, Hong Huang, Zhengying Li, Yule Duan PII: DOI: Reference:

S0925-2312(20)30044-8 https://doi.org/10.1016/j.neucom.2019.12.112 NEUCOM 21758

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

26 June 2019 22 October 2019 27 December 2019

Please cite this article as: Guangyao Shi, Hong Huang, Zhengying Li, Yule Duan, Multi-manifold Locality Graph Preserving Analysis for Hyperspectral Image Classification, Neurocomputing (2020), doi: https://doi.org/10.1016/j.neucom.2019.12.112

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Elsevier B.V. All rights reserved.

Multi-manifold Locality Graph Preserving Analysis for Hyperspectral Image Classification Guangyao Shi, Hong Huang∗, Zhengying Li, Yule Duan Key Laboratory of Optoelectronic Technology and Systems of the Education Ministry of China, Chongqing University, Chongqing 400044, China

Abstract Manifold learning has been successfully applied to hyperspectral image (HSI) classification by modeling different land covers as a smooth manifold embedded in a high-dimensional space. However, traditional manifold learning algorithms were proposed with the assumption of single manifold structure in HSI, while the samples in different subsets may belong to different submanifolds. In this paper, a novel dimensionality reduction (DR) method called multi-manifold locality graph preserving analysis (MLGPA) was proposed for feature learning of HSI data. According to the label information of HSI, MLGPA divides the samples data into different subsets, and each subset is treated as a sub-manifold. Then, it constructs a within-manifold graph and a between-manifold graph for each sub-manifold to characterize within-manifold compactness and between-manifold separability, and a discriminant projection matrix can be obtained by maximizing the betweenmanifold scatter and minimizing the within-manifold scatter simultaneously. Finally, low-dimensional embedding features of different sub-manifolds are fused to improve the classification performance. MLGPA can effectively reveal the multi-manifold structure and improve the classification performance of HSI. Experimental results on three real-world HSI data sets demonstrate that MLGPA is superior to some state-of-the-art methods in terms of classification accuracy. ∗

Corresponding author Email addresses: [email protected] ( Guangyao Shi), [email protected] ( Hong Huang), [email protected] (Zhengying Li), [email protected] (Yule Duan)

Preprint submitted to Neurocomputing

January 11, 2020

Keywords: Manifold learning, Hyperspectral image, Dimensionality reduction, Multi-manifold structure, Discriminant projection matrix 1. Introduction

5

10

15

20

25

30

Hyperspectral image (HSI) provides much richer spectral information than multispectral image, it is widely used in the fields of urban planning, geological research, crop analysis, and mineral identification[1, 2, 3, 4]. However, owing to each pixel with a large number of spectral bands, the classification process requires huge computational resources and storage capacity[5, 6, 7, 8], and the classification performance often decreases as the dimension of data increases, especially when a few labeled samples are available[9, 10]. To address this issue, dimensionality reduction (DR) is explored to reduce the number of bands while some desirable intrinsic information is preserved[11, 12, 13, 14, 15]. To reveal the manifold structure in high-dimensional data, many manifold learning methods have been proposed[16, 17, 18], such as local linear embedding (LLE)[19], isometric feature mapping (ISOMAP)[20], Laplacian eigenmaps (LE)[21], and local tangent space alignment (LTSA)[22]. LLE assumes that the data is linear over a small locality, and preserves the local linear structure of data in low-dimensional space[23]. ISOMAP characterizes the data distribution by geodesic distances instead of Euclidean distances, it seeks a lower-dimensional embedding which maintains geodesic distances between all points[24]. LE reveals the local similarity relationship between data, and it preserves the local neighborhood information by a Laplacian matrix[25]. LTSA seeks to characterize the local geometry at each neighborhood via its tangent space, and performs a global optimization to align these local tangent spaces to learn the embedding. However, all these methods are non-linear DR methods, and they have no direct mapping relationship from high-dimensional space to low-dimensional space[26]. To overcome this problem, some linear manifold learning methods were proposed to approximate nonlinear ones, such methods include locality preserving projections (LPP)[27], neighborhood preserving embedding (NPE)[28] and linear LTSA (LLTSA)[29]. These methods can directly map new samples to low-dimensional space through corresponding mapping matrix. However, LPP, NPE and LLTSA do not utilize the label information of training samples, and they cannot achieve good classification performance in certain 2

35

40

45

50

55

60

65

70

scenes[30, 31]. To unify the above DR methods, a graph embedding (GE) framework has been proposed to reformulate many DR methods according to the statistics and geometry theory[32, 33]. Based on this framework, local geometric structure Fisher analysis (LGSFA) was developed for DR of HSI data[34]. LGSFA built an intrinsic graph and a penalty graph to reveal the intrinsic structure in HSI and enhance the intraclass compactness and the interclass separability of low-dimensional features. However, the above manifold learning algorithms assume that high dimensional data are located on a single manifold. While in practical applications, there may be many different subsets in the high-dimensional data, and each subset lies on a low-dimensional sub-manifold[35, 36, 37]. The above single manifold learning methods cannot reveal the multi-manifold structure in HSI, which leads to degenerated performance of HSI classification. To address this problem, some multi-manifold learning methods were explored to discover the intrinsic structure in high-dimensional data. Xiao et al.[38] proposed a multi-manifold leaning algorithm based on LPP (M-LPP), which assumes that different face expressions may reside on different manifolds, and it designed a generalized framework for modeling and recognizing facial expressions on multiple manifolds. Wang et al.[39] proposed a multimanifold learning-LLGPE (M-LLGPE) algorithm based on linear local and global preserving embedding (LLGPE) method, it designs the manifolds of different classes by LLGPE, and then projects the data into low-dimensional spaces. Huang et al.[40] proposed a multi-feature manifold discriminant analysis (MFMDA) algorithm, which construct the intrinsic graphs and penalty graphs of spectral features and textural features within GE framework, it learns low-dimensional embedding space from original spectral features as well as textural features for compacting the intramanifold samples while separating intermanifold samples. An et al.[41] proposed a tensor-based low rank graph with multi-manifold regularization (T-LGMR) algorithm, which constructs tensor-based within-class and between-class graphs to characterize the within-class compactness and the between-class separability, and exploits the geometric information of tensor samples along the spatial and spectral dimensions. Shi et al.[42] proposed a supervised multi-manifold learning (SMML) algorithm, which extracts multi-manifold features through maximizing the between-class Laplacian graph and projects samples from different classes into the respective sub-manifolds. However, it only considered the interclass structure information, but ignored the intraclass information. To address the aforementioned issue, we proposed a new multiple man3

75

80

ifold DR method, called multi-manifold locality graph preserving analysis (MLGPA) for HSI classification. MLGPA makes full use of multi-manifold structure and spectral information in HSI to extract discriminant features for classification, the main contributions of MLGPA can be summarized as follows. • According to the label information of HSI, MLGPA divides the samples into different subsets, and each subset is treated as a submanifold, which is beneficial to describe the intrinsic characteristics of samples from the same submanifold. • Based on GE theory, a within-manifold graph and a between-manifold graph are constructed for each sub-manifold, which aims to enhance the within-manifold compactness and between-manifold separability.

85

90

95

• The proposed MLGPA method seeks a discriminant projection matrix for each sub-manifold by maximizing the between-manifold scatter and minimizing the within-manifold scatter, and low dimensional embedding for each sub-manifold can be obtained by the corresponding project matrix. These embedding features can reflect the lowdimensional characteristics on each submanifold, and they are fused to enhance the classification performance. The remainder of the paper is organized as follows. In Section II, we briefly review the theories of GE and marginal fisher analysis (MFA). Section III details our proposed MLGPA method. Experimental results on three HSI datasets are presented in Section IV to demonstrate the effectiveness of MLGPA. Finally, Section V provides some concluding remarks and suggestions for future work. 2. Related Works

100

105

Let us suppose an HSI dataset with D bands can be defined as X = [x1 , x2 , . . . , xN ] ∈ RD×N , where N refers to the number of samples in HSI. The class label of xi˙ is denoted as li ∈ {1, 2, . . . , c}, where c represents the number of land-cover types. The goal of linear dimensionality reduction methods is to seek for a projection matrix V ∈ RD×d which can map the high-dimensional data X ∈ RD×N to low-dimensional embedding features Y ∈ Rd×N , where d D is the dimension of embedded data. For simplicity, Table I summarizes the mathematical notations used in this paper. 4

Table 1: Notation and Definitions. X Training Set Y Low-dimensional embedding of X D Number of spectral bands N Number of training samples c Number of land cover types xi Vector representation of i -th pixel yi Low-dimensional representation of i -th pixel `i Class label of i -th pixel d Embedding dimension of yi G Intrinsic graph GP Penalty graph W Similarity matrix of G WP Similarity matrix of GP wij Weight of xi and xj in graph G P wij Weight of xi and xj in graph GP h A constant H Laplacian matrix of G L Laplacian matrix of GP m Number of within-class neighbors n Number of between-class neighbors V Projection matrix Mr r -th sub-manifold i -th sample in the r -th sub-manifold Mir p Number of samples in the r -th sub-manifold Gw Intrinsic graph of the r -th sub-manifold r b Gr Penalty graph of the r -th sub-manifold Wrw Similarity matrix of Gw r Wrb Similarity matrix of Gbr Cim (xi ) Neighbor set of Mir in graph Gw r Cin (xi ) Neighbor set of Mir in graph Gbr yri Low-dimensional representation of i -th pixel in the r -th sub-manifold Vr Projection matrix of the r -th sub-manifold λr Corresponding eigenvalues of the r -th sub-manifold vr1 , vr2 , · · · , vrd The first d eigenvectors for the r -th sub-manifold

5

110

115

120

2.1. Graph Embedding The graph embedding (GE) framework helps to unify most DR algorithms. The main idea of GE is to characterize the statistical or geometric properties of data by constructing two undirected weighted graphs: an intrinsic graph and a penalty graph. The intrinsic graph G = {X, W} is used to represent characteristic between samples, while the penalty graph  similarities GP = X, WP is adopted to reveal dissimilarities relationship between samples, where X is the vertex of a graph, W ∈ RN ×N and WP ∈ RN ×N refers to the similarity matrices of the intrinsic and penalty graphs, respectively. If xi and xj are adjacent points, then an edge is put to connect them. Parameter wij is the weight of xi and xj in graph G which measures the degree of similarity between xi and xj in G, while wPij is the weight of xi and xj in graph GP , it reflects the dissimilarity characteristic of xi and xj in graph GP . The goal of GE is to obtain low-dimensional embedding of each vertex, and it preserves the similarity relationship between vertex pairs during mapping process. The objective function of GE can be given as  1X kyi − yj k2 wij = min tr YT LY tr(Y T HY)=h tr(Y T HY)=h 2 i6=j min

125

(1)

where h is a constant, tr(•) is the trace of an arbitrary square matrix. H and L are the Laplacian matrices of G and GP , respectively. Then, L and LP can be defined as L = D − W, LP = DP − WP ,

oN nXN wij ) D = diag( j=1

DP = diag(

nXN

 P N P in which W = [wij ]N i,j=1 and W = wij i,j=1 .

j=1

(2)

i=1

P wij

oN

)

(3)

i=1

2.2. Marginal Fisher Analysis Based on GE framework, Marginal Fisher Analysis (MFA) is proposed with prior information of training samples. For intrinsic graph G, each vertex is connected with the its neighbor points that are from the same class. The

6

weight wij between xi and xj in G is denoted as   1, x ∈ C (x ) or x ∈ C (x ) and l = l j im i i im j i j wij =  0, otherwise 130

135

(4)

In which Cim (xi ) represents the neighbor set of xi in G, m is a positive integer which refers to the number of within-class neighbors to construct graph G, it is a predefined parameter in the experiment. The intrinsic graph characterizes the similarity relationship of each point in the same class. In penalty graph GP , each vertex is connected with its neighbor points that are from different classes. The weight wPij between xi and xj in GP can be expressed as follows:   1, x ∈ C (x ) or x ∈ C (x ) and l 6= l i in j j in i i j P (5) wij =  0, otherwise where Cin (xi ) is the neighbor set of xi in GP , n denotes the number of between-class neighbors to construct graph GP , it is also a predefined parameter in the experiment. The penalty graph focuses on revealing the similarity relationship between different categories of data. In order to aggregate the data with the same label and separate the data from different classes, projection matrix V can be calculated by solving the following objective function:

P T 

V xi − VT xj 2 wij tr VT XLXT V i,j (6) J(V) = min P 2 P = T T tr (VT XLP XT V) i,j kV xi − V xj k wij

3. Multi-Manifold Locality Graph Preserving Analysis 140

145

To reveal the intrinsic multi-manifold structure embedded in HSI, a multimanifold locality graph preserving analysis (MLGPA) method was proposed for HSI classification. MLGPA divides samples into different subsets based on their label information, and it treats each subset as a sub-manifold. Then, it chooses within-manifold and between-manifold neighbors to construct a within-manifold graph and a between-manifold graph for each sub-manifold, each sub-manifold can seek for a discriminant matrix by maximizing the 7

150

155

160

165

between-manifold scatter and minimizing the within-manifold scatter simultaneously, and low dimensional embedding for each sub-manifold can be obtained via the corresponding project matrix. Finally, it fuses the embedding features of each sub-manifold for the process of classification. MLGPA enhances the within-manifold compactness and the between-manifold separability of HSI, and it possesses a better discriminative power to improve the classification performance of HSI. The process of the MLGPA method is shown in Figure 1. Denote the r -th sub-manifold as Mr = [M1r , M2r , . . . , Mnr r ], where nr is the number of samples in sub-manifold Mr . Then, the HSI data set can be expressed as X = [M1 , M2 , . . . , Mc ] ∈ RD×c , where c is the number of land-cover types. To characterize the discriminative multi-manifold structure of HSI, MLGPA constructs a within-manifold graph and a between-manifold graph for each sub-manifold. For within-manifold graph, the vertexes and its neighbor points come from the same manifold. In between-manifold graph, the vertexes and its neighbor points belong to different manifold. Denote the w within-manifold graph of sub-manifold Mr as Gw r (Mr , Wr ), where Mr is w w the vertex of Gw r , and Wr refers to the similarity matrix of Gr , then the i w connecting weight wij between vertex Mr and its j -th neighbor point Mjr can be given as

w wij =

170

   2 Mir −Mjr k k   exp − , Mjr ∈ Cim (Mir ) or Mir ∈ Cim (Mjr ) 2t2 ri

 

0,

(7)

otherwise

where Cim (Mir ) represents the neighbor set of Mir in graph Gw r . Parameter m denotes the neighbors to construct graph Gw r, Pmnumberi of within-manifold 1 j and tri = m j=1 kMr − Mr k is a kernel parameter.  Suppose the between-manifold graph of sub-manifold Mr as Gbr Mr , Wrb , where Mr is the vertex of Gbr , and Wrb refers to the dissimilarity matrix of Gbr , b then the weight wij between vertex Mir and its j -th neighbor point Mjs (r 6= s)

8

!"#$%&'()#(*(+"!"$,+& ,-&.#!$+$+/&0!1)2(*

32"$4 !+$-,25&6!#"$"$,+

!+$-,25&7#!)8&9,+*"#3:"$,+

5*&%2/1$(:

ü

5*&%2/1$(;

Ċ

5*&%2/1$(c

5*+-%6(4#7-#.#&+*+%/&( /2(8#.+(9*!71#.

6#,;(:"$,+&<(:",#*

! " # $

c

ü

B0C&93>(

01*..%2%3*+%/&(4#.,1+.

)#*+,-#.(),.%/&

!"#$$%&'()#*+,-#. =1>(55$+/&-(!"3#(*&?

!"#$%!&' !"#$%!&( !"#$%!&) !"#$%!&*

=1>(55$+/&-(!"3#(*&@

Ċ

=1>(55$+/&-(!"3#(*&D

=1>(55$+/&-(!"3#(*&A

Ċ

!"#$%!&c

=1>(55$+/&-(!"3#(*&c

Figure 1: Flowchart of the proposed MLGPA method.

9

can be expressed as    2 Mir −Mjs k k   exp − , Mjs ∈ Cin (Mir ) or Mr (i) ∈ Cin (Mjs ) 2t2is b wij =   0, otherwise

(8)

in which Cin (Mir ) represent the neighbor set of Mir in graph Gbr , n denotes the number of between-manifold neighbors to construct graph Gbr , P tis = n1 nj=1 kMir − Mjs k is a kernel parameter. Besides, Figure 2 shows the construction process of within-manifold and between-manifold graphs.

3 r s

!"#!$%&'$!()*+,-.'/#

s r

01"211$%&'$!()*+,-.'/# !"#$%&'(r

!"#$%&'(s

Figure 2: Construction process of within-manifold and between-manifold graph. 175

As we can see from Figure 2, M1r and M1s are the points of manifold Mr 1 and Ms , respectively. In the within-manifold graph Gw r , vertexes Mr obtain its neighbors from the same manifold they lie on. In the between-manifold graph Gbr , vertexes M1r find the neighbors from different manifolds. In order to characterize the structure of within-manifold graph Gw r , the scatter matrix of vertex Mir and its neighbor sets Cim (Mir ) in graph Gw r is

10

given as Sri =

m X h=1

Mir − Mhr



Mir − Mhr

T

w wih ,

Mhr ∈ Cim Mir



(9)

For all the vertexes in the r-th manifold, the total scatter matrix Sw r of sub-manifold Mr can be represented as Sw r

=

nr X

Sri

(10)

i=1

where nr refers to the number of samples in the r -th sub-manifold. In between-manifold graph Gbr , the scatter matrix from vertex Mir to sub-manifold Ms (s 6= r) are defined as ns  T b  X Mir − Mfs Mir − Mfs wif , H1 Mir , Ms = f =1

Mfs ∈ Cin Mir



(11)

in which ns refers to the number of samples in the manifold of Ms , and the scatter matrix from vertex Mjs to sub-manifold Mr can be given as H2 Mjs , Mr



=

nr X h=1

Mhr − Mjs



Mhr − Mjs

T

b whj ,

Mhr ∈ Cin Mjs



(12)

Therefore, the scatter matrix between sub-manifold Mr and sub-manifold Ms (s 6= r) can be represented as H (Mr , Ms ) = H1 (Mr , Ms ) + H2 (Ms , Mr ) nr ns X  X  = H1 Mir , Ms + H2 Mjs , Mr i=1

(13)

j=1

Then, the total scatter matrix Hbr between sub-manifold Mr and other manifolds can be computed as follows: Hbr

=

c X

H (Mr , Ms )

s=1 s6=r

11

(14)

i r i r

i r

!"!#$%&'"(

)*!$+#,%'("*-,"&%'

!"#$%&'()*+,-

.!'&/%01#2!*3&'

!"#$%&'()*+,.

!"#$%&'()*+,/

0'12'&#$%&'()*+,&3'42")56,53*%1')&62'7

.31833&#$%&'()*+,&3'42")56,53*%1')&62'7

Figure 3: An example for the processing of MLGPA.

To enhance the aggregation of samples from the same manifold in the low-dimensional space, an objective function of within-manifold graph in r th sub-manifold Mr can be defined as Jrw = arg min

nr X m X i=1 j=1

180

w kzi − zj k2 wij

(15)

where zi and zj represent the low-dimensional representation of Mir and Mjr in sub-manifold Mr , respectively. With some algebraic formulation, (15) can be formulated as J w (Vr ) = arg min

nr X m X i=1 j=1

w kzi − zj k2 wij

nr X m X

T i

w

Vr Mr − VrT Mjr 2 wij = i=1 j=1

=

nr X m X i=1 j=1

= tr VrT = tr

 h  T w i  Vr tr VrT Mir − Mjr Mir − Mjr wij "

nr X m X i=1 j=1

VrT Sw r Vr

 j

Mir − Mr



12

 j T

Mir − Mr

#

w wij Vr

!

(16)

To improve the between-manifold separability of HSI data in the lowdimensional space, an objective function is designed for between-manifold graph of the sub-manifold Mr , and it can be given as

J b (Vr ) = arg max

c nr X n X X i=1 f =1

s=1 s6=r

b kzi − zf k2 wif +

ns X n X j=1 h=1

b kzj − zh k2 whj

!

(17) With some algebraic formulation, it can also be formulated as the following form ! c ns X n nr X n X X X 2 2 b b kzj − zh k whj J b (Vr ) = arg max kzi − zf k wif + 

 = tr 

c X nr X s=1 s6=r

i=1

j=1 h=1

i=1 f =1

s=1 s6=r

 H1 Mir , Ms +

= tr VrT Hbr Vr



ns X j=1



 H2 Mjs , Mr 

(18) The goal of (15) and (17) is to enhance the within-manifold compactness and the between-manifold separability of HSI, and Figure 3 shows how these intramanifold and intermanifold neighbors are used to maximize the manifold margins and separate different manifolds. Therefore, MLGPA should satisfy the following two optimization criteria:    arg min tr(Vr T Sw r Vr ) Vr (19)   arg max tr(Vr T Hbr Vr ) Vr

Then, (19) can be further reformulated as

 tr VrT Sw J w (Vr ) r Vr = min J (Vr ) = arg min b J (Vr ) tr (VrT Hbr Vr )

(20)

With the Lagrange multiplier method, the optimization problem of (20)

13

can be transformed into the following form: b Sw r Vr = λr Hr Vr 185

190

(21)

in which λr denotes the eigenvalue of (21). Denote the eigenvectors vr1 , vr2 , . . ., vrd corresponding to the first d eigenvalues of (21), then the projection matrix can be defined as Vr = [vr1 vr2 . . . vrd ]. With the same operations, we can obtain c projection matrix V1 , V2 , . . ., Vc , which corresponding to different sub-manifolds of HSI. Suppose the low dimensional features of X in each manifold can be denoted as Y1 = V1 X, Y2 = V2 X,. . .,Yc = Vc X, then a feature fusion method is used to combine the features for better classification, and the overall embedding features in low dimensional space can be represented as Y = [Y1 , Y2 , . . . , Yc ]. The detailed steps of the proposed MLGPA method are shown in Algorithm 1. Algorithm 1 MLGPA Input: HSI data set X = [x1 , x2 , x3 , . . . , xN ] and their class label li ∈ {1, 2, 3, . . . , c}, intramanifold neighbor number m, intermanifold neighbor number n, embedding dimension d (d  D). 1: for q = 1 and q 6= r to do 2: Find m intramanifold neighbors Cim (Mr (i)) and n intermanifold neighbors Cin (Mr (i)). 3: Calculate intramanifold weight and intermanifold weight by (7) and (8). 4: Compute intramanifold scatter matrix and intermanifold scatter matrix by (10) and (14). 5: Solve the generalized eigenvalue problem: b 6: Sw r Vr = λr Hr Vr 7: Obtain the projection matrix with the d smallest eigenvalues corresponding eigenvectors: 8: Vr = [vr1 vr2 . . . vrd ] ∈ RD×d 9: end for 10: Obtain the projection matrices of all the sub-manifolds as V1 , V2 , . . . , Vc . 11: Obtain the low dimensional features of X as Y1 , Y2 ,. . .,Yc 12: where Y1 = V1 X, Y2 = V2 X,. . .,Yc = Vc X Output: The overall embedding features is Y = [Y1 , Y2 , . . . , Yc ]

195

4. EXPERIMENTAL RESULTS AND ANALYSIS In this section, three public HSI data sets are adopted to demonstrate the effectiveness of the MLGPA algorithm by comparing it with some state-of-art DR algorithms.

200

4.1. Experiment Data Set Botswana data set: It was collected by by the NASA EO-1 satellite over the Okavango Delta, Botswana on May 31, 2001. The image contains 242 14

205

bands in the range of 400-2,500nm, and the spatial resolution is 30 m. Considering that 97 bands are suffered from serious water absorption, the remain 145 spectral bands are used for experiments. The size of the image is 1476 × 3256, and 377,856 pixels of them are marked with real labels. The data set contains 14 land-cover types, its false color scene and corresponding ground truth are shown in Figure 4.

(a)

Background

8

Island Interior(203)

1

Water(270)

9

Acacia Woodlands(314)

2

Hippo Grass(101) 10

Acacia Scrublands(248)

3

Floodplain 1(251) 11

Acacia Grasslands(305)

4

Floodplain 2(215) 12

Short Mopane(181)

5

Reeds 1(269)

13

Mixed Mopane(268)

6

Riperian(269)

14

Exposed Soils(95)

7

Fire trace 2(259)

(b)

Figure 4: Botswana hyperspectral image. (a) HSI in false-color; (b) Ground-truth map

210

215

Kennedy Space Center (KSC) data set: It was acquired by NASA AVIRIS instrument over Kennedy Space Center on March 23, 1996. The image has 224 bands in the range of 400-2,500 nm, and the spatial resolution is 18 m. Considering that 48 bands are easily absorbed by water absorption, the remain 176 spectral bands are adopted for experiments. The size of the image is 614 × 512, and 5211 pixels are marked with corresponding label. The data set contains 13 land-cover types, its false color scene and corresponding ground truth are shown in Figure 5.

!"

.!/0123456

-

D!26;336<=;!>? $,("

$

7/24# -)$"

*

E2!>95396<>!2=B '&$"

%

89::3;<=;!>? %'&" +

&

@!##!1A< %()" $,

@!FF!9:<>!2=B ','"

'

@!##!1A<3!0< %(%" $$

7!:F<>!2=B '$+"

(

7:!=B
)

C!0>3/0 %%+" $&



%$7?!2F95!<>!2=B (%,"

G46
#"

Figure 5: KSC hyperspectral image. (a) HSI in false-color; (b) Ground-truth map

15

2(>?71'36) &'()*$+,!-

*(-@

!

.(/01* %$!-

"

2345)467*#!++-

#

8070/(/4'6*# $ -

$

91(45* : !-

%

;<()'=*!":,-

*A-@@

Figure 6: Washington hyperspectral image. (a) HSI in false-color; (b) Ground-truth map

220

225

230

235

240

Washington DC Mall data set: The data set was collected by the airborne HYDICE from the mall in Washington DC. The full scene consists of 1208 × 307 pixels with a spatial resolution of 3 m, all pixels belong to 6 land-cover types. The data set contains 210 bands, which ranges from 0.4um to 2.4um. After removing 19 bands suffered from water absorption, we adopt the remaining 191 spectral bands for classification. Figure 6 shows its false color scene and corresponding ground truth. 4.2. Experimental setup In each experiment, we randomly divided the HSI data into training samples and test samples. Training samples were used to construct a DR model, while test samples were adopted to verify the effectiveness of the DR model. After that, we adopted the Overall classification accuracy (OA), average classification accuracy (AA) and kappa coefficient (κ) to evaluate the performance of different DR algorithms. To robustly evaluate the classification performance, we repeated the following experiments for 10 times and presented the average classification result in each condition. To evaluate the classification performance of the proposed MLGPA algorithm, some state-of-art DR methods were selected for comparison, such methods include PCA, LDA, NPE, LPP, MFA, MMC, SMML, MMDA, MLLGPE, and M-LPP. RAW indicates that the samples are classified directly without the process of DR. Among all DR algorithms, PCA, NPE, LPP, LDA, MMC, MFA are single manifold learning methods, while SMML, MMDA, MLLGPE, and M-LPP are feature learning methods based on multi-manifold structure. For SMML, it seeks an explicit mapping between high-dimensional space and low-dimensional space by maximizing the between-class Laplacian scatter matrix of samples. For MMDA, it designs a within-class graph and a between-class graph to characterize the within-class compactness and the between-class separability. For M-LLGPE, it learns the embedding features 16

245

250

255

260

265

270

275

of different manifolds by LLGPE, and then projects the data into different low-dimensional spaces. For M-LPP, it assumes that different classes may reside on different manifolds, and designs a generalized framework for modeling and classifying different classes on multiple manifolds. For the single manifold algorithms, which only has one project matrix, we use the nearest neighbor classifier (NN) for classification. For the multi-manifold algorithms, they have c different project matrixes, the NN classifier cannot be adopted directly for classification. Therefore, for M-LPP and M-LLGPE, we used Reconstruction Error-based Classifier (REC) to implement the classification process. As for SMML and MMDA, although they are called as multiple manifold methods, they only have one project matrix, so they were classified with NN. To achieve better classification results for various algorithms, we optimize the parameters of each algorithm. For NPE, LPP, SMML, MMDA and M-LPP, we set the neighbors number to 5. For MFA and LGSFA, the numbers of intraclass neighbor and interclass neighbor were set as k1 = 9 and k2 = 40, respectively. 4.3. Experiments on the Botswana Data Set In order to investigate the influence of different numbers of intramanifold and intermanifold neighbors on classification results, 10 samples were randomly selected from each kind of land-covers as training set, and the remaining samples were used as testing set. Parameters m and n were both selected with a set of {1, 2, 3, · · · , 9, 10}, and 10 times was repeated for the experiment to provide more confidence in the robustness of the results. Figure 7 shows the OAs versus different values of m and n. As can be seen from Figure 7, when m is no more than 3, the classification performance improves with the increase of the intramanifold neighbors number. And when m is larger than 3, the OAs reached and maintained a stable value. The reason is that few intramanifold neighbors are insufficient to reveal the multi-manifold structure of HSI. However, when m is too large, intramanifold information will be redundant for DR of HSI, which will limit further improvements for classification performance. Besides, the OAs will also vary with the number of intermanifold neighbors, and it will display a subtle change with a fixed m in terms of classification accuracy, because intermanifold neighbors are helpful to represent the dissimilarity relationship between different manifolds in HSI. so the number of them will affect the validity of between-manifold graph construction, and limit the improvement

17

285

of classification performance. Based on the above analysis, we set m and n to 3 and 4 in the following experiments. To analyze the influence of different embedding dimensions on each DR algorithm, 20 samples were randomly selected from each type of land covers in the data set for training, and the remaining samples were tested. Figure 8 shows the OAs of different DR algorithms under different embedding dimensions.

Figure 7: Classification results for Botswana data set with different m and n.

"&* "&*" "&) "&)" "&( 4,5 678, 697: 6;77 6;-, 6<<8 6<=, 6><<; 6<<-, 6
"&(" "&'

+,

280

"&'" "& "& " "&% "&%" "&$ "&$" !"

!

#"

-./012.31

#

$"

$

%"



Figure 8: Classification results with different dimensions on Botswana.

As can be seen from Figure 8, as the embedding dimension increases, the classification accuracy of each algorithm increases gradually. This is because 18

290

295

300

when the dimension of the algorithm increases, the feature information it contains will be abundant for classification. However, when the embedding dimension increases to a certain extent, the classification accuracy of most algorithms tends to be stable. The reason is that the feature information contained in the embedding space is close to saturation, so it will not lead to further improvement of the classification accuracy. Besides, compared with other algorithms, the classification accuracy of MLGPA algorithm is better than that of other algorithms, because this algorithm can better characterize the intrinsic multi-manifold structure of HSI and obtain more effective lowdimensional discriminant features. Therefore, the embedding dimensions of the algorithms are set to 30, and the embedding dimension of LDA algorithm is c -1, and c is the class number of Botswana data set. Table 2: Classification results with different numbers of training samples on the Botswana data set. (OA ± std(%)(Kappa)). algorithms

ni = 5 ni = 10 ni = 15 ni = 20 ni = 25 ni = 30 79.08±2.67 81.80±1.84 84.77±0.97 85.24±0.78 86.96±0.58 87.14±0.32 (0.773) (0.803) (0.835) (0.840) (0.859) (0.861) 79.05±2.66 81.70±1.82 84.73±1.01 85.20±0.73 86.88±0.53 86.99±0.29 PCA (0.773) (0.802) (0.835) (0.839) (0.858) (0.859) 44.34±2.06 52.41±3.00 74.92±2.49 81.07±1.24 84.13±1.11 85.21±0.89 NPE (0.398) (0.484) (0.728) (0.795) (0.828) (0.840) 41.22±2.86 51.47±2.10 69.57±1.33 80.26±0.83 84.27±1.33 86.49±0.90 LPP (0.365) (0.472) (0.670) (0.786) (0.829) (0.853) 51.78±1.28 54.97±1.99 72.52±2.38 83.87±1.56 87.26±0.85 90.19±1.00 LDA (0.478) (0.498) (0.702) (0.825) (0.862) (0.894) 78.50±2.78 81.04±2.18 83.31±1.42 84.06±0.74 85.27±0.69 85.49±0.39 MMC (0.767) (0.795) (0.819) (0.827) (0.840) (0.840) 80.94±2.81 84.41±1.70 86.77±0.91 87.09±1.42 88.58±0.66 89.01±0.33 MFA (0.793) (0.831) (0.857) (0.860) (0.876) (0.880) 78.79±2.59 81.58±2.02 84.41±1.01 85.24±0.76 86.71±0.57 86.90±0.33 SMML (0.770) (0.801) (0.831) (0.840) (0.856) (0.858) 87.36±2.21 88.60±1.19 90.69±1.19 90.80±0.87 91.10±0.56 91.22±0.61 MMDA (0.863) (0.877) (0.899) (0.900) (0.904) (0.905) 83.12±1.81 87.40±1.21 90.30±0.86 90.49±1.01 90.60±0.58 91.28±0.57 M-LLGPE (0.817) (0.863) (0.895) (0.896) (0.898) (0.905) 78.07±4.55 84.31±0.70 87.38±1.54 88.30±0.79 88.63±0.66 89.45±0.85 M-LPP (0.762) (0.830) (0.863) (0.873) (0.877) (0.885) 87.70±1.28 90.41±1.45 91.97±0.94 93.17±0.74 93.93±0.23 94.18±0.52 MLGPA (0.867) (0.896) (0.913) (0.926) (0.934) (0.937) RAW

305

To verify the classification performance of various algorithms for different training samples, ni (ni = 5, 10, 15, 20, 25, 30) samples were randomly selected from each type of land covers as training samples, and the rest were used as test samples. The average OAs with standard deviations (STD) of each DR method on Botswana data set was given in Table 2. 19

Table 3: Classification results of different DR methods on the Botswana data set.

Classes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 AA OA Kappa

310

315

320

325

RAW 100.00 95.60 92.12 90.73 68.34 54.44 93.17 93.78 75.66 75.63 80.00 87.72 77.52 100.00 85.82 83.87 0.825

PCA 100.00 96.70 94.12 89.22 82.42 52.34 93.90 95.34 77.18 69.07 86.55 94.74 67.45 100.00 85.65 83.67 0.823

NPE 99.22 87.91 47.06 52.94 53.13 38.28 78.05 72.54 51.01 36.02 52.76 66.67 43.92 84.71 61.73 58.80 0.554

LPP 98.83 63.74 47.06 52.94 51.17 31.64 86.18 60.62 42.28 38.14 54.14 47.95 27.45 69.41 55.11 53.85 0.500

LDA 93.75 92.31 52.10 68.63 42.19 41.41 79.27 67.88 49.33 29.24 53.45 39.18 25.49 49.41 55.97 54.41 0.506

MMC 100.00 96.70 94.54 89.22 81.64 50.00 93.90 89.64 70.81 58.05 87.93 95.91 62.75 100.00 83.65 81.43 0.799

MFA 100.00 95.60 95.80 90.20 82.42 51.17 94.31 96.37 77.18 66.53 86.90 93.57 67.84 100.00 85.56 83.64 0.823

SMML 100.00 96.70 95.38 90.20 82.81 52.73 93.90 95.85 77.85 65.68 86.90 95.32 67.06 100.00 85.74 83.77 0.824

MMDA M-LLGPE M-LPP MLGPA 100.00 100.00 100.00 100.00 98.90 98.90 83.52 98.90 97.06 94.54 76.47 99.58 97.06 95.10 95.10 98.53 82.81 84.77 74.22 86.72 66.80 75.00 62.11 71.88 93.50 95.53 94.72 96.75 89.64 94.82 92.75 98.96 81.54 92.28 70.47 90.27 80.08 71.19 61.44 94.92 88.28 91.38 78.97 87.59 86.55 97.66 78.95 84.80 69.80 78.04 54.90 81.57 100.00 100.00 98.82 100.00 88.00 90.66 80.17 92.17 86.50 89.46 78.44 91.19 0.854 0.886 0.767 0.905

It can be seen from Table 2 that as the number of training samples increase, the classification accuracy of each algorithm is significantly improved. This is because when the number of training samples increase, there will be more abundant priori information for feature learning. In addition, unsupervised single manifold algorithms, such as PCA, NPE, and LPP, have limited classification performance due to failure to utilize the category information in HSI data. While single manifold algorithms of LDA, MMC, and MFA simultaneously consider the spectral information and category information, and their classification performance have been improved greatly. Besides, most of the methods based on multi-manifold learning, such as M-LLGPE, M-LPP and MLGPA, have achieved higher classification accuracies than the above single manifold methods, especially the MLGPA method effectively enhances the intramanifold aggregation and the intermanifold separation by constructing within-manifold and between-manifold graphs, so as to extract more effective discriminant features, and achieve the best classification results under different training conditions. To compare the classification effect of the algorithms on each type of land covers, 5% samples were randomly selected as the training set, and the rest were used as test set. Table 3 shows the OAs, AAs and kappas of each type of land covers for Botswana data set, and Figure 9 are the corresponding classification maps for different DR algorithms. As can be seen in Table 3, compared with other algorithms, MLGPA can achieve better classification results in most types of land covers, and its OAs, 20

330

AAs and kappas are superior to other algorithms. In addition, the classification map obtained by the MLGPA algorithm is smoother, especially in the classes of ”Floodplain 1”, ”Reeds 1” and ”Acacia Scrublands”, which indicates that MLGPA can effectively reveal the intrinsic multi-manifold structure of HSI, and better extract the discriminant features.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 9: Classification maps of different methods on the Bostana data set. (a) RAW; (b) PCA; (c) NPE; (d) LPP; (e) LDA; (f) MMC; (g) MFA; (h) SMML; (i) MMDA; (j) M-LLGPE; (k) M-LPP; (l) MLGPA.

21

335

340

345

4.4. Experiments on the KSC Data Set To further evaluate the performance of MLGPA with different HSI data set, we continue the classification experiments on KSC data set. All samples were randomly divided into training and test set, training set were used to construct a corresponding DR model, and test set were used to verify the effectiveness of the model. For the reliability of the experimental results, we repeated the experiments for 10 times in each case. In order to demonstrate the effect of intramanifold and intermanifold neighbors number on the classification performance of MLGPA, we randomly selected 10 samples from each class for training, and the remaining samples were used for testing. Figure 10 shows classification results for KSC data set with different m and n.

Figure 10: Classification results for KSC data set with different m and n.

350

355

As shown in Figure 10, with the increase of m, the classification accuracy increases first and then reaches a stable value. The reason is that when the number of intramanifold neighbors is insufficient, the intrinsic information in the manifold that it can represent will also be limited. Therefore, as the number of intramanifold neighbors increases, the OAs will be improved accordingly. However, when the number of intramanifold neighbors reaches a critical point, the neighbors are enough to effectively represent the corresponding intramanifold structure, so the overall classification accuracy will not continue to increase significantly, and will reach a stable value. At the same time, the number of intermanifold neighbors will also affect the classification performance of MLGPA, and will make subtle changes in classification 22

accuracy. To achieve the best classification performance, we set m = 4 and n = 3 in the following experiments. "&* "&) "&( 4,5 678, 697: 6;77 6;-, 6<<8 6<=, 6><<; 6<<-, 6
+,

"&' "& "&% "&$ "&# !"

!

#"

-./012.31

#

$"

$

%"



Figure 11: Classification results with different dimensions on the KSC data set.

Table 4: Classification results with different numbers of training samples on the KSC data set. (OA ± std(%)(Kappa)). algorithms

ni = 5 ni = 10 ni = 15 ni = 20 ni = 25 ni = 30 68.53±2.32 73.52±1.72 76.15±1.29 78.14±1.09 79.42±0.79 80.36±1.00 RAW (0.651) (0.706) (0.735) (0.757) (0.771) (0.781) 68.49±2.33 73.49±1.73 76.11±1.30 78.11±1.09 79.36±0.79 80.32±1.00 PCA (0.651) (0.706) (0.735) (0.757) (0.770) (0.781) 54.28±3.52 54.60±2.89 62.48±3.56 71.17±1.56 75.21±1.61 77.63±1.74 NPE (0.495) (0.499) (0.586) (0.680) (0.724) (0.751) 48.62±3.17 52.96±2.90 57.56±4.03 66.40±1.95 72.73±1.72 76.82±1.63 LPP (0.433) (0.481) (0.532) (0.628) (0.697) (0.742) 54.91±3.76 62.93±2.72 70.73±2.12 79.58±1.48 84.07±0.85 86.69±0.83 LDA (0.502) (0.591) (0.676) (0.773) (0.823) (0.852) 67.30±2.37 72.21±1.91 74.44±1.43 76.19±1.33 77.72±1.08 78.77±1.14 MMC (0.638) (0.692) (0.716) (0.735) (0.752) (0.764) 68.50±2.33 73.46±1.72 76.06±1.29 78.05±1.08 79.31±0.80 80.29±1.00 MFA (0.651) (0.705) (0.734) (0.756) (0.770) (0.780) 68.38±2.34 73.25±1.74 75.89±1.31 77.84±1.11 79.06±0.77 80.10±0.99 SMML (0.650) (0.703) (0.732) (0.754) (0.767) (0.778) 70.05±2.35 73.99±1.66 76.01±1.60 76.57±1.79 77.52±2.04 78.01±1.94 MMDA (0.668) (0.711) (0.733) (0.740) (0.750) (0.755) 71.15±3.48 75.50±2.08 78.20±1.76 79.83±1.42 81.19±1.19 81.84±1.24 M-LLGPE (0.681) (0.728) (0.758) (0.776) (0.791) (0.798) 68.35±3.37 79.46±2.48 82.59±1.47 83.54±1.64 84.27±1.37 85.06±1.35 M-LPP (0.650) (0.772) (0.807) (0.817) (0.825) (0.834) 78.62±2.20 82.88±1.59 84.95±1.10 86.36±0.92 87.28±0.79 87.90±0.93 MLGPA (0.763) (0.810) (0.833) (0.848) (0.858) (0.865)

360

To explore the classification performance of different algorithms on different embedding dimensions, 20 samples were randomly selected from each 23

365

370

class of KSC dataset for experiment. Figure 11 shows the classification results under different embedding dimensions. In Figure 11, the OAs of each algorithm increases as the embedding dimension increases, and then remains stable. At the same time, MLGPA has better classification accuracy than other algorithms in most embedded dimensions, which indicates that the proposed MLGPA algorithm can effectively find multi-manifold structures in hyperspectral data and extract lowdimensional identification features. To achieve optimal classification performance for each algorithm, the embedding dimensions of the DR algorithms except for LDA are set to 30, and the dimension of LDA algorithm is c-1, and c is the number of land covers type of KSC data set. Table 5: Classification results of different DR methods on the KSC data set.

Classes 1 2 3 4 5 6 7 8 9 10 11 12 13 AA OA Kappa

375

380

RAW 87.14 71.43 63.79 49.37 49.01 28.90 71.58 67.97 94.53 73.18 93.97 73.22 99.21 71.02 78.82 0.764

PCA 87.28 71.00 63.79 48.95 49.01 29.36 71.58 67.97 94.53 73.18 93.97 72.80 99.21 70.97 78.78 0.763

NPE 74.83 65.80 42.80 32.22 43.05 28.90 62.11 52.08 88.46 67.97 88.44 71.34 99.43 62.88 71.62 0.684

LPP 77.59 53.25 56.79 24.69 31.79 16.06 44.21 45.97 74.09 61.72 86.18 62.13 98.75 56.41 66.89 0.631

LDA 89.76 87.01 60.08 50.21 51.66 40.37 61.05 78.24 87.25 95.57 89.95 91.00 99.89 75.54 83.56 0.817

MMC 85.20 66.67 61.32 47.70 47.68 33.03 70.53 63.33 92.11 73.18 93.97 69.87 99.21 69.52 77.29 0.747

MFA 87.14 68.83 63.37 51.46 45.70 29.36 71.58 67.97 94.53 73.44 93.47 72.80 99.43 70.70 78.68 0.762

SMML 87.00 70.56 64.61 50.63 48.34 28.44 70.53 67.24 94.33 72.14 93.72 71.34 99.09 70.61 78.42 0.759

MMDA M-LLGPE M-LPP 85.62 76.21 68.88 74.89 80.95 87.01 63.79 62.14 67.90 47.70 45.61 41.00 52.32 77.48 82.78 32.11 34.40 30.73 74.74 75.79 90.53 56.48 68.70 87.53 87.65 88.87 95.34 72.92 87.50 96.35 94.47 94.97 97.99 69.25 76.15 74.90 99.09 99.43 99.55 70.08 74.48 78.50 76.96 79.61 82.20 0.743 0.774 0.803

MLGPA 91.98 86.58 84.36 64.02 65.56 35.32 92.63 86.80 98.99 91.93 95.98 85.56 99.55 83.02 88.03 0.867

To verify the classification performance of various algorithms in different training samples, ni (ni = 5, 10, 15, 20, 25, 30) samples were randomly selected from each kind of land covers in the KSC data set as training samples, and the rest were used as test samples. Table 4 shows the overall classification accuracy and its standard deviation with different numbers of training samples on the KSC data set. As seen from Table 4, the OAs of each algorithm increases with the increase of the number of training samples. For single manifold methods, the classification performances of the supervised manifold learning methods are better than that of the unsupervised learning methods. In addition, under different experimental conditions, the classification effect of MLGPA is superior to other algorithms, especially when there are few training samples available. The reason is that MLGPA divides the high-dimensional data into 24

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 12: Classification maps of different methods on the KSC data set. (a) RAW; (b) PCA; (c) NPE; (d) LPP; (e) LDA; (f) MMC; (g) MFA; (h) SMML; (i) MMDA; (j) M-LLGPE; (k) M-LPP; (l) MLGPA.

25

385

390

395

400

405

410

415

420

several submanifolds according to the multi-manifold structure of HSI, and then reveals the intrinsic feature representation of HSI by learning separately for each sub-manifold. Therefore, the data in each manifold is gathered as much as possible, and the data between the manifolds is diverged as far as well, which effectively improves the classification performance and is more conducive to practical applications. To further demonstrate the classification performance of the MLGPA algorithm for each class, we randomly selected 5% samples from the KSC data set as training sample, and the rest were tested samples. Table 5 shows the OAs, AAs and KCs of each type of land covers in the KSC data set. Figure 12 is the corresponding classification maps for different algorithms. It can be seen from Table 5 that, compared with other algorithms, MLGPA can obtain the best classification result in most land covers types, and the OA, AA and kappa coefficient are the highest among all algorithms, which indicates that MLGPA can fully reveal the intrinsic properties of hyperspectral data, and is more conducive for HSI classification. In addition, in Figure 13, the classification map obtained by the MLGPA algorithm is smoother, and the types of objects such as ”Cabbage palm”, ”Cabbage oak” and ”Hardwood swamp” are more obvious. 4.5. Experiments on the Washington DC Mall Data Set To verify the reliability of the algorithm of MLGPA, we continue to conduct some experiments with the Washington DC Mall dataset. 10 samples were randomly obtained from each class of Washington DC data set for training, and the remaining were used for testing. Figure 13 is the classification results for Washington DC data set with different m and n. According to Figure 13, with the increase of the number of intramanifold neighbors, the classification accuracy tends to increase gradually and then maintain a stable value. It is for the reason that few intramanifold neighbors are not enough to reveal the intrinsic structure inside the manifold, and only a sufficient number of intramanifold neighbors can effectively represent the discriminative information of within-manifolds. Besides, the classification performance of MLGPA will also be affected by the number of intermanifold neighbors. As we can see from Figure 14, when m takes the optimal value and remains fixed, the increase of number of intramanifold neighbors will lead to the improvement in terms of classification accuracy. Therefore, to obtain the best classification result, we set m = 4 and n = 3 in the following experiments. 26

Figure 13: Classification results for Washington DC Mall data set with different m and n.

"&* "&*" "&) "&)" "&( "&(" 4,5 678, 697: 6;77 6;-, 6<<8 6<=, 6><<; 6<<-, 6
+,

"&' "&'" "& "& " "&% "&%" "&$ "&$" "&# "&#" !"

!

#"

-./012.31

#

$"

$

%"



Figure 14: Classification results with different dimensions on the Washington DC Mall data set.

425

430

To explore the influence of different dimensions on classification performance, we randomly chose 20 samples from each type of land covers as training samples, and the remaining were used for test samples. Figure 14 shows the classification results with different dimensions on the Washington DC data set. As we can see, the OAs of most algorithms improved with the increase of the embedding dimension and then kept a stable value. Therefore, to get a stable value for all the algorithms as far as possible, we reduce the dimension of the algorithms except LDA to 30, and the dimension of LDA algorithm is c -1, and c is the number of land covers type contained in Washington DC data set. 27

435

To compare the classification performance of each DR algorithm under different training samples, we randomly selected ni (ni = 5, 10, 15, 20, 25, 30) samples from the Washington DC data set for training, and the rest samples were used for testing. The average OAs with std are given in Table 6. Table 6: Classification results with different numbers of training samples on the Washington DC Mall data set. (OA ± std(%)(Kappa)). algorithms

ni = 5 ni = 10 ni = 15 ni = 20 ni = 25 ni = 30 82.09±3.48 82.85±2.84 84.31±2.61 84.97±1.97 86.07±1.53 86.58±1.48 RAW (0.778) (0.787) (0.805) (0.813) (0.827) (0.833) 82.09±3.48 82.85±2.84 84.30±2.62 84.96±1.98 86.06±1.53 86.56±1.48 PCA (0.778) (0.787) (0.805) (0.813) (0.827) (0.833) 61.81±5.29 72.40±4.07 72.72±4.61 73.84±3.86 78.79±4.39 80.38±3.40 NPE (0.534) (0.661) (0.665) (0.679) (0.738) (0.757) 62.33±5.32 72.87±5.08 72.89±4.28 73.21±4.27 77.60±4.58 78.04±4.28 LPP (0.537) (0.666) (0.666) (0.670) (0.723) (0.729) 64.07±5.19 76.11±3.86 76.20±4.06 76.21±4.17 79.92±3.88 80.41±3.63 LDA (0.561) (0.705) (0.706) (0.706) (0.751) (0.757) 81.73±3.38 82.38±2.83 83.66±2.90 84.65±2.00 85.76±1.56 86.23±1.54 MMC (0.773) (0.782) (0.798) (0.809) (0.823) (0.829) 82.48±3.49 83.82±2.61 85.96±2.27 86.29±2.09 87.48±1.47 87.94±1.45 MFA (0.783) (0.799) (0.825) (0.830) (0.844) (0.850) 81.91±3.46 82.67±2.87 83.91±2.73 84.65±1.98 85.64±1.52 86.13±1.55 SMML (0.776) (0.785) (0.801) (0.809) (0.821) (0.827) 81.05±2.23 87.17±2.66 88.23±1.79 88.71±1.93 89.46±1.24 90.17±1.27 MMDA (0.753) (0.843) (0.852) (0.861) (0.868) (0.877) 81.29±5.97 84.95±4.07 87.99±3.72 88.02±3.02 88.97±2.95 90.21±2.37 M-LLGPE (0.768) (0.814) (0.851) (0.851) (0.863) (0.878) 77.34±7.68 84.40±3.54 87.86±2.74 88.66±2.25 89.33±2.44 90.32±1.83 M-LPP (0.721) (0.807) (0.849) (0.859) (0.867) (0.879) 84.80±3.39 87.32±2.72 90.19±1.65 91.07±1.26 92.12±1.52 92.92±1.13 MLGPA (0.811) (0.842) (0.878) (0.888) (0.902) (0.911)

Table 7: Classification results of different DR methods on the Washington DC Mall data set. Class 1 2 3 4 5 6 AA OA Kappa

RAW 92.40 96.39 80.63 97.76 63.27 90.40 86.81 88.46 0.856

PCA 92.33 96.39 80.61 97.76 63.32 90.49 86.82 88.45 0.855

NPE 64.11 74.37 52.79 81.38 35.06 61.48 61.53 63.18 0.542

LPP 79.58 72.72 40.88 76.39 26.81 54.08 58.41 62.31 0.524

LDA 50.24 77.80 59.78 74.18 54.40 61.06 62.91 61.15 0.527

MMC 92.05 96.39 79.81 97.76 61.87 90.44 86.39 88.06 0.851

MFA 94.54 96.76 85.10 97.83 69.62 90.36 89.04 90.64 0.883

SMML 91.42 96.39 79.31 97.74 61.32 90.49 86.11 87.73 0.846

MMDA M-LLGPE M-LPP 90.72 83.94 93.49 97.19 96.45 97.68 87.98 97.07 94.73 94.45 97.61 98.78 83.84 69.79 65.50 90.06 88.96 69.09 90.71 88.97 86.55 90.74 89.91 89.76 0.884 0.874 0.871

MLGPA 96.91 97.19 88.55 97.93 86.85 89.98 92.90 93.63 0.920

It can be seen from Table 6 that as the number of training samples increases, the classification accuracy of various algorithms increases. It is because more training samples contain more priori information, which is more 28

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Figure 15: Classification maps of different methods on the Washington DC data set. (a) RAW; (b) PCA; (c) NPE; (d) LPP; (e) LDA; (f) MMC; (g) MFA; (h) SMML; (i) MMDA; (j) M-LLGPE; (k) M-LPP; (l) MLGPA.

29

440

445

450

455

460

465

470

conducive to feature extraction. In the DR method based on single manifold methods, the classification accuracy of LDA, MMC, and MFA are significantly better than that of PCA, NPE, LPP. The reason is that the supervised algorithms adopted the prior lable information of the training samples for classification, so they can obtain more discriminant feature representations and improve the overall classification performance. Besides, multi-manifold methods including MLLGPE, MLPP and MLGPA have higher classification accuracy than single manifold algorithms in each case, this is because multimanifold algorithms are more helpful to reveal the intrinsic multi-manifold structure in HSI. At the same time, compared with other multi-manifold algorithms, MLPGA achieved the highest classification accuracy, it is for the reason that MLGPA maximize the manifold margins and separate different manifolds by intramanifold and intermanifold graphs, which can more effectively reflect the differences between different types of land covers. To analyze the classification performance of each algorithm on different ground objects, 1% of the samples were randomly selected for training, and the remaining samples were for testing. The OAs, AAs and Kappas of each type of land covers in the Washington DC data set are shown in Table 7. Figure 15 is a corresponding classification map. After a comprehensive comparison of Table 7 and Figure 15, we can see that MLGPA achieves the highest classification accuracy in all cases, and is more suitable for revealing structures embedded in hyperspectral images. 4.6. Computational Complexity To analyze the computational complexity of MLGPA, denote the number of intramanifold neighbors and intermanifold neighbors as m and n, respectively, nr is the number of samples of r -th class. For sub-manifold Mr . The b intramanifold weight matrix Ww r and the intermanifold weight matrix Wr take O(nr m) and O(nr n), respectively. The intramanifold scatter matrix Sw r and the intermanifold scatter matrix Hbr are calculated with O(nr m2 ) and O(n2 (nr + ns )). It takes O(D3 ) to solve the generalized eigenvalue problem of (21). Therefore, the final computational complexity of MLGPA is O(D3 + n2 nr + n2 ns + nr m2 ), and it mainly depends on the size of the band number, neighbor number, and the samples size of each class.

30

Table 8: Computational time (in seconds) of different algorithms on Botswana, KSC and Washington DC Mall data sets. Data

475

480

485

490

495

LDA

MMC

MFA

Botswana

RAW

0.032 0.031 0.046 0.037 0.033

PCA

NPE

LPP

0.151

0.082

SMML MMDA M-LLGPE M-LPP MLGPA 0.050

0.252

5.362

5.243

3.586

KSC

0.086 0.075 0.094 0.079 0.077

0.409

0.182

0.099

0.771

9.016

11.487

5.012

Washington DC Mall 0.252 0.221 0.242 0.223 0.212

0.691

0.288

0.235

1.785

22.309

22.311

5.368

To quantitatively compare the complexity of each algorithm, we show the computational time of each algorithm in Table 8. All of the results were obtained on a personal computer, which has a i3-7100 CPU and 12-G memory. Besides, the version of the Windows system and MATLAB are 64-bit Windows 10 and 2017a, respectively. As shown in Table 8, the proposed MLGPA method is faster than M-LLGPE and M-LPP, but is slower than the other DR algorithms. However, the slight increase in computational complexity is acceptable relative to the improvement in classification performance. 5. Conclusion High-dimensional data can be divided into different subsets, each subset is located on a particular sub-manifold. In this paper, we propose a multimanifold locality graph preserving analysis (MLGPA) algorithm for HSI classification. The proposed MLGPA method divides the HSI data into different sub-manifolds based on the label information, then a within-manifold graph and a between-manifold graph are constructed for each sub-manifold to characterize the within-manifold compactness and the between-manifold separability. Finally, each sub-manifold can seek for a discriminant matrix by maximizing the between-manifold scatter and minimizing the within-manifold scatter simultaneously, and the embedding features of different sub-manifolds are fused to enhance the classification performance. Experimental results on the Botswana, KSC and Washington DC hyperspectral datasets show that the proposed MLPGA algorithm can effectively characterize the intrinsic multi-manifold structure in HSI, it achieve higher classification accuracy than other single manifold and multi-manifold algorithms. Acknowledgments The authors would like to thank the anonymous reviewers for their comments on this paper. This work was supported by the Basic and Frontier Research Programmes of Chongqing under Grant cstc2018jcyjAX0093, 31

500

the Chongqing University Postgraduates Innovation Project under Grants CYB18048 and CYS18035, the National Science Foundation of China under Grant 41371338. References

505

[1] T. Lu , S. Li, L. Fang, X. jia, J. Benediktsson, From subpixel to superpixel: A novel fusion framework for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 55(8) (2017) 4398–4411. [2] J. Zhao, Y. Zhong, T. Jia, X. Wang, Y. Xu, H. Shu, L. Zhang, Spectralspatial classification of hyperspectral imagery with cooperative game, ISPRS-J. Photogramm. Remote Sens. 135 (2018) 31–42.

510

515

[3] K. Zhu, Y. Chen, P. Ghamisi, X. Jia, J. Benediktsson, Deep Convolutional Capsule Network for Hyperspectral Image Spectral and SpectralSpatial Classification, Remote Sens. 11(3) (2019) 223. [4] A. Zare, C. Jiao, T. Glenn, Discriminative multiple instance hyperspectral target characterization, IEEE Trans. Pattern Anal. Mach. Intell. 40(10) (2017) 2342–2354. [5] W. Sun, L. Zhang, B. Du, W. Li, Y. Lai, Band selection using improved sparse subspace clustering for hyperspectral imagery classification, IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 8(6) (2015) 2784–2797.

520

[6] Y. Zhang, X. Kang, S. Li, P. Duan, J. Benediktsson, Feature extraction from hyperspectral images using learned edge structures, Remote Sens. Lett. 10(3) (2019) 244–253. [7] X. Wang, Y. Zhong, Y. Xu, L. Zhang, Y. Xu, Saliency-based endmember detection for hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 56(7) (2018) 3667–3680.

525

530

[8] Y. Yuan, J. Lin, Q. Wang, Hyperspectral image classification via multitask joint sparse representation and stepwise MRF optimization, IEEE T. Cybern. 46(12) (2015) 2966–2977. [9] X. Zhang, Q. Song, Z. Gao, Y. Zheng, P. Weng, L. Jiao, Spectral–spatial feature learning using cluster-based group sparse coding for hyperspectral image classification, IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 9(9) (2016) 4142–4159. 32

[10] M. Ye, Y. Qian, J. Zhou, Y. Tang, Dictionary learning-based featurelevel domain adaptation for cross-scene hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 55(3) (2017) 1544–1562. 535

540

[11] F. Luo, B. Du, L. Zhang, L. Zhang, D. Tao, Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image, IEEE T. Cybern. 49(7) (2019) 2406–2419. [12] R. Hang, Q. Liu, Dimensionality reduction of hyperspectral image using spatial regularized local graph discriminant embedding, IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 11(9) (2018) 3262–3271. [13] L. Zhang, Q. Zhang, B. Du, X. Huang, Y. Tang, D. Tao, Simultaneous spectral-spatial feature selection and extraction for hyperspectral images, IEEE T. Cybern. 48(1) (2016) 16–28.

545

[14] Z. Xue, P. Du, J. Li, H. Su, Sparse graph regularization for hyperspectral remote sensing image classification, IEEE Trans. Geosci. Remote Sens. 55(4) (2017) 2351–2366. [15] H. Huang, G. Shi, H. He, Y. Duan, F. Luo, Dimensionality Reduction of Hyperspectral Imagery Based on Spatial-Spectral Manifold Learning, IEEE T. Cybern. (2019) 1–13.

550

555

[16] H. Pu, Z. Chen, B. Wang, G. Jiang, A novel spatial–spectral similarity measure for dimensionality reduction and classification of hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 52(11) (2014) 7008–7022. [17] W. Li, Q. Du, Laplacian regularized collaborative graph for discriminant analysis of hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 54(12) (2016) 7066–7076. [18] F. Luo, L. Zhang, X. Zhou, T. Guo, Y. Cheng, T. Yin, Sparse-Adaptive Hypergraph Discriminant Analysis for Hyperspectral Image Classification, IEEE Geosci. Remote Sens. Lett. 99 (2019) 1–5.

560

[19] L. Zhang, C. Zhao, Sparsity divergence index based on locally linear embedding for hyperspectral anomaly detection, J. Appl. Remote Sens. 10(2) (2016) 025026.

33

[20] W. Li, L. Zhang, L. Zhang, B. Du, GPU parallel implementation of isometric mapping for hyperspectral classification, IEEE Geosci. Remote Sens. Lett. 14(9) (2017) 1532–1536. 565

570

[21] W. Sun, G. Yang, B. Du, L. Zhang, L. Zhang, A sparse and low-rank near-isometric linear embedding method for feature extraction in hyperspectral imagery classification, IEEE Trans. Geosci. Remote Sens. 55(7) (2017) 4032–4046. [22] J. Wang, X. Sun, J. Du, Local tangent space alignment via nuclear norm regularization for incomplete data, Neurocomputing 273 (2018) 141–151. [23] M. Li, X. Luo, J. Yang, Y. Sun, Applying a locally linear embedding algorithm for feature extraction and visualization of MI-EEG, J. Sens. (2016) 1–9.

575

[24] P. Huang, C. Chen, Z. Tang, Z. Yang, Feature extraction using local structure preserving discriminant analysis, Neurocomputing 140 (2014) 104–113. [25] S. Tu, J. Chen, W. Yang, H. Sun, Laplacian eigenmaps-based polarimetric dimensionality reduction for SAR image classification, IEEE Trans. Geosci. Remote Sens. 50(1) (2011) 170–179.

580

585

[26] P. Zhang, H. Qiao, B. Zhang, An improved local tangent space alignment method for manifold learning, Pattern Recognit. Lett. 32(2) (2011) 181– 189. [27] Y. Deng, H. Li, L. Pan, L. Shao, Q. Du, W. Emery, Modified tensor locality preserving projection for dimensionality reduction of hyperspectral images, IEEE Geosci. Remote Sens. Lett. 15(2) (2018) 277–281. [28] J. Gui, Z. Sun, W. Jia, R. Hu, Y. Lei, S. Ji, Discriminant sparse neighborhood preserving embedding for face recognition, Pattern Recognit. 45(8) (2012) 2884–2893.

590

[29] Y. Lu, Z. Lai, Z. Fan, J. Cui, Q. Zhu, Manifold discriminant regression learning for image classification, Neurocomputing 166 (2015) 475–486. [30] F. Hu, G. Xia, Z. Wang, X. Huang, L. Zhang,H. Sun, Unsupervised feature learning via spectral clustering of multidimensional patches for 34

remotely sensed scene classification, IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 8(5) (2015) 2015–2030. 595

600

[31] H. Huang, Y. Huang, Improved discriminant sparsity neighborhood preserving embedding for hyperspectral image classification, Neurocomputing 136 (2014) 224–234. [32] W. Yang, C. Sun, W. Zheng, A regularized least square based discriminative projections for feature extraction, Neurocomputing 175 (2016) 198–205. [33] J. Li, Y. Wu, J. Zhao, K. Lu, Multi-manifold Sparse Graph Embedding for Multi-modal Image Classification, Neurocomputing 173 (2016) 501– 510.

605

[34] F. Luo, H. Huang, Y. Duan, J. Liu, Y. Liao, Local geometric structure feature for dimensionality reduction of hyperspectral imagery, Neurocomputing 9(8) (2017) 790. [35] Z. Zhang, T. Chow, M. Zhao, M-Isomap: Orthogonal constrained marginal isomap for nonlinear dimensionality reduction, IEEE T. Cybern. 43(1) (2012) 180–191.

610

615

[36] Y. Zhang, Z. Zhang, J. Qin, L. Zhang, B. Li, F. Li, Semi-supervised local multi-manifold Isomap by linear embedding for feature extraction, Pattern Recognit. 76 (2018) 662–678. [37] P. Zhang, X. You, W. Ou, C. Chen, Y. Cheung, Sparse discriminative multi-manifold embedding for one-sample face identification, Pattern Recognit. 52 (2016) 249–259. [38] R. Xiao, Q. Zhao, D. Zhang, P. Shi, Facial expression recognition on multiple manifolds, Pattern Recognit. 44(1) (2011) 107–116.

620

[39] L. Wang, H. Hong, H. Feng, Multi-linear Local and Global Preserving Embedding and its Application in Hyperspectral Remote Sensing Image Classification, Journal of Computer-Aided Design & Computer Graphics 24(6) (2012) 780-786. [40] H. Huang, Z. Li, Y. Pan, Multi-Feature Manifold Discriminant Analysis for Hyperspectral Imge Classification, Remote Sens. 11(6) (2019) 651. 35

625

[41] J. An, X. Zhang, H. Zhou, L. Jiao, Tensor-based Low Rank Graph with Multi-manifold Regularization for Dimensionality Reduction of Hyperspectral Images, IEEE Trans. Geosci. Remote Sens. 56(8) (2018) 4731– 4746. [42] L. Shi, J. Hao, Z. Xin, Image recognition method based on supervised multi-manifold learning, J. Intell. Fuzzy Syst. 32(3) (2017) 2221-2232. Guangyao Shi received the M.S. degree in instrument science and technology from Chongqing University, Chongqing, China, in 2017. Currently he is pursuing the Ph.D.degree in instrument science and technology with Chongqing University. His research interests include image processing, hyperspectral image classification, machine vision, and target tracking in general. E-mail: [email protected]. Hong Huang has received M. S. degree of Pattern Recognition in 2005 and Ph. D. degree of Instrument Science and Technology in 2008 from Chongqing University, China. Currently he is a professor at Chongqing University. His research interests are pattern analysis, manifold learning, hyperspectral remote sensing, image acquisition and processing in general. E-mail: [email protected]. Zhengying Li received the B. S. degree in measurement and control technology and instrument from North University of China, Taiyuan, China, in 2016. Currently he is pursuing a Ph.D degree in Instrument Science and Technology at Chongqing University. His research interesting are machine learning, image processing and remote sensing classification in general. E-mail: zhengying [email protected].

36

Yule Duan received the B. S. degree in Measuring and Testing Technology and Instruments from Tianjin Polytechnic University, Tianjin, China, in 2016. Currently he is pursuing a Ph.D degree in Instrument Science and Technology at Chongqing University. His research interests are pattern recognition, machine learning and image processing in general. E-mail: [email protected].

37

630

Declaration of Interest Statement We choose within-manifold and between-manifold neighbors to construct a withinmanifold graph and a between-manifold graph for each sub-manifold, each submanifold can seek for a discriminant projection matrix by maximizing the betweenmanifold scatter and minimizing the within-manifold scatter simultaneously. Conflict of interest The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

38