Hyperspectral image classification using multi-feature fusion

Optics and Laser Technology xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Optics and Laser Technology journal homepage: www.elsevier...

Download PDF

2MB Sizes 1 Downloads 186 Views

Report

PDF Reader
Full Text

Optics and Laser Technology xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Optics and Laser Technology journal homepage: www.elsevier.com/locate/optlastec

Full length article

Hyperspectral image classiﬁcation using multi-feature fusion☆ ⁎

Fang Lia,b, Jie Wanga, Rushi Lana, , Zhenbing Liua, Xiaonan Luoa a Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Image and Graphics, Guilin University of Electronic Technology, Guilin 541004, China b Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

H I GH L IG H T S

fusion framework is proposed. • AA multi-feature feature representation is developed for HSI using the proposed framework. • State-of-the-art results have been achieved by the proposed method. •

A R T I C LE I N FO

A B S T R A C T

Keywords: Hyperspectral image classiﬁcation Spectral-spatial feature learning Local binary pattern Feature fusion Kernel extreme learning machine

Traditional hyperspectral image (HSI) classiﬁcation methods typically use the spectral features and do not make full use of the spatial or other features of the HSI. To address this problem, this paper proposes a novel HSI classiﬁcation method based on a multi-feature fusion strategy. The spectral-spatial features are ﬁrst extracted by spectral-spatial feature learning (SSFL), which is a deep hierarchical architecture. Additionally, the texture features of the local binary pattern (LBP) image are applied and fused with the spectral-spatial features. Then, the kernel extreme learning machine (KELM) is used to classify the hyperspectral images. The results of a number of experiments show that the proposed method eﬀectively improves the classiﬁcation accuracy of hyperspectral images.

1. Introduction Hyperspectral remote sensing technology was one of the major breakthroughs in human observation technology at the end of the 20th century and has been one of the research hotspots in recent years. Compared with traditional multispectral remote sensing images, hyperspectral images (HSI) provide a large amount of information and high spectral resolution and can more accurately describe and analyze the spectral information of topographic cover types. The classiﬁcation of hyperspectral images (HSI) has been widely used in the ﬁelds of surveying, archeology, ﬁne agriculture, biomedicine, environmental and disaster monitoring, and food detection. To solve the problem of the classiﬁcation of hyperspectral images, a large number of methods have been proposed. In view of the high dimensionality of hyperspectral images, dimension reduction is an effective method to improve the accuracy of spectral image classiﬁcation.

Linear discriminant analysis [1,2] and principal component analysis [3,4] are eﬀective methods for dimension reduction. Hyperspectral image classiﬁcation can be divided mainly into supervised and unsupervised methods [5–8]. Support vector machine (SVM) has been widely used in hyperspectral image classiﬁcation [8]. However, choosing an appropriate kernel function and parameter selection are diﬃcult for SVM. Recently, kernel-based extreme learning machine (KELM) has been used successfully for HIS classiﬁcation [9]. Compared with SVM, KELM is computationally eﬃcient and has good classiﬁcation performance [10,11]. Hyperspectral remote sensing image data are three-dimensional and contain a wealth of spectral and spatial information. Spectral-space classiﬁcation technology has received considerable attention to solve this problem [12–15]. Relevant experts in the early years focus only on the use of spectral characteristics, and the classiﬁcation results are not satisfactory. Recent studies have shown that full use of spatial features

☆ This work was partially supported by the National Natural Science Foundation of China (Nos. 61702129, 61772149, U1701267, and 61320106008), Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Image and Graphics (No. GIIP201606), and Guangxi Key Laboratory of Trusted Software (No. kx201628). ⁎ Corresponding author. E-mail address: [email protected] (R. Lan).

https://doi.org/10.1016/j.optlastec.2018.08.044 Received 29 June 2018; Received in revised form 11 August 2018; Accepted 23 August 2018 0030-3992/ © 2018 Published by Elsevier Ltd.

Please cite this article as: Li, F., Optics and Laser Technology, https://doi.org/10.1016/j.optlastec.2018.08.044

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

Fig. 1. Extraction of LBP features from hyperspectral images.

Fig. 2. Flowchart of the proposed classiﬁcation method.

preserving ﬁltering, which combines spectral spatial features to classify hyperspectral images. Li et al. [7] proposed the method of multiple feature learning for hyperspectral image classiﬁcation. Recently, Zhou et al. [22] proposed a spectral-spatial network, which is a deep hierarchical model, for HIS classiﬁcation. Based on the above discussions, this paper proposes a hyperspectral image classiﬁcation algorithm based on multi-feature fusion. The spectral-spatial features and texture features of hyperspectral images are eﬀectively combined. First, the spectral-spatial features and texture features are extracted from hyperspectral images respectively. Next, these features are fused, and the fused features are used as the input of the kernel-based extreme learning machine (KELM). Then, hyperspectral image classiﬁcation is performed. Several experimental results have demonstrated the eﬀectiveness of the proposed method.

can eﬀectively improve the classiﬁcation results. Zhou et al. [16] proposed a spatial and spectral regularized local discriminant embedding method for DR of hyperspectral data. Sun et al. [17] developed a new approach for hyperspectral image classiﬁcation exploiting spectralspatial information. Xia et al. [18] proposed new spectral-spatial classiﬁcation strategy to enhance the classiﬁcation performances obtained on hyperspectral images by integrating rotation forests and Markov random ﬁelds (MRFs). Li et al. [19] proposed new spectral-spatial classiﬁcation strategy to enhance the classiﬁcation performances obtained on hyperspectral images by integrating rotation forests and Markov random ﬁelds (MRFs). Fang et al. [20] proposed a new framework for multiple feature learning which is based on the integration of diﬀerent types of (linear and nonlinear) features. Kang et al. [21] proposed spectral-spatial hyperspectral image classiﬁcation with edge2

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

Table 1 Numbers of samples in each ground-truth class in the Indian Pines dataset. Class No

Class Name

Samples Subtotal

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

alfalfa corn-notill corn-mintill corn grass-pastrue grass-trees grass-pasture-mowed hay-windrowed oats soybean-notill soybean-mintill soybean-clean wheat woods building-grass-trees-drives stone-steel-towers

54 1434 834 234 497 747 26 489 20 968 2468 614 212 1294 380 95

Total

10,366

Table 2 Numbers of samples in each ground-truth class in the University of Pavia dataset. Class No

Class Name

Samples Subtotal

Fig. 3. OAs of diﬀerent methods with diﬀerent numbers of training samples on the Indian Pines dataset.

1 2 3 4 5 6 7 8 9

alfalfa meadows gravel trees painted metal sheets bare soil bitumen self-blocking bricks shadows

6852 18,686 2207 3436 1378 2104 1356 3878 1026

The rest of this paper is organized as follows. Section 2 brieﬂy reviews some related works. In Section 3, we present the proposed method in detail. In Section 4, the experimental results are provided to evaluate the proposed method. Finally, Section 5 presents the conclusions.

43,923

2. Related works

Total

In this paper, the spatial-spectral features are extracted by spectralspace feature learning (SSFL). The proposed method also includes a deep hierarchical architecture. Deep learning is a complex machine learning algorithm that achieves much better results in language and image recognition than do previous techniques. Deep learning has made several achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, voice, recommendation and personalization, and other related ﬁelds [23–27]. Deep learning makes machines mimic human activities, such as audio-visual and thinking, solves many complex pattern recognition problems, and has made great progress in artiﬁcial intelligence related technology. Deep learning requires automatic learning of features. Suppose we have several cases of input I (such as several images or text). Assuming that we have designed a system S with n layers such that its output, which is obtained by adjusting the system parameters, is still input I. Then, we can automatically obtain a series of hierarchical features of input I, namely S1, …, Sn . For deep

Table 3 The classiﬁcation results with diﬀerent weights. Datasets

Weight (α, β )

(0.9, 0.1) (0.8, 0.2) (0.7, 0.3) (0.6, 0.4) (0.5, 0.5) (0.4, 0.6) (0.3, 0.7) (0.2, 0.8) (0.1, 0.9)

Indian Pines

Pavia

0.8533 0.8899 0.8660 0.8471 0.8361 0.8275 0.8260 0.8238 0.8242

0.9865 0.9798 0.9686 0.9573 0.9447 0.9377 0.9327 0.9278 0.9255

Table 4 OA of diﬀerent methods on the Indian Pines dataset. Method KELM LBP-ELM EPF MPM-LBP SADL SSN

SST1 SST2

1(%) 58.52 80.26 71.75 77.16 78.95 84.70

± ± ± ± ± ±

2(%) 3.30 2.35 2.58 3.46 1.09 2.28

85.84 ± 2.52 86.61 ± 1.00

66.08 87.92 76.34 84.29 88.19 91.33

± ± ± ± ± ±

3(%) 1.03 0.95 7.38 1.84 1.49 0.97

71.35 90.59 83.51 88.19 90.74 93.96

92.47 ± 0.57 93.65 ± 1.15

± ± ± ± ± ±

4(%) 1.15 0.78 2.67 1.34 0.97 1.33

94.98 ± 0.91 95.29 ± 0.82

The results of the proposed methods are highlighted in bold. 3

74.18 91.69 86.22 90.82 92.95 95.59

± ± ± ± ± ±

5(%) 0.87 0.68 1.61 1.47 0.66 0.61

96.01 ± 0.72 96.53 ± 0.57

75.87 92.25 88.46 91.60 94.47 97.02

± ± ± ± ± ±

0.89 0.39 1.16 0.98 1.12 0.36

97.06 ± 0.47 97.33 ± 0.42

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

Fig. 4. Classiﬁcation maps on Indian Pines data set using diﬀerent methods.

the output of the spectral feature, the spatial feature is exploited using adaptive weighted ﬁlters (AWFs). AWF is a block spatial ﬁlter. The features of the center pixel are determined by the weights of the neighbors. The adaptive weights can be deﬁned by

learning, the idea is to stack multiple layers, that is, the output of one layer is used as the input to the next layer. In this way, the input information can be hierarchically expressed. 3. Multi-feature fusion classiﬁcation algorithm

i, j Wspa =

3.1. Image preprocessing

Imn−Imin , Imax −Imin

3.3. Texture feature extraction of HSI

(1)

where Imn represents the pixels of the mth row and nth column within the image.

3.3.1. Texture feature extraction using local binary pattern Hyperspectral images have many bands that contain redundant information. We use PCA to perform dimensionality reduction on the hyperspectral data before feature extraction. Local binary pattern (LBP) is an operator used to describe the local texture features of an image [29]. LBP has the distinct advantages of being rotation-invariant and grayscale-invariant, and several algorithms have been proposed to improve the performance of LBP [30–33]. The basic principle of LBP is to compare the center pixel of the selected image with the pixel threshold in its local neighborhood to obtain a binary code that describes the texture features of the image. When the neighbor;s value is lesser than the threshold value, the pixel is given a value of “0”; otherwise, the value is “1”. The binary code is accumulated bit by bit in the clockwise direction, and the resulting decimal value is the LBP value of the center point. Then, the LBP can be described as follows:

3.2. Spatial-spectral feature extraction In this work, the spatial-spectral features are extracted by spectralspace feature learning (SSFL) [22]. SSFL includes spatial feature and texture feature extraction. The output of each layer is input as the back layer. We use multiple rounds of SSFL to form a deep structural model with multiple layers. The extracted spatial-spectral feature is the input value of the extracted feature space. The spectral characteristics of hyperspectral images can be obtained by linear discriminant analysis (LDA) [28]. We use adaptive weighted ﬁlters (AWFs) to extract the spatial information. LDA is a kind of dimensionality reduction technology for supervised learning. Traditional LDA projects data onto a lower-dimensional vector space such that the ratio of the between-class distance to the within-class distance is maximized, thus achieving maximum discrimination. Therefore, when applied to hyperspectral image data mapping, the smallest within-class distance and the largest between-class distance are obtained. Sb and Sw represent the withinclass distance and between-class distance, respectively. The training set is denoted by x r = x1, x2 , …, xN , and Wspe is the ﬁlter set. Nj (j = 1, 2, …, k ) is the number of samples in the jth class set, and uj (j = 1, 2, …, k ) is the mean vector of the jth class set, as follows:

M−1

LBPM − N =

T Wspe Sw Wspe

where Sb =

k ∑ j=1

Nj (uj−u)(uj

s (gi−gk )2i ,

s (x ) =

{

1, 0,

x⩾0 , x<0

(4)

where M represents the number of sample points, and N represents the radius of the sample. gk is the grayscale value of the central pixel, and gi (i = 0, 1, …, M −1) is the grayscale value of the pixel for its local neighborhood. We use the idea of segmentation to extract the texture features of the image. The LBP histogram feature of the directly extracted hyperspectral image describes the texture features of the image as a whole, and the details of the hyperspectral image are not fully reﬂected. Considering these deﬁciencies, each hyperspectral image is ﬁrst divided into several small patches; then, the LBP of each patch is extracted to obtain the LBP histogram. Each hyperspectral image feature is composed of an LBP histogram of all its patch connections. Feature extraction is visually shown in Fig. 1. The dimensionality of the features extracted by local binary pattern

(2)

−u)T ,

∑ i=0

T Wspe Sb Wspe

max

T W =1 Wspe spe

(3)

where m × m is the size of the ﬁlter. si, j is a similarity measure. Through adaptive weighted ﬁltering, pixels in the same class have similar features. Thus, we obtain the spatial characteristics of the hyperspectral image.

In data preprocessing we ﬁrst normalize the image data. Imax and Imin represent the maximum and minimum values of the image, respectively. The normalization operation is conducted as follows:

Imn =

Si, j Σ1m × m Si, j

u is the mean vector of the sample.

k

Sw = ∑ j = 1 ∑x ∈ xr (x −uj )(x −uj )T , and x is a sample in the training set x r . To extract spectral features from hyperspectral images, we arrange the feature values from largest to smallest to create a projection vector. For 4

Optics and Laser Technology xxx (xxxx) xxx–xxx

94. 34 87. 24 80. 36 89. 03 91. 46 88. 63 100 98. 23 100 80. 08 96. 81 85. 01 99. 04 97. 11 79. 26 95. 23

86.61 ± 1.00 86.01 ± 1.93 0.8470 ± 0.0115

100 86. 89 87. 15 97. 84 81. 71 87. 69 96. 00 100 100 86. 33 89. 64 83. 69 88. 99 98. 67 76. 33 80. 85

85.84 ± 2.52 86.07 ± 2.49 0.8387 ± 0.0287

SSN

81.32 80.51 79.75 68.57 79.25 95.18 95.20 93.08 89.47 77.85 85.62 69.95 96.36 96.28 83.16 96.06

SADL

69.81 68.73 65.58 62.81 82.34 93.84 100 97.48 81.05 74.11 81.34 57.71 98.66 93.71 68.70 79.57

78.95 ± 1.09 79.72 ± 2.78 0.7598 ± 0.0127

36.04 71.42 54.16 42.30 78.23 96.81 52.40 95.56 58.42 64.21 86.99 63.38 99.23 97.42 40.08 19.15

77.16 ± 3.46 65.97 ± 5.18 0.7364 ± 0.0405

MPF-LBP

is relatively high. The computational complexity is high when we use these features directly. Therefore, we reduce the dimensionality of the features by principal component analysis (PCA). The main reason for choosing PCA is that the principal components capture most of the changes in the spatial plane of the hyperspectral image cube. Principal component analysis reduces the dimensionality of the dataset and extracts the main components of the information represented by the data. The texture features are reduced by principal component analysis in the spectral dimension, and the compressed spectral dimension is S. Some of the spectral information is lost in this step, but the texture information of the hyperspectral image is not aﬀected. 3.4. Feature fusion Hyperspectral images not only contain abundant spectral and spatial information but also have rich texture information. Using only spectral information or spatial information will cause information loss. Therefore, we extract the spectral-space and texture features, and name the proposed method by spectral-space-texture (SST) feature in this work. These features must be fused after extracting the spectral, spatial and texture features of hyperspectral images. The method of data fusion is presented in reference [34]. The methods used for feature fusion are stack fusion and compound function fusion [35]. First, to simplify the treatment, this paper chooses the simplest method, that is, a direct combination method. We call this feature fusion method SST1. Suppose the spectral-spatial features are xspec − spat and that the texture features are x tex . Then, the directly combined features can be represented as follows:

84.70 ± 2.28 85.48 ± 2.10 0.8257 ± 0.0255

SST2 SST1

F. Li et al.

R1 = [xspec − spat , x tex ].

(5)

The features can then be obtained by weighting fusion, and this method is named SST2 . Let weight α + β = 1. Then, the features after fusion can be expressed as

71.25 ± 2.58 62.35 ± 7.46 0.6709 ± 0.0311

(6)

3.5. Classiﬁcation based on kernel extreme learning machine The combined features are used as input vectors to train the kernel extreme learning machine classiﬁer. The input test samples are classiﬁed based on the trained kernel limit learning machine classiﬁer. An extreme learning machine (ELM) [9] is a model based on single hidden layer forward neural network (SLFNS). The traditional preﬁx neural network is widely used, but it is time consuming to train the network and is sensitive to the selection of the learning rate. In the ELM algorithm, the connect weight of the input layer and the hidden layer and the threshold of the hidden layer neuron are randomly generated. The connect weight and the threshold are not changed during training. The best solution can be obtained by setting the number of hidden layer neurons. The ELM algorithm has the advantages of fast learning and good performance compared to traditional training methods. Kernelbased ELM (KELM) uses a kernel function to improve the stability of ELM [9]. Let N = {(x i , ti ) x i ∈ Rn , ti ∈ Rm} denote the training database, and let g (x ) denote the activation function of the hidden layer neurons. L represents the number of hidden layer nodes. W and b represent the initial weight and hidden layer oﬀset of the input layer for random initialization and the hidden layer, respectively. H is the output matrix of the hidden layer, and β is the output weight of the hidden layer. Then, the least-squares solution β ̂ of the minimum norm β is given as

The results of the proposed methods are highlighted in bold.

58.52 ± 3.30 50.67 ± 2.46 0.5235 ± 0.0368 OA AA κ

80.26 ± 2.35 81.92 ± 3.15 0.8035 ± 0.0315

28.04 64.71 75.69 49.84 96.28 73.47 47.86 89.39 20.32 77.40 66.69 52.19 97.26 86.60 72.26 20.00 83.02 87.46 72.48 8.39 64.02 75.10 100 96.07 78.95 78.60 81.78 65.07 89.47 99.53 76.33 79.79 10.57 44.88 39.94 15.50 57.62 82.95 59.20 79.36 39.47 45.22 67.26 32.19 90.81 85.98 25.27 34.57 Alfalfa Corn-notill Corn-mintill Corn Grass-pasture Grass-trees Grass-pasture-mowed Hay-windrowed Oats Soybean-mintill Soybean-notill Soybean-clean Wheat Woods Buildings-Grasss -Tree Drives Stone-Steel-Towers

LBP-ELM KELM Method

Table 5 OA (%), AA (%) and κ of diﬀerent methods for the Indian Pines dataset.

EPF

R2 = αxspec − spat + βx tex .

−1

1 β ̂ = H T ⎛ + HH T ⎞ T . ⎝C ⎠ The output equation of KELM is given by: 5

(7)

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

Table 6 OA of diﬀerent methods on the Pavia dataset. Method KELM LBP-ELM EPF MPF-LBP SADL SSN

SST1 SST2

1(%) 89.08 91.55 95.06 95.42 93.28 97.53

± ± ± ± ± ±

2(%) 0.72 0.63 1.16 0.62 0.50 0.20

98.20 ± 0.41 98.57 ± 0.44

90.76 94.57 96.72 96.95 96.03 98.33

± ± ± ± ± ±

3(%) 0.24 0.52 0.47 0.38 0.31 0.26

91.92 96.05 97.39 97.48 97.32 98.95

99.04 ± 0.26 99.28 ± 0.18

± ± ± ± ± ±

4(%) 0.33 0.28 0.34 0.39 0.35 0.11

92.43 97.05 97.57 97.74 98.39 99.22

99.34 ± 0.14 99.60 ± 0.08

± ± ± ± ± ±

5(%) 0.18 0.23 0.19 0.30 0.13 0.12

99.57 ± 0.05 99.64 ± 0.06

92.74 97.54 97.74 97.96 98.76 99.36

± ± ± ± ± ±

0.16 0.12 0.27 0.30 0.13 0.11

99.67 ± 0.06 99.70 ± 0.03

The results of the proposed methods are highlighted in bold. −1 1 f (x i ) = h (x i ) H T ⎛ + HH T ⎞ T . C ⎝ ⎠

(8)

where h (x i ) = [g (a1, x i + bi )…g (aL , x i + bL)] is the output of the hidden layer. Then, H can be represented by:

⎡ h (x1) ⎤ H=⎢ ⋮ ⎥ . ⎢ h (x ) ⎥ N ⎦N × N ⎣

(9)

As in SVM, a function that meets Mercer can be used as the Kernel function of the extreme learning machine to obtain the KELM. Commonly used kernel functions are linear functions, polynomial functions, and Gaussian functions. In our paper, h (x ) H T and HH T are replaced by K (u, v ) ; then, f (x ) can be deﬁned as: T

−1 ⎡ K (x , x1) ⎤ 1 ⋮ f (x ) = ⎢ ⎥ ⎛ + ΩELM ⎞ T , ⎠ ⎢ K (x , x ) ⎥ ⎝ C N ⎦ ⎣

(10)

where ΩELM is the matrix of sample functions and can be expressed as:

i = 1, 2, …, N ΩELM = [K (x i ), x j], ⎧ ⎨ ⎩ j = 1, 2, …, N

(11)

The process of hyperspectral image classiﬁcation based on multifeature fusion is shown in Fig. 2.

Fig. 5. OAs of diﬀerent methods with various numbers of training samples on the University of Pavia dataset.

4. Experimental results 4.1. Datasets and experimental settings In our paper, the Indian Pines and University of Pavia datasets are used to verify the eﬀectiveness of the proposed algorithm. The Indian

Fig. 6. Classiﬁcation maps on University of Pavia data set using diﬀerent methods. 6

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

Table 7 OA (%), AA (%) and κ of diﬀerent methods for the Pavia dataset. Method

KELM

LBP-ELM

EPF

MPF-LBP

SADL

SSN

SST1

SST2

Alfalfa Meadows Gravel Trees Painted metal sheets Bare Soil Bitumen Self-Blocking Bricks Shadows

85.67 97.50 66.83 85.34 97.69

85.01 99.55 88.98 51.63 100

93.05 95.37 96.87 99.46 98.09

97.38 99.45 79.02 90.62 97.52

92.66 98.92 74.65 93.50 99.38

98.79 99.85 87.10 93.99 99.61

97. 78 99. 93 98. 75 90. 64 100

96. 48 99. 90 96. 24 95. 45 100

78.14 76.19 84.25

99.62 96.20 92.84

98.52 99.54 87.63

93.37 82.95 91.52

94.57 77.26 77.73

98.34 92.62 93.97

98. 34 97. 19 98. 85

99. 76 100 99. 23

96.96

60.41

96.88

98.94

99.14

94.20

81. 54

92. 10

OA AA κ

89.08 ± 0.72 85.40 ± 1.75 0.8553 ± 0.0101

91.07 ± 0.57 83.58 ± 1.33 0.8813 ± 0.0177

95.06 ± 1.16 96.16 ± 0.75 0.9345 ± 0.0157

95.42 ± 0.62 92.31 ± 1.06 0.9395 ± 0.0082

93.28 ± 0.50 89.76 ± 0.92 0.9116 ± 0.0067

97.53 ± 0.20 95.83 ± 0.42 0.9675 ± 0.0026

98.20 ± 0.41 96.29 ± 0.71 0.9762 ± 0.0054

98.57 ± 0.44 97.49 ± 0.75 0.9810 ± 0.0058

The results of the proposed methods are highlighted in bold.

4.2. Experimental results on the indian pines dataset

Pines dataset was acquired by airborne visible infrared imaging spectrometer in 1992 [36]. The image size is 145 × 145 pixels, and there are 16 diﬀerent land-cover classes and 220 bands in the 0.4- to 2.45-m region of the visible and infrared spectrum. In this paper, 202 bands are used after removing of substantially noisy bands. The total number of labeled samples in this dataset is 10,366. Table 1 shows the name and number of each class. The University of Pavia dataset was acquired using a reﬂection optical system imaging spectrometer [37]. The size of the images is 610 × 340 pixels. The data contain 9 ground-truth classes and 115 bands with 1.3-m spatial resolution. After removing 12 noise bands, the data of the remaining 103 spectral bands are used for classiﬁcation. The name and number of each class are shown in Table 2. To avoid the contingency of the classiﬁcation, we randomly choose m%(m = 1, 2, 3, 4, 5) of the labeled samples from each class for training, and the remaining samples are used for testing. The ﬁlter window size of the adaptive weighted ﬁlter (AWF) is m = 3, 5, …, 11. The classiﬁcation performance is assessed on the test set in terms of the overall accuracy (OA), average accuracy (AA) and kappa coeﬃcient κ . The overall accuracy (OA) is the proportion of correctly classiﬁed samples to the total number of classiﬁed samples. The average accuracy (AA) is the average value of the class classiﬁcation accuracy. The kappa coeﬃcient (k) is a robust measurement of the degree of agreement. The speciﬁc formulas are Eqs. (12)–(14), respectively. All the experiments are conducted using MATLAB R2014a on an Intel i5-3230M 2.60 GHz machine with 8 GB RAM. The speciﬁc formulas for OA and AA are as follows:

Table 4 and Fig. 3 show the OA of diﬀerent methods with diﬀerent numbers of training samples on the Indian Pines dataset. In the comparison experiment, the kernel-based ELM (KELM) was used as a contrast method. In recent years, kernel-based ELM methods have been widely used and have achieved good results in the classiﬁcation of hyperspectral images. The experimental result is the average of ten results. Our method is compared with other methods, namely, SSN [22], SADL [38], MPF-LBP [39], EPF [21], LBP-ELM [40], and KELM [9]. The classiﬁcation eﬀect on the India Pines dataset is shown in Fig. 4, which shows that the methods based on multi-feature fusion are better than those based on spectral and spatial features. Table 5 shows the OA, AA, κ , and classiﬁcation accuracy of each class when 1(%) of labeled samples from each class are randomly chosen for training. The experimental results show the eﬀectiveness of our method. 4.3. Experimental results on the University of Pavia dataset The experimental results are shown in Table 6 and Fig. 5. The experimental results show that our method (SST) performs the best when the training set is limited. Fig. 6 shows the classiﬁcation maps of the classiﬁer. The maps generated by classiﬁcation using SST are less noisy and more accurate than those generated using other methods. The classiﬁcation accuracy for each class, OA, AA, and the κ coeﬃcient are reported in Table 7, where 1(%) of labeled samples from each class are randomly chosen for training. According to the results of the above experiments, the multi-feature fusion method proposed in this paper improves the classiﬁcation results of hyperspectral images and is superior to single classiﬁcation using spectral or spatial features. We still obtained better semi-supervised classiﬁers when using only a small number of labeled samples.

T

OA =

∑

CMuu / n,

u=1

AA = ΣTu = 1 (CMuu/(ΣTu = 1CM1u ))/ T ,

(12) (13)

where T is the total number of categories, CMuu is the diagonal of the confusion matrix, the uth category is classiﬁed as the number of u classes, and n is the total number of classiﬁcation samples. The speciﬁc formula for κ is as follows:

P −P κ= A e 1−Pe

5. Conclusions In this paper, we proposed a classiﬁcation algorithm to merge the multiple features of hyperspectral images for land spectral classiﬁcation of hyperspectral images. Experiments were conducted to evaluate the performance of the proposed method based on two benchmark databases of hyperspectral images. The developed method has greatly improved the accuracy when compared with traditional classiﬁcation using only spectral features and a single spatial feature. In addition, the speed of our method is also improved. In the future, our attentions focus on developing more ways to fuse diﬀerent types of features for HSI.

(14)

where PA is the consistency of two observations and represents the overall classiﬁcation accuracy in remote sensing image classiﬁcation. Pe is the consistency of the two observations. Table 3 shows the results of feature fusion classiﬁcation with different weights. We choose the best weight that can be applied to two datasets simultaneously. The experimental results verify that our approach achieves the best performance when α = 0.9 and β = 0.1. 7

Optics and Laser Technology xxx (xxxx) xxx–xxx

F. Li et al.

References

[20] L. Fang, S. Li, X. Kang, J.A. Benediktsson, Spectral-spatial hyperspectral image classiﬁcation via multiscale adaptive sparse representation, IEEE Trans. Geosci. Remote Sens. 52 (12) (2014) 7738–7749. [21] X. Kang, S. Li, J.A. Benediktsson, Spectral-spatial hyperspectral image classiﬁcation with edge-preserving ﬁltering, IEEE Trans. Geosci. Remote Sens. 52 (5) (2014) 2666–2677. [22] Y. Zhou, Y. Wei, Learning hierarchical spectral-spatial features for hyperspectral image classiﬁcation, IEEE Trans. Cybernet. 46 (7) (2016) 1667–1678. [23] H. Lu, Y. Li, S. Mu, D. Wang, H. Kim, S. Serikawa, Motor anomaly detection for unmanned aerial vehicles using reinforcement learning, IEEE Internet Things J. (2017). [24] H. Lu, Y. Li, M. Chen, H. Kim, S. Serikawa, Brain intelligence: go beyond artiﬁcial intelligence, Mobile Networks Appl. 23 (2) (2018) 368–375. [25] H. Lu, B. Li, J. Zhu, Y. Li, Y. Li, X. Xu, S. Serikawa, Wound intensity correction and segmentation with convolutional neural networks, Concurr. Comput.: Pract. Exp. 29 (6) (2017). [26] H. Lu, Y. Li, T. Uemura, H. Kim, S. Serikawa, Low illumination underwater light ﬁeld images reconstruction using deep convolutional neural networks, Future Gener. Comput. Syst. (2018). [27] X. Xu, L. He, H. Lu, L. Gao, Y. Ji, Deep adversarial metric learning for cross-modal retrieval, in: World Wide Web-internet Web Information Systems, 2018, pp. 1–16. [28] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. ﬁsherfaces: recognition using class speciﬁc linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (2002) 711–720. [29] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classiﬁcation with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971–987. [30] A. Satpathy, X. Jiang, H. Eng, LBP-based edge-texture features for object recognition, IEEE Trans. Image Process. 23 (5) (2014) 1953–1964. [31] R. Lan, Y. Zhou, Y.Y. Tang, Quaternionic local ranking binary pattern: a local descriptor of color images, IEEE Trans. Image Process. 25 (2) (2016) 566–579. [32] R. Lan, Y. Zhou, Quaternion-Michelson descriptor for color image classiﬁcation, IEEE Trans. Image Process. 25 (11) (2016) 5281–5292. [33] R. Lan, Y. Zhou, Y.Y. Tang, et al., Person reidentiﬁcation using quaternionic local binary pattern, 2014 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2014, pp. 1–6. [34] M. Fauvel, J.A. Benediktsson, J. Chanussot, et al., Spectral and spatial classiﬁcation of hyperspectral data using SVMs and morphological proﬁles, IEEE Trans. Geosci. Remote Sens. 46 (11) (2008) 3804–3814. [35] R.M. Haralick, S.R. Sternberg, X. Zhuang, Image analysis using mathematical morphology, IEEE Trans. Pattern Anal. Mach. Intell. (1987) 532–550. [36] J. Li, J.M. Bioucas-Dias, A. Plaza, Spectral–spatial classiﬁcation of hyperspectral data using loopy belief propagation and active learning, IEEE Trans. Geosci. Remote Sens. (2013) 844–856. [37] J. Li, J.M. Bioucas-Dias, A. Plaza, A comparative study of spatial approaches for urban mapping using hyperspectral ROSIS images over Pavia City, Northern Italy, Int. J. Remote Sens. 30 (12) (2009) 3205–3221. [38] A. Soltani-Farani, H.R. Rabiee, S.A. Hosseini, Spatial-aware dictionary learning for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 53 (1) (2015) 527–541. [39] J. Li, J.M. Bioucas-Dias, A. Plaza, Spectral-spatial classiﬁcation of hyperspectral data using loopy belief propagation and active learning, IEEE Trans. Geosci. Remote Sens. 51 (2) (2013) 844–856. [40] W. Li, C. Chen, H. Su, et al., Local binary patterns and extreme learning machine for hyperspectral imagery classiﬁcation, IEEE Trans. Geosci. Remote Sens. 53 (7) (2015) 3681–3693.

[1] A. Berge, A.C. Jensen, A.H.S. Solberg, Sparse inverse covariance estimates for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 45 (5) (2007) 1399–1407. [2] T.V. Bandos, L. Bruzzone, G. Camps Valls, Classiﬁcation of hyperspectral images with regularized linear discriminant analysis, Neurocomputing 47 (3) (2009) 8629–8873. [3] A. Agarwal, T. El-Ghazawi, H. El-Askary, et al., Eﬃcient hierarchical-PCA dimension reduction for hyperspectral imagery, 2007 IEEE International Symposium on Signal Processing and Information Technology, IEEE, 2007, pp. 353–356. [4] M. Fauvel, J. Chanussot, A.J. Benediktsson, Kernel principal component analysis for the classiﬁcation of hyperspectral remote sensing data over urban areas, EURASIP J. Adv. Signal Process. 2009 (1) (2009) 783194. [5] L.Y. Chang, N.J. Liu, C.C. Han, et al., Hyperspectral image classiﬁcation using nearest feature line embedding approach, IEEE Trans. Geosci. Remote Sens. 52 (1) (2014) 278–287. [6] Q. Shi, L. Zhang, B. Du, Semisupervised discriminative locally enhanced alignment for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 51 (9) (2013) 4800–4815. [7] J. Li, X. Huang, P. Gamba, et al., Multiple feature learning for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1592–1606. [8] F. Melgani, L. Bruzzone, Classiﬁcation of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens. 42 (8) (2004) 1778–1790. [9] B.G. Huang, H. Zhou, X. Ding, et al., Extreme learning machine for regression and multiclass classiﬁcation, IEEE Trans. Syst., Man, Cybernet., Part B: Cybernet. 42 (2) (2012) 513–529. [10] C. Chen, W. Li, E.W. Tramel, et al., Reconstruction of hyperspectral imagery from random projections using multihypothesis prediction, IEEE Trans. Geosci. Remote Sens. 52 (1) (2014) 365–374. [11] C. Chen, J.E. Fowler, Single-image super-resolution using multihypothesis prediction, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), IEEE, 2012, pp. 608–612. [12] P. Ghamisi, M.D. Mura, J.A. Benediktsson, A survey on spectral-spatial classiﬁcation techniques based on attribute proﬁles, IEEE Trans. Geosci. Remote Sens. 53 (5) (2015) 2335–2353. [13] S. Bernabe, P.R. Marpu, A. Plaza, M.D. Mura, J.A. Benediktsson, Spectral-spatial classiﬁcation of multispectral images using kernel feature space representation, IEEE Trans. Geosci. Remote Sens. 11 (1) (2014) 288–292. [14] R. Ji, et al., Spectral-spatial constraint hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 52 (3) (2014) 1811–1824. [15] M. Fauvel, Y. Tarabalka, J.A. Benediktsson, J. Chanussot, J.C. Tilton, Advances in spectral-spatial classiﬁcation of hyperspectral images, Proc. IEEE 101 (3) (2013) 652–675. [16] Y. Zhou, J. Peng, C.L.P. Chen, Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 53 (2) (2015) 1082–1095. [17] L. Sun, Z. Wu, J. Liu, L. Xiao, Z. Wei, Supervised spectral-spatial hyperspectral image classiﬁcation with weighted Markov random ﬁelds, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1490–1503. [18] L. Sun, Z. Wu, J. Liu, L. Xiao, Z. Wei, Spectral-spatial classiﬁcation for hyperspectral data using rotation forests with local feature extraction and Markov random ﬁelds, IEEE Trans. Geosci. Remote Sens. 53 (5) (2015) 2532–2546. [19] J. Li, X. Huang, P. Gamba, J.M. Bioucas-Dias, L. Zhang, J.A. Benediktsson, A. Plaza, Multiple feature learning for hyperspectral image classiﬁcation, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1592–1606.

8

Hyperspectral image classification using multi-feature fusion

Hyperspectral image classification using multi-feature fusion

Recommend Documents