Optics and Laser Technology xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Optics and Laser Technology journal homepage: www.elsevier.com/locate/optlastec
Full length article
Hyperspectral image classification using multi-feature fusion☆ ⁎
Fang Lia,b, Jie Wanga, Rushi Lana, , Zhenbing Liua, Xiaonan Luoa a Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Image and Graphics, Guilin University of Electronic Technology, Guilin 541004, China b Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
H I GH L IG H T S
fusion framework is proposed. • AA multi-feature feature representation is developed for HSI using the proposed framework. • State-of-the-art results have been achieved by the proposed method. •
A R T I C LE I N FO
A B S T R A C T
Keywords: Hyperspectral image classification Spectral-spatial feature learning Local binary pattern Feature fusion Kernel extreme learning machine
Traditional hyperspectral image (HSI) classification methods typically use the spectral features and do not make full use of the spatial or other features of the HSI. To address this problem, this paper proposes a novel HSI classification method based on a multi-feature fusion strategy. The spectral-spatial features are first extracted by spectral-spatial feature learning (SSFL), which is a deep hierarchical architecture. Additionally, the texture features of the local binary pattern (LBP) image are applied and fused with the spectral-spatial features. Then, the kernel extreme learning machine (KELM) is used to classify the hyperspectral images. The results of a number of experiments show that the proposed method effectively improves the classification accuracy of hyperspectral images.
1. Introduction Hyperspectral remote sensing technology was one of the major breakthroughs in human observation technology at the end of the 20th century and has been one of the research hotspots in recent years. Compared with traditional multispectral remote sensing images, hyperspectral images (HSI) provide a large amount of information and high spectral resolution and can more accurately describe and analyze the spectral information of topographic cover types. The classification of hyperspectral images (HSI) has been widely used in the fields of surveying, archeology, fine agriculture, biomedicine, environmental and disaster monitoring, and food detection. To solve the problem of the classification of hyperspectral images, a large number of methods have been proposed. In view of the high dimensionality of hyperspectral images, dimension reduction is an effective method to improve the accuracy of spectral image classification.
Linear discriminant analysis [1,2] and principal component analysis [3,4] are effective methods for dimension reduction. Hyperspectral image classification can be divided mainly into supervised and unsupervised methods [5–8]. Support vector machine (SVM) has been widely used in hyperspectral image classification [8]. However, choosing an appropriate kernel function and parameter selection are difficult for SVM. Recently, kernel-based extreme learning machine (KELM) has been used successfully for HIS classification [9]. Compared with SVM, KELM is computationally efficient and has good classification performance [10,11]. Hyperspectral remote sensing image data are three-dimensional and contain a wealth of spectral and spatial information. Spectral-space classification technology has received considerable attention to solve this problem [12–15]. Relevant experts in the early years focus only on the use of spectral characteristics, and the classification results are not satisfactory. Recent studies have shown that full use of spatial features
☆ This work was partially supported by the National Natural Science Foundation of China (Nos. 61702129, 61772149, U1701267, and 61320106008), Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Image and Graphics (No. GIIP201606), and Guangxi Key Laboratory of Trusted Software (No. kx201628). ⁎ Corresponding author. E-mail address:
[email protected] (R. Lan).
https://doi.org/10.1016/j.optlastec.2018.08.044 Received 29 June 2018; Received in revised form 11 August 2018; Accepted 23 August 2018 0030-3992/ © 2018 Published by Elsevier Ltd.
Please cite this article as: Li, F., Optics and Laser Technology, https://doi.org/10.1016/j.optlastec.2018.08.044
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
Fig. 1. Extraction of LBP features from hyperspectral images.
Fig. 2. Flowchart of the proposed classification method.
preserving filtering, which combines spectral spatial features to classify hyperspectral images. Li et al. [7] proposed the method of multiple feature learning for hyperspectral image classification. Recently, Zhou et al. [22] proposed a spectral-spatial network, which is a deep hierarchical model, for HIS classification. Based on the above discussions, this paper proposes a hyperspectral image classification algorithm based on multi-feature fusion. The spectral-spatial features and texture features of hyperspectral images are effectively combined. First, the spectral-spatial features and texture features are extracted from hyperspectral images respectively. Next, these features are fused, and the fused features are used as the input of the kernel-based extreme learning machine (KELM). Then, hyperspectral image classification is performed. Several experimental results have demonstrated the effectiveness of the proposed method.
can effectively improve the classification results. Zhou et al. [16] proposed a spatial and spectral regularized local discriminant embedding method for DR of hyperspectral data. Sun et al. [17] developed a new approach for hyperspectral image classification exploiting spectralspatial information. Xia et al. [18] proposed new spectral-spatial classification strategy to enhance the classification performances obtained on hyperspectral images by integrating rotation forests and Markov random fields (MRFs). Li et al. [19] proposed new spectral-spatial classification strategy to enhance the classification performances obtained on hyperspectral images by integrating rotation forests and Markov random fields (MRFs). Fang et al. [20] proposed a new framework for multiple feature learning which is based on the integration of different types of (linear and nonlinear) features. Kang et al. [21] proposed spectral-spatial hyperspectral image classification with edge2
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
Table 1 Numbers of samples in each ground-truth class in the Indian Pines dataset. Class No
Class Name
Samples Subtotal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
alfalfa corn-notill corn-mintill corn grass-pastrue grass-trees grass-pasture-mowed hay-windrowed oats soybean-notill soybean-mintill soybean-clean wheat woods building-grass-trees-drives stone-steel-towers
54 1434 834 234 497 747 26 489 20 968 2468 614 212 1294 380 95
Total
10,366
Table 2 Numbers of samples in each ground-truth class in the University of Pavia dataset. Class No
Class Name
Samples Subtotal
Fig. 3. OAs of different methods with different numbers of training samples on the Indian Pines dataset.
1 2 3 4 5 6 7 8 9
alfalfa meadows gravel trees painted metal sheets bare soil bitumen self-blocking bricks shadows
6852 18,686 2207 3436 1378 2104 1356 3878 1026
The rest of this paper is organized as follows. Section 2 briefly reviews some related works. In Section 3, we present the proposed method in detail. In Section 4, the experimental results are provided to evaluate the proposed method. Finally, Section 5 presents the conclusions.
43,923
2. Related works
Total
In this paper, the spatial-spectral features are extracted by spectralspace feature learning (SSFL). The proposed method also includes a deep hierarchical architecture. Deep learning is a complex machine learning algorithm that achieves much better results in language and image recognition than do previous techniques. Deep learning has made several achievements in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, voice, recommendation and personalization, and other related fields [23–27]. Deep learning makes machines mimic human activities, such as audio-visual and thinking, solves many complex pattern recognition problems, and has made great progress in artificial intelligence related technology. Deep learning requires automatic learning of features. Suppose we have several cases of input I (such as several images or text). Assuming that we have designed a system S with n layers such that its output, which is obtained by adjusting the system parameters, is still input I. Then, we can automatically obtain a series of hierarchical features of input I, namely S1, …, Sn . For deep
Table 3 The classification results with different weights. Datasets
Weight (α, β )
(0.9, 0.1) (0.8, 0.2) (0.7, 0.3) (0.6, 0.4) (0.5, 0.5) (0.4, 0.6) (0.3, 0.7) (0.2, 0.8) (0.1, 0.9)
Indian Pines
Pavia
0.8533 0.8899 0.8660 0.8471 0.8361 0.8275 0.8260 0.8238 0.8242
0.9865 0.9798 0.9686 0.9573 0.9447 0.9377 0.9327 0.9278 0.9255
Table 4 OA of different methods on the Indian Pines dataset. Method KELM LBP-ELM EPF MPM-LBP SADL SSN
SST1 SST2
1(%) 58.52 80.26 71.75 77.16 78.95 84.70
± ± ± ± ± ±
2(%) 3.30 2.35 2.58 3.46 1.09 2.28
85.84 ± 2.52 86.61 ± 1.00
66.08 87.92 76.34 84.29 88.19 91.33
± ± ± ± ± ±
3(%) 1.03 0.95 7.38 1.84 1.49 0.97
71.35 90.59 83.51 88.19 90.74 93.96
92.47 ± 0.57 93.65 ± 1.15
± ± ± ± ± ±
4(%) 1.15 0.78 2.67 1.34 0.97 1.33
94.98 ± 0.91 95.29 ± 0.82
The results of the proposed methods are highlighted in bold. 3
74.18 91.69 86.22 90.82 92.95 95.59
± ± ± ± ± ±
5(%) 0.87 0.68 1.61 1.47 0.66 0.61
96.01 ± 0.72 96.53 ± 0.57
75.87 92.25 88.46 91.60 94.47 97.02
± ± ± ± ± ±
0.89 0.39 1.16 0.98 1.12 0.36
97.06 ± 0.47 97.33 ± 0.42
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
Fig. 4. Classification maps on Indian Pines data set using different methods.
the output of the spectral feature, the spatial feature is exploited using adaptive weighted filters (AWFs). AWF is a block spatial filter. The features of the center pixel are determined by the weights of the neighbors. The adaptive weights can be defined by
learning, the idea is to stack multiple layers, that is, the output of one layer is used as the input to the next layer. In this way, the input information can be hierarchically expressed. 3. Multi-feature fusion classification algorithm
i, j Wspa =
3.1. Image preprocessing
Imn−Imin , Imax −Imin
3.3. Texture feature extraction of HSI
(1)
where Imn represents the pixels of the mth row and nth column within the image.
3.3.1. Texture feature extraction using local binary pattern Hyperspectral images have many bands that contain redundant information. We use PCA to perform dimensionality reduction on the hyperspectral data before feature extraction. Local binary pattern (LBP) is an operator used to describe the local texture features of an image [29]. LBP has the distinct advantages of being rotation-invariant and grayscale-invariant, and several algorithms have been proposed to improve the performance of LBP [30–33]. The basic principle of LBP is to compare the center pixel of the selected image with the pixel threshold in its local neighborhood to obtain a binary code that describes the texture features of the image. When the neighbor;s value is lesser than the threshold value, the pixel is given a value of “0”; otherwise, the value is “1”. The binary code is accumulated bit by bit in the clockwise direction, and the resulting decimal value is the LBP value of the center point. Then, the LBP can be described as follows:
3.2. Spatial-spectral feature extraction In this work, the spatial-spectral features are extracted by spectralspace feature learning (SSFL) [22]. SSFL includes spatial feature and texture feature extraction. The output of each layer is input as the back layer. We use multiple rounds of SSFL to form a deep structural model with multiple layers. The extracted spatial-spectral feature is the input value of the extracted feature space. The spectral characteristics of hyperspectral images can be obtained by linear discriminant analysis (LDA) [28]. We use adaptive weighted filters (AWFs) to extract the spatial information. LDA is a kind of dimensionality reduction technology for supervised learning. Traditional LDA projects data onto a lower-dimensional vector space such that the ratio of the between-class distance to the within-class distance is maximized, thus achieving maximum discrimination. Therefore, when applied to hyperspectral image data mapping, the smallest within-class distance and the largest between-class distance are obtained. Sb and Sw represent the withinclass distance and between-class distance, respectively. The training set is denoted by x r = x1, x2 , …, xN , and Wspe is the filter set. Nj (j = 1, 2, …, k ) is the number of samples in the jth class set, and uj (j = 1, 2, …, k ) is the mean vector of the jth class set, as follows:
M−1
LBPM − N =
T Wspe Sw Wspe
where Sb =
k ∑ j=1
Nj (uj−u)(uj
s (gi−gk )2i ,
s (x ) =
{
1, 0,
x⩾0 , x<0
(4)
where M represents the number of sample points, and N represents the radius of the sample. gk is the grayscale value of the central pixel, and gi (i = 0, 1, …, M −1) is the grayscale value of the pixel for its local neighborhood. We use the idea of segmentation to extract the texture features of the image. The LBP histogram feature of the directly extracted hyperspectral image describes the texture features of the image as a whole, and the details of the hyperspectral image are not fully reflected. Considering these deficiencies, each hyperspectral image is first divided into several small patches; then, the LBP of each patch is extracted to obtain the LBP histogram. Each hyperspectral image feature is composed of an LBP histogram of all its patch connections. Feature extraction is visually shown in Fig. 1. The dimensionality of the features extracted by local binary pattern
(2)
−u)T ,
∑ i=0
T Wspe Sb Wspe
max
T W =1 Wspe spe
(3)
where m × m is the size of the filter. si, j is a similarity measure. Through adaptive weighted filtering, pixels in the same class have similar features. Thus, we obtain the spatial characteristics of the hyperspectral image.
In data preprocessing we first normalize the image data. Imax and Imin represent the maximum and minimum values of the image, respectively. The normalization operation is conducted as follows:
Imn =
Si, j Σ1m × m Si, j
u is the mean vector of the sample.
k
Sw = ∑ j = 1 ∑x ∈ xr (x −uj )(x −uj )T , and x is a sample in the training set x r . To extract spectral features from hyperspectral images, we arrange the feature values from largest to smallest to create a projection vector. For 4
Optics and Laser Technology xxx (xxxx) xxx–xxx
94. 34 87. 24 80. 36 89. 03 91. 46 88. 63 100 98. 23 100 80. 08 96. 81 85. 01 99. 04 97. 11 79. 26 95. 23
86.61 ± 1.00 86.01 ± 1.93 0.8470 ± 0.0115
100 86. 89 87. 15 97. 84 81. 71 87. 69 96. 00 100 100 86. 33 89. 64 83. 69 88. 99 98. 67 76. 33 80. 85
85.84 ± 2.52 86.07 ± 2.49 0.8387 ± 0.0287
SSN
81.32 80.51 79.75 68.57 79.25 95.18 95.20 93.08 89.47 77.85 85.62 69.95 96.36 96.28 83.16 96.06
SADL
69.81 68.73 65.58 62.81 82.34 93.84 100 97.48 81.05 74.11 81.34 57.71 98.66 93.71 68.70 79.57
78.95 ± 1.09 79.72 ± 2.78 0.7598 ± 0.0127
36.04 71.42 54.16 42.30 78.23 96.81 52.40 95.56 58.42 64.21 86.99 63.38 99.23 97.42 40.08 19.15
77.16 ± 3.46 65.97 ± 5.18 0.7364 ± 0.0405
MPF-LBP
is relatively high. The computational complexity is high when we use these features directly. Therefore, we reduce the dimensionality of the features by principal component analysis (PCA). The main reason for choosing PCA is that the principal components capture most of the changes in the spatial plane of the hyperspectral image cube. Principal component analysis reduces the dimensionality of the dataset and extracts the main components of the information represented by the data. The texture features are reduced by principal component analysis in the spectral dimension, and the compressed spectral dimension is S. Some of the spectral information is lost in this step, but the texture information of the hyperspectral image is not affected. 3.4. Feature fusion Hyperspectral images not only contain abundant spectral and spatial information but also have rich texture information. Using only spectral information or spatial information will cause information loss. Therefore, we extract the spectral-space and texture features, and name the proposed method by spectral-space-texture (SST) feature in this work. These features must be fused after extracting the spectral, spatial and texture features of hyperspectral images. The method of data fusion is presented in reference [34]. The methods used for feature fusion are stack fusion and compound function fusion [35]. First, to simplify the treatment, this paper chooses the simplest method, that is, a direct combination method. We call this feature fusion method SST1. Suppose the spectral-spatial features are xspec − spat and that the texture features are x tex . Then, the directly combined features can be represented as follows:
84.70 ± 2.28 85.48 ± 2.10 0.8257 ± 0.0255
SST2 SST1
F. Li et al.
R1 = [xspec − spat , x tex ].
(5)
The features can then be obtained by weighting fusion, and this method is named SST2 . Let weight α + β = 1. Then, the features after fusion can be expressed as
71.25 ± 2.58 62.35 ± 7.46 0.6709 ± 0.0311
(6)
3.5. Classification based on kernel extreme learning machine The combined features are used as input vectors to train the kernel extreme learning machine classifier. The input test samples are classified based on the trained kernel limit learning machine classifier. An extreme learning machine (ELM) [9] is a model based on single hidden layer forward neural network (SLFNS). The traditional prefix neural network is widely used, but it is time consuming to train the network and is sensitive to the selection of the learning rate. In the ELM algorithm, the connect weight of the input layer and the hidden layer and the threshold of the hidden layer neuron are randomly generated. The connect weight and the threshold are not changed during training. The best solution can be obtained by setting the number of hidden layer neurons. The ELM algorithm has the advantages of fast learning and good performance compared to traditional training methods. Kernelbased ELM (KELM) uses a kernel function to improve the stability of ELM [9]. Let N = {(x i , ti ) x i ∈ Rn , ti ∈ Rm} denote the training database, and let g (x ) denote the activation function of the hidden layer neurons. L represents the number of hidden layer nodes. W and b represent the initial weight and hidden layer offset of the input layer for random initialization and the hidden layer, respectively. H is the output matrix of the hidden layer, and β is the output weight of the hidden layer. Then, the least-squares solution β ̂ of the minimum norm β is given as
The results of the proposed methods are highlighted in bold.
58.52 ± 3.30 50.67 ± 2.46 0.5235 ± 0.0368 OA AA κ
80.26 ± 2.35 81.92 ± 3.15 0.8035 ± 0.0315
28.04 64.71 75.69 49.84 96.28 73.47 47.86 89.39 20.32 77.40 66.69 52.19 97.26 86.60 72.26 20.00 83.02 87.46 72.48 8.39 64.02 75.10 100 96.07 78.95 78.60 81.78 65.07 89.47 99.53 76.33 79.79 10.57 44.88 39.94 15.50 57.62 82.95 59.20 79.36 39.47 45.22 67.26 32.19 90.81 85.98 25.27 34.57 Alfalfa Corn-notill Corn-mintill Corn Grass-pasture Grass-trees Grass-pasture-mowed Hay-windrowed Oats Soybean-mintill Soybean-notill Soybean-clean Wheat Woods Buildings-Grasss -Tree Drives Stone-Steel-Towers
LBP-ELM KELM Method
Table 5 OA (%), AA (%) and κ of different methods for the Indian Pines dataset.
EPF
R2 = αxspec − spat + βx tex .
−1
1 β ̂ = H T ⎛ + HH T ⎞ T . ⎝C ⎠ The output equation of KELM is given by: 5
(7)
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
Table 6 OA of different methods on the Pavia dataset. Method KELM LBP-ELM EPF MPF-LBP SADL SSN
SST1 SST2
1(%) 89.08 91.55 95.06 95.42 93.28 97.53
± ± ± ± ± ±
2(%) 0.72 0.63 1.16 0.62 0.50 0.20
98.20 ± 0.41 98.57 ± 0.44
90.76 94.57 96.72 96.95 96.03 98.33
± ± ± ± ± ±
3(%) 0.24 0.52 0.47 0.38 0.31 0.26
91.92 96.05 97.39 97.48 97.32 98.95
99.04 ± 0.26 99.28 ± 0.18
± ± ± ± ± ±
4(%) 0.33 0.28 0.34 0.39 0.35 0.11
92.43 97.05 97.57 97.74 98.39 99.22
99.34 ± 0.14 99.60 ± 0.08
± ± ± ± ± ±
5(%) 0.18 0.23 0.19 0.30 0.13 0.12
99.57 ± 0.05 99.64 ± 0.06
92.74 97.54 97.74 97.96 98.76 99.36
± ± ± ± ± ±
0.16 0.12 0.27 0.30 0.13 0.11
99.67 ± 0.06 99.70 ± 0.03
The results of the proposed methods are highlighted in bold. −1 1 f (x i ) = h (x i ) H T ⎛ + HH T ⎞ T . C ⎝ ⎠
(8)
where h (x i ) = [g (a1, x i + bi )…g (aL , x i + bL)] is the output of the hidden layer. Then, H can be represented by:
⎡ h (x1) ⎤ H=⎢ ⋮ ⎥ . ⎢ h (x ) ⎥ N ⎦N × N ⎣
(9)
As in SVM, a function that meets Mercer can be used as the Kernel function of the extreme learning machine to obtain the KELM. Commonly used kernel functions are linear functions, polynomial functions, and Gaussian functions. In our paper, h (x ) H T and HH T are replaced by K (u, v ) ; then, f (x ) can be defined as: T
−1 ⎡ K (x , x1) ⎤ 1 ⋮ f (x ) = ⎢ ⎥ ⎛ + ΩELM ⎞ T , ⎠ ⎢ K (x , x ) ⎥ ⎝ C N ⎦ ⎣
(10)
where ΩELM is the matrix of sample functions and can be expressed as:
i = 1, 2, …, N ΩELM = [K (x i ), x j], ⎧ ⎨ ⎩ j = 1, 2, …, N
(11)
The process of hyperspectral image classification based on multifeature fusion is shown in Fig. 2.
Fig. 5. OAs of different methods with various numbers of training samples on the University of Pavia dataset.
4. Experimental results 4.1. Datasets and experimental settings In our paper, the Indian Pines and University of Pavia datasets are used to verify the effectiveness of the proposed algorithm. The Indian
Fig. 6. Classification maps on University of Pavia data set using different methods. 6
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
Table 7 OA (%), AA (%) and κ of different methods for the Pavia dataset. Method
KELM
LBP-ELM
EPF
MPF-LBP
SADL
SSN
SST1
SST2
Alfalfa Meadows Gravel Trees Painted metal sheets Bare Soil Bitumen Self-Blocking Bricks Shadows
85.67 97.50 66.83 85.34 97.69
85.01 99.55 88.98 51.63 100
93.05 95.37 96.87 99.46 98.09
97.38 99.45 79.02 90.62 97.52
92.66 98.92 74.65 93.50 99.38
98.79 99.85 87.10 93.99 99.61
97. 78 99. 93 98. 75 90. 64 100
96. 48 99. 90 96. 24 95. 45 100
78.14 76.19 84.25
99.62 96.20 92.84
98.52 99.54 87.63
93.37 82.95 91.52
94.57 77.26 77.73
98.34 92.62 93.97
98. 34 97. 19 98. 85
99. 76 100 99. 23
96.96
60.41
96.88
98.94
99.14
94.20
81. 54
92. 10
OA AA κ
89.08 ± 0.72 85.40 ± 1.75 0.8553 ± 0.0101
91.07 ± 0.57 83.58 ± 1.33 0.8813 ± 0.0177
95.06 ± 1.16 96.16 ± 0.75 0.9345 ± 0.0157
95.42 ± 0.62 92.31 ± 1.06 0.9395 ± 0.0082
93.28 ± 0.50 89.76 ± 0.92 0.9116 ± 0.0067
97.53 ± 0.20 95.83 ± 0.42 0.9675 ± 0.0026
98.20 ± 0.41 96.29 ± 0.71 0.9762 ± 0.0054
98.57 ± 0.44 97.49 ± 0.75 0.9810 ± 0.0058
The results of the proposed methods are highlighted in bold.
4.2. Experimental results on the indian pines dataset
Pines dataset was acquired by airborne visible infrared imaging spectrometer in 1992 [36]. The image size is 145 × 145 pixels, and there are 16 different land-cover classes and 220 bands in the 0.4- to 2.45-m region of the visible and infrared spectrum. In this paper, 202 bands are used after removing of substantially noisy bands. The total number of labeled samples in this dataset is 10,366. Table 1 shows the name and number of each class. The University of Pavia dataset was acquired using a reflection optical system imaging spectrometer [37]. The size of the images is 610 × 340 pixels. The data contain 9 ground-truth classes and 115 bands with 1.3-m spatial resolution. After removing 12 noise bands, the data of the remaining 103 spectral bands are used for classification. The name and number of each class are shown in Table 2. To avoid the contingency of the classification, we randomly choose m%(m = 1, 2, 3, 4, 5) of the labeled samples from each class for training, and the remaining samples are used for testing. The filter window size of the adaptive weighted filter (AWF) is m = 3, 5, …, 11. The classification performance is assessed on the test set in terms of the overall accuracy (OA), average accuracy (AA) and kappa coefficient κ . The overall accuracy (OA) is the proportion of correctly classified samples to the total number of classified samples. The average accuracy (AA) is the average value of the class classification accuracy. The kappa coefficient (k) is a robust measurement of the degree of agreement. The specific formulas are Eqs. (12)–(14), respectively. All the experiments are conducted using MATLAB R2014a on an Intel i5-3230M 2.60 GHz machine with 8 GB RAM. The specific formulas for OA and AA are as follows:
Table 4 and Fig. 3 show the OA of different methods with different numbers of training samples on the Indian Pines dataset. In the comparison experiment, the kernel-based ELM (KELM) was used as a contrast method. In recent years, kernel-based ELM methods have been widely used and have achieved good results in the classification of hyperspectral images. The experimental result is the average of ten results. Our method is compared with other methods, namely, SSN [22], SADL [38], MPF-LBP [39], EPF [21], LBP-ELM [40], and KELM [9]. The classification effect on the India Pines dataset is shown in Fig. 4, which shows that the methods based on multi-feature fusion are better than those based on spectral and spatial features. Table 5 shows the OA, AA, κ , and classification accuracy of each class when 1(%) of labeled samples from each class are randomly chosen for training. The experimental results show the effectiveness of our method. 4.3. Experimental results on the University of Pavia dataset The experimental results are shown in Table 6 and Fig. 5. The experimental results show that our method (SST) performs the best when the training set is limited. Fig. 6 shows the classification maps of the classifier. The maps generated by classification using SST are less noisy and more accurate than those generated using other methods. The classification accuracy for each class, OA, AA, and the κ coefficient are reported in Table 7, where 1(%) of labeled samples from each class are randomly chosen for training. According to the results of the above experiments, the multi-feature fusion method proposed in this paper improves the classification results of hyperspectral images and is superior to single classification using spectral or spatial features. We still obtained better semi-supervised classifiers when using only a small number of labeled samples.
T
OA =
∑
CMuu / n,
u=1
AA = ΣTu = 1 (CMuu/(ΣTu = 1CM1u ))/ T ,
(12) (13)
where T is the total number of categories, CMuu is the diagonal of the confusion matrix, the uth category is classified as the number of u classes, and n is the total number of classification samples. The specific formula for κ is as follows:
P −P κ= A e 1−Pe
5. Conclusions In this paper, we proposed a classification algorithm to merge the multiple features of hyperspectral images for land spectral classification of hyperspectral images. Experiments were conducted to evaluate the performance of the proposed method based on two benchmark databases of hyperspectral images. The developed method has greatly improved the accuracy when compared with traditional classification using only spectral features and a single spatial feature. In addition, the speed of our method is also improved. In the future, our attentions focus on developing more ways to fuse different types of features for HSI.
(14)
where PA is the consistency of two observations and represents the overall classification accuracy in remote sensing image classification. Pe is the consistency of the two observations. Table 3 shows the results of feature fusion classification with different weights. We choose the best weight that can be applied to two datasets simultaneously. The experimental results verify that our approach achieves the best performance when α = 0.9 and β = 0.1. 7
Optics and Laser Technology xxx (xxxx) xxx–xxx
F. Li et al.
References
[20] L. Fang, S. Li, X. Kang, J.A. Benediktsson, Spectral-spatial hyperspectral image classification via multiscale adaptive sparse representation, IEEE Trans. Geosci. Remote Sens. 52 (12) (2014) 7738–7749. [21] X. Kang, S. Li, J.A. Benediktsson, Spectral-spatial hyperspectral image classification with edge-preserving filtering, IEEE Trans. Geosci. Remote Sens. 52 (5) (2014) 2666–2677. [22] Y. Zhou, Y. Wei, Learning hierarchical spectral-spatial features for hyperspectral image classification, IEEE Trans. Cybernet. 46 (7) (2016) 1667–1678. [23] H. Lu, Y. Li, S. Mu, D. Wang, H. Kim, S. Serikawa, Motor anomaly detection for unmanned aerial vehicles using reinforcement learning, IEEE Internet Things J. (2017). [24] H. Lu, Y. Li, M. Chen, H. Kim, S. Serikawa, Brain intelligence: go beyond artificial intelligence, Mobile Networks Appl. 23 (2) (2018) 368–375. [25] H. Lu, B. Li, J. Zhu, Y. Li, Y. Li, X. Xu, S. Serikawa, Wound intensity correction and segmentation with convolutional neural networks, Concurr. Comput.: Pract. Exp. 29 (6) (2017). [26] H. Lu, Y. Li, T. Uemura, H. Kim, S. Serikawa, Low illumination underwater light field images reconstruction using deep convolutional neural networks, Future Gener. Comput. Syst. (2018). [27] X. Xu, L. He, H. Lu, L. Gao, Y. Ji, Deep adversarial metric learning for cross-modal retrieval, in: World Wide Web-internet Web Information Systems, 2018, pp. 1–16. [28] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (2002) 711–720. [29] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971–987. [30] A. Satpathy, X. Jiang, H. Eng, LBP-based edge-texture features for object recognition, IEEE Trans. Image Process. 23 (5) (2014) 1953–1964. [31] R. Lan, Y. Zhou, Y.Y. Tang, Quaternionic local ranking binary pattern: a local descriptor of color images, IEEE Trans. Image Process. 25 (2) (2016) 566–579. [32] R. Lan, Y. Zhou, Quaternion-Michelson descriptor for color image classification, IEEE Trans. Image Process. 25 (11) (2016) 5281–5292. [33] R. Lan, Y. Zhou, Y.Y. Tang, et al., Person reidentification using quaternionic local binary pattern, 2014 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2014, pp. 1–6. [34] M. Fauvel, J.A. Benediktsson, J. Chanussot, et al., Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles, IEEE Trans. Geosci. Remote Sens. 46 (11) (2008) 3804–3814. [35] R.M. Haralick, S.R. Sternberg, X. Zhuang, Image analysis using mathematical morphology, IEEE Trans. Pattern Anal. Mach. Intell. (1987) 532–550. [36] J. Li, J.M. Bioucas-Dias, A. Plaza, Spectral–spatial classification of hyperspectral data using loopy belief propagation and active learning, IEEE Trans. Geosci. Remote Sens. (2013) 844–856. [37] J. Li, J.M. Bioucas-Dias, A. Plaza, A comparative study of spatial approaches for urban mapping using hyperspectral ROSIS images over Pavia City, Northern Italy, Int. J. Remote Sens. 30 (12) (2009) 3205–3221. [38] A. Soltani-Farani, H.R. Rabiee, S.A. Hosseini, Spatial-aware dictionary learning for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 53 (1) (2015) 527–541. [39] J. Li, J.M. Bioucas-Dias, A. Plaza, Spectral-spatial classification of hyperspectral data using loopy belief propagation and active learning, IEEE Trans. Geosci. Remote Sens. 51 (2) (2013) 844–856. [40] W. Li, C. Chen, H. Su, et al., Local binary patterns and extreme learning machine for hyperspectral imagery classification, IEEE Trans. Geosci. Remote Sens. 53 (7) (2015) 3681–3693.
[1] A. Berge, A.C. Jensen, A.H.S. Solberg, Sparse inverse covariance estimates for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 45 (5) (2007) 1399–1407. [2] T.V. Bandos, L. Bruzzone, G. Camps Valls, Classification of hyperspectral images with regularized linear discriminant analysis, Neurocomputing 47 (3) (2009) 8629–8873. [3] A. Agarwal, T. El-Ghazawi, H. El-Askary, et al., Efficient hierarchical-PCA dimension reduction for hyperspectral imagery, 2007 IEEE International Symposium on Signal Processing and Information Technology, IEEE, 2007, pp. 353–356. [4] M. Fauvel, J. Chanussot, A.J. Benediktsson, Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas, EURASIP J. Adv. Signal Process. 2009 (1) (2009) 783194. [5] L.Y. Chang, N.J. Liu, C.C. Han, et al., Hyperspectral image classification using nearest feature line embedding approach, IEEE Trans. Geosci. Remote Sens. 52 (1) (2014) 278–287. [6] Q. Shi, L. Zhang, B. Du, Semisupervised discriminative locally enhanced alignment for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 51 (9) (2013) 4800–4815. [7] J. Li, X. Huang, P. Gamba, et al., Multiple feature learning for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1592–1606. [8] F. Melgani, L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens. 42 (8) (2004) 1778–1790. [9] B.G. Huang, H. Zhou, X. Ding, et al., Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst., Man, Cybernet., Part B: Cybernet. 42 (2) (2012) 513–529. [10] C. Chen, W. Li, E.W. Tramel, et al., Reconstruction of hyperspectral imagery from random projections using multihypothesis prediction, IEEE Trans. Geosci. Remote Sens. 52 (1) (2014) 365–374. [11] C. Chen, J.E. Fowler, Single-image super-resolution using multihypothesis prediction, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), IEEE, 2012, pp. 608–612. [12] P. Ghamisi, M.D. Mura, J.A. Benediktsson, A survey on spectral-spatial classification techniques based on attribute profiles, IEEE Trans. Geosci. Remote Sens. 53 (5) (2015) 2335–2353. [13] S. Bernabe, P.R. Marpu, A. Plaza, M.D. Mura, J.A. Benediktsson, Spectral-spatial classification of multispectral images using kernel feature space representation, IEEE Trans. Geosci. Remote Sens. 11 (1) (2014) 288–292. [14] R. Ji, et al., Spectral-spatial constraint hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 52 (3) (2014) 1811–1824. [15] M. Fauvel, Y. Tarabalka, J.A. Benediktsson, J. Chanussot, J.C. Tilton, Advances in spectral-spatial classification of hyperspectral images, Proc. IEEE 101 (3) (2013) 652–675. [16] Y. Zhou, J. Peng, C.L.P. Chen, Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 53 (2) (2015) 1082–1095. [17] L. Sun, Z. Wu, J. Liu, L. Xiao, Z. Wei, Supervised spectral-spatial hyperspectral image classification with weighted Markov random fields, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1490–1503. [18] L. Sun, Z. Wu, J. Liu, L. Xiao, Z. Wei, Spectral-spatial classification for hyperspectral data using rotation forests with local feature extraction and Markov random fields, IEEE Trans. Geosci. Remote Sens. 53 (5) (2015) 2532–2546. [19] J. Li, X. Huang, P. Gamba, J.M. Bioucas-Dias, L. Zhang, J.A. Benediktsson, A. Plaza, Multiple feature learning for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 53 (3) (2015) 1592–1606.
8