Hyperspectral image reconstruction by deep convolutional neural network for classification

Hyperspectral image reconstruction by deep convolutional neural network for classification

Author’s Accepted Manuscript Hyperspectral image reconstruction by deep convolutional neural network for classification Yunsong Li, Weiying Xie, Huaqi...

4MB Sizes 6 Downloads 269 Views

Author’s Accepted Manuscript Hyperspectral image reconstruction by deep convolutional neural network for classification Yunsong Li, Weiying Xie, Huaqing Li

www.elsevier.com/locate/pr

PII: DOI: Reference:

S0031-3203(16)30333-8 http://dx.doi.org/10.1016/j.patcog.2016.10.019 PR5927

To appear in: Pattern Recognition Received date: 22 May 2016 Revised date: 22 August 2016 Accepted date: 15 October 2016 Cite this article as: Yunsong Li, Weiying Xie and Huaqing Li, Hyperspectral image reconstruction by deep convolutional neural network for classification, Pattern Recognition, http://dx.doi.org/10.1016/j.patcog.2016.10.019 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Hyperspectral image reconstruction by deep convolutional neural network for classification Yunsong Lia,b, Weiying Xiea,b*, Huaqing Lia,b a

State Key Laboratory of Integrated Service Network, Xidian University, Xi’an, China, 710071

b

Joint Laboratory of High Speed Multi-source Image Coding and Processing, Xidian University, Xi’an, China, 710071

*

Corresponding author. Tel.: +8618309215199; fax: +862988204271. E-mail address:[email protected] (W. Xie)

Abstract Spatial features of hyperspectral imagery (HSI) have gained an increasing attention in the latest years. Considering deep convolutional neural network (CNN) can extract a hierarchy of increasingly spatial features, this paper proposes an HSI reconstruction model based on deep CNN to enhance spatial features. The framework proposes a new spatial features-based strategy for band selection to define training label with rich information for the first time. Then, hyperspectral data is trained by deep CNN to build a model with optimized parameters which is suitable for HSI reconstruction. Finally, the reconstructed image is classified by the efficient extreme learning machine (ELM) with a very simple structure. Experimental results indicate that framework built based on CNN and ELM provides competitive performance with small number of training samples. Specifically, by using the reconstructed image, the average accuracy of ELM can be improved as high as 30.04%, while performs tens to hundreds of times faster than those state-of-the-art classifiers. Keywords : hyperspectral imagery; deep convolutional neural network; extreme learning machine; reconstruction; band selection; pattern classification.

1

1. Introduction

Remote sensing imaging instruments are capable of collecting continues data including spatial and spectral information, simultaneously [1]. Thus, hyperspectral imagery (HSI) is a three-dimensional data cube including two spatial dimensions with space information of pixels and one spectral dimension with high-dimensional reflectance vectors. Generally, HSI has high spectral resolution, which can natively distinguish many different materials. However, some spectrums of different classes may be similar to others and the spatial features are often omitted in traditional way, which are not sufficiently for class classification [2]. Nowadays, spatial features have been growing more and more important for remote sensing image analysis [3] so that it has become a significant topic to enhance the spatial features. To achieve the aforementioned requirement for effective and accurate HSI classification, several algorithms have been widely used. Kang et al. [4] applied the transform domain recursive filter for filtering the pixel-wise classification map obtained by SVM of each class, which considered the neighborhood information of HSI. In [5] and [6], HSI segmentation algorithms were proposed to incorporate the spatial information (the relationship between neighboring pixels) into the classification process, which aims at partition an image into different regions. However, these algorithms assumed that pixels within a local region should share similar spectral characteristics, which ignored the fact that each pixel in HSI with high spectral resolution may represent different materials of interest, resulting in spectral distortion. Another category of considering spatial information approach is based upon the injection of details from the panchromatic [7], multispectral [8,9], or other higher spatial resolution images [10-16] into the HSI. In this situation, detail injection-based methods require these images to be acquired over the same scene and under the same conditions for being fused with the corresponding HSI. From the application point of view, this is a difficult problem. In addition, these methods may produce serious spatial and spectral distortion.

2

Commonly, to well represent a pixel in HSI, an integration of spatial features with spectral features for HSI classification is necessary. Rellier et al. [17] proposed a texture feature analysis using a Gauss-Markov random field model for HSI classification. In [18] and [19], a patch alignment framework of combining multiple features (e.g. spectral, texture, shape, etc.) was proposed for the subsequent classification. Zhang et al. [20] also considered multiple types of features for HSI classification using the sparse representation methods. All of these methods have given good performance in terms of classification accuracies, which indicate a combination of spectral and spatial features would have positive influence on classification. However, the combination of multiple features increases the number of training samples leading to computational cost. According to the important role of spatial features in HSI analysis, our approach is motivated by the current successful super-resolution model-SRCNN [21] using deep convolutional neural network (CNN). The SRCNN model is a deep CNN model for single image super-resolution that directly learned an end-to-end mapping between low- and high-resolution images. The mapping is represented as a deep CNN that takes the low-resolution image as the input and outputs the high-resolution one. Inspired by this fact, we propose a deeper CNN that directly learns an end-to-end mapping between HSI and the label with rich spatial features for the first time. Considering the difficulty of obtaining the label image acquired over the same scene under the same conditions by different sensors, we propose a new spatial features-based strategy for selecting a band with spatial information enhancement by a deeper CNN model that has not been previously presented in the available literature. This idea is validated in our previous work [22] that a selected band as a guided image was used to estimate the spectral background and foreground by matting model [23]. We refer to this approach as HSI reconstruction with deep convolutional neural network (HRCNN), which does not depend on the corresponding multispectral or panchromatic image and is easier for application. The experimental results show that our HRCNN model is good at enhancing spatial features and remaining spectral features, which is useful for the

3

subsequent classification. We present several types of image quality evaluation of both subjective and objective indices supporting this claim. Furthermore, in order to achieve low computational cost and good generalization performance, extreme learning machine (ELM) is utilized to classify the reconstructed HSI under a small training set in this paper. In summary, the main contributions of this paper are as follows: 1) The feature enhancement step is introduced in the proposed HRCNN model because that the HSI data is contaminated with noise, which is not proposed in the other schemes; 2) Since the first principle component (PC) represented as training label may bring about spectral distortion, a new band selection method based on spatial features is taken into consideration in this training model for the first time which aims at not only enhancing spatial features, but also remaining spectral information of HSI data; 3) A new exploration to combine CNN with ELM for HSI classification under a small number of training samples is achieved with good generalization and low computational cost for the first time. Specifically, HRCNN model is first employed to enhance the spatial features without spectral distortion. In addition, a spatial features-based band selection is first proposed in this paper to find a band with distinctive and informative spatial information as a training label which is not clearly demonstrated in the state-of-the-art schemes. The following part of this paper is divided into six sections. Section II reviews the related work about CNN and ELM used in HSI applications. In section III, we describe the flowchart of the proposed methodology. Section IV is devoted to experimental results. In Section V, the reconstructed HSI is extended and applied to classification. In the last section, a conclusion is made.

2. Related work

Deep learning models can learn a hierarchy of increasingly complex features, which have been widely used in the field of HSI applications. Zabala et al. segmented the spectral domain of HSI into different regions, to 4

which the stacked autoencoder (SAE) is applied individually [24]. Ma et al. introduced SAE to integrate spectral-spatial features [25]. Chen et al. utilized the SAE to classify HSI data by choosing 60% of the tagged samples as training set [26]. Chen et al also introduced deep belief network (DBN) to extract spectral-spatial features for HSI classification by choosing 50% of the tagged samples as training set [27]. However, the requirement that training samples should be flattened to one-dimension to satisfy the input of the SAE and DBN neglected the spatial information of HSI data. In addition, there are too many parameters in hidden layers of SAE and DBN, while CNN has the advantage of local connections and shared weights that can reduce computational cost. As far as we know, several studies about CNN in the field of HSI applications have tended to concentrate on extract spatial/structural features for improving classification accuracies: Zhao et al. [28] explored multiscale CNN to transform the original HSI into a pyramid structure containing spatial features for HSI classification with final result in the form of major voting. Zhao and Du exploited CNN to extract spatial features and the logistic regression (LR) classifier was used [29], which conducted experiments only on two hyperspectral images (Pavia center and University of Pavia) collected by the same sensors which cannot illustrate generalization. Romero et al. introduced CNN to learn hierarchical features of remote sensing images including multi- and hyperspectral images [30]. The flaws in this work are complex network structure and relatively poor performance. Makantasis et al. exploited CNN to encode spectral and spatial features and a Multi-Layer Perceptron (MLP) to classify HSI data [31] by choosing 80% samples as training samples that is unpractical. Tuia et al. substituted the convolution operators by spatial filters with known properties to extract deep features of HSI data [32] with lower performance. Thus, challenges still remain in achieving robust performance, good generalization and high speed under a small training set of HSI data by using CNN and efficient classifier. In this paper, CNN is mainly equipped with convolutional layers. It learns the feature representation from raw pixels, and can be trained in an end-to-end manner by the back propagation algorithm

5

[33], which can be viewed as a transformation from the input map to the output map. In addition, the issues of defining training label to extract effective spatial feature are not obviously demonstrated in the state-of-the-art works. More details are discussed in Sec. 3. Although the main idea behind this paper is to introduce HSI reconstruction technique based on deep CNN, it is of interest to see the effectiveness of the novel reconstructed data on classification. Owing to its simple principle, low computational cost, good generalization performance and little human intervention [34-38], ELM is applied to final classification unlike LR, SVM and CR based classifiers used in [24-32]. In the field of HSI classification, ELM and its extensions have been recently commanded increasing attention [39-44]. Pal et al. [39] employed kernel-based ELM for land cover classification in remote sensing images. Moreno et al. [40] successfully performed ELM on soybean classification in HSI. In [41], Bazi et al. utilized the optimized ELM by differential evolution to support HSI classification. Ensemble ELMs were introduced in [42] for HSI classification. In [43], ELM was employed to classify HSI according to multiple features. Our previous work [44], an optimized ELM was presented for HSI classification. All of the aforementioned studies have verified that ELM-based classifiers could provide comparable performance to SVM both in classification accuracies and computational time. ELM randomly generates the connected weights between input nodes and hidden nodes. Therefore, only the linearly connected weights in the output layer should be tuned in ELM, which makes the learning of ELM simple and efficient. This leads to the advantages of extremely fast learning speed and good generalization performance.

3. Proposed framework

The mainly four parts of the proposed framework are as follows: normalization, band selection, reconstruction and classification. The following sections present a brief description of these procedures.

6

3.1 Normalization

Let Y 

M  N L

denote a hyperspectral data with L spectral bands (column of Y) and M×N pixels (row

of Y). We may interpret Y either as a collection of L 2D images (or bands) of size M×N, or as a collection of M×N spectral vectors of size L. Normalization is the only pre-processing we perform to make each band of a hyperspectral data in the range of [0, 1] by the following formula: For each band: Yl 

Yl , l  1, 2, Max  Yl : 

,L (1)

where Yl is the lth spectral band of a hyperspectral data, Y, and Max(Yl (:)) represents the maximum value of pixel values in Yl.

3.2 Band selection

Before introducing our band selection method, we firstly describe the spectral characteristic of HSI. The spectrums of two different classes, trees and meadow, in the Center of Pavia image are shown in Fig. 1, and two different classes, grass/pasture and woods, in Indian Pines image are shown in Fig. 2. As can be observed from Figs. 1 and 2, the spectral magnitude values and tendency of these classes in different images obtained by different sensors approximates to each other. It is noticeable that many spectrums of different classes show better discriminating characteristics, but has difficulty in distinguishing several classes, such as trees and meadow in the Center of Pavia image, grass/pasture and woods in Indian Pines image and so on. Thus, in this section, regarding the choice of the corresponding label for each HSI data, we enhance spatial features without spectral distortion to classify these classes with same features in spectral domain.

7

1 0.9

Trees

0.8

Spectral magnitude value

Spectral magnitude value

1 0.9

0.7 0.6 0.5 0.4 0.3

0.7 0.6 0.5 0.4 0.3

0.2

0.2

0.1

0.1

0

0

10

20

30

40

50

60

70

80

90

0

100

Meadow

0.8

0

10

20

30

40

50

60

70

80

90

100

Number of bands

Number of bands

(a)

(b)

Fig. 1. Spectrum of a ROSIS image over the Center of Pavia: (a) Trees; (b) Meadow.

1

1 0.9

0.8

Spectral magnitude value

Spectral magnitude value

0.9

0.7 0.6 0.5 0.4 0.3 0.2

Woods

0.7 0.6 0.5 0.4 0.3 0.2

Grass/Pasture

0.1 0

0.8

0.1 0

20

40

60

80

100

120

140

160

180

0

200

Number of bands

0

20

40

60

80

100

120

140

160

180

200

Number of bands

(a)

(b)

Fig. 2. Spectrum of an AVIRIS image over Indian Pines: (a) Grass/Pasture; (b) Woods.

Kang et al. adopted the first PC as the guidance image in order to filter the classification map of each class in HSI [4]. The principle component analysis (PCA) method can ensure that most information is preserved in a small amount of significant PCs, but it cannot ensure that the spectral signatures of interest are emphasized. In addition, it has been proved that using selected bands may offer a slightly better performance than using PCs in [45] and [46]. Furthermore, we valid that the first PC considered as guidance image may result spectral distortion with detailed description in Sec. 4. However, there is no denying that PC image contains rich spatial information, and several algorithms extracted spatial features in PC image [17-20] for HSI classification. Hence, we make emphasis on selecting a band with the first PC image as reference. Due to their wide applicability, six gray level co-occurrence matrix (GLCM) measures, including angular second moment (ASM), contrast, entropy, variance, correlation, dissimilarity, have been adopted to extract spatial features. These features are defined and

8

listed in Table 1. Each feature of every band is compared to the specific feature of the first PC. We use the following ratio to select band: 6

PCFi  Band Fl i

i 1

PCFi

   ki l

, l  1, 2,3,

,L

(2)

where Bandl represents the lth band of a hyperspectral data, and Fi is the ith feature demonstrated in Table 1. Here, PC represents the first PC. The band with the minimum value of l is selected as the training label. In other words, spatial features of the selected band and that of the first PC are most similar than that of other bands. Table 1 Parameters for texture features. Feature

Description

Feature

Description

F1

ASM

F4

variance

F2

contrast

F5

correlation

F3

entropy

F6

dissimilarity

Under training with this label, spatial features are enhanced while spectral information are remained which will be examined in experimental section. The histograms of the original bands and reconstructed ones are shown in Fig. 3. It can be observed that the histograms of the reconstructed bands distribute relatively uniform, which illustrates our band selection algorithm is positive on the reconstruction procedure, thus the spatial features are enhanced.

(a)

(b)

(c) st

(d) th

Fig. 3. The histogram of (a) the original 1 band, (b) the original 25 band, (c) the reconstructed 1st band, (d) the reconstructed 25th band.

9

3.3 HRCNN

The flowchart of the proposed HRCNN containing four convolutional layers is illustrated in Fig. 4. The mainly four parts of the proposed HSI reconstruction algorithm are as follows: patch extraction and representation, feature enhancement, non-linear mapping and reconstruction. The following sections present a brief description of these procedures.

f1  f1

...

f2  f2

Patch extraction and representation

...

Feature enhancement

f3  f3

...

Non-linear mapping

f4  f4

Reconstruction

Fig. 4. The framework of HRCNN. The network consists of four convolutional layers, each of which is responsible for a specific operation (i.e. patch extraction and representation, feature enhancement, non-linear mapping and reconstruction).

3.3.1 Patch extraction and representation

This process aims at extracting overlapping patches from the original hyperspectral data and representing each patch as a high-dimensional vector, which determines what should be emphasized and restored in the following stages. This layer is expressed as an operation h1, which is a set of feature maps with the same size. The input image Y is firstly resized to a fixed height (38 pixels throughout our experiments), keeping their aspect ratios. Each feature map is extracted by performing convolution with convolutional filters (W1) and biases (B1):

 L  h1  Y     W1  Yl  B1   l 1 

(3)

where L is the number of spectral bands and Yl is the lth band of the input hyperspectral data. The star operator  indicates the 2-D convolution operation. Here, we use thresholding function max(0,x) as the element-wise non-linearity σ(·), also known as the ReLU [47]. A positive response indicates the presence of certain

10

discriminative patterns, and is kept, while negative responses are suppressed by setting them to zero.

3.3.2 Feature enhancement

The first layer extracts an n1-dimensional feature for each patch. However, the HSI is contaminated with some noise, which not only decreases the accuracy of the subsequent processing, but also influences visual effect. Hence, the extracted features in the first layer may be noisy features which will be enhanced by the second convolutional layer and combined to form another set of feature maps according to feature enhancement step in [48]. This operation h2 is used learned filters to convolve the feature maps with convolutional filters (W2) and biases (B2) from the preceding layer formulated as:

 L  h2  Y     W2  h1  Yl   B2   l 1 

(4)

where h1 (Yl) represents the feature maps extracted in the first convolutional layer. The first two convolutional layers aim at extracting features with rich information, noise reduction and contrast enhancement.

3.3.3 Non-linear mapping

The remaining levels of the feature are extracted recursively by performing convolution with convolutional filters (W3) and biases (B3) on the feature maps from the preceding hierarchy layer:

 L  h3  Y     W3  h2  Yl   B3   l 1 

(5)

where h2 (Yl) represents the feature maps extracted by the second convolutional layer of the lth band. Inspired by the non-linear mapping step in [21], we map each of these n2-dimensional vectors into an n3-dimensional one in this operation. Thus, we use filters with size of 1×1.

3.3.4 Reconstruction

The last convolutional layer aims at generating the final reconstructed hyperspectral data with spatial 11

feature enhancement: L

h4  Y   W4  h3  Yl   B4

(6)

l 1

where h3 (Yl) represents the feature maps extracted by the third convolutional layer of the lth band. To obtain valid reconstruction, we pass the reconstructed HSI through a relu non-linearity, which rectify the feature maps. Overall, HRCNN model consists of four layers, which is a deeper convolutional neural network compared with SRCNN [21] aiming at feature enhancement for specific HSI.

3.3.5 Model learning

The aforementioned band selected by spatial features-based strategy is adopted as the training label. All the other bands are trained with this ideal output. In particular, the parameters, weight and bias matrices, are updated as:

i 1  0.9  i    where

E    , i 1  i  i 1 i

(7)

1, 2, 3, 4 represents the indices of layers, and i is the iteration, η is the learning rate. The

parameters, weight and bias matrices are trained by using the gradient descent method with the standard back propagation, which are realized through minimizing a loss function between the reconstructed hyperspectral data and the original one, and computing the partial derivative of the loss function with respect to each trainable parameter (weight or bias). The loss function used in this work is defined as:

E   

1 L  h  Yl ;    X L l 1

2

(8)

where X is the selected band with distinctive and informative spatial features and L is the number of band. The learning procedure of HRCNN is summarized in Algorithm 1. Since the architecture and all corresponding trainable parameters are specified, we can build the CNN model and reload the saved parameters for HSI

12

reconstruction.

3.4 Classification

Knowing a training set {(xi, ti)|xi∈Rn, ti∈Rm, i=1,…,N }, which is randomly chosen from the reconstructed HSI, the procedure of ELM can be summarized as: Step 1 Randomly assign hidden node parameters (ai, bi), i=1, … , N. N is the number of training set. Step 2 Calculate the hidden layer output matrix H. The number of hidden nodes is represented by Q, and g(x) is activation function.

g  aQ  x1  bQ      g  aQ  x N  bQ   N Q

 g  a1  x1  b1   H   g  a1  x N  b1 

(9)

Step 3 Calculate the output weights

  H† T where

(10)

H † is Moore–Penrose generalized inverse of the matrix H.

Finally, the classification result can be predicted by ELM according to the following function:

f  x i   h  x i    h  x i  H† T

(11)

Number of hidden layer is what we call the important parameter, and we obtain a better number of hidden layer by doing experiments.

Algorithm 1. Training the HRCNN model Initialize learning rate, number of max iteration, batch size of training, kernel size, number of kernels, type of kernel and so on. Generate random weights with guassian type and biases with 0; cnnModel=InitCNNModel(weight and bias matrices, [n1-4]); while iter< max iteration or err > min error do compute err according to loss function Eq. 12 for iter=1 to iter<=number/(batch size) do

13

cnnModel.train(TrainingData,TrainingLabels), as loss is minimilized with BP; update weight and bias matrices with Eq. 11; end for iter ++ end while Save parameters (weight, bias) of the CNN.

In this way, to evaluate the effectiveness of the proposed algorithm, support vector machine (SVM) [49] and sparse representation [50] are used as comparison algorithms for the reconstructed HSI classification. Here, the Orthogonal Matching Pursuit (OMP), Simultaneous Orthogonal Matching Pursuit (SOMP) and First-order neighborhood system weighted constraint OMP (FOMP) algorithms are used to solve the sparse optimization problem.

4. Experimental results and analysis

4.1 Dataset

Three hyperspectral data sets are used to evaluate the performance of the reconstructed method in our experiments. The first data in our experiments is about the Indian Pines image1, which was captured by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) in 1992 over the Indian Pines test site in Northwestern Indiana. The spatial resolution of this image is 20m per pixel. The test image consists of 145×145 pixels, each pixel have 220 bands with the 20 noisy bands (no. 104-108, 150-163, and 220) removed. The second image utilized in our experiments is the Salinas image2, which was also collected by AVIRIS sensor over Salinas, Valley, California. The spatial resolution is 3.7 m per pixel. The area comprises 512×217 pixels and has 224 bands with the 20 bands (no. 108-112,154-167, and 224) removed.

1 2

https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes

14

The last image is the Center of Pavia acquired by the Reflective Optics System Imaging Spectrometer (ROSIS)3. This image comprises 1096×492 pixels and has 115 bands ranging from 0.43 to 0.86 μm with a spatial resolution of 1.3 m per pixel. The 13 noisy bands are removed.

4.2 Training details

The weight matrices of each convolutional layer are initialized by drawing randomly from a Gaussian distribution with zero mean and standard deviation 0.001. The biases are initialized to 0. It is empirically found that a momentum term of 0.9 is important for the network to converge. Stochastic gradient descent with a mini-batch size of 128 and the learning rate of 10-4 in the first three layers and 10-5 in the last layer were used to update the weight and bias matrices. The settings of HRCNN are f1=9, f2=7, f3=1, f4=5, n1=64, n2=32, n3=16, n4=1. The training pairs {Y, X} are prepared as 38×38 pixel sub-images. Hyperspectral data of arbitrary sizes can be tested. All the convolutional layers are given sufficient zero-padding during test, and pooling or full connected layers are not adopted, so that the output band is of the same size as the input. The sub-images are extracted with a stride of 10. Some smaller strides are attempted but resulted in time consuming without performance improved significantly.

4.3 Mid-level representation

Fig. 5 shows some example of leaned filters trained on the hyperspectral data. Fig. 6 shows example feature maps in different layers. In order to illustrate the generalization of our HRCNN model, we demonstrate two HSI data sets obtained by different sensors, i.e. Indian Pines obtained by AVIRIS and Center of Pavia obtained by ROSIS. Obviously, feature maps of the first layer contain different structures, while feature maps of the second layer are mainly enhanced. The third layer is the result of the restored feature maps.

3

http://www.ehu.eus/ccwintco/uploads

15

Fig. 5. First layer filters learned on HSI.

Fig. 6. Example feature maps of the different layers: (a) Indian Pines; (b) the Center of Pavia.

4.4 Quality assessment

In this section, the proposed algorithm is evaluated by the objective indexes through making a comparison between the reconstructed HSI and the original one. Spectral angel mapper (SAM), spectral information divergence (SID) and root-mean-squared error (RMSE) are adopted to evaluate spectral information. Tables 2-4 show the quality assessment of the reconstructed Indian Pines, Salinas and Center of Pavia image, respectively. Compared to the reconstruction procedure based on the first PC as the training label, the selected band compares favorably for several quantitative indexes. The values of SAM, SID, and RMSE obtained based on the selected band are more closed to the reference value 0, which means the proposed reconstruction method gives better performance in terms of objective quality indexes.

Table 2 Spectral quality assessment for the reconstructed Indian Pines image. Method

SAM

SID

RMSE

Reference values

0

0

0

1.6083

0.0045

0.0263

10.9380

0.1832

0.2261

Reconstructed with the selected band st

Reconstructed with the 1 PC

16

Table 3 Spectral quality assessment for the reconstructed Salinas image. Method

SAM

SID

RMSE

Reference values

0

0

0

1.2669

0.0010

0.0183

6.0193

0.0429

0.1388

Reconstructed with the selected band st

Reconstructed with the 1 PC

Table 4 Spectral quality assessment for the reconstructed Center of Pavia image. Method

SAM

SID

RMSE

Reference values

0

0

0

2.3025

0.1578

0.0178

16.6186

0.2215

0.1340

Reconstructed with the selected band st

Reconstructed with the 1 PC

Fig. 7 shows spectrums of some pixels in Indian Pines image. It can be observed from Fig. 7 that the reconstructed spectral magnitude values and tendency by our reconstruction method are quite similar to that of the original one while the reconstructed spectrums with the first PC image as training label are quite different

1

1

0.9

0.9

0.8

0.8

Spectral magnitude value

Spectral magnitude value

compared to the original one.

0.7 0.6 0.5 0.4 0.3

Original Reconstructed with the selected band Reconstructed with the 1st PC

0.2 0.1 0

0

20

40

60

80

100

120

140

160

180

0.7 0.6 0.5 0.4 0.3

Original Reconstructed with the selected band Reconstructed with the 1st PC

0.2 0.1 0

200

0

20

40

60

1

1

0.9

0.9

0.8

0.8

0.7 0.6 0.5 0.4 0.3

Original Reconstructed with the selected band Reconstructed with the 1st PC

0.2 0.1 0

0

20

40

60

80

100

120

80

100

120

140

160

180

200

Number of bands

Spectral magnitude value

Spectral magnitude value

Number of bands

140

160

180

0.7 0.6 0.5 0.4 0.3

Original Reconstructed with the selected band Reconstructed with the 1st PC

0.2 0.1 0

200

Number of bands

0

20

40

60

80

100

120

140

160

180

200

Number of bands

Fig. 7. Spectrums of some pixels in original and reconstructed HSI.

Figs. 8 and 9 show the reconstructed spectrums of five pixels in Indian Pines and Center of Pavia image, 17

respectively. The original spectrum is represented by “-”, and the reconstructed spectrum is represented by ‘--’. Most of them are overlapped. It is apparent that the reconstruction procedure based on deep CNN model and band selection method has positive influence on remaining spectral information. The aforementioned idea that the proposed reconstruction HRCNN model enhances spatial features without spectral distortion is valid. 1

0.8 0.7 0.6 0.5

1

0.9

Corn-no till

0.8

Spectral magnitude value

Spectral magnitude value

Spectral magnitude value

1

0.9

Alfalfa

0.7 0.6 0.5 0.4 0.3

0.9

Corn-min till

0.8

Spectral magnitude value

1 0.9

0.7 0.6 0.5 0.4 0.3

0.2

0.2

0.1

0.1

0

0

Corn

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.4

20

40

60

1

100

120

140

160

180

200

0

20

40

60

80

100

120

140

160

180

200

0.1 0

20

40

60

Number of bands

Woods

0.7

0.6

0.5

0.8

0.7

0.6

0.5

100

120

140

160

180

0

200

0

20

40

60

80

100

120

140

160

180

200

140

160

180

200

140

160

180

200

140

160

180

200

Number of bands 1 0.9

Grass/Trees

0.9

Oats Spectral magnitude value

Spectral magnitude value

0.9

0.8

80

Number of bands

1

1

0.9

Spectral magnitude value

80

Number of bands

Spectral magnitude value

0

0.8

0.7

0.6

0.5

Grass/Pasture-mowed

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.4

0.4

0.4

0.1

20

40

60

80

100

120

140

160

180

0

200

20

40

60

100

120

140

160

180

0

200

40

60

80

100

120

140

160

180

0

200

0

Spectral magnitude value

0.7

0.6

0.5

60

80

100

120

Number of bands Grass/Pasture

0.9

Soybean-min till

0.8

Spectral magnitude value

Hay-windrowed

40

1

1

0.9

0.8

20

Number of bands

1

0.9

20

Number of bands

Number of bands 1

Spectral magnitude value

80

0.7 0.6 0.5 0.4 0.3

0.9

Wheat

Spectral magnitude value

0

0.8

0.7

0.6

0.5

0.8

0.7

0.6

0.5

0.2

0

20

40

60

100

120

140

160

180

200

Number of bands

1

0.9

0

20

40

60

0.7

0.6

0.5

100

120

140

160

180

200

0

20

40

60

80

100

120

140

160

180

0

200

Number of bands

1

20

40

60

80

100

120

Number of bands

1

0.9

Stone-steel towers

0.9

Soybean-no till

0.8

80

Number of bands

1

Spectral magnitude value

Spectral magnitude value

80

Spectral magnitude value

0

0.4

0.4

0.1

0.8

0.7

0.6

0.5

0.9

Soybean-clean till

0.8

Spectral magnitude value

0.4

0.7 0.6 0.5 0.4 0.3

Bldg-Grass-Tree-Drivers

0.8

0.7

0.6

0.5

0.2 0.4

0.4

0.4 0.1 0

20

40

60

80

100

120

Number of bands

140

160

180

200

0

20

40

60

80

100

120

140

160

180

200

Number of bands

0

0

20

40

60

80

100

120

140

160

180

200

0

Number of bands

20

40

60

80

100

120

Number of bands

Fig. 8. Reconstructed spectrum of an AVIRIS image over Indian Pines.

5. Classification

For ELM classifier, we make focus on both classification accuracies and computational time. The first data is Indian Pines image containing 16 classes and we randomly choose about 10% of the labeled sample training and use the remaining samples for testing as many literatures also used. For Salinas image, the training sets which account for about 10% of the ground truth were chosen randomly. For the Center of Pavia image, the training sets account for about 5% are chosen randomly. In the following section, the CNN based reconstructed data classified by SVM, OMP, SOMP, FOMP and ELM are represented as R-SVM, R-OMP, R-SOMP and

18

R-FOMP, R-ELM, respectively. For SVM classifier, parameters are determined by fivefold cross validation. Classification performance is evaluated by overall accuracy (OA), average accuracy (AA) and kappa coefficient

1

1

1

0.9

0.9

0.9

Water

0.7 0.6 0.5 0.4 0.3

0.8

0.6 0.5 0.4 0.3 0.2

0.1

0.1 10

20

30

40

50

60

70

80

90

0

100

Trees

0.7

0.2

0

Spectral magnitude value

0.8

Spectral magnitude value

Spectral magnitude value

(Kappa).

10

20

30

50

60

70

80

90

0.6 0.5 0.4 0.3

0.5 0.4 0.3

0.1 60

70

80

90

0

100

Soil

0.6

0.1 50

10

20

30

40

50

60

70

80

90

0.6 0.5 0.4 0.3

0.5 0.4 0.3

0.1 10

20

30

40

50

60

70

Number of bands

80

90

100

0

80

90

100

70

80

90

100

70

80

90

100

0.7 0.6 0.5 0.4 0.3

0

10

20

30

40

50

60

0.9

Tile

0.6

0.1

70

Number of bands

0.7

0.2

60

1

0.8

0.2

50

Asphalt

0.8

0

100

Spectral magnitude value

Spectral magnitude value

0.7

40

0.1 0

0.9

Bitumen

0

30

0.2

1

0.9

0

20

Number of bands

1

0.8

10

Number of bands

0.7

0.2

40

0

0.9

0.8

0.2

30

0.3

0

100

Spectral magnitude value

0.7

20

0.4

1

Number of bands

Spectral magnitude value

40

0.9

Brick

Spectral magnitude value

Spectral magnitude value

0.9

10

0.5

0.1 0

1

0

0.6

Number of bands

1

0

Meadow

0.7

0.2

Number of bands

0.8

0.8

Shadow

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

10

20

30

40

50

60

70

80

90

100

0

0

10

20

30

Number of bands

40

50

60

Number of bands

Fig. 9. Reconstructed spectrum of a ROSIS image over the Center of Pavia.

Fig. 10. For the Indian Pines image: (a) training set and (b) test set. Classification results obtained by (c) SVM, (d) R-SVM, (e) OMP, (f) R-OMP, (g) SOMP, (h) R-SOMP, (i) FOMP, (j) R-FOMP, (k) ELM, (l) R-ELM.

5.1 Classification results

19

Fig. 10 illustrates the classification maps obtained by the aforementioned classifiers of the original Indian Pines image and of the new reconstructed one, respectively. The first two figures of Fig. 10 show the map of the train samples and the map of the test samples. From the results of each individual classifier, the classification performance of the reconstructed data is much better than that of the original data. This means that the reconstructed HSI can preserve distinctive and refined information for accurate classification. Table 5 Classification accuracies (in percent) of the Indian Pines image. Class

SVM

R-SVM

OMP

R-OMP

SOMP

R-SOMP

FOMP

R-FOMP

ELM

R-ELM

1

35.00

90.00

47.50

100

62.50

100

60.00

100

50.00

97.50

2

70.64

94.55

53.27

93.46

61.84

93.46

71.18

94.94

64.72

97.51

3

67.69

95.04

51.88

95.71

54.70

95.17

67.29

96.65

54.83

96.11

4

40.85

95.31

43.66

97.65

46.01

97.18

53.99

99.06

41.31

97.18

5

84.76

97.46

83.14

99.08

88.45

99.08

90.76

98.15

86.37

99.08

6

94.81

97.86

92.82

99.09

98.17

99.24

96.79

98.93

94.96

99.54

7

80.00

100

92.00

96.01

84.00

100

92.00

96.00

68.00

96.00

8

98.60

97.90

90.44

99.77

95.10

99.53

96.04

100

98.14

98.83

9

22.22

83.33

27.78

100

44.44

100

44.44

100

0

100

10

77.94

94.40

61.71

96.91

66.40

97.26

75.31

97.37

64.80

96.00

11

83.61

97.33

70.33

98.14

74.86

98.32

84.46

98.91

75.18

98.19

12

65.35

88.70

41.24

98.11

47.46

98.31

64.03

97.93

55.74

93.60

13

94.54

99.45

94.53

98.36

97.81

98.91

97.81

98.91

97.27

100

14

96.04

99.47

88.98

99.65

92.25

99.47

95.51

100

91.45

99.82

15

55.46

90.80

31.90

93.10

36.78

93.97

50.57

96.84

45.98

93.97

16

72.29

92.77

90.36

90.36

96.39

92.77

87.95

90.36

86.75

92.77

OA

79.72

95.97

67.87

97.28

72.87

97.37

80.39

97.97

73.35

97.62

AA

71.24

94.65

66.35

97.21

71.70

97.67

76.76

97.75

67.22

97.26

Kappa

76.76

95.40

63.33

96.90

69.02

97.00

77.54

97.68

69.45

97.29

Furthermore, Table 5 demonstrates the classification accuracies of different methods. In Table 5, the OA, AA and Kappa of R-SVM are increased to 95.97%, 94.65%, and 95.40%, respectively. R-OMP yields 29.41% higher OA than OMP, R-SOMP offers over 24.5% higher OA than SOMP, and R-FOMP yields 17.58% higher OA than FOMP. Moreover, both the AA and the Kappa of R-OMP, R-SOMP and R-FOMP are also increased significantly. The OA, AA, and Kappa of R-ELM are increased form 73.35%, 67.22%, 69.45% to 97.62%, 97.26%, and 97.29%, respectively. Where, OA was improved 24.27%. It offers 30.04% higher AA and 27.84% 20

Kappa than ELM. This clearly demonstrates that the reconstructed Indian Pines data is a highly distinctive and informative data for good classification. In addition, the FOMP classifier, proposed by our previous study [40], can achieve the highest classification accuracy when it classifies the reconstructed Indian Pines image.

Fig. 11. For the Salinas image: (a) training set and (b) test set. Classification results obtained by (c) SVM, (d) R-SVM, (e) OMP, (f) R-OMP, (g) SOMP, (h) R-SOMP, (i) FOMP, (j) R-FOMP, (k) ELM, (l) R-ELM.

Table 6 Classification accuracies (in percent) the of Salinas image. Class

SVM

R-SVM

OMP

R-OMP

SOMP

R-SOMP

FOMP

R-FOMP

ELM

R-ELM

1

99.50

98.01

99.50

98.56

99.78

98.23

100

99.50

99.61

98.12

2

100

98.42

99.43

97.64

99.52

97.73

99.64

99.08

97.17

98.87

3

98.99

97.53

96.68

98.09

97.81

98.26

98.31

98.93

91.57

99.94

4

99.44

94.26

99.60

96.02

99.36

95.86

99.68

98.25

90.52

97.77

5

99.17

98.71

97.06

98.47

96.43

98.63

98.84

98.76

93.28

98.80

6

99.94

99.52

99.89

99.92

99.86

99.97

99.94

100

99.55

99.55

7

99.72

95.93

99.60

96.93

99.41

97.02

99.97

98.88

98.98

98.42

8

89.79

98.46

78.77

98.17

82.29

98.28

87.51

99.17

81.66

98.80

9

99.80

98.57

99.12

99.39

99.44

99.46

99.84

99.73

96.96

99.73

10

95.29

93.70

95.39

96.54

94.71

96.65

98.20

98.27

86.00

97.12

11

97.51

96.57

97.71

97.71

96.88

97.82

99.79

97.09

93.14

93.66

12

99.60

98.56

99.65

98.85

100

98.56

100

99.37

99.37

99.37

13

97.45

97.70

97.58

95.27

96.00

95.52

99.15

97.33

96.73

99.40

21

14

93.67

97.61

94.91

98.23

96.57

98.03

97.20

99.17

92.00

99.27

15

67.84

98.81

65.81

99.45

71.29

99.50

78.10

99.91

60.29

99.16

16

98.46

96.44

98.46

95.64

98.40

95.76

98.71

98.03

94.65

99.57

OA

92.83

97.86

89.99

98.25

91.46

98.30

94.06

99.17

87.91

98.84

AA

96.01

97.42

94.95

97.80

95.49

97.83

97.18

98.84

91.97

98.60

Kappa

92.00

97.62

88.85

98.05

90.49

98.11

93.38

99.08

86.51

98.71

Fig. 12. For the Center of Pavia image: (a) training set and (b) test set. Classification results obtained by (c) SVM, (d) R-SVM, (e) OMP, (f) R-OMP, (g) SOMP, (h) R-SOMP, (i) FOMP, (j) R-FOMP, (k) ELM, (l) R-ELM.

Table 7 Classification accuracies (in percent) of the Center of Pavia image. Class

SVM

R-SVM

OMP

R-OMP

SOMP

R-SOMP

FOMP

R-FOMP

ELM

R-ELM

1

99.92

99.96

99.21

99.26

99.87

99.29

99.97

99.95

98.54

100

2

97.10

98.39

87.70

92.50

87.93

92.73

87.70

94.69

88.35

98.62

3

97.01

99.15

95.92

96.58

97.68

96.73

97.15

99.29

92.31

99.43

4

94.92

97.82

81.27

96.86

73.60

97.34

83.38

92.02

76.37

98.43

5

97.22

98.64

94.08

96.30

96.67

96.67

95.51

96.25

89.51

98.18

6

98.03

97.83

80.15

78.47

77.44

78.69

78.66

87.62

94.09

98.64

7

93.75

97.25

91.09

96.19

94.75

96.42

92.98

95.18

84.32

96.53

8

99.38

99.86

97.79

95.55

98.48

95.86

98.62

99.48

95.27

99.55

9

99.85

99.54

74.72

68.43

83.20

68.48

95.53

95.13

46.85

99.54

OA

98.89

99.39

95.45

96.20

96.20

96.30

96.56

97.98

94.52

99.43

AA

97.47

98.72

89.10

91.13

89.96

91.36

92.17

95.51

85.07

98.77

Kappa

97.98

98.89

91.74

93.08

93.07

93.27

93.73

96.32

90.11

98.95

22

Figs. 11,12 demonstrate the classification maps obtained by different methods of the Salinas and Center of Pavia image, respectively. As shown these figures, the experiments are performed on the reconstructed images are much closed to the ground truth map. Tables 6 and 7 show the classification accuracies. From these two examples, it can be seen that the reconstructed data are always more suitable to classification. The classification results of the reconstructed HSI outperform that of the original ones in terms of OA, AA and Kappa. It is apparent that the performance of all the competitive algorithms is improved when the HSI reconstructed by our proposed HRCNN model. It can be also observed from Figs. 11,12 and Tables 6,7 that the classification map obtained by R-FOMP approximates to the ground truth map and the classification accuracies obtained by R-FOMP is a bit higher than that of R-ELM. In other words, our previous proposed FOMP [40] algorithm has good performance on classifying the reconstructed Indian Pines and Salinas image. However, for the Center of Pavia image, R-ELM is better than R-FOMP. Specifically, ELM has largest improvement for the reconstructed HSI using deep CNN model. In addition, Indian Pines and Salinas image are obtained by the same sensor (AVIRIS) while the Center of Pavia image is obtained by ROSIS sensor, which illustrates the proposed R-ELM has better performance of generalization.

5.2 Classification results with different training and test set

In this section, the influence of different rates of training samples to the proposed methods are analyzed on several hyperspectral data sets as shown in Fig. 13. Training sample set is randomly chosen in different percentages (from 5% to 45% per class for the Indian Pines and Salinas image, from 1% to 5% per class for the Center of Pavia image), and the remaining samples are used for testing. The results are averaged over ten times random runs. As is shown in Fig. 13, the OA results of the classifiers generally improve with the number of training samples increases. It is apparent that the performance of R-ELM consistently yields higher OA than that 23

of R-SVM on all the training samples. 100

100

100

98 98

95

94 92 90 88

Overall accuracy (%)

Overall accuracy (%)

Overall accuracy (%)

96 96

94

92

86

82

5

10

15

20

25

30

35

40

85

80

90

R-ELM R-SVM

84

90

R-ELM R-SVM

R-ELM R-SVM 45

88

5

10

Percentage (%) of training samples per class

15

20

25

30

35

40

45

75

1

Percentage (%) of training samples per class

(a)

1.5

2

2.5

3

3.5

4

4.5

5

Percentage (%) of training samples per class

(b)

(c)

Fig. 13. Effect of different numbers of training samples for: (a) Indian Pines; (b) Salinas; (c) Center of Pavia.

We also compare the proposed scheme with the recent works of deep learning on HSI applications. Though different numbers of training samples are employed in different methods, the reported performance is still an important standard to estimate the development and effectiveness of the proposed scheme. As shown in Table 8, Indian Pines and Center of Pavia are the most used. Among all these methods, our classification scheme could obtain comparable performance (OA, AA and Kappa) under a small training set with less computational complexity. More details are shown in the following part.

Table 8 Classification performance of different methods in literature. Reference

Datasets

Indian Pines Zabala et al. [24] Subset of Pavia center

Indian Pines Ma et al. [25]

Center of Pavia Chen et al. [26]

University of Pavia

Indian Pines Chen et al. [27]

Baseline

Training set

OA

AA

Kappa

SAE, SVM

5%

80.66

73.63

-

97.42

86.26

-

10%

99.22

98.57

99.11

5%

99.90

99.73

99.83

SAE, LR

60%

98.52

97.82

98.07

95.95

95.45

95.39

DBN, LR

50% 99.05

98.48

98.75

10 per class

95.42

-

-

10 per class

88.97

-

-

30%

-

-

84

98.88

-

-

99.53

-

-

99.91

-

-

SAE, CR

University of Pavia

Center of Pavia Zhao & Du [29]

University of Pavia Romero et al. [30]

Indian Pines

CNN, LR CNN, SVM

Indian Pines Salinas Makantasis et al. [31]

Center of Pavia

CNN, MLP

80%

University of Pavia Tuia et al. [32]

Indian Pines

CNN, LR

Indian Pines Proposed algorithm

99.62

-

-

30 per class

-

-

88

10%

97.62

97.26

97.29

10%

98.84

98.60

98.71

CNN, ELM Salinas

24

Center of Pavia

5%

99.43

98.77

98.95

5.3 Computational time

More importantly, ELM is a high-efficient algorithm so that the computational complexity of all the aforementioned classifiers for the reconstructed HSI is reported in Table 9. Experiments are performed using MATLAB on a Laptop with 3.4 GHz CPU and 8 GB of RAM. ELM is achieved by MATLAB while SVM is implemented by mixing C and MATLAB. The sparsity-based classifiers are also achieved by MATLAB. It can be seen from Table 9 that the computational time of ELM is much less than SVM and sparsity-based classifiers, and the sparsity-based classifiers spend too much time.

Table 9 Computing time (in seconds) for the classification procedure in the reconstructed images. Methods

Indian Pines

Salinas

Center of Pavia

SVM

3.91

28.95

10.68

OMP

60.02

1736.06

8715.46

SOMP

97.41

2054.63

22619.49

FOMP

111.01

2596.14

35037.51

ELM

0.59

0.624

4.29

6. Conclusion

In this paper, a novel approach for HSI reconstruction, with a deep CNN model, is introduced for the subsequent classification by ELM under a small training set for the first time. Through experiment on three hyperspectral data sets, we found that the selected band contains more distinctive and informative spatial information and is referred to as training label. By training bands with this ideal output using a deeper CNN, we obtain and save the model including optimized parameters (weights and biases). Then, this model is loaded and used to reconstruct HSI. In addition, the results of HSI reconstruction is used as the input for various classifiers to be considered as a pre-processing method for the subsequent classification. Results confirm that the effectiveness of the reconstructed HSI dramatically increases the OA, AA and Kappa of the widely used SVM

25

classifier, three sparsity-based classifiers (i.e. OMP, SOMP, FOMP) and ELM classifier. By using the reconstructed HSI, the AA of the ELM classifier can be improved as high as 30.04%. Among these classification results, R-FOMP, as we previously proposed, can achieve a bit higher classification accuracies than that of R-ELM when it is applied to classify Indian Pines and Salinas image obtained by the same sensor. For the Center of Pavia image obtained by the different sensor, ELM classifier achieves the highest classification accuracies. All of these illustrate that R-ELM has the best generalization under a small training set. In addition, the sparsity-based classifiers spend too much time and ELM classifier is much faster.

Acknowledgments

The authors would like to the Editor-in-Chief and Yi Chen for providing the software of SOMP method. This work was partially supported by the National Natural Science Foundation of China (nos. 61222101, 61272120, 61301287, 61301291 and 61350110239).

References

[1] J. A. Richards, Remote Sensing Digital Image Analysis: An Introduction, New York, 2013. [2] C.-I. Chang, Hyperspectral Data Processing: Algorithm Design and Analysis, New York, 2013. [3] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, J. C. Tilton, Advances in spectral–spatial classification of hyperspectral images, Proc. IEEE 101(3) (2013) 652–675. [4] X. Kang, S. Li, J. A. Benediktsson, Spectral–spatial hyperspectral image classification with edge-preserving filtering, IEEE Trans. Geosci. Remote Sens. 52 (5) (2014) 2666–2677. [5] Y. Tarabalka, J. Chanussot, J. A. Benediktsson, Segmentation and classification of hyperspectral images using watershed transformation, Pattern Recognit. 43 (7) (2010) 2367–2379. [6] P. Ghamisi, M. S. Couceiro, N.M. F. Ferreira, J. A. Benediktsson, Integration of segmentation techniques for classification of hyperspectral images, IEEE Geosci. Remote Sens. Lett. 11 (1) (2014) 342–346. [7] M. Cetin, N. Musaoglu, Merging hyperspectral and panchromatic image data: qualitative and quantitative analysis, Int. J. Remote Sens. 30 (7) (2009) 1779–1804.

26

[8] Y. Zhang, M. He, Multi-spectral and hyperspectral image fusion using 3-D wavelet transform, Chinese J. Electron. 24 (2) (2007) 218– 224. [9] Y. Zhang, S. De Backer, P. Scheunders, Noise-resistant wavelet-based Bayesian fusion of multispectral and hyperspectral images, IEEE Trans. Geosci. Remote Sens. 47 (11) (2009) 3834–3842. [10] R. C. Patel, M. V. Joshi, Super-resolution of hyperspectral images: use of optimum wavelet filter coefficients and sparsity regularization, IEEE Trans. Geosci. Remote Sens. 53 (4) (2015) 1728–1736. [11] R. Kawakami, J. Wright, Y.-W. Tai, Y. Matsushita, M. Ben-Ezra, K. Ikeuchi, High-resolution hyperspectral imaging via matrix factorization, in Proc. of CVPR, 2011. [12] B. Huang, H. Song, H. Cui, J. Peng, Z. Xu, Spatial and spectral image fusion using sparse matrix factorization, IEEE Trans. Geosci. Remote Sens. 52 (3) (2014) 1693–1704. [13] N. Yokoya, T. Yairi, A. Iwasaki, Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion, IEEE Trans. Geosci. Remote Sens. 50 (2) (2012) 528–537. [14] N. Akhtar, F. Shafait, A. Mian, Sparse spatio-spectral reconstruction for hyperspectral image super-resolution, in Proc. of ECCV, 2014. [15] M. A. Veganzones, M. Simões, G. Licciardi, N. Yokoya, J. M. Bioucas-Dias, J. Chanussot, Hyperspectral super-resolution of locally low rank images from complementary multisource data, IEEE Trans. Image Process. 25 (1) (2016) 274–288. [16] A. Villa, J. Chanussot, J. A. Benediktsson, C. Jutten, Spectral unmixing for the classification of hyperspectral images at a finer spatial resolution, IEEE J. Sel. Topics Signal Process. 5 (3) (2011) 521–533. [17] G. Rellier, X. Descombes, F. Falzon, J. Zerubia, Texture feature analysis using a Gauss–Markov model in hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 42 (7) (2004) 1543–1551. [18] L. Zhang, L. Zhang, D. Tao, X. Huang, On combining multiple features for hyperspectral remote sensing image classification, IEEE Trans. Geosci. Remote Sens. 50 (3) (2012) 879–893. [19] J. Li, H. Zhang, L. Zhang, X. Huang, L. Zhang, Joint collaborative representation with multitask learning for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 52 (9) (2014) 5923–5936. [20] E. Zhang, L. Jiao, X. Zhang, H. Liu, S. Wang, Class-level joint sparse representation for multifeature-based hyperspectral image classification, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7 (4) (2016) 1012–1022. [21] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2016) 295–307. [22] W. Xie, Y. Li, C. Ge, Reconstruction of hyperspectral image using matting model for classification, Opt. Eng. 55 (5) (2016) 053104. [23] A. Levin, D. Lischinski, Y. Weiss, A closed-form solution to natural image matting, IEEE Trans. Pattern Anal. Mach. Intell. 30 (2) (2008) 228–242.

27

[24] J. Zabalza et al., Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging, Neurocomputing 185 (12) (2016) 1–10. [25] X. Ma, H. Wang, J. Geng, Spectral-spatial classification of hyperspectral image based on deep auto-encoder, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. PP (99) (2016) 1–13. [26] Y. Chen, Z. Lin, X. Zhao, G. Wang, Y. Gu, Deep learning-based classification of hyperspectral data, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7 (6) (2014) 2094–2107. [27] Y. Chen, X. Zhao, X. Jia, Spectral-spatial classification of hyperspectral data based on deep belief network, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 8 (6) (2015) 2381–2392. [28] W. Zhao, S. Du, Learning multiscale and deep representations for classifying remotely sensed imagery, ISPRS J. Photogramm. Remote Sens. 113 (2016) 155–165. [29] W. Zhao, S. Du, Spectral-spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach, IEEE Trans. Geosci. Remote Sens. 54 (8) (2016) 4544–4554. [30] A. Romero, C. Gatta, G. Camps-Valls, Unsupervised deep feature extraction for remote sensing image classification, IEEE Trans. Geosci. Remote Sens. 54 (3) (2016) 1349–1362. [31] K. Makantasis, K. Karantzalos, A. Doulamis, N. Doulamis, Deep supervised learning for hyperspectral data classification through convolutional neural networks, in Proc. of IGARSS, 2015, pp. 4959–4962. [32] D. Tuia, R. Flamary, N. Courty, Multiclass feature learning for hyperspectral image classification: Sparse and hierarchical solutions, ISPRS J. Photogramm. Remote Sens. 105 (2015) 272–285. [33] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1988) 2278–2324. [34] G.-B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (2006) 489–501. [35] Z.Q. Wang, G. Yu, Y. Kang, Y.J. Zhao, Q.X. Qu, Breast tumor detection in digital mammography based on extreme learning machine, Neurocomputing 128 (2014) 175–184. [36] W. Zong, G.-B. Huang, Face recognition based on extreme learning machine, Neurocomputing 74 (2011) 2541–2551. [37] G. Huang, G.-B. Huang, S. Song, K. You, Trends in extreme learning machines: a review, Neural Networks 61 (2015) 32–48. [38] W. Xie, Y. Li, Y. Ma, Breast mass classification in digital mammography based on Extreme learning machine, Neurocomputing 173 (2016) 930–941. [39] M. Pal, A. E. Maxwell, T. A.Warner, Kernel-based extreme learning machine for remote sensing image classification, Remote Sens. Lett. 9 (4) (2013) 852–862. [40] R. Moreno, F. Corona, A. Lendasse, M. Grana, L. S. Galvao, Extreme learning machines for soybean classification in remote sensing

28

hyperspectral images, Neurocomputing 128 (27) (2014) 207–216. [41] Y. Bazi et al., Differential evolution extreme learning machine for the classification of hyperspectral images, IEEE Geosci. Remote Sens. Lett. 11(6) (2014) 1066–1070. [42] A. Samat, P. Du, S. Liu, J. Li, L. Cheng, E2LMs: ensemble extreme learning machines for hyperspectral image classification, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7 (4) (2014) 1060–1069. [43] W. Li, C. Chen, H. Su, Q. Du, Local binary patterns and extreme learning machine for hyperspectral imagery classification, IEEE Trans. Geosci. Remote Sens. 53 (7) (2015) 3681–3693. [44] J. Li, Q. Du, W. Li, Y. Li, Optimizing extreme learning machine for hyperspectral image classification, J. Appl. Remote Sens. 9 (1) (2015). [45] Q. Du, H. Yang, Similarity-based unsupervised band selection for hyperspectral image analysis, IEEE Geosci. Remote Sens. Lett. 5 (4) (2008) 564–568. [46] W. Li, Q. Du, Gabor-filtering based nearest regularized subspace for hyperspectral image classification, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7 (4) (2014) 1012–1022. [47] V. Nair, G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proc. of ICML, 2010. [48] Z. Xiong, X. Sun, F. Wu. Image hallucination with feature enhancement, in Proc. of CVPR, 2009. [49] G. Mountrakis, J. Im, C. Ogole, Support vector machines in remote sensing: A review, ISPRS J. Photogramm. Remote Sens. 66 (3) (2011) 247–259. [50] J. Wright et al., Sparse representation for computer vision and pattern recognition, Proc. IEEE 98(6) (2010) 1031–1044.

29

Yunsong Li received the Ph.D. degree in signal and information processing from Xidian University, China, in 2002. Prof. Li is the director of the image coding and processing center at the State Key Laboratory of Integrated Service Networks. His research interests focus on image and video processing and high-performance computing.

Weiying Xie received M.S. degree in communication and information systems, Lanzhou University, in 2014. Now, she is currently a Ph.D. student at the State Key Laboratory of Integrated Service Networks of Xidian University. Her research interests include neural networks, image processing and high-performance computing.

Huaqing Li received Bachelor’s degree in signal and information processing of Xidian University. Currently, he is pursuing Master’s degree at the State Key Laboratory of Integrated Service Networks of Xidian University, in the field of deep learning, hyperspectral image processing and high-performance computing.

Highlights 

The feature enhancement step is introduced in the proposed HRCNN model for the first time because that the HSI is contaminated with noise.



A new band selection method based on spatial features is proposed in this training model which aims at enhancing spatial features and remaining spectral information of HSI.



A new exploration to combine CNN with ELM for HSI classification under a small number of training samples is achieved with good generalization and low computational cost for the first time.

30