Hyperspectral image classification using k-sparse denoising autoencoder and spectral–restricted spatial characteristics

Hyperspectral image classification using k-sparse denoising autoencoder and spectral–restricted spatial characteristics

Applied Soft Computing Journal 74 (2019) 693–708 Contents lists available at ScienceDirect Applied Soft Computing Journal journal homepage: www.else...

8MB Sizes 0 Downloads 120 Views

Applied Soft Computing Journal 74 (2019) 693–708

Contents lists available at ScienceDirect

Applied Soft Computing Journal journal homepage: www.elsevier.com/locate/asoc

Hyperspectral image classification using k-sparse denoising autoencoder and spectral–restricted spatial characteristics ∗

Rushi Lan, Zeya Li, Zhenbing Liu , Tianlong Gu, Xiaonan Luo Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

highlights • Propose k-sparse denoising autoencoders (KDAE) to extract the features of hyperspectral images. • Develop spectral–restricted spatial characteristics for hyperspectral images. • A framework, integrating k-sparse denoising autoencoders and spectral–restricted spatial characteristics, is proposed for hyperspectral image classification.

article

info

Article history: Received 31 January 2018 Received in revised form 20 June 2018 Accepted 7 August 2018 Available online 7 November 2018 Keywords: Hyperspectral image classification Restricted spatial information k-sparse denoising autoencoders Softmax

a b s t r a c t Hyperspectral images (HSIs) have both spectral and spatial characteristics that possess considerable information. This paper proposes a novel k-sparse denoising autoencoder (KDAE) with a softmax classifier for HSI classification. Based on the stack-type autoencoder, KDAE adopts k-sparsity and random noise, employs the dropout method at the hidden layers, and finally classifies HSIs through the softmax classifier. Moreover, an operation referred to as restricted spatial information (RSI) is conducted to obtain the spatial information of the HSI. The proposed method extracts features by KDAE combined with RSI, effectively taking the spectral and spatial information into account. The obtained features are finally fed into the softmax classifier for classification. The experimental results obtained on two benchmark hyperspectral datasets indicate that the proposed method achieves state-of-the-art performance compared with other competing methods. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Hyperspectral image (HSI) classification is a hot topic in the remote sensing and machine learning communities. With the rapid development of sensor technologies, hyperspectral remote sensing is able to capture spatially and spectrally continuous data simultaneously, and the spectral and spatial resolutions of remote sensing images have both been significantly improved, thereby substantially enriching the content of remote sensing images. Undetectable features in wide-band remote sensing images can now be detected. However, HSI classification is still challenging due to the complicated characteristics of HSI data. Many methods have been proposed from different perspectives for HSI classification, such as Gaussian processes [1], manifold learning [2], spectral matching [3], and linear regression [4]. However, these traditional methods still fail to provide satisfactory accuracy. In particular, support vector machine (SVM) [5,6] has proven to be powerful for solving the problem of the Hughes phenomenon and has improved the effectiveness of classification. ∗ Corresponding author. E-mail address: [email protected] (Z. Liu). https://doi.org/10.1016/j.asoc.2018.08.049 1568-4946/© 2018 Elsevier B.V. All rights reserved.

Since HSI classification faces the problems of a high-dimensional feature space and limited amount of labeled data, the majority of the aforementioned algorithms greatly suffer from the ‘‘curse of dimensionality’’. According to the classification criteria, the characteristics that can be applied are spectral characteristics, spatial characteristics, and spectral–spatial characteristics. Previous studies mainly focused on the spectral characteristics of hyperspectral information, whose redundancy is larger; however, the spatial information, when added to the feature extraction process, is able to increase the content of sample information, thus improving the classification performance. It is necessary to propose a new spatial characteristic and classification selection criterion for spectral–spatial characteristics. The previous image classification approaches classify the entire image as the classification sample, whereas the purpose of remote sensing image classification is to classify the obtained spectral data to obtain the overall image effect of the geomorphology. Thus, remote sensing image classification is based on pixel classification and object classification. Classification based on pixels is a traditional difficult problem: the problem is the phenomenon of ‘‘salt and pepper’’. This problem is due to the problems caused by the large difference of the classes in the complex imaging

694

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 1. BP neural network; x1 , x2 , . . ., xN1 are input variables (vector), and h1 , h2 , . . ., hN2 are hidden layer nodes. The BP neural network obtains the difference error (∆1 , ∆2 , . . ., ∆N3 ) between the desired output (y1 , y2 , . . ., yN3 ) and the actual output (o1 , o2 , . . ., oN3 ), and the entire network uses the BP principle to distribute the error to all layers of the hierarchy to correct the weight of each unit.

process of hyperspectral remote sensing images. Based on object classification, a new idea is proposed to solve this problem, and image objects composed of multiple adjacent pixels with more semantic information are used as the basic unit for classification. The traditional spectral–spatial feature extraction methods include quadtree decomposition and the multidimensional Wiener filter combination model, 3D wavelet coefficient model, linear sparse logistic regression model, and so forth. Recently, deep learning has remarkably improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains [7]. Deep learning methods for acquiring spectral– spatial information mainly include stacked autoencoder, denoising autoencoder, k-sparse autoencoder, and deep belief networks (DBNs). The common HSI classification methods include methods based on maximum likelihood estimation (MLE), Gaussian mixture models (GMMs), SVMs, Markov random fields after processing, and classified financial technology classification methods. M. Ye et al. [8] developed a dictionary-learning-based feature-level domain adaptation for cross-scene HSI classification. Ma et al. applied a stacked autoencoder to extract spectral and spatial features, and then it was used in the collaborative representation to classify the extracted features [9]. Similarly, an unsupervised multimanifold and contractive autoencoder was proposed for hyperspectral remote image classification [10]. In [11], stacked sparse autoencoders to extract spectral and spatial features were used, and then an SVM was used to classify the extracted features; finally, the authors used a transfer learning strategy to apply the trained model to other datasets. The segmented stacked autoencoders were used to solve the problem by dividing the spectral data into different bands; the features of segmented data were extracted by stacked autoencoders, and then the extracted features were stitched into a 1-D eigenvector and fed into a stacked autoencoder [12]. A framework based on deep learning was proposed, which combines spectral and spatial information, and the features extracted through stacked autoencoders and DBNs can achieve good results [13–15]. In [16], k-sparse autoencoders were proposed based on autoencoders. Pan et al. combined a rolling guidance filter and a vertex component analysis network ti propose R-VCANet, which could provide more representative features with limited training samples [17]. In [18], by reviewing the classification used in evolutionary computation and neural networks, the authors discussed the core potentials of computational intelligence for optical remote sensing image processing. S.A. Medjahed et al. proposed a new optimization-based framework to reduce the dimensionality of HSIs [19]. In [20], H. Ohno et al. presented the connection between

LGAE and partial least squares from the perspective of information theory for denoising autoencoders. Autoencoders attempt to learn a straightforward encoder and the corresponding codes for samples by minimizing the error between the sample data and its reconstruction that maps from codes back into input data [21,22]. Sparsity of autoencoders is achieved by computing the relative entropy between the average activation level of the hidden layer nodes and the target activation level and then adding it to the loss function to achieve the sparse constraint [23,24]. We will think of a neuron as being ‘‘active’’ ( or as ‘‘firing’’ ) if its output value is close to 1 or as being ‘‘inactive’’ if its output value is close to 0. We would like to constrain the neurons to be inactive most of the time. This discussion assumes a sigmoid activation function. This sparse approach can only guarantee the sparsity of the activation value for the hidden layer. It is possible that the activation value for units in the hidden layer is small, and they can jointly play the role of encoding input but do not guarantee achieving a sparse expression for each input. The concept of k-sparse autoencoder, which was proposed by A. Makhzani et al., is an autoencoder with a linear activation function, where in hidden layers only the k highest activities are retained while the remaining nodes are set to zero, thus ensuring a sparse expression for each input. It can enforce exact and arbitrary sparsity levels in the hidden layers. Inspired by the aforementioned algorithms, in this work, we add noise processing to propose a k-sparse denoising autoencoder (KDAE) based on k-sparse autoencoders, and the dropout method is further applied to the hidden layer. Then, we combine KDAE with a softmax classifier and call it KDAES. Moreover, this paper proposes a novel spatial feature extraction method called restricted spatial information (RSI) and combine it with the spectral information to obtain good classification accuracy. Experiments have been conducted to demonstrate the effectiveness of the proposed method from different perspectives. In summary, the main contributions of this work are twofold: 1. We propose a novel KDAE for HSI classification. The advantage of KDAE is that the application of KDAE can improve the classification effect in the training phase and during the test phase. Based on the stack-type autoencoder, the algorithm adopts k-sparsity and random noise, and it employs the dropout method at the hidden layer. By employing the dropout method, several neurons in cooperation are destroyed, which can prevent overfitting and improve the classification accuracy. 2. A feature named spectral–restricted spatial information is developed to comprehensively represent the characteristics of HSIs. The comparison results show that the classification performance of different classifiers can be improved using spectral–restricted spatial information. The remainder of this paper is organized as follows: Section 2 briefly introduces some related works. Sections 3 and 4 present the proposed KDAE and RSI, respectively. Section 5 provides several evaluation results to demonstrate the effectiveness of the proposed method. Section 6 finally presents the conclusion.

2. Related works This section briefly reviews some related works, including backpropagation (BP) neural networks, stacked autoencoder, denoising autoencoder, and sparse autoencoder.

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

695

Fig. 2. Framework of k-sparse denoising autoencoder classification.

2.1. BP neural network

the lth layer.

The BP neural network is one of the most widely used algorithms in artificial neural networks. The autoencoder algorithm used in this paper relates to the principle of the BP neural network (see Fig. 1). Suppose that there are m fixed datasets. We define the loss function as follows: J(W , b; x, y) =

1 2

∥o − y ∥2

(1)

where W and b represent the weights and bias values respectively. For a batch of m datasets, the overall loss function can be defined as: J(W , b) =

1 ∑ m

k=1

1 m( f (Zw,b (xk )) − yk )2 2

(2)

The gradient descent method is described in Eqs. (3)–(4), where

β is the learning rate and is used to control the speed of the (l)

parameter change. Wij represents the weight parameter between (l)

the ith and the jth neurons in the lth layer, and bi indicates the bias value of the ith neuron in lth layer.

∂ J(W , b) ∂ Wil,j ∂ J(W , b) bli = bli − β ∂ bli,j

Wil,j = Wil,j − β

∂ J(W , b) = (yi − f (zil ))f ′ (zil ) ∂ zil

(4)

(5)

(l)

l+1 ∑

Wjil δjl+1 f ′ (zil ))

(7) (8)

2.2. Stacked autoencoder The autoencoder algorithm is the basic structure of the stacked autoencoder [25] algorithm. The autoencoder network is a type of unsupervised learning model, which is similar to the data reduction of principal component analysis (PCA) through the hidden layer, and then the original data are reconstructed by the output reconstruction layer. The autoencoder algorithm can construct a multilayer hidden layer and extract more abstract and stable characteristics through greedy training. After each training parameter is completed, the output reconfiguration layer is removed, and the hidden layer is trained as input; then, these input—hidden layers are connected to form a stacked autoencoder [26]. 2.3. Denoising autoencoder and sparse autoencoder

where zi represents the output value of the ith neuron in the lth layer. The residual in Eq. (5) represents the residual of the ith node of the lth layer. The relationship between the residuals of two neural networks can be obtained as in Eq. (6).

δil = (

∂ zil+1 = αjl · δil+1 ∂ Wijl ∂ z l+1 · i l = δil+1 ∂ bi ·

(3)

To achieve the derivative term in the gradient descent method, the ‘‘residual concept’’ is introduced, and the term is denoted as δ .

δil =

∂ J(W , b; x, y) ∂ J(W , b; x, y) = l ∂ Wij ∂ zil+1 ∂ J(W , b; x, y) ∂ J(W , b; x, y) = l ∂ bi ∂ zil+1

(6)

j=1

Finally, we can find the partial derivative required for gradient descent, as shown in Eq. (7), where αjl is the jth activation value of

Denoising autoencoder refers to the addition of noise when inputting data. In this paper, such noise is realized by means of a certain proportion of random zeros. A model trained in this way has stronger robustness. Adding noise results in better classification than with no noise, which we can understand from the following two aspects [27]: 1. By using corrupted data to train the model, the model trained from damaged data is capable of extracting more stable characteristics to achieve data classification. 2. Data of random zeros are equivalent to damaged data, this method alleviates the difference between the training set and the test set to some extent, and the use of raw data for training often makes the model easy to overfit, making it difficult to obtain strong generalization capabilities. The sparsity of the sparse autoencoder algorithm is realized by sparsity restrictions. Take a single neuron as the research object, where the neuron output is close to 1 for the activated state and the output is close to zero for inhibition; thus, the sparsity restrictions are making the neurons inhibited most of the time limit. We use aj to represent the activation degree of neuron j, and ρ is used

696

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

to represent the sparse parameters. For example, Eq. (9) uses ρˆ j to represent the average activation degree of neuron j, where xi represents the ith input sample and m represents the total number of input samples.

ρˆ j =

m 1 ∑

m

[aj (xi )]

(9)

i=1

where the symbol [·] represents the activation degree of a neuron. Then, we use KL-divergence to keep ρˆ j close to ρ for the sparsity constraint. The relative entropy calculation method is shown in Eq. (10). 1−ρ ρ + (1 − ρ ) log ρˆ j 1 − ρˆ j S1 ∑ Jsparse (W , b) = J(W , b) + β KL(ρ ∆ρˆ j ) KL(ρ ∆ρˆ j ) = ρ log

(10)

(11)

j=1

For the stacked autoencoder, the sparseness of each hidden layer is controlled individually. The implementation is to sum the relative entropy of all neurons in the hidden layer of this layer and then add it to the loss function J(W , b). The total cost function can be expressed by Eq. (11), where S1 is the number of hidden layer neurons and β is the weight of the control relative entropy.

Fig. 3. The hidden layer of the sparse autoencoder cannot effectively implement the sparse situation.

3. The proposed k-sparse denoising autoencoder As shown in algorithm 1, in this paper, based on the denoising autoencoder, we use the k-sparse method to realize the sparsity. By preserving k maximum activation values at the hidden layer to achieve sparsity, we zero out hidden layer neurons beyond the k maximum activation values such that each input can achieve the sparse representation. Let Γ represent a set that is composed of k maximum activation values, and let Γ C represent its complementary set. This sparsity indicates that the sparse mismatch between the training phase and the test phase can result in a slight improvement in the classification effect. In other words, in the test phase, we take α k activation values of the maximum hidden layer nodes, where α ≥ 1. Algorithm 1 k-sparse denoising autoencoders

rather than train a single DNN. DNNs discard some neurons with probability P, other neurons are preserved with probability q=1–p, and the output of rounded down neurons is set to zero. Dropout works well in practice because it prevents neuron coadaptation during the training phase. Our proposed k-sparse denoising autoencoder introduces the noise of the input layer, the random zeros of the hidden layer, and k-sparse processing, which effectively solves the problem that the test samples and training samples do not conform to the same distribution, resulting in a poor classification effect. We have constructed a neural network of the k-sparse denoising autoencoders, and the proposed method effectively solves the problem of training the fitting in the existing methods and is a more mature feature extraction method. This network has produced a good effect in the experiments.

Training:

4. Spectral–restricted spatial feature extraction (1) Dataset input, add noise, randomly set input layer nodes to zero with a uniform probability distribution. (2) Perform the feedforward phase and compute h=f(W1 x+b1 ). (3) Employ the dropout method to randomly set hidden layer nodes to zero with a uniform probability distribution. (4) Find k largest activations of h and set the remainder to zero, hΓ C =0, and Γ =suppk (h). (5) Compute the output and error using the sparsified h (6) Backpropagate the error through the k largest activations defined by Γ and iterate Test: Add noise and compute the features h=f(W1 x+b1 ). Find its α k largest activations and set the remainder to zero. hΓ C =0, where Γ = suppα k (h).

After numerous debuggings, we found the most suitable parameter values in the experiment. In the use of the random noise and dropout methods, they all have a uniform probability distribution between 0 and 1, and the random zero-setting probability is set separately to values of 0.1 and 0.5. The k value of our proposed method is also based on experimental experience, and the value of k will affect the effect of classification. For the datasets in this paper, we use dropout [28] methods in the hidden layer for the training dataset, which can improve the classification accuracy to a certain extent. The idea of dropout is to train the overall DNN and average the entire set of results

A general flowchart of HSI classification is shown in Fig. 2, which consists of three parts: (1) feature representation, (2) feature extraction, and (3) classification. Feature extraction extracts both spatial and spectral features and uses KDAE for training data. In the original data, the r–c plane reflects the spatial information of the HSI, and the λ dimension reflects the spectral information of the multiple channels. The spectrum of pixel p to be classified is unprocessed, and the extracted neighbor region is taken as a spatial feature after PCA. The neighborhood space is from the angle of the r-c plane. From the perspective of spectral information, only n channels are reserved. Next, the spectral information of the point to be classified and the spatial information of the neighbor space are vector stacked to form a 1-D vector eigenvector. Finally, the eigenvectors are fed into the KDAE for feature learning, where the softmax classifier is used as its output layer activation, and the output layer size is the same as the total number of classes. The softmax method is used for multiple classifications, which is equivalent to the promotion of the version of the logistic regression method, after the KDAE to achieve classification. The specific steps to find the hypothesis function, the use of deep learning to set the loss function, and finally the BP reverse propagation continuously make the loss function smaller, and the regression parameters

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

697

Fig. 4. Spectral characteristics of unmarked pixels in the Indian Pines and Salinas datasets.

Fig. 5. General view of the hyperspectral remote sensing images: (a) Indian Pines data and (b) Salinas data.

are obtained. After many experiments, we found that using the softmax method to perform multiple classifications is superior to other methods, such as SVMs, extreme learning machines and so on. We combine KDAE with a softmax classifier and call it KDAES. As shown in Fig. 2, a pixel region with a window size of w× w around the pixel to be classified is defined as the neighbor region to be extracted. Due to the hundreds of channels along the spectral dimension, data on this initial layer always have tens of thousands of dimensions, containing a large amount of redundant information. The dimensions of spatial information data can affect the feature extraction and classification performance. Thus, the framework uses PCA before extracting the neighbor region spatial information. When the neighbor region of the unclassified pixel cannot be completely extracted, the mirror symmetry approach is used [11].

Sparse implementation in sparse autoencoders can only guarantee the sparsity of activation for the hidden layer, as shown in Fig. 3. Most of the hidden layer units may have small activation values (activation values between 0 and 1). However, it is possible to jointly encode the input, and it is not guaranteed that a sparse representation can be achieved for each input. The k-sparse approach is more straightforward and effective. This approach preserves the sparse representation of each input by zeroing hidden neurons outside the k maximal activation values. As shown in Fig. 4, the spectral of the unlabeled pixels are very different. The neighbor region spatial information for each pixel is varied, as shown in Fig. 5, such as p1 pixel and p2 pixel, the spatial information of adjacent areas of the base point p2 is greatly different. In this paper, spectral and spatial information are combined. There will be great differences of spatial information for the same kind of labeled pixel to be classified, if only use

698

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 6. (a) Influence of the depths and (b) influence of the training set ratio. (Spectral–restricted spatial classification of HSI using KDAES.)

the neighbor region spatial information method to obtain spatial information. Fig. 5 show a general view of the hyperspectral remote sensing images: (a) Indian Pines data and (b) Salinas data. Regarding the aforementioned problem, this paper proposes a new strategy to obtain spatial information (i.e., RSI) in algorithm 2. This strategy eliminates the great uncertainty of spatial information and reduces the unclassified intraclass difference. The spatial information of the pixels shares the same spatial information, which is the neighbor region of the base point, limiting the spatial information of change to a certain range. As shown in Fig. 5, if p1 is the base point that all class six pixels take the neighbor region of as spatial information, then the spatial information will be identical or similar for the same type of class. In this way, the intraclass differences of the spectral– spatial stacked vector can be reduced. Therefore, improved classification accuracy can be achieved. The selection of the base points should be based on the following two principles. First, the number of base points should be greater than the total number of marked points in the dataset. Second, the spatial distribution of the base points should be uniform. The problem that needs to be noted is that the first principle is to allow the type of spatial information to match the total number of labeled points. Irregularities due to the distribution of labeled and unlabeled points in remote sensing images and due to the distribution and number of samples of different classes are often very uneven; thus, a number of base points of several multiples are labeled as the best points in the total number of categories. The second principle is to make the limited spatial information more reasonable in the spatial sense. Algorithm 2 Restricted spatial information extraction algorithm Input: Spatial information of the HSI. Output: Restricted spatial information of the HSI.. (1) begin (2) Select a number of base points: take one row in every i rows, and take one column in every j columns in the image. Take the intersection points of these rows and columns as the base points. (3) For each sample (pixel) to be classified, calculate the score value (S) for each base point: S=m*r/(d+ε ). (4) Calculate the scores of all base points. (5) Assign the neighborhood spatial information of the highest score point to the pixel to be classified. (6) end

As shown in algorithm 2, all pixels to be classified will match the most suitable base points according to S, which is calculated

according to m, r and d. As shown in Eqs. (12)–(14), the variable m represents the spectral similarity between the pixel p to be classified and base point b. The neighbor region spatial information of the same type of pixels should be similar; thus, according to the value of variable m, choose the base point b. The variable r represents the proportion of labeled pixels to all the neighborhood pixels in the neighbor region of the base point, a represents the number of labeled points, and w represents the length of the neighbor region. For the same class of pixels, not all of them will choose the same base point and then share the same spatial information; thus, the variable r can minimize the interference caused by unlabeled pixels. The variable d represents the Euclidean distance of pixel p and the base point b on the remote sensing image, and x and y indicate the coordinates of pixels. Normally, the same type of pixels are often gathered; thus, we need to take the spatial distance d into account when choosing the base point. ε is a small positive constant, and it was introduced to prevent the denominator from being zero. m, r, and d are regularized on an interval of [0,1]. m= a=

d=

p∗b

∥p∥∥b∥ a

w×w



(xp − xb )2 + (yp − yb )2

(12) (13)

(14)

Compared with the existing spectral–neighbor spatial information feature extraction method, by combining the spectral information of the pixel to be classified and the confined spatial information of the adjacent space, the uncertainty of the spatial information is largely eliminated, and the intraclass differences and spatial information discrepancies in the sample to be classified are small. Combined with the processing of the KDAES classification framework, the existing denoising autoencoder and sparse autoencoder are effectively combined, and based on the sparseness treatment of the k-sparse concepts, the overall classification performance has been significantly improved. 5. Experiments In this section, several experiments are conducted to evaluate the proposed method. We first introduce the HSI datasets and

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

699

Fig. 7. According to the classification of spectral feature information, the classification effect of each method was compared, and the training set proportion was set to 0.1, 0.2, . . ., 0.5. (a) Comparison results of the overall classification accuracy for each method. (b) Comparison results of the average classification accuracy for each method. (c) Comparison results of the Kappa coefficients for each method.

experimental design, and then we present the classification performance of KDAES. Subsequently, the influences of depth and training samples of KDAE are studied. Finally, comparisons of different features are presented (see Tables 1 and 2).

5.1. Data description and experimental design As shown in Fig. 5, the hyperspectral data used in the first experiment were recorded by the AVIRIS sensor over the Indian

700

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 8. The results of different classification methods on the Indian Pines dataset using spectral characteristic information. Table 1 Distribution of sample data for each category of Indian Pines dataset. #

Class

Samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Alfalfa Corn-notill Corn-mintill Corn Grass-pasture Grass-trees Grass-pasture-mowed Hay-windrowed Oats Soybean-notill Soybean-mintill Soybean-clean Wheat Woods Buildings-Grass-Trees-Drives Stone-steel-Towers

46 1428 830 237 483 730 28 478 20 972 2455 593 205 1265 386 93

Table 2 Distribution of sample data for each category of Salinas datasets. #

Class

Samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Broccoli_green_weeds_1 Broccoli_green_weeds_2 Fallow Fallow Fallow_smooth Corn Grass-pasture Stubble Celery Grapes_untrained Soil_vineyard_develop Corn_senesced_green_weeds Lettuce_romaine_4wk Lettuce_romaine_5wk Lettuce_romaine_6wk Lettuce_romaine_7wk Vineyard_untrained Vineyard_vertical_trellis

2009 3726 1976 1394 2678 3959 3579 11,271 6203 3278 1068 1927 916 1070 7268 1807

Pines test site in Northwestern Indiana. The image has a spatial dimension of 145 × 145 pixels (10,249 pixels labeled) with 224

spectral reflectance bands, spectral coverage in the 400–2500 nm range, and spatial resolution of 20 m per pixel. After processing the raw data, 20 water absorption bands and 4 serious noise bands were discarded, leaving 200 spectral bands for the experiment. As shown in Fig. 5, the second dataset employed was also collected by AVIRIS in the Salinas Valley, California, with a spatial resolution of 3.7 m. The image has a spatial dimension of 512 × 217 pixels (54,129 pixels labeled), with 224 spectral bands, and 20 water absorption bands were discarded, leave 204 spectral bands for experiments. Both of them have 16 different classes. For a better comparison, refer to the electronic versions of all the figures in these experiments. We use MATLAB 2014b on a computer with an Intel Core i7-4790 3.6 GHz CPU and 16 GB of memory. The codes are implemented using the deep learning toolbox of Rasmus Berg Palm, which is a MATLAB library for deep learning methods; the SVMs are implementations from LIBSVM; and the PCA and the locality-preserving projection methods are implemented from the toolbox of Laurens van der Maaten, a MATLAB library for dimensionality reduction. For evaluating the classification accuracy, labeled samples are randomly divided into a training set and test set with a ratio of 1:1 according to [15]. Overall accuracy (OA) refers to the number of samples that are classified correctly and divided by the number of test samples. Average accuracy (AA) shows the average value of the class classification accuracy. Kappa coefficient is a robust measurement of the degree of agreement. The specific formulas are shown in Eqs. (15)–(17), where T is the total number of categories, CMuu is represented on the diagonal of the confusion matrix, the uth category is classified as the numbers of u classes, and n is the total number of classification samples. PA is the consistency of two observations, and the overall classification accuracy is in remote sensing image classification; Pe is the consistency of the two observations. In remote sensing image classification, it is the total number of real elements of a certain class and the total number of elements classified as such. The product is the sum of all categories. OA =

T ∑ u=1

CMuu /n

(15)

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

701

Fig. 9. The results of different classification methods on the Salinas dataset using spectral characteristic information.

AA =

T ∑

(CMuu /(

u=1

Kappa =

pA − pe 1 − pe

T ∑

CM1u ))/T

(16)

u=1

(17)

Both testing and training samples are randomly drawn from the ground truth without overlapping with each other. Each experiment has been repeated 20 times, and the average accuracy with its standard deviation are reported. For both datasets, the KDAE have three hidden layers, and each layer has one-hundred hidden units. Noise is added in the input layer (in accordance with the proportion of 0.1 random zeros), the dropout method is employed in each hidden layer (dropout ratio is 0.5), and the k-sparse method is used (k values vary according to the dataset, and the α value is 2 according to the [16]). With regard to spatial information, after debugging, the PCA component is set to 5, which achieves the best performance; the window size of the neighbor region spatial information and RSI extraction is set to 5 × 5; and i and j in the RSI method are set

to 10 and 10 in both datasets. The ϵ values are taken as 0.01 in the Indian Pines dataset and 0.001 in the Salinas dataset. Regarding the comparison experiments, SDAE is used as one of the comparative methods. The other comparative experiment is mainly based on SVM [29]. The optimal gamma (-g) and cost (-c) are determined by the gridding search method. The SVM-based approach mainly includes the linear kernel function SVM, denoted as LSVM; polynomial kernel function SVM, denoted as PSVM; and SVM combined with localitypreserving projection method, which is widely used in the HSI classification, denoted as LPP-LSVM and LPP-PSVM [30–32].

5.2. Classification of HSI using KDAES HSI classification is based on the classification of labeled pixel points, and unlabeled pixels do not directly participate as samples in classification; however, it is possible to utilize unlabeled pixels when extracting neighbor region spatial information. For each

702

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 10. According to the classification of spectral and spatial feature information, the classification effect of each method was compared, and the training set proportion was set to 0.1 0.5. (a) Comparison results of the overall classification accuracy for each method. (b) Comparison results of the average classification accuracy for each method. (c) Comparison results of the Kappa coefficients for each method.

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

703

Fig. 11. Results of different classification methods on the Indian Pines dataset using spectral and spatial feature information, where (a) is the ground truth result, and (b)–(g) correspond to the results of LPP-LSVM, LPP-PSVM, LSVM, PSVM, SDAE, and KDAES, respectively.

pixel to be classified, we describe the vectors (pixel) with different features, which are spectral–spatial information and spectral– restricted spatial information. Tables 3 and 4 present the classification accuracies for the two experimental datasets. The classification accuracy values are given in the form of mean and standard deviation from the perspective of statistics, which is used as a measurement of volatility. As shown in Table 3, compared with approaches based on SVM and SDAE, KDAES leads to better performance, particularly when it is combined with spectral–restricted spatial feature extraction. Our proposed method exhibits the highest OA and Kappa and the highest percentage of correctly classified pixels among all the test pixels considered, with improvements of 2.08%, 0.31% and 2.36% over the DBM, respectively. Table 4 shows the experimental results for the Salinas dataset. It is shown that KDAES again provides a better result and outperforms approaches based on SVM on average in terms of OA, AA and Kappa. Meanwhile, it is shown that combining KDAES and spectral–restricted spatial feature extraction leads to improvements of 2.29%, 0.86% and 2.51% compared with combining KDAES and spectral–spatial feature extraction. Clearly, both the proposed KDAES and spectral– restricted spatial feature extraction could acquire the best classification accuracy. This result demonstrates that the RSI contributes to improving the classification results. Through the analysis of experimental time data, we found that the SVM-based classification method used less time than other methods in the entire classification process, and the method proposed in this paper is equivalent to the SDAE method. As shown in Tables 3 and 4, when spectral–spatial information features are used in various methods, the k-sparse denoising autoencoder method is superior to other methods to some extent. From the perspective of the method’s intention of improved accuracy, SDAE is achieved by limiting the activation value of hidden layer nodes to a certain value to achieve sparsity. Therefore, when the activation values of most hidden layer nodes are small, the sparseness is not achieved. K-sparse is an improvement over this situation that does not always occur; thus, the k-sparse classification effect does not improve very significantly. At the same time,

the Indian Pines and Salinas datasets also have the problem of unbalanced samples. This problem will lead to a smaller average classification accuracy (AA). As shown in the tables, the SVMbased method has a better classification effect in this respect. As shown in Tables 3 and 4, the combination of the restricted spatial information and the spectral information proposed in this paper will greatly help improve the classification effect and explain the method of using neighborhood information as spatial characteristic information without processing. It wastes the considerable potential of hyperspectral data, and it also shows that a wide range of domain information and spectral differences of unlabeled points do cause certain interferences in the classification. 5.3. Influences of the depths and the training sample size Depth plays an important role in HSI classification. We train our methods with different depths but with fixed principle component numbers and hidden unit numbers to investigate how the depths of the features affect the classification accuracies. The results are shown in Fig. 6(a). When the depth is set at three layers, the best classification accuracy will be obtained. Finally, we perform experiments to examine how the number of training samples affects the results of our proposed method. All the parameters are fixed except for the number of training samples per class. Classification OA plots under various training sizes are shown in Fig. 6(b). As shown in this figure, OA monotonically increases with the ratio of the training set. This figure shows that the Indian Pines dataset is more susceptible to the ratio of the training set than the Salinas dataset. This result illustrates that the deep learning method requires a large number of training sets to show the advantages. 5.4. Evaluation of different feature representations of HSI As shown in Fig. 7, from the perspective of using spectral characteristic information, the experimental part conducts experiments on two datasets to verify the performance of the proposed method. The experimental results show that the k-sparse denoising autoencoder proposed in this paper has improved classification

704

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 12. The results of different classification methods on the Salinas dataset using spectral and spatial feature information, where (a) is the ground truth result, and (b)–(g) correspond to the results of LPP-LSVM, LPP-PSVM, LSVM, PSVM, SDAE, and KDAES, respectively. Table 3 Experimental results of the Indian Pines dataset, in which the training set ratio is set to 50%, and each method is evaluated by the mean values and standard deviation. Method

Features

OA(%)

AA(%)

Kappa

LPP-LSVM

Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–Spatial Spectral–RestrictedSpatial

92.64 ± 0.3457 96.45 ± 0.1926 92.35 ± 0.4462 96.98 ± 0.2447 86.11 ± 0.4525 97.73 ± 0.2097 92.23 ± 0.3608 97.75 ± 0.1721 92.61 ± 0.1771 97.11 ± 0.0058 95.95 ± 0.1872 93.68 ± 0.2616 98.03 ± 0.1235

88.22 ± 1.3140 94.05 ± 1.1159 88.85 ± 1.5416 94.05 ± 1.4842 89.28 ± 0.8772 96.24 ± 1.2907 93.05 ± 1.0847 96.29 ± 1.1447 91.38 ± 0.1938 94.97 ± 0.0794 95.45 ± 0.1745 94.53 ± 0.0742 95.76 ± 0.3544

0.9160 ± 0.0040 0.9594 ± 0.0022 0.9127 ± 0.0051 0.9655 ± 0.0028 0.8414 ± 0.0051 0.9741 ± 0.0024 0.9113 ± 0.0041 0.9744 ± 0.0020 0.9156 ± 0.0011 0.9707 ± 0.0005 0.9539 ± 0.0014 0.9279 ± 0.0014 0.9775 ± 0.0014

LPP-PSVM LSVM PSVM SDAE DBN KDAES

performance and is superior to the SVM-based method [33]. In the early studies of hyperspectral remote sensing images, the main features of spectral characteristics were studied. Because of the large differences in hyperspectral remote sensing image data, it is

not ideal to use traditional methods for classification. This article uses a deep learning approach to attempt to solve the problem of remote sensing image classification. From the experimental results, when the training set is smaller, the classification effect is

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

705

Fig. 13. Using spectral–restricted spatial feature classification, the classification effect of each method was compared, and the training set proportion was set to 0.1 0.5. (a) Comparison results of the overall classification accuracy for each method. (b) Comparison results of the average classification accuracy for each method. (c) Comparison results of the Kappa coefficients for each method.

still not very ideal; when the training set reaches a certain number of samples to meet the requirements of the large amount of data for deep learning methods, a better classification result is achieved. As shown in Figs. 7–9, from the line chart, as the proportion

of the training set increases, the OA, AA, and Kappa indicators of various classification methods constantly increase. This result indicates that the deep learning method requires a large number of training sets to show its advantages. Under the condition where the

706

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

Fig. 14. The results of different classification methods on the Indian Pines dataset using the spectral–restricted spatial feature, where (a) is the ground truth result, and (b)–(g) correspond to the results of LPP-LSVM, LPP-PSVM, LSVM, PSVM, SDAE, and KDAES, respectively. Table 4 Experimental results of the Salinas dataset, in which the training set ratio is set to 50%, and each method is evaluated by the mean values standard deviation. Method

Features

OA(%)

AA(%)

Kappa

LPP-LSVM

Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial Spectral–Spatial Spectral–RestrictedSpatial

91.58 ± 0.2180 98.99 ± 0.0903 92.55 ± 0.1916 99.25 ± 0.0847 94.01 ± 0.1962 99.78 ± 0.0187 96.37 ± 0.1255 99.82 ± 0.0276 97.32 ± 0.1970 99.81 ± 0.0058 97.56 ± 0.3212 99.85 ± 0.0235

95.23 ± 0.1689 97.95 ± 0.1551 95.98 ± 0.1596 98.42 ± 0.1670 97.54 ± 0.1216 99.69 ± 0.0390 98.52 ± 0.0623 99.71 ± 0.0477 98.72 ± 0.0506 99.65 ± 0.0008 98.79 ± 0.0369 99.65 ± 0.0029

0.9061 ± 0.0024 0.9888 ± 0.0010 0.9171 ± 0.0021 0.9916 ± 0.0009 0.9333 ± 0.0022 0.9977 ± 0.0002 0.9596 ± 0.0014 0.9980 ± 0.0003 0.9702 ± 0.0022 0.9979 ± 0.0006 0.9729 ± 0.0014 0.9980 ± 0.0001

LPP-PSVM LSVM PSVM SDAE KDAES

training set is 50%, the method based on the k-sparse denoising autoencoder classification method is superior to other classification methods to some extent in terms of all evaluation indices. When the training set is 50%, when the classification effect is drawn, it can be clearly observed from the image that differences among each method are not particularly evident, but there is a serious ‘‘salt and pepper’’ effect. As shown in Fig. 10, the experimental part uses the spectral characteristic information and combines the spatial feature information of the neighborhood. Through the stacked vector, the neighborhood space feature information after stretching into a one-dimensional vector with spectral characteristics expanded the amount of information for the classification of samples, thus improving the classification effect [13,34,35]. The datasets adopted in this paper are all farm datasets; consequently, the ground objects in the remote sensing images are mainly crops, and each category is distributed in the remote sensing images in block form. From a geographical distribution perspective, crops have a certain continuity, such as when a pixel point of the surrounding pixels is planting corn, pixels of corn have one of the largest probabilities. From this perspective, considering the neighborhood space characteristic information in classification has a certain practical significance. When the spatial feature information is considered,

as shown in Figs. 11 and 12, when the training set is 50% and the classification effect diagram is drawn, it shows that the classification effect is significantly improved, the proposed method has the best classification effect, and the classification image is the clearest. This result also shows that it is effective to increase the information content of the sample to be classified by directly using the neighboring pixels of the point to be classified. In addition, higher spectral resolution of HSIs resulted in the spectrum of each pixel vector information redundancy being very large; thus, fully utilizing spatial characteristic information for hyperspectral remote sensing image classification is the most direct and effective approach. As shown in Fig. 13, the experimental part uses the spectral and restricted spatial characteristic information presented in this paper. The KDAES method is combined with the use of spectral– restricted spatial information to obtain the best overall classification accuracy on both datasets. Although the graph shows that the KDAES method has achieved the best results, the advantage is small compared with other methods. From another perspective, this result is also because the restricted spatial characteristic information method makes the classification of all methods obtain greater importance. In this paper, the two defects existing in the classification of neighborhood spatial information are improved, and a

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

707

Fig. 15. The results of different classification methods on the Salinas dataset using the spectral–restricted spatial feature, where (a) is the ground truth result, and (b)–(g) correspond to the results of LPP-LSVM, LPP-PSVM, LSVM, PSVM, SDAE, and KDAES, respectively.

new method for combining characteristic feature information is proposed. From the experimental results, this spatial information combination method can improve the classification results of all methods. In Figs. 14 and 15, the classification effect diagram is drawn when the training set is 50%. From the line graph, based on the characteristic information of the spectral–restricted spatial information, various classification methods have good performance, and the method proposed in this article has great advantages. Except for the AA indicator, it has similar performance to the SVMbased classification method. The proposed method achieved the best classification effect among the other two indicators. Moreover, in the actual remote sensing images, the salt and pepper noise of each image has obviously decreased, and the image obtained by the overall classification framework proposed in this paper is the clearest. 6. Conclusion In this paper, we combined the k-sparse autoencoder with a denoising autoencoder to propose KDAE, which can be used to extract

the characteristics of HSIs. Based on the spectral–spatial feature extraction framework, we also developed the spectral–restricted spatial information that is able to effectively solve the problem of large variations in the spatial information of neighboring regions in the same pixel. Experiments on two benchmark HSI databases have confirmed the effectiveness of KDAE, and the results indicated that spectral–restricted spatial characteristics are useful for classification. The performance of the traditional SVMs and sparse denoising autoencoder, when combined with the developed features, have also been improved. Acknowledgments The authors would like to sincerely thank the reviewers for their comments and suggestions that significantly improved the quality of this paper. Rushi Lan and Zeya Li contributed equally to this work. This work was supported in part by the National Natural Science Foundation of China (Nos. 61562013, 61702129, 61772149, and 61320106008), Natural Science Foundation of Guangxi Province (CN) (No. 2017GXNFDA198025), and by Guangxi Key Laboratory of Trusted Software (CN) (No. kx201730).

708

R. Lan, Z. Li, Z. Liu et al. / Applied Soft Computing Journal 74 (2019) 693–708

References [1] Y. Bazi, F. Melgani, Classification of hyperspectral remote sensing images using gaussian processes, in: Geoscience and remote sensing symposium, 2008, IGARSS 2008, IEEE international, 2009, pp. II–1013–II–1016. [2] H. Huang, Classification of hyperspectral remote-sensing images based on sparse manifold learning, J. Appl. Remote Sens. 7 (1) (2013) 073464. [3] S. Shanmugam, P. Srinivasaperumal, Spectral matching approaches in hyperspectral image processing, Int. J. Remote Sens. 35 (24) (2014) 8217–8251. [4] H. Yuan, Y.T. Yuan, Spectral-spatial shared linear regression for hyperspectral image classification, IEEE Trans. Cybern. 47 (4) (2016) 934–945. [5] N. Peng, W. Wei, Classification of hyperspectral remote sensing images with dynamic support vector machine ensemble, J. Comput. Appl. 30 (6) (2010) 1590–1593. [6] M. Khodadadzadeh, H. Ghassemian, M. Khodadadzadeh, H. Ghassemian, M. Khodadadzadeh, H. Ghassemian, Contextual classification of hyperspectral remote sensing images using svm-plr, Aust. J. Basic Appl. Sci. 5 (8) (2011) 374–382. [7] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436. [8] M. Ye, Y. Qian, J. Zhou, Y.Y. Tang, Dictionary learning-based feature-level domain adaptation for cross-scene hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 55 (3) (2017) 1544–1562. [9] X. Ma, H. Wang, J. Geng, Spectral-spatial classification of hyperspectral image based on deep auto-encoder, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9 (9) (2016) 4073–4085. [10] A. Hassanzadeh, A. Kaarna, T. Kauranne, Unsupervised multi-manifold classification of hyperspectral remote sensing images with contractive autoencoder, 2017. [11] C. Tao, H. Pan, Y. Li, Z. Zou, Unsupervised spectral–spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification, IEEE Geosci. Remote Sens. Lett. 12 (12) (2015) 2438–2442. [12] J. Zabalza, J. Ren, J. Zheng, H. Zhao, C. Qing, Z. Yang, P. Du, S. Marshall, Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging, Neurocomputing 214 (C) (2016) 1062. [13] Z. Lin, Y. Chen, X. Zhao, G. Wang, Spectral-Spatial Classification of Hyperspectral Image Using Autoencoders, 2015, pp. 1–5. [14] Y. Chen, Z. Lin, X. Zhao, G. Wang, Y. Gu, Deep learning-based classification of hyperspectral data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7 (6) (2017) 2094–2107. [15] Y. Chen, X. Zhao, X. Jia, Spectral–spatial classification of hyperspectral data based on deep belief network, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8 (6) (2015) 2381–2392. [16] A. Makhzani, B. Frey, k-Sparse Autoencoders, Comput. Sci. (2013). [17] B. Pan, Z. Shi, X. Xu, R-VCANet: A new deep-learning-based hyperspectral image classification method, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10 (5) (2017) 1975–1986. [18] Y. Zhong, A. Ma, Y.S. Ong, Z. Zhu, L. Zhang, Computational intelligence in optical remote sensing image processing, Appl. Soft Comput. 64 (2018) 75– 93.

[19] S.A. Medjahed, T.A. Saadi, A. Benyettou, M. Ouali, Gray wolf optimizer for hyperspectral band selection, Appl. Soft Comput. 40 (C) (2016) 178–186. [20] H. Ohno, Linear guided autoencoder: Representation learning with linearity, Appl. Soft Comput. 55 (2017) 566–575. [21] H. Lu, B. Li, J. Zhu, Y. Li, Y. Li, X. Xu, L. He, X. Li, J. Li, S. Serikawa, Wound intensity correction and segmentation with convolutional neural networks, Concurrency, Comput. Pract. Exp. 29 (6) (2017). [22] H. Lu, Y. Li, M. Chen, H. Kim, S. Serikawa, Brain intelligence: Go beyond artificial intelligence, Mobile Netw. Appl. (7553) (2017) 1–8. [23] Y. Chen, N.M. Nasrabadi, T.D. Tran, Hyperspectral image classification using dictionary-based sparse representation, IEEE Trans. Geosci. Remote Sens. 49 (10) (2011) 3973–3985. [24] W. Li, Q. Du, F. Zhang, W. Hu, Hyperspectral image classification by fusing collaborative and sparse representations, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9 (9) (2017) 4178–4187. [25] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. 11 (12) (2010) 3371–3408. [26] C. Xing, L. Ma, X. Yang, Stacked denoise autoencoder based feature extraction and classification for hyperspectral images, J. Sens. 2016 (2016) 1–10. [27] P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: International conference on machine learning, 2008, pp. 1096–1103. [28] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958. [29] F. Melgani, L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens. 42 (8) (2004) 1778–1790. [30] G. Camps-Valls, D. Tuia, L. Bruzzone, J.A. Benediktsson, Advances in hyperspectral image classification, IEEE Signal Process. Mag. 31 (1) (2014) 45–54. [31] R. Archibald, G. Fann, Feature selection and classification of hyperspectral images with support vector machines, IEEE Geosci. Remote Sens. Lett. 4 (4) (2007) 674–677. [32] Y. Tarabalka, M. Fauvel, J. Chanussot, J.A. Benediktsson, Svm- and mrf-based method for accurate classification of hyperspectral images, IEEE Geosci. Remote Sens. Lett. 7 (4) (2010) 736–740. [33] W. Zhao, H. Lu, D. Wang, Multisensor image fusion and enhancement in spectral total variation domain, IEEE Trans. Multimed. 20 (4) (2018) 866–879. [34] C. Chen, W. Li, E.W. Tramel, M. Cui, S. Prasad, J.E. Fowler, Spectral–spatial preprocessing using multihypothesis prediction for noise-robust hyperspectral image classification, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7 (4) (2014) 1047–1059. [35] Z. Chen, B. Wang, Semisupervised spectral–spatial classification of hyperspectral imagery with affinity scoring, IEEE Geosci. Remote Sens. Lett. 12 (8) (2015) 1710–1714.