Robust multitask learning with three-dimensional empirical mode decomposition-based features for hyperspectral classification

ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27 Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and Re...

Download PDF

6MB Sizes 0 Downloads 40 Views

Report

Full Text

ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

Robust multitask learning with three-dimensional empirical mode decomposition-based features for hyperspectral classification Zhi He a,⇑, Lin Liu a,b a School of Geography and Planning, Guangdong Provincial Key Laboratory of Urbanization and Geo-simulation, Center of Integrated Geographic Information Analysis, Sun Yat-sen University (SYSU), Guangzhou 510275, China b Department of Geography, University of Cincinnati (UC), Cincinnati 45221, USA

a r t i c l e

i n f o

Article history: Received 14 May 2016 Received in revised form 15 August 2016 Accepted 24 August 2016

Keywords: Hyperspectral image (HSI) Classification Three-dimensional empirical mode decomposition (3D-EMD) Multitask learning Sparse Low-rank

a b s t r a c t Empirical mode decomposition (EMD) and its variants have recently been applied for hyperspectral image (HSI) classification due to their ability to extract useful features from the original HSI. However, it remains a challenging task to effectively exploit the spectral-spatial information by the traditional vector or image-based methods. In this paper, a three-dimensional (3D) extension of EMD (3D-EMD) is proposed to naturally treat the HSI as a cube and decompose the HSI into varying oscillations (i.e. 3D intrinsic mode functions (3D-IMFs)). To achieve fast 3D-EMD implementation, 3D Delaunay triangulation (3D-DT) is utilized to determine the distances of extrema, while separable filters are adopted to generate the envelopes. Taking the extracted 3D-IMFs as features of different tasks, robust multitask learning (RMTL) is further proposed for HSI classification. In RMTL, pairs of low-rank and sparse structures are formulated by trace-norm and l1;2 -norm to capture task relatedness and specificity, respectively. Moreover, the optimization problems of RMTL can be efficiently solved by the inexact augmented Lagrangian method (IALM). Compared with several state-of-the-art feature extraction and classification methods, the experimental results conducted on three benchmark data sets demonstrate the superiority of the proposed methods. Ó 2016 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

1. Introduction Hyperspectral imaging sensors acquire the radiance of materials in hundreds of contiguous bands from the visible to the infrared spectrum, providing high spectral resolution for each pixel to distinguish various materials. Due to the rich information captured by sensors, hyperspectral image (HSI) has opened up new opportunities and challenges in many remote sensing applications, such as classification (Jia et al., 2015; Tan et al., 2015), unmixing (BioucasDias et al., 2012), data fusion (Wang and Glennie, 2015), target detection (Dong et al., 2015), and so on. Supervised classification, which aims at labeling each pixel by one of the land-cover classes based on training samples given for different classes, is an important application that has attracted extensive research efforts over the past few years. The framework of HSI classification contains two main aspects: feature acquisition and classifier construction. To acquire discriminative features for HSI classification, many feature selection/extraction methods have been developed in the ⇑ Corresponding author. E-mail address: [email protected] (Z. He).

last decades. Feature selection aims to select a subset of the original features which make the classes to be distinguished accurately. Some representative feature selection methods involve the suboptimal search strategy (Serpico and Bruzzone, 2001), clustering (Martínez-Usó et al., 2007; Yuan et al., 2016), genetic algorithm (Ghamisi and Benediktsson, 2015), partial least squares regression (Li et al., 2015; Neumann et al., 2016) and game theory (Gurram et al., 2016). One of the focus of this paper is feature extraction, which transforms the original HSI into new features and preserves the vast majority information. Widespread feature extraction methods include the principle component analysis (PCA) (Zabalza et al., 2014), gray level co-occurrence matrix (GLCM) (Huang et al., 2014), wavelet transform (WT) (Bruce et al., 2002; Hsu, 2007), one-dimensional singular spectrum analysis (1D-SSA) (Zabalza et al., 2014), one-dimensional empirical mode decomposition (1D-EMD) (He et al., 2014) and manifold learning-based methods (Huang et al., 2015; Ma et al., 2016). Besides spectral information, the spatial relationship of neighboring pixels is also crucial to yield promising results. In this regard, spatial information is extracted by extended morphological profiles (EMPs) (Fauvel et al., 2013; Gu et al., 2016), extended multiattribute

http://dx.doi.org/10.1016/j.isprsjprs.2016.08.007 0924-2716/Ó 2016 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

12

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

profiles (EMAPs) (Song et al., 2014), image segmentation (He et al., 2016; Li et al., 2015), two-dimensional EMD (2D-EMD) (Demir and Ertürk, 2010; Gormus et al., 2012; Ertürk et al., 2013; He et al., 2013) and two-dimensional SSA (2D-SSA) (Zabalza et al., 2015). The aforementioned feature extraction methods only deal with the HSI by vector or image-based strategies. However, a HSI data set is naturally modeled as three-dimensional (3D) cube which contains a spectral dimension and two spatial dimensions. As such, much work has been carried out in the literature to treat the HSI as volumetric data and detect the spectral-spatial features simultaneously. For instance, 3D discrete WT (3D-DWT) is adopted by Qian et al. (2013) to capture geometrical and statistical spectralspatial structures of the HSI cube. The traditional GLCM is extended into 3D scenario (Tsai and Lai, 2013) to extract discriminant texture features. Moreover, some researchers (Zhang et al., 2013; Guo et al., 2013; Zhong et al., 2015; Veganzones et al., 2016) extract spectral-spatial features by considering the HSI as a whole tensor rather than a vector or matrix. Designing suitable classifiers also plays a vital role to yield significant classification performance. Two types of state-of-the-art classifiers are support vector machine (SVM) (Vapnik, 1995; Cavallaro et al., 2015; Dalponte et al., 2015) and sparse representation-based classification (SRC) (Chen et al., 2011, 2013; Wu et al., 2015). The former is based on structural risk minimization, while the latter is motivated by the rapid development of compressed sensing. Other powerful methods, such as ensemble learning (Samat et al., 2014), active learning (Crawford et al., 2013) and deep learning (Zhao and Du, 2016) also announce impressive results for HSI classification. Recently, many techniques, which include some variations of SVM or SRC-based methods, have been proposed to incorporate both spectral and spatial characteristics of the samples. For instance, composite kernels (Camps-Valls et al., 2006) and multiple kernel learning (He and Li, 2015; Wang et al., 2016) are proposed to balance the spectral and spatial content by optimizing the linear combination of various kernels. Markov random filed (MRF) regularization is adopted by Tarabalka et al. (2010) to take advantage of the spatial contextual information to refine the classification results. Support tensor machines (STM) (Guo et al., 2016) is developed to tackle the classification problem of HSI with tensor-based data structure. Joint sparsity model (JSM) is applied in (Chen et al., 2011; Fu et al., 2016) to incorporate the spatial correlation between neighboring pixels into a joint SRC. Multitask learning (MTL) (He et al., 2014; Li et al., 2015; Yuan et al., 2015; Jia et al., 2016) is proposed to deal with multiple features of the HSI simultaneously by treating each type of feature as a task. Moreover, low-rank representation (LRR) (He et al., 2016; Xu et al., 2015; de Morsier et al., 2016) has also gained much popularity since the high spatial similarity of HSI implies the low-rank characteristic and LRR provides a robust tool for capturing the correlation of data belonging to several subspaces. Versatile as the MTL and LRR are, to the best of our knowledge, existing MTL rarely exploits the low-rank structure of HSI to improve the classification performance. In this paper, we extend the traditional 1D/2D-EMD into 3D (i.e. 3D-EMD) by naturally treating the HSI as a cube. The HSI can be decomposed into varying oscillations named 3D intrinsic mode functions (3D-IMFs), each of which is a 3D feature of the original HSI. In general, the computational efforts and memory requirements will exponentially increase with the addition of dimension. Therefore, two strategies are adopted to accelerate the 3D-EMD: (1) 3D Delaunay triangulation (3D-DT) (Golias and Dutton, 1997) is used to determine the distances of extrema. Rather than all of the extrema, only part of the extrema are involved in calculating the filter size. (2) Separable filters are adopted to generate the envelopes. That means, instead of performing a complicated 3D filter, we separately execute a 1D filter three times to achieve the

same results as 3D filter, thus reducing the computational requirements. Subsequently, robust MTL (RMTL) is proposed to classify all of the 3D-IMFs simultaneously by taking each IMF as a task. The RMTL captures the task relationships by low-rank structure, while the specificities can also be identified in the RMTL by sparse structure. The pairs of low-rank and sparse structures are realized by trace-norm and l12 -norm, respectively. Inexact augmented Lagrangian method (IALM) (Lin et al., 2009) is adopted to effectively solve the optimization problem in the RMTL. Based upon the above analysis, the framework of the proposed HSI classification method is outlined in Fig. 1. To sum up, the main innovative contributions of this work lie in the following two aspects: (1) We present the first attempt to develop a fast 3D-EMD, which treats the HSI as a data cube and decomposes the HSI into several 3D-IMFs. Instead of extending the 1D/2DEMD directly into 3D, two strategies (i.e. 3D-DT and separable filters) are adopted to facilitate implementation of the proposed 3D-EMD in filter size determination and envelope generation. Moreover, the 3D nature of 3D-IMFs help to better represent the spectral-spatial features of the samples in HSI. As such, compared to the vector or image-based methods, the proposed 3D-EMD method facilitates the preservation of spectral-spatial information. (2) We take each 3D-IMF as a task and simultaneously classify the 3D-IMFs by the RMTL, which integrates multiple features with both low-rank and sparse regularizations. It is of great importance to combine multiple features due to their potential advantages in characterizing the HSI over a single feature. Note that the high spatial similarity of the HSI implies the low-rank property and the sparse matrix identifies task specificities, the proposed low-rank and sparse basedRMTL can capture both shared factors and specificities of the tasks. The layout of this paper is as follows. Section 2 describes the proposed 3D-EMD method. Section 3 presents the RMTL for HSI classification. Experimental results on three benchmark HSI data sets are illustrated in Section 4. Finally, conclusions are drawn in Section 5.

2. 3D extension of EMD for analyzing HSI EMD, which is first proposed by Huang et al. (1998) has attracted a great deal of attention in various applications (Li et al., 2013; Song et al., 2014) due to its ability to extract local characteristics of the non-linear and/or non-stationary data. Contrary to most of the signal processing methods (e.g. Fourier transform and WT), the EMD can adaptively decompose the non-linear and/ or non-stationary data as sum of zero-mean amplitude modulation and frequency modulation (AM-FM) components termed as IMFs. As stated in Rilling et al. (2003), Deléchelle et al. (2005), Patel et al. (2016), and Ren et al. (2016), the sifting process of EMD is realized by detecting the local maxima and minima, generating the upper and lower envelops of the extrema, subtracting the mean envelopes to isolate the high-frequency oscillatory components and repeating the above-mentioned procedures recursively on the rest of data. Both 1D-EMD and 2D-EMD have been receiving considerable attention in the signal/image processing literature. As mentioned earlier, the 1D-EMD/2D-EMD methods have been successfully applied in hyperspectral classification (He et al., 2014; Demir and Ertürk, 2010; Gormus et al., 2012; Ertürk et al., 2013; He et al., 2013). Note that a HSI data is naturally formed as 3D cube, it would be promising to develop the 3D extension of

13

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

X1 y1

Normalized residues

0.4

3D-IMF 1

0.3

X2

Input: HSI

0.2

RMTL

y2

0.1 0

3D-EMD

3D-IMF 2

XK

1 2 3 4 5 6 7 8 9 10111213141516

Class

Output: Class label

yK

Test Sample Training Samples 3D-IMF K

Fig. 1. Schematic illustration of the proposed HSI classification method.

EMD (Riffi et al., 2015) for HSI analysis. However, the 3D-EMD presented by Riffi et al. (2015) is time-consuming1 and thus requires a much faster implementation to facilitate its application. In this section, a fast 3D-EMD is proposed to extract spectral-spatial features of HSI.

2.4 Generate the upper envelope Emax ðm; n; bÞ of the maxima and lower envelope Emin ðm; n; bÞ of the minima by MAX and MIN filters, respectively. The mean envelope Mðm; n; bÞ is subsequently obtained by

Mðm; n; bÞ ¼ 2.1. 3D-EMD scheme The proposed 3D-EMD is a sifting process that iteratively decomposes a volumetric data into a small number of 3D-IMFs and a final residue. Let Vðm; n; bÞ represent the HSI data cube with m; n; b 2 Nþ denote the data sizes, Fig. 2 depicts the procedure of the 3D-EMD, whose main steps can be summarized as follows. 1. Initialization: i ¼ 1 and Ri ðm; n; bÞ ¼ Vðm; n; bÞ; 2. Compute the envelopes: 2.1 j ¼ 1 and hi;j1 ðm; n; bÞ ¼ Ri ðm; n; bÞ; 2.2 Recognize the maps of local maxima and minima (denoted as M max ðm; n; bÞ and M min ðm; n; bÞ, respectively) by browsing hi;j1 ðm; n; bÞ ¼ Ri ðm; n; bÞ with a 3 3 3 neighboring window.2 A local maximum (or minimum) is a particular point that is strictly larger (or smaller) than all of its neighbors; 2.3 Determine the filter size (i.e. MAX/MIN filters and average filter), which can be used to construct the upper/lower envelopes of the extrema and the smoothed mean envelopes in steps 2.4 and 2.5, respectively. As will be shown in Section 2.2, 3D-DT is adopted to calculate the distances of local maxima (or minima), and the filter size can then be effectively determined by only part of the extrema; 1 As stated in Riffi et al. (2015), the execution time of 3D-EMD for decomposing a 256 256 26 brain volume is more than 1 h, while more than 4 h are required for a brain volume with size 256 256 35. 2 The reasons for using a 3 3 3 neighboring window in the proposed 3D-EMD are twofold. First, it is consistent with the window size used in Riffi et al. (2015). As shown in step 2 of the Riffi et al. (2015), a cube sized 3 3 3 is adopted to generate the maps of the maxima and minima. Second, it is much faster and more flexible to identify the local maxima and minima with a small neighboring window than a larger one. For instance, only 26 adjacent points should be compared with the center point in the 3 3 3 neighboring window, while 124 adjacent points are taken into consideration with a 5 5 5 neighboring window.

Emax ðm; n; bÞ þ Emin ðm; n; bÞ 2

ð1Þ

2.5 Compute the smoothed mean envelope M s ðm; n; bÞ by the average filter. It is notable that the aforementioned MAX/ MIN filters and average filter are separable filters, which help to accelerate the 3D-EMD (see Section 2.3); 2.6 Update the jth intermediate variable hi;j ðm; n; bÞ produced in extracting the ith 3D-IMF by subtracting the smoothed mean envelope M s ðm; n; bÞ

hi;j ðm; n; bÞ ¼ hi;j1 ðm; n; bÞ M s ðm; n; bÞ

ð2Þ

3. Decide whether hi;j ðm; n; bÞ is a 3D-IMF. The stop criterion utilized in this paper is the standard deviation (SD) between two successive iterations

2 Pm Pn Pb x¼1 y¼1 z¼1 hi;j ðx; y; zÞ hi;j1 ðx; y; zÞ S Dij ¼ 2 Pm Pn Pb x¼1 y¼1 z¼1 hi;j1 ðx; y; zÞ

ð3Þ

where ðx; y; zÞ refers to the coordinate of the 3D cube, SDij denotes the stop criterion generated in the jth iteration when extracting the ith 3D-IMF. In case SDij is low (e.g. lower than 0.05), hi;j ðm; n; bÞ is taken as the ith 3D-IMF, and let C i ðm; n; bÞ ¼ hi;j ðm; n; bÞ. Otherwise, set j ¼ j þ 1 and go back to step 2.2; 4. Update the residue by Riþ1 ðm; n; bÞ ¼ Ri ðm; n; bÞ C i ðm; n; bÞ; 5. Check whether Riþ1 ðm; n; bÞ contains less than three extrema. If so, take Riþ1 ðm; n; bÞ as the final residue and complete the whole decomposition. Otherwise, set i ¼ i þ 1 and go back to step 2.1. According to the above-mentioned steps 1–5, the original data Vðm; n; bÞ can be reconstructed from a number of 3D-IMFs and the residue

Vðm; n; bÞ ¼

Iþ1 X C i ðm; n; bÞ i¼1

ð4Þ

14

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Fig. 2. Flowchart of the proposed 3D-EMD algorithm. Steps in the blue dashed box demonstrate the main differences between our 3D-EMD algorithm and the one proposed by Riffi et al. (2015). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

where C i ðm; n; bÞ represents the ith (i ¼ 1; 2; . . . ; I) 3D-IMF, and the final residue RIþ1 ðm; n; bÞ is expressed as C Iþ1 ðm; n; bÞ for simplicity. Moreover, it is noteworthy that a common problem of the data interpolation in EMD is that the maxima and minima maps often do not have any data points at the boundary. As a consequence, the EMD based approaches which are associated with incorrect interpolation at the boundary often have significant problems with ‘‘end effects”. To avoid interpolation, the upper and lower envelopes in the proposed 3D-EMD method are generated by MAX and MIN filters (see Step 2.4), respectively. Therefore, the 3DEMD is inherently free of end effects and does not require extra end effects mitigation process. 2.2. Filter size determination In the 3D-EMD method proposed by Riffi et al. (2015), it is necessary to calculate the Euclidean distances among all maxima and minima to determine the distance vectors dadjmax and dadjmin , where dadjmax (or dadjmin ) is generated by sorting the distance array of the nearest Euclidean distances among each maximum (or minimum) and the other maxima (or minima) in descending order. Specifically, Fig. 3 depicts an example about how to obtain the dadjmax with 6 local maxima. For each maximum, we find the nearest Euclidean distances among the particular maximum and the other maxima. As shown in Fig. 3, 6 nearest Euclidean distances (i.e. dmax;1 ; dmax;2 ; . . . ; dmax;6 ) are found and the dadjmax can be generated by sorting dmax;1 to dmax;6 in descending order. The dadjmin can also be similarly calculated with the minima map. However, we observe that only the nearest Euclidean distances are involved in calculating the distance vectors. That means, a small number of

the maxima (or minima) near the particular maximum (or minimum) are needed, while the others far away from the particular maximum (or minimum) are redundant. In this regard, we can speed up the 3D-EMD by decreasing the amount of distance calculation. In this paper, 3D-DT (Golias and Dutton, 1997) is adopted to identify the small number of maxima (or minima) that are near a particular maximum (or minimum), and thus we can effectively obtain the distance vectors dadjmax and dadjmin . As noted in Golias and Dutton (1997), 3D-DT is a triangulation of 3D scatter data such that the circumsphere associated with any tetrahedron is empty. Specifically, the maps of local maxima M max ðm; n; bÞ (or minima M min ðm; n; bÞ) can be regarded as 3D scatter data, on which 3D-DT can be constructed to recognize a number of nearest neighbors of each extremum. Note that a large number of local maxima/ minima is involved in HSI processing, the advantage of 3D-DT will become obvious. Having obtained the distance vectors dadjmax and dadjmin , the filter size d yields

d¼

d1 þ d2 þ d3 þ d4 4 odd

ð5Þ

where ½odd calculates the nearest odd integer, and d1 ; d2 ; d3 ; d4 are the existing four types of distances modeled by

d1 ¼ minfminfdadjmax g; minfdadjmin gg

ð6Þ

d2 ¼ maxfminfdadjmax g; minfdadjmin gg

ð7Þ

d3 ¼ minfmaxfdadjmax g; maxfdadjmin gg

ð8Þ

d4 ¼ maxfmaxfdadjmax g; maxfdadjmin gg

ð9Þ

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

15

Fig. 3. An example about how to obtain the dadjmax with 6 local maxima. dmax;1 ; dmax;2 ; . . . ; dmax;6 are the nearest Euclidean distances among the particular maximum and the other maxima.

2.3. Envelope generation In step 2.4 of the 3D-EMD, the upper envelope Emax ðm; n; bÞ and lower envelope Emin ðm; n; bÞ can be respectively generated by

Emax ðx; y; zÞ ¼ MAX hi;j1 ðs; t; uÞ Emin ðx; y; zÞ ¼ MIN hi;j1 ðs; t; uÞ

ð10Þ ð11Þ

where MAX{} and MIN{} refer to the MAX and MIN filter, respectively, ðx; y; zÞ denotes the coordinate of envelopes, h i h i h i ~ ~ ~ ~ ~ ~ ~ s 2 x d; x þ d ; t 2 y d; y þ d ; u 2 z d; z þ d and d ¼ d1 . 2

At this point, the average filter in Eq. (12) can be realized by sequentially performing an 1D convolution along the first, second and third dimensions. As an example, Fig. 4 gives a schematic illustration of the separability property of the MAX filter. The size of the original 3D volume is 2 4 2 and the filter size d ¼ 3. As observed from Fig. 4, the MAX filter can be achieved by successively implementing an 1D MAX filter along the first, second and third dimensions. In addition, the MIN filter and average filter, which are not shown in Fig. 4, can also be separably performed along the first to third dimensions.

Accordingly, the mean envelope is obtained by Eq. (1) and the smoothed mean envelope M s ðm; n; bÞ in step 2.5 can be given by the following average filter

3. Robust multitask learning for HSI classification

~ yþd ~ xþd zþd X X~ X 1 M s ðx; y; zÞ ¼ Mðs; t; uÞ ddd ~ ~ ~

In this section, a RMTL method is proposed to make full use of the 3D-IMFs for HSI classification. Taking each 3D-IMF as a task, and there are K ¼ I þ 1 tasks in the RMTL. Suppose the HSI contains J classes of land-covers, the training feature matrix of the kth

ð12Þ

s¼xdt¼ydu¼zd

As emphasized in Section 2.1, all of the MAX/MIN and average filters appeared in Eqs. (10)–(12) are separable filters. As to the MAX/MIN filters, it is notable from Siskos et al. (1998) that they have separability property and thus can be achieved by independently applying them along rows and columns in a 2D image. In 3D-EMD, we generalize the separability property of MAX/MIN filters to 3D scenario and separably implement them along the first, second and third dimensions in a 3D HSI cube. As to the average filter, we rewrite Eq. (12) as

M s ðx; y; zÞ ¼

~ yþd ~ xþd zþd X X~ X

~t¼yd ~ ~u¼zd s¼xd

ð13Þ

e ¼ K denotes a where ‘’ represents the convolution operator, K ddd 3D convolution kernel, and K is an all-ones 3D matrix, which can also be treated as a third-order tensor. As such, the CANDECOMP/PARAFAC (CP) decomposition of K is

K ¼abc

where X kj 2 Rmk pj ; j ¼ 1; 2; . . . ; J is associated with the jth class, mk denotes the feature dimensionality of the kth task, pj indicates P the number of training samples belong to class j, and Jj¼1 pj ¼ p is the total number of training samples. To classify an unlabeled pixel y ¼ ½y1 ; y2 ; . . . ; yK , we try to solve the following problem

min L;E

ð14Þ

where ‘’ represents the outer product, a; b and c are all-ones vectors with lengths equal to d. The average filter is separable because the rank of the Kruskal form of K shown in the right-hand side of Eq. (14) is equal to 1.

kLk þ kkEk1;2 yk ¼ X k ðLk þ Ek Þ; k ¼ 1; 2; . . . ; K

s:t:

e ðx s; y t; z uÞ Mðs; t; uÞ K

e ðx; y; zÞ ¼ MK

(k ¼ 1; 2; . . . ; K) task can be expressed as X k ¼ ½X k1 ; X k2 ; . . . ; X kJ ,

ð15Þ

where L is a low-rank matrix to identify the shared characters among multiple tasks, E denotes a row-sparse matrix to recognize the task specificities, yk ; X k ; Lk ; Ek are related to the kth task, k k refers to the trace norm (also known as nuclear norm) of a matrix, i.e., the sum of singular values of a matrix, and k k1;2 encourages the row-sparsity in E. Since the matrices Lk and Ek in the constraint equation are multiplied by the training feature matrix X k , two auxiliary matrices P and Q are added to simplify the solution procedure. As such, the problem in Eq. (15) can be reformulated as

min P;Q

s:t:

kPk þ kkQ k1;2 Lk ¼ P k ; Ek ¼ Q k ; yk ¼ X k Lk þ X k Ek ; k ¼ 1; 2; . . . ; K

ð16Þ

16

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Fig. 4. An example for the separability property of the MAX filter. The size of the 3D volume is 2 4 2 and the filter size d ¼ 3.

It is notable that IALM (Lin et al., 2009) can be applied to solve the above problem in Eq. (16). Specifically, the augmented Lagrangian function is expressed as K n X Ll ðL;E;P;Q Þ ¼ kPk þ kkQ k1;2 þ hT k1 ;yk X k Lk X k Ek i

Moreover, fix the others and update Q in the ðt þ 1Þth iteration by

Q ðtþ1Þ ¼ arg minLl ðLðtþ1Þ ; EðtÞ ; P ðtþ1Þ ; Q Þ Q

¼ arg minkkQ k1;2 Q

k¼1

þhT k2 ;Lk P k i þ hT k3 ;Ek Q k i þ

l 2

kyk X k Lk X k Ek k2F þ kLk P k k2F þ kEk Q k k2F

þ

o

¼ arg minkkQ k1;2 þ Q

l > 0 is a penalty parameter, T k1 ; T k2 ; T k3 ; k ¼ 1; 2; ; K are

Lagrange multipliers, Lk ; Ek ; P k ; Q k are the kth column of L; E; P; Q , respectively. The variables L; E; P; Q can be updated one at a time by IALM. In greater detail, fix the others and update P in the ðt þ 1Þth iteration by

E

ðtþ1Þ

kLk;ðtÞ P k k2F

F

ð21Þ

k¼1

l k Q k;ðtþ1Þ i þ ky X k Lk;ðtþ1Þ X k Ek k2F 2 o þkEk Q k;ðtþ1Þ k2F

o

2

2 K X l k 1 k;ðtÞ k;ðtÞ P L þ T ¼ arg minkPk þ P 2 k¼1 l 2 F

2 1 1 1 ðtÞ ðtÞ ¼ arg min kPk þ P L þ T 2 P l 2 l F

ð22Þ

In analogy to L, the kth column of Eðtþ1Þ can be obtained by

ð18Þ

E

k;ðtþ1Þ

¼

K X T Iþ ðX k Þ X k

!1

k¼1

T T k;ðtÞ 1 k;ðtÞ ðX k Þ ðyk X k Lk;ðtþ1Þ Þ þ Q k;ðtþ1Þ þ ððX k Þ T 1 T 3 Þ

l

Fix the others and update L in the ðt þ 1Þth iteration by

L

l

k;ðtÞ þhT 3 ; Ek

k¼1

ðtþ1Þ

k¼1

2 Q k Ek;ðtÞ þ 1 T k;ðtÞ 3

K n X k;ðtÞ ¼ arg min hT 1 ; yk X k Lk;ðtþ1Þ X k Ek i E

l

K X

L

P

k;ðtÞ

o

¼ arg minLl ðLðtþ1Þ ; E; P ðtþ1Þ ; Q ðtþ1Þ Þ

¼ arg minkPk hT 2 ; Lk;ðtÞ P k i þ

2

2

kEk Q k k2F

Finally, fix the others and update E in the ðt þ 1Þth iteration by

P

K n X

l

l

2 k 1 1 ðtÞ Q EðtÞ þ T 3 ¼ arg min kQ k1;2 þ Q l 2 l F

P ðtþ1Þ ¼ arg minLl ðLðtÞ ; EðtÞ ; P; Q ðtÞ Þ

þ

k;ðtÞ

hT 3 ; Ek;ðtÞ Q k i þ

k¼1

ð17Þ where

K n X

ð23Þ

¼ arg minLl ðL;EðtÞ ;P ðtþ1Þ ;Q ðtÞ Þ L

K n X k;ðtÞ k;ðtÞ ¼ arg min hT 1 ;yk X k Lk X k Ek;ðtÞ i þ hT 2 ;Lk P k;ðtþ1Þ i

þ

l 2

L

k¼1

kyk X k Lk X k Ek;ðtÞ k2F þ kLk P k;ðtþ1Þ k2F

Accordingly, the kth column of L

Lk;ðtþ1Þ ¼

Iþ

K X

ðtþ1Þ

o

ð19Þ

yields

!1

T

ðX k Þ X k

k¼1

T T k;ðtÞ 1 k;ðtÞ ðX k Þ ðyk X k Ek;ðtÞ Þ þ P k;ðtþ1Þ þ ððX k Þ T 1 T 2 Þ ð20Þ

l

Although the optimization problems in Eqs. (18) and (21) are nonlinear convex, fortunately, both of them have closed-form solutions. On the one hand, Eq. (18) can be solved by the singular value thresholding operator

2 1 1 ðtÞ ðtÞ kPk þ þ T P L P l 2 l 2 F

1 ðtÞ ¼ Dl1 LðtÞ þ T 2 ¼ UDl1 ðRÞV T l !

1 VT ¼ Udiag pi

P ðtþ1Þ ¼ arg min

1

l

þ

16i6~r

ð24Þ

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

where ðÞþ ¼ maxð; 0Þ; ~r and pi indicates the rank and singular val ðtÞ ues of LðtÞ þ l1 T 2 , respectively. On the other hand, the solution of Eq. (21) satisfies

Q

ðtþ1Þ

2 1 1 ðtÞ ðtÞ ¼ arg min kQ k1;2 þ Q E þ T 3 Q l 2 l F

1 ðtÞ ðtÞ ¼ S lk E þ T 3 k

l

ð25Þ

~ ¼ EðtÞ þ 1 T ðtÞ Let the ith row of optimal solution Q ðtþ1Þ and C l 3 c i respectively, and then in Eq. (25) be qi and ~

( k~c k

qi ¼

k i 2 l ~c i ; k~ci k2

if lk < k~c i k2 otherwise

0;

ð26Þ

Based upon the above analysis, the main steps for solving Eq. (15) can be outlined in Algorithm 1. After solving L and E, the class label of the test sample y is determined by the minimal total residual K X Class y ¼ arg minrj ¼ arg min kyk X kj ðLkj þ Ekj Þk22 j

j

ð27Þ

k

Algorithm 1. Solving Problem Eq. (15) by IALM Require: The test feature matrix y, the training feature matrix X k ; k ¼ 1; 2; . . . ; K, the parameters k; l and the scalar q Ensure: The optimal solution L and E 1: Initialize: L ¼ E ¼ P ¼ Q ¼ T 2 ¼ T 3 ¼ 0; T 1 ¼ 0; q ¼ 1:1,

l ¼ 106 , lmax ¼ 106 and t ¼ 0 2: while Convergence is not attained do 3: Fix the others and update P by Eq. (18) 4: Fix the others and update L by Eqs. (19) and (20) 5: Fix the others and update Q by Eq. (21) 6: Fix the others and update E by Eqs. (22) and (23) 7: Update Lagrange multipliers T 1 ; T 2 , and T 3 by k;ðtþ1Þ

T1

k;ðtþ1Þ T2 k;ðtþ1Þ T3

8:

k;ðtÞ

¼ T1 ¼ ¼

k;ðtÞ T2 k;ðtÞ T3

þ lðyk X k;ðtþ1Þ Lk;ðtþ1Þ X k;ðtþ1Þ Ek;ðtþ1Þ Þ

Indiana, USA. This is a commonly used data set in HSI classification, which contains 220 contiguous bands across the spectral range from 400 to 2500 nm. In the experiments, we reduce the number of bands to 200, by removing 20 water absorption bands. The spatial dimension is 145 145 for each band and the spatial resolution is 20 m per pixel. This data set contains 16 land-cover classes and 10366 labeled pixels (see Table 1) range unbalanced from 20 to 2468 for different classes, which poses a big challenge for the classification task. The three-band false color composite image and the ground truth map are shown in Fig. 5. The background color in Table 1 corresponds to the color in Fig. 5(b). (2) The University of Pavia data set was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS-03) optical sensor over an urban area surrounding the University of Pavia, North Italy, on July 8, 2002. The original image provides 115 spectral bands ranging from 430 to 860 nm. 12 most noisy bands are removed and 103 bands remained for experiments. The image size is 610 340 pixels with very high spatial resolution of 1.3 m per pixel. There are 9 classes of interests available and each class contains more than 900 pixels (see Table 2). However, the available training samples of each class are less than 600. The false color composite image, the ground truth map as well as the available training samples are shown in Fig. 6. Similar to the Indian Pines data set, the background color in Table 2 also agrees with the color in Fig. 6(b) and (c). (3) The Salinas data set was collected by the AVIRIS sensor over the Valley of Salinas, Southern California, USA, on October 8, 1998. The original data contains 224 spectral channels, 20 water absorption bands are removed and 204 bands are preserved for experiments. This image has 16 exclusive classes of 512 217 pixels with a spatial resolution of 3.7 m per pixel. The specific classes and the number of samples for each class are displayed in Table 3. Fig. 7 depicts the false color image together with the corresponding ground truth. Moreover, the background color of Table 3 is also in accord with that of Fig. 7(b).

þ lðLk;ðtþ1Þ P k;ðtþ1Þ Þ

4.2. Experimental setting

þ lðEk;ðtþ1Þ Q k;ðtþ1Þ Þ

The purpose of the experiments is to investigate the effectiveness of the 3D-EMD-based feature descriptor and the RMTL method for hyperspectral classification. To that end, we compare the proposed methods with other state-of-the-art methods. For the feature extraction, we take into consideration the raw spectral profiles (denoted as ‘‘Spec”), 1D-EMD (He et al., 2014), 2D-EMD (Demir and Ertürk, 2010) and 3D-DWT (Qian et al., 2013). For the classifiers, we compare RMTL against SVM (Vapnik, 1995), SVM with composite kernel (SVMCK) (Camps-Valls et al., 2006), JSM solved by the simultaneous orthogonal matching pursuit (SOMP) (Chen et al., 2011) and MTL (He et al., 2014). Notably that it is difficult even infeasible to identify the label of samples in practice, we use very few labeled samples in the experiments. That means, 10 samples for each class in the Indian Pines data and Salinas data are randomly selected as training samples, while the rest are taken as test ones. In the University of Pavia data, only 10 of the available training samples from each class are randomly chosen for training. All of the experiments are implemented on the above-mentioned three benchmark hyperspectral images. Different methods are compared numerically by overall accuracy (OA) and average accuracy (AA) and statistically by kappa coefficient (j). Our experiments are performed by MATLAB on a computer with Intel(R) Xeon(R) CPU (3.3 GHz), 8 GB RAM and Windows 7 operating system. The experiments reported in this section are the average

Update the parameter

l by

l ¼ minðlmax ; qlÞ 9:

17

Update t by

t ¼tþ1 10: end while

4. Experiments and analysis 4.1. Hyperspectral data sets The effectiveness of the proposed 3D-EMD and RMTL methods are evaluated using three real-world remote sensing hyperspectral data sets. Here, we briefly describe those data sets as follows. (1) The Indian Pines data set was gathered by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) instrument on June 12, 1992, over the Indian Pine test site in Northwestern

18

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Table 1 Number of samples (NoS) used in the Indian Pines data.

results from ten times of repeated random training-sample selection. It is crucial to tune the parameters of various methods. For the feature extraction methods, the stopping criteria of 1D-EMD is equal to He et al. (2014), 2D-EMD (Demir and Ertürk, 2010) is speed up by the fast and adaptive version, 3D-DWT is realized by two levels decomposition with the ‘‘Haar” mother wavelet, and 5 (or 6 and 5) 3D-IMFs are generated in the Indian Pines data (or University of Pavia data and Salinas data) by 3D-EMD. The ‘‘Spec” and 1D-EMD are spectral feature-based methods, the 2D-EMD can effectively extract the spatial features, while 3D-DWT and 3D-EMD respect the 3D nature of HSI and explore spectralspatial features. The components (i.e. 1D/2D/3D-IMFs and wavelet components) obtained by different methods are respectively

regarded as the features of different tasks in the RMTL. The ‘‘Spec” can also be viewed as a special case of the RMTL with the task K ¼ 1. For the classification methods, 3D-EMD-based features are applied in various classifiers. Regarding no kernel is used in SOMP and the proposed RMTL, we adopt linear kernel in SVM, SVMCK and MTL for consistency. The penalty term in both SVM and SVMCK is tuned with f0:1; 1; 10; 20; . . . ; 1000g. Note that each task in the MTL and RMTL is treated equally, to be fair, the composite weight in SVMCK is set to K1 for all of the 3D-IMFs. The sparsity level of SOMP ranges from 10 to 100, and the spatial window sizes are taken as 9 9; 5 5 and 9 9 in the Indian Pines data, University of Pavia data and Salinas data, respectively. Moreover, the regularization parameter k in both MTL and RMTL is selected from f0:01; 0:1; . . . ; 1g. To examine the effect of k on the classification accuracy of RMTL, a simple experimental study is performed in which the OA of the Indian Pines data is computed at various values of k. It is shown from Fig. 8 that there is a fluctuation in the accuracy and the OA reaches to the maxima when k is set at 0.8. Moreover, the size of HSI data cube Vðm; n; bÞ and the number of 3D-IMFs are important parameters in the 3D-EMD. Specifically, m; n; b have much more impact on the computational time. An experiment is performed on the Indian Pines data to analyze the influence of the data size. We use a series of subsets cropped from the whole image. The data sizes are originally set to m ¼ 45; n ¼ 45; b ¼ 100, and then uniformly increased with a step size of 1, i.e. the sizes of subsets are 45 45 100; 46 46 101; . . . ; 145 145 200. The computational time of 3D-EMD with various data sizes are shown in Fig. 9, from which we can find that the size of HSI data cube Vðm; n; bÞ has direct effect to the computational time in hyperspectral classification, i.e. the larger the data sizes, the longer the computational time. In addition, the number of 3D-IMFs is an important factor for the classification accuracy. It is notable from the procedure of 3D-EMD (see Section 2.1) that

Fig. 5. Indian Pines data. (a) Three-band false color composite and (b) ground truth data with 16 classes.

Table 2 NoS and available training samples (NoATS) used in the University of Pavia data.

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

19

Fig. 6. University of Pavia data. (a) Three-band false color composite, (b) ground truth data with 9 classes, and (c) available training samples.

Table 3 NoS used in the Salinas data.

the whole sifting process stops until the residue Riþ1 ðm; n; bÞ contains less than three extrema, and thus the number of 3D-IMFs is obtained automatically in this paper. In case the iteration process is terminated, we will obtain less number of 3D-IMFs. As an example, the OA (%) of the RMTL with various number of 3D-IMFs for the Indian Pines data is shown in Fig. 10, from which we can observe that the OA with 0 3D-IMFs (i.e. the original HSI data set) is the lowest, while 5 3D-IMFs provides better accuracy than any other cases.

4.3. Classification results and discussions Experimental results are provided in this subsection to validate the proposed 3D-EMD and RMTL methods. As an example, the decomposition results of 3D-EMD for the Indian Pines data are shown in Fig. 11. It is notable that the lower-order 3D-IMFs have higher local spatial frequencies and therefore, involve less spatial integration. Opposite results can be observed from the higherorder 3D-IMFs. Moreover, Fig. 12 depicts the test samples of classes 4, 7, 9 and 15 in ‘‘Spec” and 3D-EMD feature spaces under 2D condition. We choose classes 4, 7, 9 and 15 due to the fact that those classes are difficult to separate. Bands #40 and #70 are used to form the features of different methods. Comparing the ‘‘Spec” features shown in Fig. 12(a) and the 3D-EMD features shown in Fig. 12

(b)–(f), we can observe that the separability is significantly improved in the 3D-EMD feature space. Without loss of generality, the thematic maps of various methods in a trial are visually compared in Figs. 13–15. More specifically, Tables 4–6 quantitatively summarize the classification accuracies (i.e. classification accuracy of each class, OA, AA and j) and computational time of various methods under comparison. In addition, Fig. 16 depicts the normalized residues of a sample from class 15 of the Indian Pines data located at (23,26). The horizontal and vertical axis refer to the class number and corresponding residue, respectively. It is shown in Fig. 16(b)–(d) that the 3DEMD features with RMTL correctly classifies the sample located at (23,26) into class 15, whereas the other two methods (i.e. Spec (RMTL) and 3D-EMD(MTL)) classify the sample into class 10 and class 3, respectively. This indicates that 3D-EMD with RMTL can potentially yield good performance. A few observations and discussions can be highlighted from Figs. 13–15, and Tables 4–6. It can be first seen that, the ‘‘Spec” and 1D-EMD-based features generate more scattered noise than the 2D-EMD, 3D-DWT or 3D-EMD-based ones. For instance, it is clearly visible from Fig. 13(a)–(e) that many salt-and-pepper-like misclassification pixels are appeared in the classification maps of ‘‘Spec” and 1D-EMD, while the wrongly classified samples obtained by other methods (i.e. 2D-EMD, 3D-DWT and 3D-EMD) are spatially concentrated. Similar properties can also be found from Figs. 14 and 15. Furthermore, with the same RMTL classifier, ‘‘Spec”

20

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Overall accuracy (%)

100 80 60 40 20 0

0

1

2

3

4

5

Number of 3D-IMFs Fig. 10. Overall accuracy (%) of the RMTL with various number of 3D-IMFs for the Indian Pines data.

Fig. 7. Salinas data. (a) Three-band false color composite and (b) ground truth data with 16 classes.

Overall accuracy (%)

100 95 90 85 80

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

in the proposed RMTL

Computational time (s)

Fig. 8. Effect of k on the overall accuracy (%) for the Indian Pines data.

250 200 150 100 50 0

45 × 45 × 100

65 × 65 × 120

85 × 85 × 140

105 × 105 × 160 125 × 125 × 180 145 × 145 × 200

Data sizes Fig. 9. The computational time of 3D-EMD with various data sizes (i.e. 45 45 100; 46 46 101; . . . ; 145 145 200) for the Indian Pines data.

and 1D-EMD-based features lead to lower classification accuracies than other features (see Tables 4–6). As an example, it is observed from the second to sixth columns of Table 4 that the OAs of ‘‘Spec” and 1D-EMD are at least 4% lower than 2D-EMD, 3D-DWT or 3D-

EMD. The above-mentioned phenomena stress the importance of spatial information for HSI data. We then compare various feature descriptors in greater detail by using the same classifiers (i.e. RMTL). It can be seen from Tables 4–6 that the classification accuracies of ‘‘Spec” are worst among all of the feature extraction methods, 1D-EMD provides much better results than ‘‘Spec”, while 2D-EMD is comparable to 3D-DWT and both of them are superior to ‘‘Spec” and 1D-EMD. Moreover, 3D-EMD generally leads to better results than the other four methods. As shown in Table 6 that the OA of ‘‘Spec” is 2.67%, 3.93%, 4.58% and 8.99% lower than 1D-EMD, 2D-EMD, 3D-DWT and 3DEMD, respectively. The main reason for unsatisfactory performance of ‘‘Spec” is due to the absence of powerful feature extraction technologies. The OA, AA and j of 2D-EMD in Table 6 trail slightly behind 3D-DWT, but both of them yield higher classification accuracies than ‘‘Spec” and 1D-EMD. This highlights yet again the importance of spatial information for hyperspectral classification. Although both 3D-DWT and 3D-EMD are 3D-based methods that respect the 3D nature of HSI, the OA, AA and j of 3D-EMD in Table 6 are still 4.41%, 2.76% and 4.90% higher than 3D-DWT. Similar results can also be found in Tables 4 and 5. The reason for good results of 3D-EMD with respect to 3D-DWT is that datadependent methods are naturally discriminative as compared to the methods with predetermined basis functions. An additional goal of Tables 4–6 are to compare the performance of different classifiers with the same 3D-EMD-based features. It should be mentioned that the sum of 3D-IMFs3 are used as input features of SVM and SOMP, while each 3D-IMF is taken as a kind of feature/task in SVMCK,4 MTL and RMTL. It can be observed from Tables 4–6 that although SVM is a state-of-the-art method for HSI classification, it leads to lower classification accuracies than almost all of the other classifiers under comparison. SVMCK leads to better performance than SVM, while MTL and RMTL provide higher classification accuracies than any other baseline methods. For instance, it is shown from Table 5 that the OA of SVM is 2.01%, 3.07% and 7.13% lower than SVMCK, MTL and RMTL, respectively, while the OAs of MTL and RMTL are 1.06% and 5.12% higher than SVMCK. As to SOMP, it achieves comparable or slightly better classification performance than SVMCK in the Indian Pines data and Salinas data. However, SOMP exhibits poor results for the University Pavia data. As displayed in Table 5, the OA, AA and j of SOMP are 2.07%, 0.81% and 2.49% lower than SVM. This is due to the fact that the available training samples (see Fig. 6(c)) are composed of small

3 Although the sum of 3D-IMFs involves various scenarios (e.g. C 1 þ C 2 ; C 1 þ C 2 þ C 3 , etc.), we choose only one scenario that leads to the best classification performance. 4 Each 3D-IMF is taken as a kind of feature in SVMCK and the composite weight is set to K1 for all of the 3D-IMFs. That means, the composite kernel K in SVMCK is P constructed by Kðxi ; xj Þ ¼ K1 Kk¼1 Kk ðxki ; xkj Þ, where xki denotes the ith sample from the kth 3D-IMF and Kk refers to the kth kernel to construct the composite kernel K.

21

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Fig. 11. Decomposition results of 3D-EMD for the Indian Pines data. (a) The original data cube, (b)–(f) the 1st to 5th 3D-IMFs, and (g) the final residue.

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

Class 4 Class 7 Class 9 Class 15

0.2 0 0.2

0.4

0.6

0.8

Class 4 Class 7 Class 9 Class 15

0.2 0 0.2

1

0.4

(a)

0.6

0.8

Class 4 Class 7 Class 9 Class 15

0.2 0 0.2

1

0.4

(b)

1

1

0.8

0.8

0.6

0.6

0.8

1

(c) 1 0.8 0.6

0.6

0.4

0.4

0.4 Class 4 Class 7 Class 9 Class 15

0.2 0 0.2

0.4

0.6

(d)

0.8

1

Class 4 Class 7 Class 9 Class 15

0.2

0

0

0.2

0.4

0.6

(e)

0.8

1

Class 4 Class 7 Class 9 Class 15

0.2 0 0

0.2

0.4

0.6

0.8

1

(f)

Fig. 12. Two-dimensional representation of the ‘‘Spec” and 3D-EMD features for the Indian Pines data. (a) The ‘‘Spec” features and (b)–(f) the 1st to 5th 3D-IMF features.

22

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

(a) Spec(RMTL)

(b) 1D-EMD(RMTL) (c) 2D-EMD(RMTL)

(f) 3D-EMD(SVM) (g) 3D-EMD(SVMCK) (h) 3D-EMD(SOMP)

(d) 3D-DWT(RMTL) (e) 3D-EMD(RMTL)

(i) 3D-EMD(MTL)

(j) Ground truth

Fig. 13. Classification maps of the Indian Pines data with (a) Spec(RMTL), (b) 1D-EMD(RMTL), (c) 2D-EMD(RMTL), (d) 3D-DWT(SRMTL), (e) 3D-EMD(RMTL), (f) 3D-EMD (SVM), (g) 3D-EMD(SVMCK), (h) 3D-EMD(SOMP), (i) 3D-EMD(MTL)-based methods and (j) denotes the ground truth.

(a) Spec(RMTL)

(b) 1D-EMD(RMTL)

(c) 2D-EMD(RMTL) (d) 3D-DWT(RMTL)

(f) 3D-EMD(SVM) (g) 3D-EMD(SVMCK) (h) 3D-EMD(SOMP)

(i) 3D-EMD(MTL)

(e) 3D-EMD(RMTL)

(j) Ground truth

Fig. 14. Classification maps of the University of Pavia data with (a) Spec(RMTL), (b) 1D-EMD(RMTL), (c) 2D-EMD(RMTL), (d) 3D-DWT(SRMTL), (e) 3D-EMD(RMTL), (f) 3D-EMD (SVM), (g) 3D-EMD(SVMCK), (h) 3D-EMD(SOMP), (i) 3D-EMD(MTL)-based methods and (j) denotes the ground truth.

patches, which may cause the window around a pixel contains no training samples. Particularly, the first class (i.e. asphalt) covering narrow regions provide quite low accuracy. Moreover, RMTL shows superior classification results to MTL, which indicates the advantages of low-rank and sparse constrains in HSI classification.

Finally, we discuss the computational time cost issue. As to the feature extraction methods, ‘‘Spec” spends less time than any other methods since the raw spectral profiles involve no feature extraction strategies. Comparing against 3D-DWT and 3D-EMD, 1D-EMD and 2D-EMD consume much larger amount of time. As shown in

23

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

(a) Spec(RMTL)

(b) 1D-EMD(RMTL) (c) 2D-EMD(RMTL) (d) 3D-DWT(RMTL) (e) 3D-EMD(RMTL)

(f) 3D-EMD(SVM) (g) 3D-EMD(SVMCK) (h) 3D-EMD(SOMP)

(i) 3D-EMD(MTL)

(j) Ground truth

Fig. 15. Classification maps of the Salinas data with (a) Spec(RMTL), (b) 1D-EMD(RMTL), (c) 2D-EMD(RMTL), (d) 3D-DWT(SRMTL), (e) 3D-EMD(RMTL), (f) 3D-EMD(SVM), (g) 3D-EMD(SVMCK), (h) 3D-EMD(SOMP), (i) 3D-EMD(MTL)-based methods and (j) denotes the ground truth.

Table 4 Classification accuracy (%), standard deviation (in bracket) and computational time of various methods for the Indian Pines data. The best accuracy or shortest time is in bold. Class

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

RMTL

3D-EMD

Spec

1D-EMD

2D-EMD

3D-DWT

3D-EMD

SVM

SVMCK

SOMP

MTL

RMTL

65.91 59.97 53.16 79.91 82.75 87.65 62.50 97.08 60.00 38.83 60.05 37.58 99.01 90.89 62.70 87.06

70.46 74.02 60.56 86.16 82.14 92.27 68.75 98.54 70.00 71.29 70.42 63.41 99.51 95.48 68.38 96.47

86.36 78.79 75.97 89.73 74.13 84.53 68.75 94.36 70.00 91.65 73.76 72.68 97.03 95.25 75.41 96.47

90.91 81.53 73.30 86.16 84.19 94.57 68.75 98.96 80.00 89.88 70.83 73.51 99.51 98.29 83.51 97.65

93.18 87.99 88.35 87.50 86.45 89.42 81.25 98.33 90.00 93.84 90.24 75.50 94.06 92.21 92.97 97.65

84.09 74.44 73.42 81.25 82.75 81.82 68.75 92.90 70.00 84.03 79.66 63.74 98.52 87.62 76.22 92.94

88.64 84.83 82.77 81.70 78.23 85.07 75.00 97.08 70.00 90.71 66.11 78.97 99.01 91.98 92.43 92.94

90.91 81.81 81.67 84.82 88.71 83.58 75.00 95.20 0.00 84.24 76.81 76.49 98.02 92.45 80.81 94.12

90.91 85.67 79.00 85.27 84.39 84.12 81.25 97.91 80.00 82.67 84.13 74.83 92.08 89.41 83.51 94.12

93.18 87.99 88.35 87.50 86.45 89.42 81.25 98.33 90.00 93.84 90.24 75.50 94.06 92.21 92.97 97.65

(continued on next page)

24

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

Table 4 (continued) Class

a

RMTL

3D-EMD

Spec

1D-EMD

2D-EMD

3D-DWT

3D-EMD

SVM

SVMCK

SOMP

MTL

RMTL

OA

66.41 (1.89)

77.46 (1.81)

81.82 (1.75)

83.28 (1.68)

89.80 (1.25)

80.21 (1.79)

82.08 (1.72)

83.33 (1.65)

84.83 (1.59)

89.80 (1.25)

AA

70.32 (1.21)

79.24 (1.18)

82.81 (1.10)

85.72 (0.98)

89.93 (0.74)

80.76 (1.13)

84.72 (1.02)

80.29 (1.19)

85.58 (0.99)

89.93 (0.74)

j

61.77 (2.13)

74.37 (1.98)

79.41 (1.92)

81.02 (1.88)

88.42 (1.42)

77.59 (1.94)

79.74 (1.91)

81.16 (1.85)

82.77 (1.76)

88.42 (1.42)

time (s)a

0 +33.85

1152.25 +33.74

2580.37 +33.94

8.12 +33.65

240.25 +33.58

240.25 +1.27

240.25 +8.17

240.25 +183.34

240.25 +43.52

240.25 +33.58

The first and second rows denote the computational time of feature extraction methods and classification methods, respectively.

Table 5 Classification accuracy (%), standard deviation (in bracket) and computational time of various methods for the University of Pavia data. The best accuracy or shortest time is in bold. Class

a

RMTL

3D-EMD

Spec

1D-EMD

2D-EMD

3D-DWT

3D-EMD

SVM

SVMCK

SOMP

MTL

RMTL

1 2 3 4 5 6 7 8 9

66.32 61.04 38.40 96.77 99.11 52.97 86.69 76.21 98.94

64.14 77.81 61.74 96.70 99.33 53.63 85.41 80.01 99.37

77.09 71.86 71.56 96.18 99.55 59.53 92.86 87.97 98.84

72.61 81.54 71.84 96.54 99.33 56.75 90.53 90.30 99.89

85.14 81.24 77.69 98.04 97.30 80.66 96.84 97.15 99.62

73.81 79.48 60.22 96.36 99.73 54.07 85.67 92.45 99.87

69.05 85.42 57.79 95.04 99.85 55.16 91.13 93.35 98.63

68.13 77.42 64.57 94.75 98.02 56.23 94.29 84.63 96.23

77.08 82.64 68.94 98.76 99.85 54.92 96.02 94.30 98.84

85.14 81.24 77.69 98.04 97.30 80.66 96.84 97.15 99.62

OA

66.50 (1.71)

74.99 (1.65)

76.46 (1.58)

79.84 (1.41)

85.38 (0.89)

78.25 (1.45)

80.26 (1.38)

76.18 (1.63)

81.32 (1.29)

85.38 (0.89)

AA

75.16 (1.23)

79.79 (1.05)

83.94 (0.97)

84.37 (0.91)

90.41 (0.54)

82.40 (1.04)

82.82 (0.84)

81.59 (1.01)

85.70 (0.72)

90.41 (0.54)

j

58.06 (1.92)

67.80 (1.74)

69.95 (1.68)

73.84 (1.48)

80.91 (1.01)

71.37 (1.61)

74.10 (1.32)

68.88 (1.71)

75.71 (1.24)

80.91 (1.01)

time(s)a

0 +150.61

2993.45 +150.24

5922.64 +150.87

20.87 +149.59

540.47 +149.25

540.47 +1.75

540.47 +17.98

540.47 +971.32

540.47 +352.35

540.47 +149.25

The first and second rows denote the computational time of feature extraction methods and classification methods, respectively.

Table 6 Small classification accuracy (%), standard deviation (in bracket) and computational time of various methods for the Salinas data. The best accuracy or shortest time is in bold. Class

RMTL

3D-EMD

Spec

1D-EMD

2D-EMD

3D-DWT

3D-EMD

SVM

SVMCK

SOMP

MTL

RMTL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

97.95 97.99 95.27 99.49 96.29 98.84 99.33 69.60 98.90 88.00 88.85 96.92 98.23 91.23 51.12 94.21

99.10 98.22 96.59 98.99 97.71 99.59 99.44 71.86 98.45 86.02 96.69 99.69 98.90 93.49 63.81 98.39

99.25 98.68 96.24 97.83 96.40 98.73 99.47 75.88 99.03 89.41 92.63 99.74 98.68 96.98 66.44 96.55

99.07 98.92 98.09 98.27 95.73 98.66 99.61 75.50 99.01 89.98 94.94 100.00 99.06 94.91 70.56 98.30

99.80 99.88 99.57 99.21 98.43 99.94 99.58 84.22 99.05 95.76 98.77 99.90 99.61 99.39 82.93 98.69

93.30 98.39 95.47 98.05 95.43 99.16 98.23 60.38 96.80 77.08 88.37 98.38 96.03 93.11 67.42 92.32

96.05 96.58 96.34 99.35 90.07 99.06 99.36 65.69 99.35 84.09 88.00 98.33 98.90 96.13 68.60 93.77

99.60 98.92 98.83 97.33 91.45 99.37 97.65 71.06 98.64 87.21 98.68 100.00 98.45 89.72 75.19 97.77

99.20 98.63 96.49 98.12 98.05 98.68 99.52 79.41 99.10 90.82 92.72 100.00 98.45 96.79 71.44 99.00

99.80 99.88 99.57 99.21 98.43 99.94 99.58 84.22 99.05 95.76 98.77 99.90 99.61 99.39 82.93 98.69

OA

84.80 (1.61)

87.47 (1.41)

88.73 (1.35)

89.38 (1.31)

93.79 (1.05)

83.87 (1.65)

85.85 (1.54)

88.54 (1.37)

90.41 (1.26)

93.79 (1.05)

AA

91.39 (0.63)

93.56 (0.52)

93.87 (0.45)

94.41 (0.41)

97.17 (0.22)

90.50 (0.69)

91.85 (0.60)

93.74 (0.48)

94.78 (0.37)

97.17 (0.22)

j

83.08

86.08

87.47

88.20

93.10

82.10

84.29

87.28

89.34

93.10

25

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27 Table 6 (continued) Class

RMTL

time (s)a a

3D-EMD

Spec

1D-EMD

2D-EMD

3D-DWT

3D-EMD

SVM

SVMCK

SOMP

MTL

RMTL

(1.72)

(1.47)

(1.38)

(1.35)

(1.12)

(1.84)

(1.68)

(1.41)

(1.31)

(1.12)

0 +199.52

3192.61 +199.14

6230.41 +199.83

22.81 +198.97

570.54 +198.87

570.54 +2.09

570.54 +21.54

570.54 +994.41

570.54 +409.57

570.54 +198.87

The first and second rows denote the computational time of feature extraction methods and classification methods, respectively.

(23,26) Normlized Residues

0.4 0.3 0.2 0.1 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

Class

(a) Detailed location

(b) Spec(RMTL) 0.4

Normlized Residues

Normlized Residues

0.4 0.3 0.2 0.1 0

0.3 0.2 0.1 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

Class

Class

(c) 3D-EMD(MTL)

(d) 3D-EMD(RMTL)

Fig. 16. Normalized residues of a sample from class 15 located at (23,26). (a) Detailed location of the sample in class 15, (b) Normalized residues of RMTL with ‘‘Spec” features, (c) and (d) are normalized residues of MTL and RMTL with 3D-EMD features, respectively.

Table 4, 1D-EMD and 2D-EMD take more than 19 and 40 min, respectively, while 3D-DWT is quite fast and only spends 8.12 s. The computational time of 3D-EMD is about 4 min, which is also much faster than 1D-EMD and 2D-EMD. It is noteworthy from Tables 5 and 6 that the efficiency of 3D-EMD becomes much more obvious with larger sizes of data sets. As to the classifiers, SVM and SVMCK are much faster than the other methods, while SOMP costs the most time among all classifiers. Moreover, the computational time of MTL and RMTL is moderate. As displayed in Table 4, both SVM and SVMCK only require a few seconds to classify the Indian Pines data. This is due to the fact that C++ software5 is utilized to speed up the implementation of those two classifiers. Both MTL and RMTL are more than 100 s faster than SOMP in Table 4. In addition, RMTL solved by IALM is about 10 s faster than the accelerated proximal gradient (APG) based MTL. Similar phenomena can also be observed from the University of Pavia data and Salinas data. The above analysis validates the proposed 3D-EMD and RMTL methods in computational efficiency.

5

http://www.csie.ntu.edu.tw/cjlin/index.html.

5. Conclusions In this paper, we have developed the 3D-EMD and RMTL methods for feature extraction and classification of HSI. Comparing against techniques in the literature, the proposed methods show significant benefits and generate impressive results for hyperspectral classification. The main advantage of the 3D-EMD is that it respects the 3D nature of HSI and treats the HSI as a data cube so as to preserve the spectral-spatial information. Notably that it is challenging to extend the existing 1D/2D-EMD to their 3D version, two strategies are adopted for fast implementation of the 3D-EMD. 3D-DT is used in the filter size determination, while separable filters are applied to speed up the envelope generation process. The RMTL can also provide several advantages. On the one hand, all of the 3D-IMFs are fully utilized by simultaneously treating each 3D-IMF component as a task in the RMTL. On the other hand, both shared factors and specificities can be detected by integrating multiple features with pairs of low-rank and sparse regularizations. The experimental results on three real-world hyperspectral data sets have consistently validated the superiority of the proposed methods, especially with limited number of training samples.

26

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27

The proposed method can be further improved. For instance, the mathematical foundation of the 3D-EMD should be investigated to optimize its performance. Moreover, the kernel extension of RMTL can be studied for further improving the hyperspectral classification performance.

Acknowledgments This work was supported by the National Natural Science Foundation of China under Grants 41501368, 41522104 and 41531178, and the Fundamental Research Funds for the Central Universities under Grants 16lgpy04 and 15lgjc24. The authors would like to thank Prof. D. Landgrebe from Purdue University for providing the AVIRIS image of Indian Pines and the Prof. Gamba from University of Pavia for providing the ROSIS data set. Last but not least, we would like to take this opportunity to thank the Editors and the Anonymous Reviewers for their detailed comments and suggestions, which greatly helped us to improve the clarity and presentation of our manuscript.

References Bioucas-Dias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J., 2012. Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 5, 354–379. Bruce, L.M., Koger, C.H., Li, J., 2002. Dimensionality reduction of hyperspectral data using discrete wavelet transform feature extraction. IEEE Trans. Geosci. Remote Sens. 40, 2331–2338. Camps-Valls, G., Gomez-Chova, L., Muñoz-Marí, J., Vila-Francés, J., Calpe-Maravilla, J., 2006. Composite kernels for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 3, 93–97. Cavallaro, G., Riedel, M., Richerzhagen, M., Benediktsson, J.A., Plaza, A., 2015. On understanding big data impacts in remotely sensed image classification using support vector machine methods. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 8, 4634–4646. Chen, Y., Nasrabadi, N.M., Tran, T.D., 2011. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 49, 3973–3985. Chen, Y., Nasrabadi, N.M., Tran, T.D., 2013. Hyperspectral image classification via kernel sparse representation. IEEE Trans. Geosci. Remote Sens. 51, 217–231. Crawford, M.M., Tuia, D., Yang, H.L., 2013. Active learning: any value for classification of remotely sensed data? Proc. IEEE 101, 593–608. Dalponte, M., Ene, L.T., Marconcini, M., Gobakken, T., Næsset, E., 2015. Semisupervised SVM for individual tree crown species classification. ISPRS J. Photogramm. Remote Sens. 110, 77–87. Deléchelle, E., Lemoine, J., Niang, O., 2005. Empirical mode decomposition: an analytical approach for sifting process. IEEE Signal Process. Lett. 12, 764–767. Demir, B., Ertürk, S., 2010. Empirical mode decomposition of hyperspectral images for support vector machine classification. IEEE Trans. Geosci. Remote Sens. 48, 4071–4084. de Morsier, F., Borgeaud, M., Gass, V., Thiran, J.P., Tuia, D., 2016. Kernel low-rank and sparse graph for unsupervised and semi-supervised classification of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 54, 3410–3420. Dong, Y., Zhang, L., Zhang, L., Du, B., 2015. Maximum margin metric learning based target detection for hyperspectral images. ISPRS J. Photogramm. Remote Sens. 108, 138–150. Ertürk, A., Güllü, M.K., Ertürk, S., 2013. Hyperspectral image classification using empirical mode decomposition with spectral gradient enhancement. IEEE Trans. Geosci. Remote Sens. 51, 2787–2798. Fauvel, M., Tarabalka, Y., Benediktsson, J.A., Chanussot, J., Tilton, J.C., 2013. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE 101, 652– 675. Fu, W., Li, S., Fang, L., Kang, X., Benediktsson, J.A., 2016. Hyperspectral image classification via shape-adaptive joint sparse representation. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9, 556–567. Ghamisi, P., Benediktsson, J.A., 2015. Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci. Remote Sens. Lett. 12, 309–313. Golias, N.A., Dutton, R.W., 1997. Delaunay triangulation and 3D adaptive mesh generation. Finite Elem. Anal. Des. 25, 331–341. Gormus, E.T., Canagarajah, N., Achim, A., 2012. Dimensionality reduction of hyperspectral images using empirical mode decompositions and wavelets. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 5, 1821–1830. Gu, Y., Liu, T., Jia, X., Benediktsson, J.A., Chanussot, J., 2016. Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 54, 3235–3247.

Guo, X., Huang, X., Zhang, L., Zhang, L., 2013. Hyperspectral image noise reduction based on rank-1 tensor decomposition. ISPRS J. Photogramm. Remote Sens. 83, 50–63. Guo, X., Huang, X., Zhang, L., Zhang, L., Plaza, A., Benediktsson, J.A., 2016. Support tensor machines for classification of hyperspectral remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 54, 3248–3264. Gurram, P., Kwon, H., Davidson, C., 2016. Coalition game theory-based feature subspace selection for hyperspectral classification. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens., 1–11 http://dx.doi.org/10.1109/JSTARS.2016.2531984 (in press). He, Z., Li, J., 2015. Multiple data-dependent kernel for classification of hyperspectral images. Expert Syst. Appl. 42, 1118–1135. He, Z., Liu, L., Deng, R., Shen, Y., 2016. Low-rank group inspired dictionary learning for hyperspectral image classification. Signal Process. 120, 209–221. He, Z., Wang, Q., Shen, Y., Jin, J., Wang, Y., 2013. Multivariate gray model-based BEMD for hyperspectral image classification. IEEE Trans. Instrum. Meas. 62, 889–904. He, Z., Wang, Q., Shen, Y., Sun, M., 2014. Kernel sparse multitask learning for hyperspectral image classification with empirical mode decomposition and morphological wavelet-based features. IEEE Trans. Geosci. Remote Sens. 52, 5150–5163. Hsu, P.H., 2007. Feature extraction of hyperspectral images using wavelet and matching pursuit. ISPRS J. Photogramm. Remote Sens. 62, 78–92. Huang, H., Luo, F., Liu, J., Yang, Y., 2015. Dimensionality reduction of hyperspectral images based on sparse discriminant manifold embedding. ISPRS J. Photogramm. Remote Sens. 106, 42–54. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.-C., Tung, C.C., Liu, H.H., 1998. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London A 454, 903–995. Huang, X., Liu, X., Zhang, L., 2014. A multichannel gray level co-occurrence matrix for multi/hyperspectral image texture representation. Remote Sens. 6, 8424– 8445. Jia, M., Gong, M., Jiao, L., 2015. Hyperspectral image classification using discontinuity adaptive class-relative nonlocal means and energy fusion strategy. ISPRS J. Photogramm. Remote Sens. 106, 16–27. Jia, S., Hu, J., Xie, Y., Shen, L., Jia, X., Li, Q., 2016. Gabor cube selection based multitask joint sparse representation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 54, 3174–3187. Li, F., Hu, D., Ding, C., Zhang, W., 2013. InSAR phase noise reduction based on empirical mode decomposition. IEEE Geosci. Remote Sens. Lett. 10, 1180–1184. Li, J., Zhang, H., Zhang, L., 2015. Efficient superpixel-level multitask joint sparse representation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 53, 5338–5351. Li, X., Zhang, Y., Bao, Y., Luo, J., Jin, X., Xu, X., Song, X., Yang, G., 2015. Exploring the best hyperspectral features for LAI estimation using partial least squares regression. Remote Sens. 6, 6221–6241. Lin, Z., Chen, M., Ma, Y., 2009. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Univ. Illinois at Urbana-Champaign, Champaign, IL, USA, UIUC Tech. Rep. UILU-ENG-09-2215, pp. 1–20. Ma, L., Zhang, X., Yu, X., Luo, D., 2016. Spatial regularized local manifold learning for classification of hyperspectral images. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 9, 609–624. Martínez-Usó, A., Pla, F., Sotoca, J.M., García-Sevilla, P., 2007. Clustering-based hyperspectral band selection using information measures. IEEE Trans. Geosci. Remote Sens. 45, 4158–4171. Neumann, C., Förster, M., Kleinschmit, B., Itzerott, S., 2016. Utilizing a PLSR-based band-selection procedure for spectral feature characterization of floristic gradients. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens., 1–15 http://dx. doi.org/10.1109/JSTARS.2016.2536199 (in press). Patel, R., Janawadkar, M., Sengottuvel, S., Gireesan, K., Radhakrishnan, T., 2016. Suppression of eye-blink associated artifact using single channel EEG data by combining cross-correlation with empirical mode decomposition. IEEE Sens. J. http://dx.doi.org/10.1109/JSEN.2016.2591580 (in press). Qian, Y., Ye, M., Zhou, J., 2013. Hyperspectral image classification based on structured sparse logistic regression and three-dimensional wavelet texture features. IEEE Trans. Geosci. Remote Sens. 51, 2276–2291. Ren, Y., Suganthan, P.N., Srikanth, N., 2016. A novel empirical mode decomposition with support vector regression for wind speed forecasting. IEEE Trans. Neural Netw. Learn. Syst. 27, 1793–1798. Riffi, J., Mahraz, A.M., Abbad, A., Tairi, H., 2015. 3D extension of the fast and adaptive bidimensional empirical mode decomposition. Multidimens. Syst. Signal Process. 26, 823–834. Rilling, G., Flandrin, P., Goncalves, P., 2003. On empirical mode decomposition and its algorithms. IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 2003. IEEE, pp. 8–11. Samat, A., Du, P., Liu, S., Li, J., Cheng, L., 2014. E2 LMs: ensemble extreme learning machines for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 7, 1060–1069. Serpico, S.B., Bruzzone, L., 2001. A new search algorithm for feature selection in hyperspectral remote sensing images. IEEE Trans. Geosci. Remote Sens. 39, 1360–1367. Siskos, S., Vlassis, S., Pitas, I., 1998. Analog implementation of fast min/max filtering. IEEE Trans. Circuits Syst. II 45, 913–918. Song, B., Li, J., Dalla Mura, M., Li, P., Plaza, A., Bioucas-Dias, J.M., Chanussot, J., 2014. Remotely sensed image classification using sparse representations of

Z. He, L. Liu / ISPRS Journal of Photogrammetry and Remote Sensing 121 (2016) 11–27 morphological attribute profiles. IEEE Trans. Geosci. Remote Sens. 52, 5122– 5136. Song, R., Guo, H., Liu, G., Perski, Z., Fan, J., 2014. Improved Goldstein SAR interferogram filter based on empirical mode decomposition. IEEE Geosci. Remote Sens. Lett. 11, 399–403. Tan, K., Hu, J., Li, J., Du, P., 2015. A novel semi-supervised hyperspectral image classification approach based on spatial neighborhood information and classifier combination. ISPRS J. Photogramm. Remote Sens. 105, 19–29. Tarabalka, Y., Fauvel, M., Chanussot, J., Benediktsson, J.A., 2010. SVM-and MRFbased method for accurate classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 7, 736–740. Tsai, F., Lai, J.S., 2013. Feature extraction of hyperspectral image cubes using threedimensional gray-level cooccurrence. IEEE Trans. Geosci. Remote Sens. 51, 3504–3513. Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, USA. Veganzones, M.A., Cohen, J.E., Cabral Farias, R., Chanussot, J., Comon, P., 2016. Nonnegative tensor CP decomposition of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 54, 2577–2588. Wang, H., Glennie, C., 2015. Fusion of waveform LiDAR data and hyperspectral imagery for land cover classification. ISPRS J. Photogramm. Remote Sens. 108, 1–11. Wang, Q., Gu, Y., Tuia, D., 2016. Discriminative multiple kernel learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens., 1–16 http://dx.doi.org/10.1109/TGRS.2016.2530807 (in press). Wu, Z., Wang, Q., Plaza, A., Li, J., Liu, J., Wei, Z., 2015. Parallel implementation of sparse representation classifiers for hyperspectral imagery on GPUs. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 8, 2912–2925.

27

Xu, Y., Wu, Z., Wei, Z., 2015. Spectral-spatial classification of hyperspectral image based on low-rank decomposition. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 8, 2370–2380. Yuan, Y., Lin, J., Wang, Q., 2016. Dual-clustering-based hyperspectral band selection by contextual analysis. IEEE Trans. Geosci. Remote Sens. 54, 1431–1445. Yuan, Y., Zhu, G., Wang, Q., 2015. Hyperspectral band selection by multitask sparsity pursuit. IEEE Trans. Geosci. Remote Sens. 53, 631–644. Zabalza, J., Ren, J., Wang, Z., Marshall, S., Wang, J., 2014. Singular spectrum analysis for effective feature extraction in hyperspectral imaging. IEEE Geosci. Remote Sens. Lett. 11, 1886–1890. Zabalza, J., Ren, J., Yang, M., Zhang, Y., Wang, J., Marshall, S., Han, J., 2014. Novel Folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing. ISPRS J. Photogramm. Remote Sens. 93, 112–122. Zabalza, J., Ren, J., Zheng, J., Han, J., Zhao, H., Li, S., Marshall, S., 2015. Novel twodimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans. Geosci. Remote Sens. 53, 4418–4433. Zhang, L., Zhang, L., Tao, D., Huang, X., 2013. Tensor discriminative locality alignment for hyperspectral image spectral-spatial feature extraction. IEEE Trans. Geosci. Remote Sens. 51, 242–256. Zhao, W., Du, S., 2016. Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS J. Photogramm. Remote Sens. 113, 155–165. Zhong, Z., Fan, B., Duan, J., Wang, L., Ding, K., Xiang, S., Pan, C., 2015. Discriminant tensor spectral-spatial feature extraction for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 12, 1028–1032.

Robust multitask learning with three-dimensional empirical mode decomposition-based features for hyperspectral classification

Robust multitask learning with three-dimensional empirical mode decomposition-based features for hyperspectral classification

Recommend Documents