Hyperspectral remote sensing image change detection based on tensor and deep learning

Hyperspectral remote sensing image change detection based on tensor and deep learning

Accepted Manuscript Hyperspectral Remote Sensing Image Change Detection based on Tensor and Deep Learning Fenghua Huang, Ying Yu, Tinghao Feng PII: DO...

1MB Sizes 0 Downloads 64 Views

Accepted Manuscript Hyperspectral Remote Sensing Image Change Detection based on Tensor and Deep Learning Fenghua Huang, Ying Yu, Tinghao Feng PII: DOI: Reference:

S1047-3203(18)30277-3 https://doi.org/10.1016/j.jvcir.2018.11.004 YJVCI 2322

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

12 October 2018 3 November 2018 3 November 2018

Please cite this article as: F. Huang, Y. Yu, T. Feng, Hyperspectral Remote Sensing Image Change Detection based on Tensor and Deep Learning, J. Vis. Commun. Image R. (2018), doi: https://doi.org/10.1016/j.jvcir.2018.11.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Hyperspectral Remote Sensing Image Change Detection based on Tensor and Deep Learning Fenghua Huang 1,2*, Ying Yu1,2,Tinghao Feng3 1

Spatial Data Mining and Application Research Center of Fujian Province, Yango University, Fuzhou 350015, China; [email protected] 2 Information Engineering College, Yango University, Fuzhou 350015, China; [email protected] 3 College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC28223, USA; [email protected] * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +86-0591-8397-6047. Abstract: Considering the bottleneck in improving the performance of the existing multi-temporal hyperspectral remote sensing (HSRS) image change detection methods, a HSRS image change detection solution based on tensor and deep learning is proposed in this study. At first, a tensor-based information model (TFS-Cube) of underlying features change in HSRS images is established. The wavelet texture feature change, spectral feature change and spatio-temporal autocorrelation coefficient of different-temporal related pixels are combined with three-order tensor, so as to make full use of the underlying features change information of HSRS images, optimize the organization mode and maintain the integrity of constraints between different underlying features. Secondly, a restricted Boltzmann Machine based on three-order tensor (Tensor3-RBM) is designed. The input, output and unsupervised learning of TFS-Cube tensor data are realized by multi-linear operations in Tensor3-RBMs. A large number of unlabeled samples are trained layer by layer through multilayer Tensor3-RBMs. Finally, the traditional BP neural network on the top layer of deep belief network (DBN) is replaced with support tensor machine (STM), and a deep belief network with multi-layer Tensor3-RBM and STM (TRS-DBN) is constructed. A small number of labeled samples are used for supervised learning and TRS-DBN global parameters optimization to improve the accuracy of TRSDBN change detection. Two types of HSRS images from different sensors, AVIRIS and EO-1 Hyperion, are used as the data sources (double-temporal). Four representative experimental regions are randomly selected from the two areas covered by AVIRIS and EO-1 Hyperion HSRS images respectively (two regions in each area) to detect the land use changes. Experimental results demonstrate that TRS-DBN has higher change detection accuracy than similar methods and a good automation level. Keywords: Tensor model, Deep learning, Support tensor machine, Hyperspectral remote sensing images, Change detection

2 1. Introduction The change detection of surface objects in multi-temporal remote sensing images has been a research hotspot in the field of remote sensing application. With the increasing abundance of spatial data sources and the higher performance of machine learning methods, the accuracy of surface objects change detection has been improved. Hyperspectral remote sensing (HSRS) images have hundreds or even thousands of continuous and narrow bands, which can provide abundant spectral and spatial information. With the ability to identify the details of surface feature information, they have been extensively used in surface objects classification and change detection. Multi-temporal HSRS images are of great significance in revealing the subtle changes of surface objects through multi-temporal high-dimensional feature information. So far, many scholars have already carried out a lot of research on HSRS image change detection. Most traditional HSRS image change detection methods are derived from the classical single-band or multi-spectral change detection methods, such as linear transform detection, classification detection, abnormal change detection and change analysis detection [1]. The performance of the above methods is limited in HSRS images with high feature dimension and spectral interference. In recent years, some scholars have proposed some new change detection methods for multi-temporal HSRS images. Tensor factorization [2], orthogonal subspace mapping [3], multi-source target feature support [4], mixed pixel decomposition [5] and independent component analysis (ICA) [6] were used to detect the changes in HSRS image, which improved the accuracy of HSRS image change detection to some extent. However, the existing methods have the following shortcomings: (1) the information of time-space-spectral structure constraint in remote sensing data is destroyed by describing underlying features with vectors only, causing the loss of partial underlying features information. (2) most existing change detection methods are based on shallow-layer machine learning algorithms and have limited generalization abilities. (3) there are no efficient dimensionality reduction methods. (4) it is challenging to obtain enough labelled samples, which limits the accuracy of unsupervised change detection methods. Due to the above problems, there is a bottleneck in improving the performance of multi-temporal HSRS image change detection. Recently, the emergence of deep learning methods and the extensive application of tensor technology provide a new idea to solve the above problems. Compared with shallow-layer structure neural networks, the deep network structures composed of multi-layer nonlinear mapping layers has stronger feature learning ability and better accuracy in representation and classification of complex function. In particular, the deep belief network (DBN) which combines the advantages of unsupervised learning and supervised learning has good learning ability for high-dimensional datasets. Remote sensing images with multiple bands have multidimensional data structures. The expression of tensor model can not only effectively preserve the structural constraints and association information among various features in remote sensing images, but also can avoid the loss of the underlying features resulted from the complete vectorization of the underlying features information. In view of the above problems in HSRS image change detection, a HSRS image change detection scheme based on tensor and deep learning is proposed in this study. First of all, a tensor-based underlying features change information model (TFS-Cube) of HSRS image is established to make full use of the underlying features change information of multi-temporal HSRS images and optimize their organization model. Secondly, a restricted Boltzmann Machine based on three-order tensor

3 (Tensor3-RBM) is constructed to realize the efficient processing of TFS-Cube data. Additionally, a deep learning network containing multi-layer Tensor3-RBMs and support tensor machine (TRSDBN) is proposed to effectively improve the accuracy and automation level of HSRS image change detection. 2. Tensor-based HSRS images underlying features change information model (TFS-Cube) 2.1 Description of TFS-Cube model Tensor, as a multi-dimensional array, is the extension of vector and matrix in dimension, and the latter can be also regarded as the expression of the former in particular cases. The advantage of describing complex data in the form of tensor is that the data can retain the structural information without changing the original form [7]. Remote sensing images usually have multiple bands. In addition to the width and height of two-dimensional images, they generally have spectral dimension (band dimension) and spatial dimension (texture and scale information). Multi-temporal remote sensing images have an additional dimension of time. High-order tensors can effectively describe the information structure of HSRS data. Multi-temporal HSRS images can be described through fiveorder tensors as: H  R L1 L2  L3  L4  L5 , which can be found in Figure 1 [8]. H is the tensor corresponding to HSRS image. L1, L2, L3, L4, and L5 denote the lengths of the five dimensions of width, height, spectrum, space, and time in multi-temporal HSRS images.

Figure 1. Tensor structure of multi-temporal remote sensing images When surface objects change with time, the spatial and spectral features of the corresponding positions in the different-temporal HSRS images vary in terms of spatial features, spectral features, and the relationship between the surface objects distribution and time. In general, surface spatial features are described by texture features of different scales, while spectral features are described by spectral curves. In particular, there are hundreds of consecutive fine bands in HSRS images, which can form spectral curves with strong distinguishing abilities. The relationship between surface objects distribution and time can be described using Spatiotemporal Autocorrelation Function (STACF) [9]. In the detection of changes in different registered remote sensing images, the width dimension, height dimension and time dimension in the five-order tensor model H presented in Figure 1 can be simplified to ST-ACF dimension so as to comprehensively describe the correlation between time and space and reduce the number of tensor orders. Therefore, this study proposed a tensor-based HSRS images underlying features change information model (TFS-Cube) to improve the accuracy of multi-temporal HSRS images change detection. A tensor model including

4 spatiotemporal autocorrelation dimension, spectral change dimension and spatial change dimension is used to describe and organize the underlying features change information of the HSRS images, which not only can maintain the correlation and constraint integrity between different features change information, but also can avoid only using vectors to describe the single feature changes respectively leading to the loss of time-space-spectrum underlying features association and constraint information. Figure 2 presents the structure of the TFS-Cube model.

Figure 2. TFS-Cube model structure Figure 2 shows that TFS-Cube is a three-order tensor model, and its tensor model can be expressed as: TFS -Cube  R M1M 2 M 3 , where M1, M2, and M3 denote the lengths of the spatial change dimension, spectral change dimension and ST-ACF dimension in the multi-temporal HSRS images. 2.2 Extraction of spatial feature change information in TFS-Cube Spatial information in HSRS images generally refers to multi-scale texture information. In order to better extract spatial features information, this study collects stereo image blocks through the convolution of the local neighborhood windows of central pixels to realize localized description of remote sensing images. The multi-scale texture features of the stereo image block corresponding to the pixel neighborhood window are taken as the spatial feature of the central pixel. The specific extraction process is shown in Figure 3, including three parts of the optimal size selection of the pixel neighborhood window, the three-dimensional discrete wavelet transformation (3D-DWT) of pixel neighborhood stereo image block and texture statistical features calculation of each wavelet sub-components.

Figure 3. The process of TFS-Cube spatial feature extraction

5 (1) Pixel neighborhood window optimal size selection According to Figure 3, at the beginning of TFS-Cube spatial feature extraction, a pixel neighborhood window is used to obtain a stereo image block with the optimal spatial size (the size is K×K×N), where K is the side length of square base in the pixel neighborhood window stereo image block, N is the height of the image block (the number of the bands in HSRS images), and the value of N is known. Therefore, the optimal value of K is the key to determine the optimal size of the neighborhood window of the central pixel. In the proposed study, a semi-variogram was used to obtain the average distance between the pixels in each band of the HSRS image and determine the optimal value of K. Semi-variance (SV) is a commonly statistical index of spatial correlation in geostatistics. As an ideal index that can comprehensively measure the local variability and spatial correlation of remote sensing image texture [10], it can measure the spatial dependence of spatial continuity data [11,12]. Semivariance (SV) represents the relationship between the spatial distance d of the points and the difference in their gray value G, which is defined in formula (1) [12].

SV (d ) 

1 N (d ) [G ( si )  G ( si  d ))]2  2 N (d ) i 1

(1)

Where N(d) is the number of point pairs with the distance of d, G(si) and G(si+d) denote the gray values at point si and point si+d. The existing research [10-12] shows that the semi-variance function in a specific direction is usually an uphill curve that gradually rises with the increase of d value and finally stabilizes, as presented in Figure 4. The stable value of SV(d) is the sill, and the change interval of d value before SV(d) reaches the sill is the range. In 2D image texture feature extraction, the range in the semivariogram curve can be used to estimate the spatial correlation distance between the pixels and the size of the sliding window.

Figure 4. Semivariogram curve in a specific direction Remote sensing images have multiple bands, and each band have different texture features. Therefore, it is impossible to uniformly process different bands through the traditional semivariogram curves. To solve the above problems, this study first calculates the mean image of the DN values of all bands in the HSRS image. Secondly, the semi-variance value of each pixel in the mean image is calculated according to Formula (1), and the semivariogram curve is fit in different directions to find the average value of different direction ranges (i.e., range_avg). Since the convolution calculation result of the neighborhood window (sliding window) in the TFS-Cube spatial

6 feature extraction represents the value of the central pixel, the width of the neighborhood window is twice of the range_avg, that is, K=2*range_avg. (2) 3D-DWT wavelet decomposition of pixel neighborhood stereo image block Since the tensor discrete wavelet decomposition can obtain multi-scale change information of high-dimensional data in the frequency domain, it was used to extract spatial change features in TFSCube in this study. At first, the stereo image block of the neighborhood window of the pixel can be obtained. Secondly, three-dimensional discrete wavelet transform (3D-DWT) is implemented to obtain wavelet sub-components and extract their texture features at different scales. Considering the spatial neighborhood information at different scales, the scheme can enhance the robustness of image content structure using multi-scale analysis, thus effectively describing the unique imaging features of different surface objects at various scales. As shown in Figure 3, due to the low spatial resolution of HSRS images, this study used 3D-DWT to realize single layer wavelet decomposition in the stereo image blocks of the neighborhood window, and obtained 8 wavelet sub-components. The relationship between the low and high frequency components is shown in Formula (2) [13]: I(x,y,z) =(Lx  H x )  (Ly  H y )  (Lz  H z ) =Lx Ly Lz  Lx Ly H z  Lx H y Lz  Lx H y H z

(2)

H x Ly Lz  H x Ly H z  H x H y Lz  H x H y H z

Where, and represent direct sum operation and the Kronecker product of tensors, L and H denote the low-pass and band-pass filters of the one-dimensional discrete wavelet in the three directions of X, Y and Z. LLL, LLH, LHL, LHH, HLL, HLH, HHL and HHH represent eight wavelet sub-components obtained by 3D-DWT single layer decomposition, where LLL is an approximate sub-component and HHH is a detail sub-component. Based on Figure 3, the three directions of X, Y, and Z in formula (2) represent the directions of width dimension, height dimension, and spectral dimension of the stereo image block corresponding to the neighborhood window. In addition, it is of great importance to choose wavelet base function in 3D-DWT wavelet decomposition. The existing research [14,15] demonstrates that the Daubechies wavelet base function has good discontinuity in texture operation. The wavelet base function contains a smooth scaling function with orthogonality and compactness and thus orthogonal wavelets can be obtained in discrete wavelet transform. Daubechies wavelet analysis is an advanced texture extraction technique for texture image analysis [14,15], which can obtain higher classification accuracy [16,17] in texture classification. Therefore, Daubechies wavelet base was adopted for 3D-DWT wavelet decomposition in the proposed study. (3) Wavelet sub-component texture feature calculation In this study, information entropy (IE) and angular second moment (ASM) are used to calculate the wavelet coefficients in each wavelet sub-component and generate texture statistical features.IE mainly measures the amount of information in an image, and it is in direct proportion to the degree of non-uniformity or complexity of the texture in the image [18]. ASM, also known as energy, reflects the uniformity of the gray distribution and the roughness of the texture in an image. The larger the

7 ASM, the more uneven the gray distribution of the image and the rougher the texture [19]. IE and ASM can be defined in formulas (3) and (4): G

G

IE   p (i, j ) log p (i, j )

(3)

i 1 j 1

ASM (i, j , h) 

K /2 L K /2 L N /2 L

   M (i, j, h) i 1

j 1

2

(4)

h 1

Where, L represents the number of layers in wavelet decomposition. M(i,j,h) represents the wavelet coefficients of local wavelet position (i,j,h) in each wavelet sub-component, p(i,j) is the corresponding Gray-scale co-occurrence matrix, G represents the number of gray levels. The size of each wavelet sub-component after 3D-DWT decomposition of the pixel neighborhood window stereo image block is KL  KL  NL . 2

2

2

IE and ASM are calculated and each local wavelet sub-component is compressed into two eigenvalues. IE and ASM values are assigned to the central pixel of the sliding neighborhood window stereo image block. When the sliding neighborhood window completes image convolution, the single layer tensor wavelet decomposition will output 16-band tensor wavelet texture and generate a tensor data set with the size of M×N×16 (where M and N represent the width and height of each band in the 3D-DWT input image). In this study, the TFS-Cube spatial feature change of the two corresponding pixels in two different temporal HSRS images can be described by the difference between the abovementioned wavelet texture feature vectors with the length of 16. 2.3 TFS-Cube spectral feature change information extraction There are hundreds or even thousands of continuous and narrow bands in hyperspectral remote sensing images with strong feature recognition ability, which is also the main advantage of hyperspectral remote sensing images. Change Vector Analysis (CVA) is used to analyze the difference of the corresponding pixel spectral features in two different-temporal HSRS images, which is described through the included angle θ of two spectral eigenvectors and the module r of the difference vector. θ and r are calculated as presented in Formulas (5) and (6):

  arccos(v1  v2 / (| v1 | * | v2 |))

r | v1  v2 |

(5) (6)

Where, v1 and v2 represent the spectral vectors of the corresponding pixels in two differenttemporal HSRS images, |v1| and |v2| represent the modulus of vector v1 and v2 and v1•v2 is the inner product of vectors v1 and v2. In this study, the spectral changes of different-temporal HSRS images are analyzed using CVA method, and the spectral difference information of the two bands was outputted to generate a tensor data set with the size of M×N×2 (where M and N represent the width and height of each band in CVA input image). The TFS-Cube spectral feature change of two corresponding pixels in two different-temporal HSRS images are described by the above-mentioned spectral feature difference vector with the length of 2.

8 2.4 TFS-Cube spatiotemporal autocorrelation information extraction In this study, the relationship between the temporal and spatial distribution of related surface objects in different-temporal HSRS images is described through the global spatiotemporal autocorrelation coefficient [20,21]. The definition is shown in Formula (7):

Px 0(t ) 

rx (t ) cov([W x S (T )][W 0 S (T  t )])   x (0) 0 (0) var(W x S (T )) var(W 0 S (T ))

(7)

Px0(t) represents the spatiotemporal autocorrelation coefficient. S(T) and S(T+t) are the sample values at time phase T and T+t. x and t are spatial delay amount and time delay amount. Wx is the spatial weight matrix with spatial delay of x (to be normalized). rx(t) is the sample covariance of spatial delay matrix with spatial delay in which spatial delay is x and time delay is t. δx(t) is the sample variance of the spatial weight matrix with the spatial delay of x and the time delay of t [22]. The pixels are traversed through the central pixel windowing method to extract the spatial feature. To maintain the consistency of spatial features and spatiotemporal autocorrelation coefficient in TFSCube, the spatial delay values are set to 1, 2 and 3 (i.e. x=1, 2 and 3) according to different sizes of surface objects to extract the spatiotemporal autocorrelation coefficient in TFS-Cube. Meanwhile, the size of the detection window size is set to the square bottom face of the pixel neighborhood window stereo image block (i.e. K×K). The spatial weight matrix is constructed in “Queen” mode, and the spatiotemporal autocorrelation coefficients of the two-temporal HSRS images are calculated. Finally, the ST-ACF feature information of the three bands is obtained, and a tensor dataset with the size of M×N×3 is generated (where M and N are the width and height of each band in the ST-ACF feature extraction input image). Finally, regarding any two corresponding pixels in the two-temporal HSRS images, the spatial feature difference vector, the spectral feature difference vector and the ST-ACF feature vector are combined to construct the TSF-Cube tensor model, the size of which is 16×2×3, namely, TFS -Cube  R1623 . 3. Support tensor machine (STM) Although support vector machine (SVM) can process ordinary vector data, it can not directly learn 2D and multidimensional sample data. Support tensor machine (STM) is the extension of SVM in high dimensional space, which is mainly used for direct training and classification of multidimensional feature data [23]. The working principle of STM is similar to that of SVM. At first, it finds a series of classified hyperplanes in high-dimensional space that can distinguish different types of tensor samples according to the principle of structural risk minimization. Secondly, the twoclass classification is transformed into multi-class classification through one-against-one strategy [23,24]. The classification decision function of STM classifier which can only perform two-class classification is presented in Formula (8) [25]: N

g ( D)  sgn( D  j  j  b) j 1

(8)

Where, D is an unlabeled tensor samples dataset. N is the number of dimensions of the tensor samples, b is the bias term. ωj is the projection matrix in the j-th dimension. g(D) is the predicated

9 category of unlabeled tensor sample D, and g(D)∈{-1, +1}. The values of ωj and b can be obtained using the optimizing formula (9) [25]: N 1  2 N     i      c b f ) , , ( min  j j 1 N 1  j Nj1 ,b , 2 i 1    N  j  j  b   1- i , 1  i  M ,  i  0 s.t. gi  Di   j 1   

     

(9)

Where, o is the outer product operation of the tensors. M is the total number of training samples. N is the number of dimensions of the tensor samples. Di (1≤i≤M) corresponds to the ith tensor training sample. ξ and c represent the relaxation variable and the penalty coefficient, respectively. The larger the c value, the stronger the penalty for misclassification. The Lagrangian expansion for the optimization problem of equation (9) is shown in formula (10) [26]:



L j

N j 1





, b,  ,   f  j

N j 1



N    N  , b,     i  gi ci  Di  j  j  b   i  j  1     i 1

(10)

Where, αi is a Lagrangian multiplier (αi≥0, 1≤i≤M), which is mainly used to associate the constraint function and the original function. The Alternating Projection Optimization [27] algorithm can be employed to alternately solve the classifier projection matrix in each dimension until local convergence, and the final classification hyperplanes can be obtained. Similar to SVM, STM also divides the input samples into positive and negative tensor samples for two-class classification in the high-dimensional feature space. The heterogeneous tensor samples closest to the classification hyperplanes are called support tensors [23]. Additionally, the two-class STM can be generalized to a multi-class classifier [28] through the one-against-one strategy (OAO). 4. DBN-STM deep learning model 4.1 DBN and traditional RBM Deep Belief Network (DBN) is a deep learning network model proposed by GEHinton [29] in 2006, and it is composed of multi-layer unsupervised Restricted Boltzmann machines (RBMs) network and additional top-layer supervised backpropagation neural network. The former is mainly used for unsupervised features learning, while the latter is mainly used for global parameters finetuning and classification. Different from AutoEncoder (AE) and Convolutional Neural Network (CNN), DBN is a semi-supervised learning model, possessing both the advantages of supervised learning and unsupervised learning. DBN is an ideal deep learning model in the case of small number of labeled training samples. DBN has good feature learning ability, which makes classification or prediction easier by forming a more abstract high-level representation of low-level features [30,31]. DBN pre-trains the RBMs in each layer through the layer-by-layer training method. The hidden layer output of the lower layer RBM is used as the visible layer input of the upper layer RBM. The finetuning stage uses the supervised learning method to train the last layer network. The differences of the actual output and the expected output are backpropagated layer by layer, and the weight of the entire DBN network is finely tuned to achieve global optimization [32,33]. The DBN is a probability generation model composed of multiple layers of RBMs, and learns the initial parameters between two adjacent RBM layers on the basis of the greedy algorithm [34]. The traditional multi-layer neural

10 networks are possibly to fall into local optimum due to the use of gradient-based global optimization algorithms. In addition, global fine-tuning is also time-consuming. Each layer of RBM in DBN is composed of two sub-layers: a hidden layer and a visible layer. The visible layer is mainly used to input and filter samples, while the hidden layer is employed for data processing, analysis and output. There is a bidirectional connection between the hidden layer and the visible layer without interconnection between the units in the same layer, and thus parallel processing can be implemented. The structure of the RBM in the DBN is shown in Figure 5 [29].

Figure 5. Structure of RBM in DBN Where, m and n denote the numbers of neurons in the hidden layer and the visible layer, respectively. The vectors of the visible layer and the hidden layer are v= (v1, v2.....vn) and h= (h1, h2....hm), and vi and hj represent the state of the i-th visible layer neuron and the j-th hidden layer neuron. In addition, c1~cm and b1~bn are the bias terms of neurons in the hidden layer and the visible layer, and Wij is the connection weight of neurons hi and vj. Since RBM is an energy-based model, the key of RBM construction in DBN is to obtain the joint probability distribution P (v, h) of (v, h). The solution process of P (v, h) is as presented in Formulas (11), (12) and (13) [35]. P (v, h ) 

1  E ( v ,h ) e M

M   e  E ( v ,h ) v

(11) (12)

h

E (v, h)  bv  ch  hWv

(13)

Where, M is the normalization factor. The key of solving P (v, h) is to calculate the optimal values of parameter vectors such as b, c and W, and the activation probability of each neuron in RBM. In DBN, RBM trains the samples layer by layer using the iterative method to obtain the values of parameter vectors, such as b, c and W, so as to fit the given training data gradually. There exists no connection between the neurons in the hidden layer or the visible layer. Under the given state of neurons in any layer of RBM, the activation states of the neurons in the other layer are independent of each other. The activation probability calculation methods of the i-th neuron in the visible layer and the j-th neuron in hidden layer are as shown in formulas (14) and (15), respectively [33]: 1 p (vi  1)  (14)  ( bi  hW ) 1 e 1 (15) p (h j  1)   ( c j  vW ) 1 e 4.2 Restricted Boltzmann machine based on three-order tensor (Tensor3-RBM)

11 In this study, the input data of DBN is TFS-Cube tensor data, and TFS-Cube is a three-order tensor model. Therefore, it is necessary to improve the traditional RBM structure in DBN to satisfy the three-order tensor sample learning. A restricted Boltzmann machine based on three-order tensor (Tensor3-RBM) is proposed in this study. The energy function definition of the Tensor3-RBM model is presented in Formula (16) [36]: I

J

K

L

M

N

I

J

K

L

M

N

E ( X , Y |  )   xijk ijklmn ylmn   xijk aijk   ylmnblmn i 1 j 1 k 1 l 1 m 1 n 1

i 1 j 1 k 1

(16)

l 1 m 1 n 1

Where, X=[xijk]∈RI×J×K is the input tensor of the Tensor3-RBM visible layer (three-order tensor), Y=[ylmn]∈R L×M×N is the output tensor of Tensor3-RBM hidden layer, and xijk, ylmn∈0,1. W=[ωijklmn]∈R I×J×K×L×M×N is a six-order tensor representing the weight tensor of the connection between the visible layer and the hidden layer of Tensor3-RBM. A=[aijk]∈R I×J×K and B = [blmn]∈R L×M×N represent the offsets of the visible layer and the hidden layer, respectively. θ={W,B,C} represents all the parameters in Tensor3-RBM model. Obviously, since there are too many free variable parameters (I×J×K×L×M×N+I×J×K+ L×M×N) in θ, the training of Tensor3-RBM requires a large number of training samples and time, leading to low learning efficiency. To solve the above problem, this study uses the tensor decomposition technique to decompose the six-order weight tensor W into the product of two three-dimensional tensors U=[uklm]∈R K×L×M and V=[vnij]∈R N×I×J so as to decrease the number of free variables. All parameters in the Tensor3-RBM model after decomposition of W are θ={U, V, B, C}, where the number of free variables is greatly reduced (K×L×M+ N×I×J+I×J×K+ L×M×N). Consequently, the operation efficiency of Tensor3-RBM is obviously improved. The solution process of Tensor3-RBM is similar to the traditional RBM model. At first, the conditional distributions of the visible layer and the hidden layer are obtained, as shown in formulas (17) and (18) [37]. Secondly, the maximum likelihood estimation and the gradient descent method are used to solve the model. The update iteration rules of each parameter are similar to the traditional RBM, which will not be described again here. L

M

N

p ( xijk  1| Y ; )  sigmoid (aijk   ylmnuklm vnij )

(17)

l 1 m 1 n 1

I

J

K

p ( ylmn  1| X ; )  sigmoid (blmn   xijk uklm vnij )

(18)

i 1 j 1 k 1

4.3 Deep belief neural network based on multi-layer Tensor3-RBM and STM (TRS-DBN) To realize better deep learning of TFS-Cube tensor data, a deep belief neural network based on multi-layer Tensor3-RBM and STM (TRS-DBN) is proposed. The structure of TRS-DBN is presented in Figure 6. Similar to the traditional DBN, the output of the i-th Tensor3-RBM hidden layer of the TRS-DBN is used as the input of the (i+1)-th Tensor3-RBM visible layer, and the input of the first layer of Tensor3-RBM is regarded as the input of the entire TRS-DBN network. The visible layer of Tensor3-RBM can input and filter tensor data, while the hidden layer can process and output tensor data. The output of the last layer of Tensor3-RBM is taken as the output of the entire multilayer network. Different from traditional DBN, since TRS-DBN learns tensor data, a logistic

12 regression layer (top layer) is added to the last layer of Tensor3-RBM in TRS-DBN. The traditional BP network is replaced with a STM as the top-layer classifier. TRS-DBN initializes each Tensor3RBM firstly to obtain the original weights between the visible layer neurons and the hidden layer neurons, and formally trains the multi-layer Tensor3-RBM with multiple iterations. The TRS-DBN network parameters were finely tuned with a small number of labeled samples and the top-layer STM classifier.

Figure 6. The structure of TRS-DBN network 5. HSRS image change detection method based on TRS-DBN The process of HSRS image change detection based on TRS-DBN is shown in Figure 7:

13 Figure 7. Process of HSRS image change detection based on TRS-DBN Firstly, two HSRS images of different time phases in the same experimental region were acquired, and the convolution operation of the central pixel window (stereoscopic image block) was used to extract the space feature (3D- DWT wavelet texture feature) and spectral feature of the image data after preprocessing and registration, as well as the ST-ACF feature (the specific extraction process is as described in Sections 2.2, 2.3 and 2.4). The spatial difference and spectral difference of each pixel pair in the two images were calculated. Combined with the ST-ACF vector, a TFS-Cube tensor model with the size of 16×2×3 (the specific structure is shown in Figure 2) was established. At the same time, the TRS-DBN deep learning network based on multi-layer Tensor3-RBM and STM classifiers was constructed. Secondly, a certain number of unlabeled training samples (expressed by TFS-Cube) were randomly selected and input into the TRS-DBN network. The TRS-DBN network was pre-trained, and the initial weight between the nodes in the visible layer and the hidden layer of each Tensor3RBM was obtained. Each Tensor3-RBM was learned layer by layer, and the global fine-tuning and optimization of parameters in the TRS-DBN model were realized through a small number of labeled samples and a support tensor machine (STM). After each iteration and global fine-tuning, whether the output result satisfied the termination condition was verified. If not, the TRS-DBN training and the global fine-tuning optimization would be performed again until the trained TRS-DBN satisfied the termination condition. If the termination condition was satisfied, the TRS-DBN training was completed. Finally, the test samples were selected and the trained TRS-DBN model was used for change detection to output the test results and evaluate the detection performance of TRS-DBN. In the proposed study, the change state of hyperspectral image pixels in different phases was divided into “change” and “non-change”. The detection results of different time-phase HSRS image changes were evaluated according to Precision P, Recall R, F-Score and time consumption T. N1 is the number of change pixels correctly detected in the experimental region, N2 represents the number of all changed pixels detected in the experimental region, and N3 denotes the number of actually changed pixels in the experimental region. P, R and F are defined in Formulas (19), (20), and (21): P  N1/ N 2 100%

(19)

R  N1/ N 3 100%

(20)

2 P  R 100% (21) PR F is the harmonic mean of P and R. As the evaluation index of the above two indicators, it can reflect the overall performance of the algorithm. F value can alleviate the possible contradiction between P and R to a certain extent. The larger the F value, the better the performance of the algorithm [38]. T mainly reflects the efficiency of pixel change detection, which is another important performance indicator in addition to the change detection accuracy. F

6. Experiment and result analysis 6.1 Experimental region and sample selection

14 (1) Basic data and pre-processing To improve the credibility of the research results, two different-temporal AVIRIS HSRS images and two different-temporal EO-1 Hyperion HSRS images (L1G level) are taken as the experimental data sources. The former covered parts of Los Angeles, California, and the latter was located near Xinyang City, Henan Province, China. Both AVIRIS and EO-1 Hyperion images are extensively used HSRS images. The AVIRIS airborne images have 224 visible and infrared bands with the spatial resolution of 15.5 meters. The spectral range is 0.2~2.4μm, and the spectral resolution is 10nm. There are 242 bands (35 bands of visible light, 35 bands of near-infrared, 172 bands of short-wave infrared) in the EO-1 Hyperion data (L1G). The spectral range, spectral resolution and spatial resolution of EO-1 Hyperion images are 0.4~2.5μm, 10nm and 30m. Two representative experimental regions are selected from the coverage areas of AVIRIS and EO-1 Hyperion HSRS images (a total of four experimental regions). The details of each experimental region are shown in Table 1. In addition to the above-mentioned double-temporal AVIRIS and EO1 Hyperion images, a large number of land use change data of the above four experimental regions were obtained through field investigation and related historical data collection, which are the main sample sources of HSRS images change detection experiments in this study. Table 1. Details of the four experimental regions Region

Sensor

Size

Central location

Imaging

Name

type

(pixels)

(rectangle)

time

Address Part of Eastvale city,

Region 01

AVIRIS

147×316

33°56'9.08"N,

2015-10-21

Riverside

117°36'30.29"W

2018-06-25

California, USA

33°12'58.85"N,

2015-10-21

Salton Sea, San Diego

115°35'35.88"W

2018-06-25

City, California, USA

county,

Nearby Mullet island in Region 02

AVIRIS

149×299

Nearby Panxin Town in Region 03

EO-1 Hyperion

185×271

31°55'21.17"N,

2003-03-06

Luoshan

114°26'58.81"E

2006-04-16

Xinyang

County, City,

Henan

Province Nearby Dalin Town and Region 04

EO-1 Hyperion

187×268

32°16'56.19"N,

2003-03-06

114°32'52.52"E

2006-04-16

Dongpu

Town

Luoshan Xinyang

in

County, City,

Henan

Province

Before carrying out the experiments, AVIRIS and EO-1 Hyperion images of the experimental regions are preprocessed, such as removal of the problem bands, radiation correction, bad lines repair, stripe removal and atmospheric correction. Different temporal HSRS images can be conducted using geometric registration. (2) Sample selection

15 There are a large number of water bodies, tidal flats, bare lands and grasslands, as well as a small amount of industry and mining land and forest land in Experimental Region 1. The types of land in Experimental region 2 contain grassland, forest land, bare land, construction land, industry and mining land, and a small amount of artificial water bodies and landscapes. The land use types in Experimental region 3 and Experimental region 4 contain grassland, forest land, bare land, fired land and water bodies. The training and test samples of the land use change detection in each experimental region contain two types of samples, namely, changed samples and unchanged samples. Through comprehensively analyzing the field survey data, historical data and high spatial resolution remote sensing images, a small number of the labeled training samples and some unlabelled samples are selected from each experimental region. The details of sample selection of each experimental region are shown in Table 2. After extracting the relevant samples, the training and test samples were normalized to eliminate the influence of different feature indexes on dimension, magnitude and data dispersion. Table 2. Sample selection of each experimental region Experimental

Labelled training

Labelled training

Unlabelled

Test

regions

samples(changed)

samples(unchanged)

training samples

samples

Region 1

40

40

800

46452

Region 2

50

50

1000

44551

Region 3

60

60

850

50135

Region 4

70

70

900

50116

6.2 TRS-DBN-based land use change detection experiment In this study, the TensorFlow technology provided by Google is used to build a TRS-DBN network consisting of multiple layers of Tensor3-RBM and one STM classifier. The data of each experimental region based on the TFS-cube model is directly processed, and the optimal value of the TRS-DBN network structure parameters can be determined by relevant empirical parameters and simulation experiments. According to the research results, when there are three Tensor3-RBMs in TRS-DBN and 192 hidden layer nodes in each Tensor3-RBM, the change detection accuracy of TRS-DBN will be the highest totally. In terms of different experimental regions, the TRS-DBN network converges gradually after being performed 500-600 times iterations and related global parameters fine-fining, and the overall performance of the change detection tends to be stable. Additionally, the number of neurons in input layer (the first Tensor-RBM visible layer) of the entire TRS-DBN is equal to the product of the three lengths of the three dimensions in TFS-cube (i.e., 16×2×3=96). There is one neuron in the output layer (the output of the STM classifier) (the change detection results only have two categories : changed and unchanged, 1 for changed, and 0 for unchanged). Land use change detection is carried out in the four experimental regions of Region 1~Region 4 using TRS-DBN method. The land use type identification standard refers to the Classification Standard of Land Use Status (GB/T 21010-2017) promulgated by relevant Chinese authorities in 2017. The change detection results are presented in Figures 8-11.

16

Figure 8. Original HSRS images (AVIRIS images, RGB: 37/18/9) and detection results of land use change in experimental region 1: (a) Time phase 1: 2015-10-21; (b) Time phase 2: 2018-06-25; (c) Actual change reference binary image; (d) TRS-DBN detection result

Figure 9. Original HSRS images(AVIRIS images, RGB: 37/18/9) and detection results of land use change in experimental region 2: (a) Time phase 1: 2015-10-21; (b) Time phase 2: 2018-06-25; (c) actual change reference binary image; (d) TRS-DBN detection result

17 Figure 10. Original HSRS images(Hyperion image, RGB: 164/48/31) and detection results of land use change in experimental region 3: (a) Time phase 1: 2003-03-06; (b) Time phase 2: 2006-04-16; (c) actual change reference binary image; (d) TRS-DBN detection result

Figure 11. Original HSRS images(Hyperion image, RGB: 34/20/12) and detection results of land use change in experimental region 4: (a) Time phase 1: 2003-03-06; (b) Time phase 2: 2006-04-16; (c) actual change reference binary image; (d) TRS-DBN detection result In Figures 8-11, subimage (a) represents the original HSRS image of the first time phase, subimage (b) represents the original HSRS image of the second time phase, subimage (c) shows the actual land use change reference binary image in each experimental region (white for changed, and black for unchanged), and subimage (d) describes the TRS-DBN detection result of land use change in each experimental region, RGB: a/b/c represents the number a, b and c band in a HSRS image correspond to the red, green and blue channel in the related RGB image. According to Fig. 8-11, although different compositions and distributions of the surface objects in the above four experimental regions and a small number of land use changes are mis-detected by mistake and omitted, TRS-DBN method can correctly detect most land use changes in the four experimental regions. The analysis of the relevant performance for TRS-DBN land use change detection in each experimental region is shown in Table 3. The average recall and average precision of land use change detection in the above four experimental regions using the TRS-DBN method are 93.6860% and 94.8477%, respectively, and he average F value is up to 0.9425, indicating that TRS-DBN could accurately detect land use changes in different-temporal HSRS images. Table 3. Analysis of change detection results in each experimental region Experimental

N1

N2

Region 1

11154

11857

Region 2

4755

Region 3 Region 4

regions

Averages

N3

P (%)

R (%)

F

12300

94.07

90.69

0.9235

5005

4911

95.01

96.82

0.9591

788

828

836

95.17

94.26

0.9471

490

515

527

95.15

92.98

0.9405

94.85

93.69

0.9425

18 In Experimental region 1, a small number of tidal flats are mis-detected as land use change due to flooding of water bodies. Sandy land with low water content and grassland with sparse vegetation were possibly to be confused with bare land, leading to false detection or omission. In experimental region 2, the grasslands with a large change in vegetation density or color (such as from green to blight yellow) are mis-detected as having had a change in land use type., and a small part of the bare lands with a large difference in brightness or color are also mis-detected. In experimental region 3, some changes in tidal flats on the river banks are mis-detected. In addition, the grasslands or woodlands with obvious changes in vegetation density, and the bare lands with large differences in color (such as the occurrence of fire) are both easily mis-detected as a change in land use. The mis-detected or omitted land use changes in Experimental region 4 are substantially the same as that in Experimental region 3. The main reasons for the above-mentioned false detection or omitted detection include the low spatial resolution of HSRS images, high proportion of mixed pixels and the obvious phenomenon that same surface objects with different spectrums or different surface objects with the same spectrums. For example, the tidal flats with low water content and high brightness are more likely to be confused with the bare lands. In addition, another important reason is that the Classification Standard for Land Use Status issued by relevant Chinese authorities in 2017 (GB/T 21010-2017) was not only classify land use according to the spectral or spatial characteristics of the surface objects, but also considered the social function of surface objects and convenience of land management. 6.3 Comparison of TRS-DBN and other change detection methods In this study, other five representative similar methods, namely, the HSRS image change detection method based on DBN and combination of feature vectors (CDHRS-VDBN) [39], the HSRS change detection method based on tensor wavelet texture (CDHRS-TWT) [23], the HSRS image change detection method based on feature subspaces (CDHRS-SCD) [3], the HSRS image change detection based on mixed pixels decomposition (CDHRS-MPD) [5] and the HSRS image change detection based on independent component analysis (CDHRS-ICA) [6] are employed to carry out land use change detection with TRS-DBN for comparison experiments in the four experimental regions (Region1~Region4). The comprehensive performance of the above various methods are evaluated according to average precision (P_avg), average recall (R_avg), average F value (F_avg) and average time consumption (T_avg). Table 4 shows the comparison experiment results. Table 4. Performance comparison of different detection methods Detection Methods

P_avg (%)

R_avg (%)

F_avg

T_avg(seconds)

TRS-DBN

94.85

93.69

0.9425

342.5

CDHRS-VDBN

90.61

90.92

0.9075

262.7

CDHRS-TWT

92.55

91.42

0.9197

200.32

CDHRS-SCD

88.43

89.49

0.8894

181.9

CDHRS-MPD

86.76

87.88

0.8730

172.3

CDHRS-ICA

87.14

88.12

0.8761

131.4

19 According to Table 4, although the T_avg of TRS-DBN method is 1.31~2.61 times that of other methods, the P_avg, R_avg and F_avg are higher than the others, reaching 94.85%, 93.69% and 0.9425. The detection accuracy of CDHRS-TWT is only second to TRS-DBN, but its T_avg is significantly lower than TRS-DBN. The change detection accuracy of CDHRS-VDBN is second only to CDHRS-TWT, and the T_avg is slightly higher than CDHRS-TWT. CDHRS-SCD method has lower detection accuracy than CDHRS-VDBN, and its T_avg is lower than CDHRS-TWT. The detection accuracy of CDHRS-MPD and CDHRS-ICA is relatively close and lower than that of CDHRS-SCD. However, their T_avg is lower than CDHRS-SCD. The change detection accuracy of CDHRS-MPD is the lowest, and the T_avg of CDHRS-ICA is the lowest. CDHRS-VDBN uses the combined feature vector composed of pixel spectral difference vector, endmember abundance difference vector and angle of pixel spectral feature vectors to construct the underlying features of HSRS image, and detects the land use changes through DBN deep learning. Compared with the traditional methods, its change detection accuracy is greatly improved. However, CDHRS-VDBN only considers the spectral feature information without considering the spatial feature information, and only adopts vectors to organize the underlying features of HSRS images, which might destroy the time-space-spectrum structure constraints of remote sensing data and lead to the loss of partial underlying features. Therefore, the change detection accuracy of CDHRS-VDBN method is significantly lower than that of the TRS-DBN. CDHRS-TWT describes the space-spectrum feature change and performs weighted fusion on the wavelet texture and spectral features change information through tensor wavelet, effectively overcoming the influence of feature dimension and achieving high detection accuracy. However, it only uses automatic threshold selection method based on FOC (receiver operating characteristic) to realize the unsupervised change detection, limiting the generalization ability for complex tensor change information. Consequently, its change detection accuracy is also lower than TRS-DBN. CDHRS-SCD, CDHRS-MPD and CDHRS-ICA have good performance in HSRS images change detection, but they lack effective feature organization modes and their utilization rate of underlying features in HSRS images is low compared with other methods using deep learning and tensor model. Moreover, their ability for complex functions expression is limited in the case of restricted samples and computational units, leading to the lower accuracy of HSRS image change detection. CDHRS-SCD targets the pixels in the second-temporal image, and constructs the background subspace through the corresponding pixels and other related information in the first-temporal image. The orthogonal subspace mapping method is used to calculate the distance from the background subspace to judge whether the pixels are abnormal or not, and the abnormal pixels are taken as the changed objects. However, there are some drawbacks in this method, for example, the detection accuracy for the subtle spectral change needed to be improved, and the subspace construction mode is complicated and mainly dependent on the artificial experience. CDHRS-MPD first determines the number of dimensions and the number of endmembers, and then extracts the pure spectral features (endmembers) of the surface objects to calculate the abundance maps. Finally, the changes are detected through mixed pixels decomposition and abundance maps. CDHRS-MPD only considers the spectral feature information of HSRS image and neglects the spatial feature information, which leads to a limited change detection accuracy. CDHRS-ICA decomposes the mixed pixels through ICA and carries out skewness-based independent component analysis for

20 endmember difference image. The change of single-class surface objects is shown in the different component images to extract the change information. This method can maintain low mis-detection rate while obtaining the higher detection rate. However, the method also only uses spectral feature information without considering the spatial feature information. In addition, ICA decomposition might cause loss of underlying features, which will limit the change detection accuracy. In comparison of the other 5 methods, the proposed TRS-DBN overall has better performance. In addition, since TRS-DBN is a semi-supervised learning method, the automation level is higher than other supervised HSRS image change detection methods. 7. Analysis and discussion Due to the low spatial resolution of HSRS images, the textures around the pixels are vague, and mixed pixels occupy a large proportion. Therefore, HSRS image change detection is a complex problem to be solved. Compared with the other existing methods, the proposed TRS-DBN method can achieve good effects in HSRS image change detection. However, there are still some limitations: (1) The mixed pixels problem needs to be solved Although the TFS-Cube model considers the time-space-spectrum features comprehensively, the spectral dimension bases on the original spectrum of pixels, which is essentially a mixed spectrum. This can effectively reduce the complexity of the TFS-Cube model, but the change detection accuracy of TRS-DBN is decreased. There are a great number of mixed pixel decomposition methods for HSRS image. However, most methods have complex processes and lack generality. How to efficiently extract endmember spectrum in HSRS image and improve the description of spectral changes in TFS-cube tensor model needs to be further studied. (2) The time complexity of TRS-DBN needs to be further reduced Due to the comprehensive use of DBN deep network and TFS-Cube tensor model, the time consumption of TFS-Cube is significantly higher than that of other methods. The time consumption of TRS-DBN can be decreased through improving the performance of computer hardware (like adding GPU module) or simplifying the algorithm process. 8. Conclusions The proposed TRS-DBN method makes full use of the advantages of tensor model, deep belief networks, support tensor machine and 3D-DWT wavelet texture extraction technology, and uses TFS-cube tensor model to organize and describe the underlying feature changes of HSRS images. TRS-DBN deep learning network composed of multilayer Tensor-RBMs and STM classifier is used to realize HSRS image change detection, so as to effectively improve the change detection accuracy of HSRS images. Compared with the other five representative methods (CDHRS-VDBN, CDHRSTWT, CDHRS-SCD, CDHRS-MPD and CDHRS-ICA), the proposed TRS-DBN method overall has better performance. The average recall and average precision are up to 94.85% and 93.69% respectively, and the average F value is 0.9425. Certainly, the proposed study also had some shortcomings, for example, how to efficiently extract endmember spectra in HSRS images and improve the description of the spectral dimension in TFS-cube tensor model, and how to reduce the

21 time complexity of the TRS-DBN method. Future research will propose feasible solutions to the above problems so as to further enhance the accuracy, efficiency and automation level of change detection for HSRS images.

Acknowledgments This work was funded by the National Natural Science Foundation of China (NSFC, 41501451), the Program for New Century Excellent Talents in Fujian Province Universities (MinJiaoKe[2016]23) and the Program for Outstanding Youth Scientific Research Talents in Fujian Province Universities (MinJiaoKe[2015]54). The authors would like to thank Zhengyuan Mao and Yinan He for their assistance, suggestions, and discussions. References [1] Lu,D. ;Mausel,P.;Brondízio,E.;Moran,E. Change detection techniques. International Journal of Remote Sensing. 2004, 12, 2365-2407. [2] Du,Q. A new method for change analysis of multi-temporal hyperspectral images. The 4th Workshop on Hyperspectral Image and Signal Processing (WHISPERS), 2012. [3] Wu,C.; Du,B. ;Zhang,L.P. A Subspace-Based Change Detection Method for Hyperspectral Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.2013, 2,815-830. [4] Jiang,B.T.; Fan,B.; Zhang,X.L. A Novel hange Detection Method for Hyperspectral Images Based on Multisource Target Feature Support. China Patent, 201210247785.3, 2012-12-12. [5] Ertürk, A.; Plaza,A. Informative Change Detection for Hyperspectral Images by Unmixing. IEEE Geoscience and Remote Sensing Letters.2017, 6,1252-1256. [6] Wu,C.;Du,B.;Zhang,L.P. Hyperspectral change detection based on independent component analysis. Journal of Remote Sensing. 2012, 3,545-561. [7] Zhang, L.F. Tensor Representation and Manifold Learning Methods for Remote Sensing Images. Wuhan University, Wuhan, China, 2013. [8] Guo,X.;Huang,X.;Zhang L.F., et al. Tensor Subspace Based Multiway Filtering Algorithm for Hyperspectral Images. Acta Geodaetica et Cartographica Sinica. 2013, 2, 253-259,267. [9] Reynolds, K.M. Analysis of epidemics using spatio-temporal autocorrelation. Phytopathology. 1988, 2,240-246. [10] Cai,Y.Y. Relationship between semi-variance function and mathematical morphology in texture features extraction of remote sensing images. West-China Exploration Engineering. 2010, 7,131-135. [11] Huang,X.; Zhang,L.P.; Li,P.X. Methods for Classification of the High Spatial Resolution Remotely Sensed Images Based on Wavelet Transform. Geomatics and Information Science of Wuhan University. 2006, 1, 66-69. [12] He,Y.T.; Ke,C.Q. Application of Semivariogram Texture Distilling for Remote Sensing Image Classification. Remote Sensing Technology and Application. 2008, 5,571-575.

22 [13] Yoo, H.Y.; Lee, K.; Kwon,B.D. Quantitative indices based on 3D discrete wavelet transform for urban complexity estimation using remotely sensed imagery.International Journal of Remote Sensing.2009, 23,6219-6239. [14] Ma,W.; Manjunath,B. A comparison of wavelet transform features for texture image annotation. Proceedings of the IEEE Int Conf Image Processing. Washington, DC, USA,1995. [15] Wang, Z.; Yong, J. Texture analysis and classification with linear regression model based on wavelet transform. IEEE Transactions on Image Processing.2008, 8, 1421-1430. [16] Manian,V.; Vasquez,R. Scaled and rotated texture classification using a class of basis functions. Pattern Recognition.1998, 12, 1937-1948. [17] Borah,S.; Hines,E.L.; Bhuyan,M. Wavelet transform based image texture analysis for size estimation applied to the sorting of tea granules. Journal of Food Engineering. 2007, 2,629-639. [18] Guo,D.J.; Song,Z.C. A Study on Texture Image Classifying Based on Gray-level Co-occurrence Matrix. Forestry Machinery & Woodworking Equipment.2005, 7, 21-23. [19] Feng,J.H.;Yang,Y.J. Study of Texture Images Extraction Based on Gray Level Co-Occurence Matrix. Beijing Surveying and Mapping.2007, 3, 19-22. [20] Zhao, L.; Wang, J.Q.; Deng, M.; Huang, J. Spatial-temporal autocorrelation model of road  network based on traveling time. Journal of Central South University (Science and Technology). 2012, 10,365-373. [21] Kamarianakis,Y.; Prastacos,P. Space-time modeling of traffic flow. Computers & Geosciences. 2005, 2,119-133. [22] Huang, K.; Mao, Z.Y. Vegetation Change Detection Using Spatiotemporal Autocorrelation Index. Remote Sensing Information.2016, 3, 37-44. [23] Guo, X. A tensor based image noise reduction, feature extraction and classification framework for remotely sensed imagery. Wuhan University, Wuhan, 2015. [24] Huang, C.; Davis, L.S.; Townshend, J.R.G. An assessment of support vector machines for land cover classification. International Journal of Remote Sensing. 2002, 4,725-749. [25] Lu,C.T.; Li,F.Z.; Zhang,L.; Zhang,Z. Least Squares Semi-supervised Support Tensor Machine. Pattern Recognition and Artificial Intelligence. 2016, 7, 633-640. [26] Zeng,K.; He,L.F.; Yang,X.W. Multilinear principle component analysis based support higherorder tensor machine. Journal of Nanjing University (Natural Sciences). 2014, 2,219-227. [27] Bezdek, J.C.; Hathaway, R.J. Some Notes on Alternating Optimization. Lecture Notes in Computer Science. 2002, 4, 288-300. [28] Cortes,C.; Vapnik,V. Support-Vector Networks. Machine Learning. 1995, 3,273-297. [29] HINTON, G.E.; OSINDERO, S.; THE, Y.W. A fast learning algorithm for deep belief nets. Neural Computation.2006, 7, 1527-1554. [30] Kang,Y.; Lu,M.C.; Yan,G.W. Soft Sensor for Ball Mill Fill Level Based on DBN-ELM Model. Instrument Technique and Sensor.2015, 4, 73-75, 92. [31] Wang,Y.H.; Di,K.S.; Zhang,S.; Shang,C.; Huang,D.X. Melt index prediction of polypropylene based on DBN-ELM. CIESC Journal. 2016, 12, 5163-5168. [32]Chen,H.J.; Huang,M.L.;Jiang,L.;Tao,J.H.;Yang,J.H.;Zheng,G.Z.;Wu,Z.H. A remote sensing image automatic annotation method based on deep learning. China Patent, 201410039584.3, 201405-28.

23 [33] Li,X.L; Zhang,Z.X.; Wang,Y.H.; Liu,Q.J. Aerial Images Categorization with Deep Learning. Journal of Frontiers of Computer Science & Technology.2014, 3,305-312. [34] Lv,G.; Hao,P.; Sheng,J.R. On applying an improved deep neural networks in tiny image classification. Computer Applications and Software.2014, 4,182-184,213. [35] Lv,Q.; Dou,Y.; Niu,X.; Xu,J.Q.; Xia,F. Remote Sensing Image Classification Based on DBN Model. Journal of Computer Research and Development.2014, 9, 1911-1918. [36] Qi,G.; Sun,Y.;Gao,J.; Hu,Y.; J,L. Matrix Variate Restricted Boltzmann Machine. Proceedings of the International Joint Conference on Neural Networks.2016,389-395. [37] Sun,Y.F.; Liu,S.M.; Hu,Y.L.;Yin,B.C. Image recognition method based on matrix variables and Gauss Boltzmann machine model. China Patent, 201710141534.X, 2017-3-10. [38] OK, A.O. Automated Detection of Buildings from Single VHR Multispectral Images Using Shadow Information and Graph Cuts. ISPRS Journal of Photogrammetry and Remote Sensing. 2013, 86, 21-40. [39] Huang, F.H.; Chao,X.; Zhu,Z.Z. Change Detection of Hyperspectral Remote Sensing Images Based on Deep Belief Network. International Journal of Earth Sciences and Engineering.2016, 5, 2096-2105.