Automatic extraction of urban impervious surfaces based on deep learning and multi-source remote sensing data

Automatic extraction of urban impervious surfaces based on deep learning and multi-source remote sensing data

Accepted Manuscript Automatic Extraction of Urban Impervious Surfaces Based on Deep Learning and Multi-Source Remote Sensing Data Fenghua Huang, Ying ...

1MB Sizes 0 Downloads 66 Views

Accepted Manuscript Automatic Extraction of Urban Impervious Surfaces Based on Deep Learning and Multi-Source Remote Sensing Data Fenghua Huang, Ying Yu, Tinghao Feng PII: DOI: Reference:

S1047-3203(18)30378-X https://doi.org/10.1016/j.jvcir.2018.12.051 YJVCI 2416

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

29 November 2018 30 December 2018 31 December 2018

Please cite this article as: F. Huang, Y. Yu, T. Feng, Automatic Extraction of Urban Impervious Surfaces Based on Deep Learning and Multi-Source Remote Sensing Data, J. Vis. Commun. Image R. (2018), doi: https://doi.org/ 10.1016/j.jvcir.2018.12.051

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Automatic Extraction of Urban Impervious Surfaces Based on Deep Learning and Multi-Source Remote Sensing Data Fenghua Huang 1,2*, Ying Yu1,2,Tinghao Feng3 1

Spatial Data Mining and Application Research Center of Fujian Province, Yango University, Fuzhou 350015, China; [email protected] 2 Information Engineering College, Yango University, Fuzhou 350015, China; [email protected] 3 College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC28223, USA; [email protected] * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +86-0591-8397-6047. Abstract: The conventional methods of urban impervious surfaces extraction mainly use the shallow-layer machine learning algorithms based on the medium- or low-resolution remote sensing images, and always provide low accuracy and poor automation level because the potential of multisource remote sensing data are not fully utilized and the low-level features are not effectively organized. In order to address this problem, a novel method (AEIDLMRS) is proposed to automatically extract impervious surfaces based on deep learning and multi-source remote sensing data. First, the multi-source remote sensing data consisting of LIDAR points cloud data, Landsat8 images and Pléiades-1A images are pre-processed, re-sampled and registered, and then the combined features of spectral, elevation and intensity from the multi-source data are denoised using the minimum noise fraction (MNF) method to generate some representative MNF features. A small number of reliable labelled samples are automatically extracted using the fuzzy C-means (FCM) clustering method based on the MNF features. Secondly, the convolutional neural network (CNN) is used to extract the representative features of the neighborhood windows of each pixel in the fused Pléiades-1A image through multi-layer convolution and pooling operations. Finally, the combined features of MNF features and CNN features are pre-learned via the deep belief network (DBN). The DBN parameters are globally optimized jointly using the Extreme Learning Machine (ELM) classifier on the top level and the small set of labelled samples extracted via FCM, and the urban impervious surfaces are distinguished from others based on the trained ELM classifier and morphological operations. Experiments are performed to compare the proposed method with other three related methods in three different experimental regions respectively. Experimental results demonstrate that AEIDLMRS has better accuracy and automation level than the others under relatively good efficiency, and it is more suitable for the extraction of complex urban impervious surfaces. Keywords: multi-source remote sensing data; deep learning; extraction of urban impervious surface; ELM classifier; fuzzy C means clustering

2 1. Introduction It is a great challenge to extract urban impervious surfaces represented by houses, roads and parking lots due to the diversity of urban underlying surfaces, the complexity of materials for artificial ground objects and the heterogeneity of ground surface. The rapid advance in remote sensing technologies leads to a massive accumulation of multi-spectral, high-spectral and highspatial-resolution remote sensing data from various sensors as well as the airborne laser radar data (such as LIDAR points cloud). The remote sensing technologies have been used more and more for the extraction of urban impervious surfaces [1]. In traditional approaches for impervious surface extraction [2-3], the differences of the reflection spectrum characteristics from different ground objects in the impervious surfaces are analyzed via the spectral analysis and mixed pixels decomposition. The data is mainly collected from the medium- and low-resolution remote sensing images, and thus the mutual interferences between the spectrums from different ground objects are very serious, decreasing the accuracy of impervious surfaces extraction. In order to address the problem, several researches have proposed to combine the object-oriented segmentation algorithms with the spectral analysis methods[3-5], which extracts and combines the spectrum, texture and shape features of impervious surfaces from the medium- and high-spatial-resolution images, and then effectively extract the impervious surfaces using various supervised or unsupervised machine learning algorithms. Some authors[6] proposed to collect data mainly from the hyperspectral remote sensing images, which can provide a broad variety of spectral features concerning visible light, nearinfrared and short-wave infrared ranges, and then extract the urban impervious surfaces using the mixed pixels decomposition algorithms. This method can be more effective than the traditional methods based on multi-spectral images. In addition to the use of the 2-D remote sensing images, some other authors[7-9] proposed to jointly utilize multi-source remote sensing data including the multi-spectral images, high-spatial-resolution images and the airborne laser radar data (such as LIDAR points cloud) for extracting the urban impervious surfaces. They made full use of the features of visible light, near-infrared radiation, thermal infrared radiation and elevation extracted from the above multi-source remote sensing data, and then achieves more accurate extraction of the urban impervious surfaces [7-9]. However, currently, most of existing urban impervious surface extraction methods based on multi-source data are performed by the shallow machine learning algorithms, and they generally suffer from the low utilization rate of low-level features, over reliance on artificial experiences in the selection and combination optimization of feature vectors, the poor automation level in extraction process and the unsatisfactory extraction accuracy. Recently, deep learning becomes one of the hotspots in the field of machine learning and it is characterized by the unique ability in automatic feature learning and the remarkable strength in representation and fitting for non-linear complicated functions [37, 38]. By combining low-level features, it can generate more abstract high-level representations, properties or features. Due to its vast superiority over the traditional machine learning algorithms, the deep learning has been successfully used for various visual recognition tasks such as classification of surface objects, objects detection and change detection in remote sensing images [10]. Currently, the deep learning frameworks mainly include four prevalent algorithms of the deep belief network (DBN), convolutional neural network (CNN), automatic encoder (AE) and recurrent neural network (RNN),

3 each of them has their unique functions and usage respectively. In detail, DBN is a semi-supervised learning algorithm, which consists of several hidden layers (or Boltzmann machines) and needs a few labelled samples for the optimization and fine tuning of global parameters. CNN is a supervised learning algorithm, which mainly consists of multiple convolutional layers and related pooling layers and an optional fully connected layer, being good at the machine learning for images (especially big images). As an unsupervised learning algorithm, AE consists of several hidden layers, and it is usually used for dimensionality reduction and denoising. RNN is more suited for the regression analysis or classification of objects when the input is continuous time sequences. Despite their success, these algorithms are rarely used for extraction of urban impervious surfaces from multisource images. In this paper, a new method (AEIDLMRS) is proposed to improve the accuracy and effect of automatically extracting urban impervious surfaces based on deep learning and multi-source remote sensing data. AEIDLMRS can make full use of the features from the multi-source data and jointly use CNN and DBN to combine and deeply learn the features, and finally the ELM classifier is used to extract urban impervious surfaces. 2. Related Models and Theories 2.1 CNN CNN is the most widely used algorithm for deep learning currently, first proposed by K. Fukushima [11] in 1980 as a feedforward neural network. Compared with the traditional machine learning algorithms, CNN can learn more robust features, being invariant to the translation, scaling and distortion of input samples. It is thus suited for learning and reconstructing complex features. A typical CNN consists of an input layer, several convolutional layers and corresponding pooling layers and an optional fully connected layer [12-15]. The convolutional and pooling layers are commonly responsible for a feature extraction. The fully connected layer is usually used as a regressor or a classifier. The typical process of CNN is illustrated in Figure 1. CNN without fully connected layer or classifier can be used directly for feature learning, otherwise it can be used for supervised learning. The feature learning process of CNN is mainly implemented through the alternate operations of convolutional layers and pooling layers. Specifically, network weight parameters are considerably pruned using the local connection, weight sharing and spatial down-sampling techniques, thereby improving the representation ability of the learned features.

Figure 1. Typical process of CNN

4 In CNN, an input data to each neuron in the convolutional layer is connected with a local receptive field (i.e. a set of connected and neighboring local units) in the previous layer, and thus the local feature is extracted [14]. That is, multiple 2-D matrixes in the previous layer are the input to the convolutional layer. The convolution operation is then performed on the input matrixes of the convolutional layer with trainable kernels. Then, a feature tensor (multiple output 2-D feature maps) is generated using the activation function as equation (1) [15] below: x nj  F (  xin 1  kijn  b nj )

(1)

iM nj

Where, F and

M

n j

respectively denote the activation function (the sigmoid function in this paper)

and the index of the input matrix corresponding to the j-th output feature map in the n-th layer. x nj denotes the j-th output of the n-th layer, “*” denotes the convolution operation, kijn denotes the convolutional kernel corresponding to the i-th index in the indexes from the j-th output of the n-th layer, and b nj denotes the bias term of the j-th output of the n-th layer. The pooling layer is mainly responsible for abstracting each feature map in the above convolutional layer and generates the down-sampled feature maps. The down-sampling process of the pooling layer is defined as equation (2) [15-17] below:

x nj   jn D( x nj 1 )  bnj

(2)

Where, D(∙) denotes the down-sampling function. This function can be used to calculate the weighted sum, the maximum or other values (the mean square deviation is used in this paper) of each n×n neighborhood window in the input images (the output feature maps from convolutional layers) and generate a representative value. The length and width of the output images are both changed into 1/n of the original and every output includes a multiplicative bias β and an additive bias b. The feature maps outputted from convolutional layers can be calculated by the equation (2) and generate the feature maps of down-sampled layers. CNN can be applied directly to the machine learning of two-dimensional visual images. It can extract translation- and distortion-invariant features in a distributed manner from original pixels and perform classification. Currently, CNN has been successfully applied to the fields of paper fragments stitching [12], handwritten digits recognition [13], images classification [14] and plant leaves classification [15] and so on. 2.2 ELM-based DBN (DBN-ELM) DBN is a deep learning network model proposed by G.E. Hinton [18] in 2006. DBN consists of a multi-level unsupervised restricted Boltzmann machines (RBM) and an additional top-level supervised back-propagation neural network. The former is mainly responsible for unsupervised feature learning and the latter is used for the global parameters optimization and samples classification. Unlike AE and CNN, DBN is a semi-supervised learning algorithm which combines the merits of supervised learning and unsupervised learning. DBN is ideal deep learning algorithm

5 when there are only few labelled samples or it is very difficult to get the labelled samples. Due to its excellent feature learning ability, DBN can combine the low-level features into more abstract highlevel representations, thereby facilitating classification or prediction [19-20]. In DBN, the RBMs are pre-trained layer by layer, and the output of the hidden layer of a RBM is the input to the visible layers of the RBM in the next layer. During the tuning stage, the last layer of the network is trained in a supervised manner, and the differences between the actual outputs and the expected outputs are estimated and back-propagated layer by layer, and the weights of the entire DBN can be fine-tuned and achieve global optimization [21-22]. As a probability generation model consisting of multiple levels of RBM, DBN learns the initial parameters between two neighboring RBM layers using the greedy algorithm layer by layer [23]. The traditional DBN algorithms only adopt the gradient-based global optimization algorithms, and they are very time-consuming and easier to fall in local optimum. The extreme learning machine (ELM) is a machine learning algorithm based on the single hidden layer feed-forward neural network (SLFN) [19]. ELM doesn’t need an iterative training process, but only obtains the minimal norm least square solution[19], which is more efficient than the traditional multi-layer neural networks (e.g., the BP algorithm). In this paper, the effects of global parameters optimization and classification in DBN are improved by substituting the ELM classifier for the traditional neural network at the top of DBN. The structure of DBN-ELM is illustrated in Figure 2.

Figure 2. The structure of DBN-ELM Each RBM consists of a hidden layer and a visible layer. The visible layer is mainly responsible for filtering inputs, while the hidden layer is used for processing, analyzing and producing output data. There are two-way connections between the neurons in the hidden layer and the corresponding visible layer. The neurons in each layer are avoided mutual connection to promote parallelism. The structure of RBM in DBN [18] is shown in Figure 3.

6

Figure 3. Structure of RBM in DBN Where, m and n denote the number of neurons in the hidden layer and the corresponding visible layer, v=(v1,v2…..vn) denotes the vector of the visible layer, h=(h1,h2….hm) denotes the vector of the hidden layer, vi and hj denote the status of the i-th neuron in the visible layer and the j-th neuron in the hidden layer, c1~cm and b1~bn denote the bias terms in the hidden layer and the corresponding visible layer, Wij denotes the weight of connection between nodes hi and vj. Because RBM is a energy-based model, the key to build a RBM in DBN is to obtain the joint probability distribution P(v, h) at (v, h) as equations (3), (4) and (5)[24] below: 1 (3) P(v, h)  e E ( v ,h ) M (4) M   e E ( v,h ) v

h

E(v, h)  bv  ch  hWv

(5)

Where, M denotes the normalization factor. To compute the optimal values of the parameter vectors (e.g., b, c and W) and the activation probability of each neuron in a RBM is the key element to obtain P(v, h). Each RBM in DBN uses iterative methods to train samples, so as to get the values of parameter vectors (e.g., b, c and W) and fit the given training data. Because the neurons in the hidden layer and corresponding visible layer lack of mutual connections, the activation status of neurons in a layer is independent of one another when the state of neurons in any layer in RBM is given. Therefore, the activation probabilities of the i-th neuron in the visible layer and the j-th neuron in the hidden layer are computed as equations (6) and (7) [22] below: 1 (6) p(vi  1)   ( bi  hW ) 1 e p(h j  1) 

1 1 e

 ( c j  vW )

(7)

2.3 Fuzzy C-means clustering (FCM) FCM is an improved algorithm for the traditional K means algorithm (K-means), and it is also an unsupervised fuzzy clustering algorithm. Unlike the traditional K-means algorithm restrict a sample must belong to a class only, FCM takes into account the universal principle of fuzziness and the uncertainty in the nature, and it allows a sample to belong to several classes in different degrees. The concept of membership degree is used to indicate the degree of each sample belonging to a certain class [25-26]. The principle of FCM clustering is maximizing the similarity of samples within the same class and maximizing the difference between samples in different classes, so as to achieve

7 automatic clustering of the input samples [27]. Compared with the traditional hierarchical clustering algorithms and K-means, FCM is more desirable and stable [27]. Consider a set X of n samples, i.e. X=[x1,x2,x3…xn]. The loss function of FCM clustering in X is defined in equation (8) [28-29] below: c

n

F   [ S j ( xi )]d xi  p j

2

(8)

j 1 i 1

Where, c denotes the pre-set number of classes, pj(j=l,2,…,c) denotes the clustering center of the j-th class , Sj(xi) denotes the membership function, i.e. the degree to which the i-th sample belongs to the j-th class, and d denotes a constant to control the level of fuzziness of the clustering results (d>1). In FCM, the sum of the membership degrees of each sample to all classes must be 1 as equation (9) below: c

 S ( x )=1, j  1, 2, , c. j 1

j

i

(9)

The minimum value of F in equation (8) can be calculated under the condition of satisfying equation (9).Let the partial derivative of F with respect to pi and S(xi) be zero. After iterative calculation, we obtain the equations (10) and (11) [28-29] below: n

 [S ( x )]

d

pj 

i 1 n

j

xi

i

 [S ( x )]

, j  1, 2, , c.

(10)

d

i 1

S j ( xi ) 

j

i

(1/ xi  pi )1/(d 1) c

 (1/ k 1

, i  1, 2,n ; j  1, 2, ,c .

(11)

1/( d 1)

xi  pk )

When the algorithm converges, pj and Sj(xi) can be obtained. Then, the fuzzy classification of all samples can be achieved using the membership degrees matrix. 3. The proposed method (AEIDLMRS) The urban impervious surfaces are generally very complex and contain a large number of buildings, artificial hard roads or squares (e.g., the ground is covered with cement, asphalt, slabstones or ceramic tiles) and bare rocks and so on. But it is still obviously different from the pervious objects on the ground (e.g., vegetation, bare earth, water body and earth road). In detail, most of the ground objects in the impervious surface have unique spectral reflectance features, infrared features, colors, shapes and sizes. In this paper, we propose to automatically extract urban impervious surfaces jointly using different deep learning algorithms and the features of spectrum, texture, elevation and thermal infrared in the multi-source data. The procedure of the proposed algorithm AEIDLMRS is illustrated in Figure 4.

8

Figure 4. Procedure of AEIDLMRS Firstly, the multi-source remote sensing data consisting of LIDAR points cloud data, Landsat8 images and Pléiades-1A images are pre-processed, re-sampled and registered, and then the combined features of spectral, elevation and intensity from the multi-source data are denoised using the minimum noise fraction (MNF) method to generate some representative MNF features. The airborne LIDAR is currently one of the most widely used high-resolution measurement technologies. The resolution of LIDAR data used in this work is 0.5 meter. The points cloud data of LIDAR can provide the information of elevation and intensity features. As well known, the impervious surfaces show a high radiance in the thermal infrared bands, but low reflectivity in the near-infrared bands which represent vegetation biomass. The reflectivity of soil and sandy grounds in the middle infrared and visible-light bands is higher than that of the impervious surfaces. Therefore, the infrared bands play an important role in distinguishing vegetation, soil and impervious surfaces. Due to its low spatial resolution, the Landsat8 data has to be resampled before it is registered and fused with other two types of imaging data (i.e. LIDAR points cloud and Pléiades-1A). In this work, we register and resample the four multi-spectral bands of Pléiades-1A, the two thermal infrared bands of Landsat8 and the two short-wave infrared (SWIR) bands of Landsat8 (combining with their respective panchromatic bands) so as to achieve the same resolution of 0.5 meter as that of LIDAR points cloud. Because the panchromatic band of Pléiades-1A has the same spatial resolution as that of LIDAR data, so the resampling process is not required. Then, the visible light, near-infrared,

9 medium-infrared and thermal-infrared bands are jointly used to compute the normalized vegetation index (NDVI), modified Normalized Difference Water Index (MNDWI) [30], modified normalized difference impervious surface index (MNDISI) [31] and normalized building index (NDBI) as equations (12),(13),(14) and (15) below: NDVI=(NIR-R)/(NIR+R) (12) MNDWI=(G-MIR)/(G+MIR) (13) MNDISI=[TR-(MNDWI+NIR+MIR)/3]/ [TR-(MNDWI+NIR+MIR)/3] (14) NDBI=( MIR-NIR)/( MIR+NIR) (15) Where, NIR denotes the first near-infrared band (0.75~0.95µm) of the Pléiades-1A image. R and G denote the visible red (0.60~0.72µm) and green (0.49~0.61µm) bands of the Pléiades-1A image, respectively. MIR and TR denote the first SWIR band (1.57~1.65µm) and the second thermalinfrared band (11.50-12.51µm) of the Landsat8 image, respectively.The feature vector Vmul=[ Lh, Li, NDVI,MNDWI,MNDISI,NDBI, W1~ W4] is obtained by combining the elevation feature Lh and intensity feature Li of the points cloud data, the four feature indexes of NDVI, MNDWI, MNDISI and NDBI, as well as the multi-spectral bands (W1~ W4) features of the Pléiades-1A image. Then, Vmul is denoised with MNF into VMNF which is a more abstract and robust high-level feature vector. Secondly, a small number of reliable labelled samples are automatically extracted using the fuzzy C-means (FCM) clustering method based on the MNF features. Based on VMNF, the input samples are clustered at pixel level through the FCM algorithm to generate a membership degree matrix, and a few labelled samples of the impervious surfaces are extracted as the input samples of global parameters optimization of DBN-ELM. In this work, two classes of impervious and pervious surfaces are set in FCM. After clustering, according to the membership degree matrix, select the pixels with larger values (top 30%) in the largest connected region as the labelled training samples. The value of clustering fuzzy degree constant d can be obtained by experiments. Thirdly, the convolutional neural network (CNN) is used to extract the representative features of the neighborhood windows of each pixel in the fused Pléiades-1A image through multi-layer convolution and pooling operations, the detailed process is illustrated in Figure 5. The pre-processed panchromatic band (W5) and the multi-spectral bands (W1~W4) of the Pléiades-1A image are fused via the Gram-Schmidt method. With the operations of C-layer convolutions and P-layer pooling in CNN in Figure 5, the abstract features of the k×k neighborhood windows of each pixel in the fused image is extracted as the representative features of the central pixels.

Figure 5. Feature learning process in CNN Finally, the combined features of MNF features and CNN features are pre-learned via the deep belief network (DBN). The DBN parameters are globally optimized jointly using the Extreme

10 Learning Machine (ELM) classifier on the top level and the small set of labelled samples extracted via FCM, and the urban impervious surfaces are distinguished from others based on the trained ELM classifier and morphological operations. VMNF and VCNN are combined into the feature vector VDBN=[VMNF,VCNN]. VDBN can be used as the input of the DBN-ELM classifier and learned with the small set of labelled samples for generating more essential features. The DBN-ELM network which consists of L layers of RBM (H neurons in each hidden layer) and ELM classifiers VDBN can be optimized and used to classify the input samples with the ELM classifier finally. Because AEIDLMRS is a pixel-wise algorithm for urban impervious surfaces extraction, the morphological open operation is performed on the ELM classification results to reduce the pepper noises. The results of morphological operation are regarded as the impervious surfaces extracted automatically in the end. The key factors for implementing AEIDLMRS are the structure of CNN and DBN, the number of iterations, the fuzzy clustering constant value and the size of neighboring window. The values of the above parameters can be obtained by experiences or experiments. 4. The Evaluation Methodologies of Impervious Surface Extraction The distribution of ground objects in urban impervious surfaces is highly complex. In urban remote sensing images, the actual ground surface corresponding to a single pixel usually consists of both impervious and pervious surfaces. Therefore, the classification results of these individual pixels cannot accurately reveal the distribution of urban impervious surfaces. To address this problem, each a×b rectangular research region is partitioned with square grids (the cell size is s×s). Let INT(x) denote the largest integer no more than the real number x. Then, (INT(a/s) +1) ×(INT(b/s) +1) experimental sub-regions are generated, including INT(a/s)×INT(b/s) square sub-regions (each subregion size is s×s)and INT(a/s)+INT(b/s)+1 rectangular sub-regions. The value of s can be determined by experiments based on the average scale of impervious objects. The partitioning method of a rectangular research region is illustrated in Figure 6.

Figure 6. Partitioning of a rectangular research region Next, AEIDLMRS estimates the percentage of impervious surfaces (PIS) for each experimental sub-region. The PIS is defined as the ratio of the area of impervious surfaces to the area of entire experimental sub-region. The real values (ground-truth values) of PIS in each sub-region can be computed through manual visual interpretation with airborne photographs and field investigation.

11 Given s=30.5 meters, the estimated and real values of PIS in some representative sub-regions are shown in Figure 7.

Figure 7. Estimated and real values of PIS in some representative sub-regions (s=30.5meters) Finally, the scatter diagram is plotted based on the values of estimated PIS and real PIS to show the accuracy and effectiveness of AEIDLMRS. The employed metrics for the analysis are the coefficient of determination (R2), root mean square error (RMSE) and the mean absolute error (MAE), they are defined as equations (16), (17), and (18)[3,32], respectively. n

R  1 2

 (P i 1 n

r

i

 (P i 1

 Pi e ) 2  P)

r

i

n

RMSE 

MAE 

 (P i 1

i

r

(16)

2

 Pi e )2

(17)

n

1 n r Pi  Pi e  n i 1

(18)

Where Pi e denotes the estimated value of PIS for the i-th sub-region, Pi r denotes the real value of PIS for the i-th sub-region, P denotes the average value of real PIS, and n denotes the number of experimental sub-regions. 5. Experimental Results and Analysis 5.1 Experimental data selection

12 In the experiments of this work, the multi-source remote sensing data of the experimental regions include one Pléiades-1A high-resolution remote sensing image (acquired in July 14, 2014), one Landsat8 multi-spectral image (Landsat Level 1T product, acquired in July 12, 2014) containing thermal infrared bands and some LIDAR points cloud data (LAS format, produced from April 23, 2014 to July 25, 2014) provided by Open Topography. The three types of data above were acquired with similar imaging angles and in similar time, so they can be fused after the pre-processing and related correction. The Landsat8 image has 11 bands, including 2 unique thermal infrared bands (10.60~11.19µm and 11.50-12.51µm) with a spatial resolution of 100 meters (can reach 30 meters after terrain correction), 2 unique SWIR bands (1.57~1.65µm and 2.11~2.29µm) with a spatial resolution of 30 meters and a panchromatic band with a spatial resolution of 15 meters. The Landsat 8 data has been processed through radiation correction, geometrical correction and terrain correction, the spatial resolution of the thermal infrared bands are 30 meters. The Pléiades-1A image has four multi-spectral bands (0.43~0.95µm) with a spatial resolution of 2 meters and 1 panchromatic band with a spatial resolution of 0.5 meters. The LAS points cloud data are generated through photogrammetric surveying based on the corresponding Pléiades-1A images. The values of NDVI, MNDWI, MNDISI and NDBI can be calculated by jointly using the visible light, near-infrared, medium-infrared and thermal-infrared bands of the multi-source data described above, meanwhile the elevation and intensity features can be also obtained from the LIDAR points cloud data. Three representative experimental regions A, B and C are randomly selected from the area covered by the multi-source data, and the regions are mainly located near Chonggang Town, Shizuishan City, Ningxia, China(in the southeast of the Helan Mountains), as shown in Figure 8. In order to show the elevation of various ground objects and LIDAR signal reflection intensity more intuitively, the 3-D points cloud in Figure 8 is observed at different angles [34-36]. Different colors denote different signal reflection intensity. Because AEIDLMRS is an unsupervised learning algorithm, the training samples of each sub-region are collected automatically rather than specified in advance. 5.2 Determination of AEIDLMRS’ Parameters AEIDLMRS involves 8 major parameters, i.e. the size k of the pixel’s square neighborhood window, the fuzzy clustering constant d in Equation (1), two parameters related to CNN (the number of convolution layers C and the number of pooling layers P), three parameters related to DBN (the number of hidden layers L, the number of neurons in the hidden layer H and the number of iterations M), and the size of square cell s in the grid( as shown in figure 6) used to partition the experimental regions. Based on the size of the urban ground objects in impervious surfaces and relevant experiments, the size of the pixels’ neighborhood windows in the Pléiades-1A fused image is set to 15×15 (i.e. k=15). The size of each square cell in the grid used to partition the experimental regions is set to 61×61 (i.e. 30.5meters×30.5meters). The optimal value of d is set to 1.5535 based on the experimental analysis performed by FCM, that is, when d=1.5535, the clustering result will be closest to the real sample categories and the highest classification accuracy will be achieved. AlexNet-like CNN structure is adopted in this work, the detailed structure of the employed CNN is shown in Figure 9.

13

(a-1)

(b-1)

(c-1)

(a-2)

(b-2)

(c-2)

Figure 8. RGB images of the Pléiades-1A data and 3-D effect images of points cloud data in experimental regions (spatial resolution of 0.5 meters). (a-1) RGB image of Region A (353 × 339); (a-2) 3-D points cloud effect of Region A; (b-1) RGB image of Region B (440 × 396); (b-2) 3-D points cloud effect of Region B; (c-1) RGB image of Region C (353 × 394); (c-2) 3-D points cloud effect of Region C. The input of the CNN in Figure 9 is the pixels’ square neighborhood windows in the Pléiades-1A fused image. In order to facilitate the CNN convolutional operations, all the input pixels’ square neighborhood windows must be resampled to a new window of 30×30, the 3 convolutional layers and 3 pooling layers (i.e. C=3 and P=3) are used to process the new windows layer by layer, the sizes of convolutional kernels in the three convolutional layers are all 3×3, while the sizes of downsampling windows in the three pooling layers are all 2×2. The strides of the convolutional kernels and the down-sampling windows are 1 and 2 respectively. After the features extraction for the neighborhood windows of each pixel in the Pléiades-1A fused image, a feature vector with the length of 2×2×32=128 is obtained and used as the representative feature of the corresponding pixel.

14

Figure 9.Structure of CNN employed in AEIDLMRS The bias term of the DBN hidden layer b, the bias term of the visible layer c and the connection weights matrix W is determined by maximizing the logarithmic likelihood function of the activation probability on the training set [24], and the contrastive divergence (CD) algorithm proposed by Hinton [33] can be used to update these parameters (including the learning rate). Three parameters: the number of neurons in the hidden layer H, the number of hidden layers in DBN L and the number of iterations M are significantly important for DBN. Each hidden layer in our DBN has the same number of neurons. The number of neurons H in the hidden layer is determined by computing the reconstructing error of RBM. A rough range of H is first determined, and then calculates the reconstruction errors of the RBM corresponding to different number of neurons in the range. The value corresponding to the minimum reconstruction error is taken as the value of H. In this work, the determined value of H is 400(i.e. H=400). The number of hidden layers L can greatly affect the effect of deep learning. However, if the values of L and M are too large, the learning efficiency and generalization ability of AEIDLMRS will be also reduced, even overfitting will occur. The best values of L and M can be obtained by the experiments. In the experiments, AEIDLMRS was used to extract impervious surfaces of experimental regions A, B and C based on the multi-source data under different values of L and M. Then, 60 verification sub-regions (61×61) were randomly selected and combined from the three experimental regions to generate a set of 180 verification sub-regions in total. The estimated and real values of PIS were calculated for each sub-region in the set and the relationship of the two types of values is analyzed with the scatter diagram plotted. The change trends of metrics R2, RMSE and MAE with L and M are shown in Figure 10.

(a)

(b)

Figure 10. The influence of the changes of parameters L and M on AEIDLMRS’ performance. (a) the number of hidden layers in DBN L ;(b) the number of iterations M. Figure 10 (a) shows that when L (L<3) increases, R2 increases while RMSE and MAE decrease. When L=3, R2 is the highest while RMSE and MAE are slightly larger than the minimal values. When

15 L>3, R , RMSE and MAE are basically stable, it indicates that continuing to increase the number of hidden layers in DBN can not obviously improve the performance of AEIDLMRS. So L is set to 3 based on the above analysis. Similarly, as shown in Figure 10(b), when M<1000, R2 increases continuously but RMSE and MAE decrease gradually. When M=1000, R2 peaks, RMSE reaches the lowest level, and MAE falls in an appropriate range (merely 0.0674). Accordingly, M is suitable to be set to 1000. 2

5.3 Analysis of experimental results Given the parameters described in Section 6.1, the impervious surfaces of experimental regions A, B and C are extracted by AEIDLMRS based on multi-source data. The results are shown in Figure 11.

Figure 11. The high-resolution RGB reference images and impervious surfaces extracted by AEIDLMRS of different experimental regions.(a-1) reference images of region A; (a-2) impervious surfaces of region A; (b-1) reference images of region B; (b-2) impervious surfaces of region B; (c-1) reference images of region C; (c-2) impervious surfaces of region C. The subimages (a-1), (b-1) and (c-1) denote the high-resolution RGB reference images (the Pléiades-1A fused images) of regions A, B and C, while the subimages (a-2), (b-2) and (c-2) denote the binary images of the impervious surfaces results extracted from regions A, B and C using the pixel-based extraction method (i.e. AEIDLMRS) . In the subimages (a-2), (b-2) and (c-2), the white parts represent impervious surfaces, while the black parts represent pervious surfaces. From the high-resolution RGB reference images of the three experimental regions A,B and C, that is, the subimages (a-1), (b-1) and (c-1) in Figure 11, it can be observed that the impervious surfaces in the three experimental regions have different composition and distribution. In detail, experimental regions A and B are the typical agglomeration areas of urban population and industries, the impervious surfaces mainly consist of middle- and low-rise regular buildings, cement roads,

16 squares (including vehicles), a few bare rocks and man-made impervious facilities, while the pervious surface mainly consists of regular farmland, urban man-made vegetation, bare land used for industries and mining purposes, and a few man-made water bodies or pools. Obviously, the impervious surfaces containing buildings and roads occupy a high proportion. Compared with the region B, the impervious surfaces in the region A occupy a higher proportion and are distributed more regularly and densely. The extraction of impervious surfaces in region A is subject to more interferences from ground objects such as building shadows, vegetation shadows and vegetation cover, while region B is more obviously affected by vehicles, vegetation cover and the bare lands of high luminance. The region C is a typical suburb area which has the similar ground objects with regions A and B. However, the proportion of impervious surfaces in region C has a lower proportion and a more scattered and disorder distribution than the others. The impervious surfaces in region C are interlaced with the pervious surfaces. There are more interference factors in the extraction of impervious surfaces in region C. According to the binary images of the impervious surfaces results extracted from regions A, B and C, that is, the subimages (a-2), (b-2) and (c-2) in Figure 11, although the composition and distribution of the impervious surfaces in the three experimental regions are different, most of them can be correctly extracted by the AEIDLMRS algorithm, and the distribution of impervious surfaces is basically consistent with the actual situation on the ground. In order to further evaluate the performance of AEIDLMRS in extracting impervious surfaces, the experimental regions A, B and C are partitioned to three sub-region sets with the square grids (the size of each cell is 61×61,i.e. 30.5meters×30.5meters), and the three sets contain 306, 440 and 360 sub-regions respectively. The scatter diagram of each set is plotted based on the values of real PIS and estimated PIS of each sub-region in the set, and the results are shown in Figure 12. The metrics of R2, RMSE and MAE are calculated to evaluate the extraction accuracy and effectiveness of AEIDLMRS.

Figure 12. Fitting accuracy analysis of PIS in the three experimental regions.(a) Region A; (b) Region B; (c) Region C As shown in Figure 12, as for the experimental regions A, B and C, the values of R2 are up to 0.9522, 0.9475, and 0.9384 respectively (0.9460 in average), the values of RMSE are up to 6.526%, 6.955% and 6.962% respectively (6.814% in average), and the values of MAE are up to 5.121%, 5.469% and 5.564% respectively (5.385% in average). It reveals that AEIDLMRS can achieve a good fitting effect between estimated PIS and real PIS values of the three experimental regions and

17 effectively extract the urban impervious surfaces which have complex compositions and distributions. However, a few parts of impervious surfaces are still omitted or mis-detected. Comparing the values of R2, RMSE and MAE of the regions A, B and C, we can find that region A has the highest value of R2 and the lowest values of RMSE and MAE so that the impervious surfaces can be extracted most accurately and effectively. In addition, on the contrary, region C has the lowest value of R2 and the highest values of RMSE and MAE, which means the worst accuracy and effect of impervious surfaces extraction. AEIDLMRS can correctly extract most of the impervious surfaces in the experimental regions and exclude the interferences of most buildings and vegetation shadows. The main omitted impervious surfaces in regions A and B are a few tree-covered road surfaces at two sides and some areas affected by the vegetation shadows. The mis-detected impervious surfaces mainly contain a few bare lands of high luminance and some edges (or ridge) of the farmlands with regular geometric shapes. Region C suffers the least interference from shadows and has few vegetation-covered road surfaces. But the impervious surfaces in Region C are scattered widely, and most of the roads has a complex composition. Some sections of the cement roads are damaged in the surfaces or degraded into parts of the earth roads. In addition, some sections of the cement roads are covered by sands and prone to be omitted or mis-detected. Accordingly, the extraction accuracy of Region C is less than that of other two regions. 5.4 Comparative experiment and analysis AEIDLMRS is a pixel-wise extraction method for impervious surfaces. To evaluate the performance, three representative algorithms are compared as follows: (1) IEMRSSVM[7]: an estimation method of impervious surfaces based on multi-source remote sensing data (WorldView-2 fused images and LIDAR data) and support vector machine (SVM); (2) IEMRSGC[8] : an extraction method of impervious surfaces based on multi-source remote sensing data (Landsat7, Landsat8, GF1 and LIDAR) and GrabCut image segmentation algorithm; (3) IEMRML[9] : an extraction method of impervious surfaces based on multi-source remote sensing data (LIDAR data and high-resolution color aerial images) and the maximum likelihood algorithm. In IEMRSSVM, the haze-and-ratiobased (HR) algorithm is first used to fuse the WorldView-2 multi-spectral bands and the panchromatic band. Next, the elevation threshold of the normalization digital surface model (nDSM) of LIDAR data is used to distinguish lower ground objects from higher ground objects. The proportion of urban impervious surfaces are then estimated using the pixel-wise hierarchical SVMs. The impervious surfaces in the shadow are extracted jointly using the feature thresholds and the GIS spatial analysis methods. IEMRSGC combines the abundant spectral and spatial information from multi-spectral and high spatial resolution satellite images, as well as the elevation and intensity information of the LIDAR points cloud data. The optimal segmentation method based on graph theory is proposed to extract the impervious surfaces and the extraction is converted into the optimal label problem under different data sources, so as to further improve the accuracy of impervious surfaces extraction by multi-source features fusion. IEMRML makes full use of the LIDAR points cloud data and the high-resolution color airborne images containing elevation, spectrum and spatial information. The maximum likelihood algorithm is used to determine whether a pixel belongs to the impervious surfaces or not.

18 The experiments of impervious surfaces extraction are performed by the AEIDLMRS and the above three representative algorithms (IEMRSSVM, IEMRSGC and IEMRML) in the experimental regions A, B and C respectively, using the same methods of sub-regions partition and scatter diagrams plotting. The average of the above three metrics of impervious surfaces extraction (Avg_R2、 Avg_RMSE and Avg_MAE) and average time consumption( Avg_T ) are used to evaluate the comprehensive performances of AEIDLMRS, IEMRSSVM, IEMRSGC and IEMRML, as shown in Table 1 as follows. Table 1. Comparison of experimental results generated by different algorithms Algorithms

Avg_R2

Avg_RMSE

Avg_MAE

AEIDLMRS

0.9578

6.6583%

5.1218%

5.941

IEMRSSVM

0.9105

8.364%

6.456%

3.532

IEMRSGC

0.8985

8.916%

6.878%

2.624

IEMRML

0.8586

9.342%

7.365%

1.285

Avg_T(min)

As shown in Table 1, AEIDLMRS outperforms other algorithms in terms of Avg_R2, Avg_RMSE and Avg_MAE and have higher automation level in impervious surfaces extraction due to jointly using the deep learning algorithms of CNN and DBN and making full use of the abundant features from the multi-source remote sensing data. However, time consumption of AEIDLMRS is higher than the other three algorithms. This high time consumption problem can be alleviated by improving the performance of the computer hardware (such as increasing the GPU modules) or using distributed parallelism computing models (e.g.cloud computing). IEMRSSVM has the second highest performance. IEMRSSVM can extract impervious and pervious surfaces independently by classification and process the special ground objects such as shadows and water bodies separately, although which can improve the extraction accuracy to a certain extent, the time consumption will be increased considerably. In addition, SVM is a shallow learning algorithm, whose learning ability is obviously inferior to DBN and CNN in the complicated learning problems. Therefore, compared with AEIDLMRS, IEMRSSVM is obviously less accurate and it’s time consumption is not reduced obviously. In addition, the automation level of extracting impervious surfaces with IEMRSSVM is also limited. IEMRSGC is an unsupervised learning algorithm like AEIDLMRS, which uses Grabcut algorithm to perform clustering based on multi-source features data. Although IEMRSGC can achieve a great improvement in efficiency over AEIDLMRS and IEMRSSVM, its accuracy is limited due to the use of Grabcut for two-class classification. IEMRSGC cannot correctly recognize most of building shadows, vegetation-covered road surfaces, bare lands of high luminance and vehicles, so that IEMRSGC suffer many mis-detections and omissions and have lower extraction accuracy than IEMRSSVM. In IEMRML, the LIDAR data and the high-resolution color airborne images are used, but the thermal infrared spectral feature is not employed. In addition, the learning ability of the maximum likelihood algorithm in processing high-dimensional feature data is limited, so IEMRML has the lowest accuracy in the four algorithms although having the lowest time consumption. In short, under the condition of obvious inferences from building shadows, vegetation and other ground objects, AEIDLMRS has better performance than IEMRSSVM, IEMRSGC and IEMRML overall based on the multi-source remote sensing data.

19 6. Discussions and Further Works Due to the diversity of underlying surfaces, the complexity of materials of artificial ground objects and the heterogeneity of ground surfaces composition, how to accurately extract urban impervious surfaces is a complex problem to be addressed. Compared with the existing similar methods, the proposed AEIDLMRS method has better accuracy and automation in the extraction of urban impervious surfaces, but the following deficiencies still are needed to be studied and addressed in the future: (1) The efficiency of AEIDLMRS needs to be further improved. Due to jointly using two deep learning algorithms (i.e. DBN and CNN), time complexity of AEIDLMRS is higher than that of other three methods. It would be alleviated by improving computer hardware performance (e.g. offering more GPU modules) or through distributed parallel computation (e.g. cloud computing). (2) The extraction omission of vegetation-covered road surfaces needs to be addressed. Extracting impervious surfaces in this work is mainly interfered by building shadows and road-side vegetation. Impervious surfaces covered by building shadows are prone to be omitted, and the vegetation-covered road surfaces are usually mis-detected by AEIDLMRS as vegetation so as to cause omission. The AEIDLMRS method proposed in this study can better solve most of the effects of building shadows on the impervious surfaces extraction, but it is still not ideal for the extraction of the vegetation-covered road surfaces, which needs to be further researched in the future. 7. Conclusions In this work, AEIDLMRS is proposed to address the challenging problem of extracting urban impervious surfaces given complex background interferences. AEIDLMRS makes full use of the multi-source remote sensing data (containing Landsat8 and Pléiades-1A remote sensing images and the LIDAR points cloud data) and jointly use several powerful algorithms(e.g. CNN, DBN and FCM) to improve the accuracy and effectiveness of urban impervious surfaces extraction. The experiments of impervious surfaces extraction based on multi-source remote sensing data are performed by the AEIDLMRS and other three representative algorithms (IEMRSSVM, IEMRSGC and IEMRML) in three different experimental regions respectively, using the same methods of experimental sub-regions partition and scatter diagrams plotting. Experimental results demonstrate that AEIDLMRS has higher accuracy and automation level than others three algorithms, and it is more suitable for the extraction of complex urban impervious surfaces. However, there are several deficiencies still needed to be studied and addressed in the future, such as the relatively lower efficiency and the omission of the vegetation-covered road surfaces. Conflict of interest There is no conflict of interest

Acknowledgments

20 This work was funded by the National Natural Science Foundation of China (NSFC, 41501451), the Program for New Century Excellent Talents in Fujian Province Universities (MinJiaoKe[2016]23) and the Program for Outstanding Youth Scientific Research Talents in Fujian Province Universities (MinJiaoKe[2015]54). The authors would like to thank Dr.Gang Chen and Dr.Yinan He in University of North Carolina at Charlotte for their assistance, suggestions, and discussions. References [1] Arnold,C.L.; Gibbons,C.J. Impervious surface coverage: The emergence of a key environmental indicator. Journal of the American Planning Association.1996, 2,243-258. [2] Wang,H.; Lu,S.L.; Wu,B.F.;Li,X.S. Advances in Remote Sensing of Impervious Surface Extraction and Its Applications. Advances in Earth Science.2013, 3,327-336. [3] Cheng,X.;Shen,Z.F.; Luo,J.C.; Zhu,C.M.; Zhou,Y.N.; Hu,X.D. Estimating impervious surface base on comparison of spectral mixture analysis and support vector machine methods. Journal of Remote Sensing. 2011,6, 1235-1247. [4] Li,C.L.; Du,J.K.; Zuo,T.H. The Study of Extracting Impervious Surface Information Based on High-resolution Remote Sensing Image. Remote Sensing Information. 2009, 5, 36-40. [5] Sun,Z.Y.; Zhao,Y.F.; Chen,J.; Li,G.L.; Tan,M.Z. Application of Object-oriented Classification in Extraction of Impervious Degree of Urban Surface. Scientia Geographica Sinica.2007,6,837-842. [6] Xia,J.S.; Du,P.J.; Feng,Y.F.; Cao,W.; Wang,X.L.; He,J.G.; Chen,X. Urban impervious surface area extraction and analysis based on hyperspectral remote sensing image. Journal of China University of Mining & Technology. 2011, 4, 660-666. [7] Wu,M.F.; Sun,Z.C.; Li,H.; Yang,B.; Yu,S.S. Synergistic Use of WorldView-2 Imagery and Airborne LiDAR Data for Urban Impervious Surface Estimation. Remote Sensing Information. 2017, 2, 79-88. [8] Yu, J.S. Extracting impervious surfaces from multi-source Remote Sensing Data. Master thesis. Wuhan: Wuhan University,2017. [9]Hodgson,M.E.; Jensen,J.R.; Tullis,J.A.; et al. Synergistic use of lidar and color aerial photography for mapping urban parcel imperviousness. Photogrammetric Engineering and Remote Sensing. 2003, 9, 973-980. [10] Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., & Chen, C. (2013). Probabilistic graphlet transfer for photo cropping. IEEE Transactions on Image Processing, 22(2), 802-815. [11] Fukushima,K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics.1980, 4,193-202. [12] Duan,B.B.; Han,L.X. Improved convolutional neural networks and its application in stitching of scrapped paper. Computer Engineering and Applications. 2014, 9,176-181,270. [13] LeCun,Y.; Boser,B.; Denker,J.S.;et al. Backpropagation applied to handwritten zip code recognition. Neural Computation. 1989, 4,541-551. [14] Krizhevsky,A.;Sutskever,I.; Hinton,G.E. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems Conference and Workshop (NIPS), Lake Tahoe, Nevada, 2012,1106-1114.

21 [15] Gong,D.X.; Cao,C.R. Plant Leaf Classification Based on CNN. Computer and Modernization. 2014, 4, 12-15, 19. [16] Zhang, L., Song, M., Yang, Y., Zhao, Q., Zhao, C., & Sebe, N. (2014). Weakly supervised photo cropping. IEEE Transactions on Multimedia, 16(1), 94-107. [17] Zhao,W.Z.; Du,S.H. Learning multiscale and deep representations for classifying remotely sensed imagery.ISPRS Journal of Photogrammetry and Remote Sensing. 2016, 113,155-165. [18] HINTON, G.E.; OSINDERO, S.; THE, Y.W. A fast learning algorithm for deep belief nets. Neural Computation. 2006, 7,1527-1554. [19] Kang,Y.; Lu,M.C.; Yan,G.W. Soft Sensor for Ball Mill Fill Level Based on DBN-ELM Model. Instrument Technique and Sensor. 2015, 4,73-75,92. [20] Wang,Y.H.; Di,K.S.; Zhang,S.; Shang,C.; Huang,D.X. Melt index prediction of polypropylene based on DBN-ELM. CIESC Jorunal. 2016, 12, 5163-5168. [21]Chen,H.J.; Huang,M.L.;Jiang,L.;Tao,J.H.;Yang,J.H.;Zheng,G.Z.;Wu,Z.H. A remote sensing image automatic annotation method based on deep learning. China Patent, 201410039584.3, 201405-28. [22] Li,X.L; Zhang,Z.X.; Wang,Y.H.; Liu,Q.J. Aerial Images Categorization with Deep Learning. Journal of Frontiers of Computer Science & Technology.2014,3,305-312. [23] Lv,G.; Hao,P.; Sheng,J.R. On applying an improved deep neural networks in tiny image classification. Computer Applications and Software. 2014, 4, 182-184,213. [24] Lv,Q.; Dou,Y.; Niu,X.; Xu,J.Q.; Xia,F. Remote Sensing Image Classification Based on DBN Model. Journal of Computer Research and Development. 2014, 9,1911-1918. [25] Dunn,J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well Separated Cluster. Journal of Cybernetics.1974, 3, 32-57. [26] Bezdek,J.C. Pattern recognition with fuzzy objective function algorithms.Advanced Applications in Pattern Recognition.1981, 1171,203-239. [27] BEZDEK,J.C.; EHRLICH,R. FCM: The fuzzy C-means clustering algorithm. Computers& Geosciences.1984, 10, 191-203. [28] Liu,Z.Y.; Geng,X.Q. Text Mining Algorithm Based on Fuzzy Clustering. Text Mining Algorithm Based on Fuzzy Clustering.2009, 5, 44-49. [29] Zhang, L., Gao, Y., Ji, R., Xia, Y., Dai, Q., & Li, X. (2014). Actively learning human gaze shifting paths for semantics-aware photo cropping. IEEE Transactions on Image Processing, 23(5), 2235-2245. [30] Xu,H.Q. A Study on Information Extraction of Water Body with the Modified Normalized Difference Water Index (MNDWI). Journal of Remote Sensing.2005, 5,589-595. [31] Wang,T.; Zhao, L.; Wu J.; Zhang,P.;Zhang,Y.N.; Zhang,X.B. Modified Form of Normalized Difference Impervious Surface Index:A Case Study on Urban Built-Up Area of Lanzhou City. Geomatics & Spatial Information Technology.2015,3,182-185. [32] Cheng,X.; Shen,Z.F.;Luo,J.C.; Zhou,Y.N;Zhang,X. A "global-local" impervious surface area extraction model using multispectral remote sensing images. Journal of Remote Sensing.2013, 5,163177. [33] Hinton,G.E. Training products of experts by minimizing contrastive divergence. Neural Computation. 2002, 8, 1771-1800.

22 [34] Zhang, L., Wang, M., Li, W., Hong, R., & Liu, M. (2015). An automatic three-dimensional scene reconstruction system using crowdsourced Geo-tagged videos. IEEE Transactions on Industrial Electronics, 62(9), 5738-5746. [35] Huang,Q.R. Automatic Classification Algorithm of Remote Sensing Image Based on FCM and SVM. Journal of North China University of Water Resources and Electric Power (Natural Science Edition).2015, 4,84-88. [36] Wang,Z.M.; Cao,H.J.; Fan,L. Method on Human Activity Recognition Based on Convolutional Neural Networks. Computer Science. 2016, 11A, 56-58. [37] Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., & Li, X. (2014). Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Transactions on Image Processing, 23(3), 1419-1429. [38] Yin,B.C.; Wang,W.T.; Wang,L.C. Review of Deep Learning. Journal of Beijing University of Technology. 2015, 1, 48-59.