Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119 Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and ...

Download PDF

6MB Sizes 2 Downloads 52 Views

Report

Full Text

ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

Semantic classiﬁcation of urban buildings combining VHR image and GIS data: An improved random forest approach Shihong Du ⇑, Fangli Zhang, Xiuyuan Zhang Institute of Remote Sensing and GIS, Peking University, Beijing 100871, China

a r t i c l e

i n f o

Article history: Received 26 November 2014 Received in revised form 16 March 2015 Accepted 17 March 2015

Keywords: Very high resolution (VHR) images Urban buildings Semantic classiﬁcation Random forest Object-based image analysis (OBIA)

a b s t r a c t While most existing studies have focused on extracting geometric information on buildings, only a few have concentrated on semantic information. The lack of semantic information cannot satisfy many demands on resolving environmental and social issues. This study presents an approach to semantically classify buildings into much ﬁner categories than those of existing studies by learning random forest (RF) classiﬁer from a large number of imbalanced samples with high-dimensional features. First, a two-level segmentation mechanism combining GIS and VHR image produces single image objects at a large scale and intra-object components at a small scale. Second, a semi-supervised method chooses a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Third, two important improvements in RF classiﬁer are made: a voting-distribution ranked rule for reducing the inﬂuences of imbalanced samples on classiﬁcation accuracy and a feature importance measurement for evaluating each feature’s contribution to the recognition of each category. Fourth, the semantic classiﬁcation of urban buildings is practically conducted in Beijing city, and the results demonstrate that the proposed approach is effective and accurate. The seven categories used in the study are ﬁner than those in existing work and more helpful to studying many environmental and social problems. Ó 2015 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

1. Introduction As main sites of urban activities and important components of cities, urban buildings are vital foundations of urban studies. Semantic classiﬁcation of buildings intends to label buildings using a set of semantic categories cognized and conceptualized by people, such as low-story shantytowns, middle-story apartments, high-story apartments, administrative buildings, and commercial buildings. These categories strongly correlate with urban environment analyses (e.g. ecological and environmental evaluation), urban resource allocation (e.g. resource management, transportation planning, and disaster reduction) and urban social analyses (e.g. population estimation, and market research) (Wu et al., 2005). Existing work has focused on how to extract building contours or accurately distinguish buildings from non-buildings. However, geometric information alone cannot fulﬁll the demands on urban ecology, resources and social researches (Paul et al., 2001). Therefore, semantic classiﬁcation of urban buildings is required.

⇑ Corresponding author. Tel.: +86 10 62750294; fax: +86 10 62751961.

Geometric analyses of buildings have been intended to extract geometric contours of buildings or distinguish buildings from other objects by using geometric or spectral features. In the middle-to-late 1980s, researchers started to extract urban buildings from aerial photos (Huertas and Nevatia, 1988). With the explosive increase in image data and continuous development of sensor techniques, techniques of extracting urban buildings have made great progresses. From the perspective of images used, buildings can be extracted from either low- and medium-resolution images or high-resolution images (Lin and Nevatia, 1998). Due to the limits of spatial resolution, only large areas of buildings or residential areas instead of individual buildings can be obtained from lowand medium-resolution images (Nevatia et al., 1997). On the other hand, VHR images can provide ﬁner texture and more accurate locations of buildings. Thus, they are used more comprehensively to acquire buildings in high accuracy (Myint et al., 2011). From the perspective of extraction methods, existing work generally falls into edge-based geometric grouping or object-based classiﬁcation. The former ﬁrst extracts edges from images, and then uses geometric models of buildings as prior constraints to ﬁnd edges belonging to the same buildings and group them into complete contours. These works have often used optical VHR data (Kim and Muller,

E-mail address: [email protected] (S. Du). http://dx.doi.org/10.1016/j.isprsjprs.2015.03.011 0924-2716/Ó 2015 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

108

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

1999; Sirmacek and Unsalan, 2009; Ok, 2013) or a combination of optical and LiDAR data (Sohn and Dowman, 2007; Awrangjeb et al., 2010, 2013). Unlike edge-based methods, object-based methods ﬁrst segment VHR images into image objects, and then distinguish image objects of buildings from that of non-buildings using image features (Myint et al., 2011). However, in VHR images, a lot of detailed information emerges, and the heterogeneity of buildings becomes much larger. Consequently, it is difﬁcult to ﬁnd appropriate segmentation scales and image features to classify complete buildings with different shapes, sizes, and structures. Semantic analyses of urban buildings have concentrated much on distinguishing different categories of buildings. These categories are cognized and conceptualized by people and described by natural languages. More importantly, they are strongly correlated to environmental and social variables and have special implications to these variables. There have been a few studies concentrating on recognizing the categories or neighborhoods of urban buildings. For identifying the categories of buildings, Lu et al. (2014) used spatial attributes calculated from LiDAR and other land-use features to classify buildings into three categories: single-family houses, multiple-family houses, and non-residential buildings. Belgiu et al. (2014) used airborne laser scanning data to group buildings into three categories: residential/small buildings, apartments/block buildings, and industrial/factory buildings. For the classiﬁcation of neighborhoods, Graesser et al. (2012) deﬁned urban neighborhoods as homogeneous zones and classiﬁed them as formal and informal areas, but they did not recognize subtypes, such as residential, commercial, and industrial structures. Other work in this ﬁeld includes extracting unplanned settlements (Kuffer et al., 2014) and slums (Kohli et al., 2012) from VHR images. In terms of analyses above, most existing studies have focused on extracting geometric information on buildings while only a few have concentrated on semantic analysis. In addition, some important issues still remain to be resolved. First, existing work on semantic analyses has distinguished too few categories to satisfy the many demands in environmental or social sciences (Graesser et al., 2012; Kohli et al., 2012; Belgiu et al., 2014; Kuffer et al., 2014; Lu et al., 2014). Second, there have been no appropriate segmentation scales and algorithms to produce single image objects for diverse buildings. This lack of computational methods leads to low classiﬁcation accuracies as image features strongly depend on segmentation scales. Third, a small number of manually chosen samples and features may be practical for classifying a few categories of buildings. To distinguish between more building categories greatly varying in size, shape, structure, and spectrum, however, a large number of samples, high-dimension and heterogeneous features are required. In this situation, the samples are often imbalanced, and the features are often auto-correlated and have distinct importance for distinguishing different categories. Unfortunately, there is still a lack of related work to reduce the inﬂuences of imbalanced samples on classiﬁcation and to evaluate feature importance to classifying each category. Aimed to resolve the issues raised above, this study presents a two-level segmentation mechanism (i.e. a large-scale layer constrained by GIS data for producing single image objects and a small-scale layer providing intra-object component features) and a semi-supervised method to choose a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Random forest (RF) classiﬁer is used to semantically classify buildings, for it is capable of handling a large number of samples and high-dimension and heterogeneous features. Moreover, to improve classiﬁcation accuracy and evaluate feature importance, two improvements in RF classiﬁer are presented: a voting-distribution-ranked rule for reducing the inﬂuences of imbalanced samples and a feature importance

measurement for each category based on Gini descent and path tracing strategy. The ﬁrst contribution of this study is the improvements of RF classiﬁer in voting rule and feature importance evaluation. Although some researchers have used RF classiﬁer to classify VHR images, the effective approaches to handling imbalanced samples and evaluating feature importance for each category are still unresolved. The improvements ﬁll the gap from a methodological perspective. Another contribution is the semantic classiﬁcation of urban buildings. The seven categories used in this study are ﬁner than those used in existing studies and more appropriate for many environmental and social variables, such as population distribution (Wu et al., 2005; Lu et al., 2006) and small-scale heating networks (Geiß et al., 2011). Existing categories are ineffective at handling these variables. This situation will be worse in China because the inter-category differences in the capability of holding families are very large. Therefore, this study is motivated by both theoretical and practical demands. 2. Semantic category system of buildings This section will ﬁrst discuss the cognition and representation of urban buildings in the real physical world, the geoinformatic world, and the cognition world, and then analyze the transformations of real-world urban buildings to object features and semantic categories. Finally, it will construct a semantic category system of buildings. 2.1. Category system of urban buildings Urban buildings made of various materials with assorted styles and appearances in the real physical world are the basis of cognizing semantic category by people and of sensing buildings by remote sensors (the middle section of Fig. 1). In the geoinformatic world (the right section of Fig. 1), buildings are abstracted into contours in GIS data and into image pixels or image objects in VHR images. Thus, they are described from the aspects of spectrums, shapes, and textures. In the cognition world, people cognize, understand, and communicate their ideas about buildings through appropriate semantic categories (the left section of Fig. 1). Therefore, building a semantic category system helps to transform the feature representations in the geoinformatic world to the concepts in the cognition world. The goal of semantic classiﬁcation is to build relationships between the concepts of buildings in the cognition world and the features of buildings in the geoinformatic world. Therefore, the semantic category system can be built by discriminating the appearances and functions of urban buildings, including low-story (LS) shantytowns, medium-story (MS) apartments, high-rising (HR) apartments, administrative (AD) buildings, commercial (CM) buildings, industrial (ID) buildings, and auxiliary (AU) buildings (Table 1). 2.2. Inter-category variations of buildings Fig. 2 illustrates seven typical images for each category of buildings. It is clear that these categories greatly vary in the following aspects. First, the buildings in those categories have different sizes. Most buildings are single objects, while some (e.g. LS shantytowns) refer to the extents of spatially dense buildings and are much larger than other buildings. Even buildings in the same category may have different sizes. Second, spectral values differ signiﬁcantly between the seven categories. Generally, LS shantytowns are represented as gray pixels, while CM and ID buildings often consist of colored pixels. Some categories of buildings (e.g. AD buildings)

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

Semantic Categories

RS&GIS Data

Appearance Styles blue & factory

AX Buildings

109

Distribution

grey & bungalow grey & block

ID Buildings

low

... LS Shantytowns

Objects Geometry

grey & rectangle grey & joint

MS Apartments

middle Structure

...

Sub-objects

columnar AD Buildings

high

conjoint

Texture

... HR Apartments villa

Pixels

townhouse CM Buildings

joint Spectrum

...

Fig. 1. Urban buildings in the three worlds.

Table 1 Semantic classiﬁcation system for urban buildings. Category

Code

Deﬁnition

Low-story (LS) shantytowns

0

Medium-story (MS) apartments

1

High-rising (HR) apartments Administrative (AD) buildings

2

Bungalows densely distributed for daily life Slab-type apartment buildings like rectangles with larger lengths than widths, and normally lower than ten stories (Fig. 2b) Residential buildings which are squares and higher than 12 stories (Fig. 2c) Ofﬁce buildings for government medical, educational and commercial administration Buildings for commodity tradition Buildings for industrial production and processing Annex buildings

3

Commercial (CM) buildings Industrial (ID) buildings

4 5

Auxiliary (AU) buildings

6

(a) LS shantytowns

(b) MS apartments

(e) CM buildings

may be composed of multi-colored pixels. Therefore, spectral features can distinguish urban buildings to some degree. Third, the structures of urban buildings vary greatly. For example, LS shantytowns are composed of small buildings distributed both densely and randomly in their extents (Fig. 2a), while MS apartments are often more homogeneous and composed of one component, and HR apartments and AD buildings often have multiple components, each with a distinct size, shape, and spectrum. Fourth, the seven categories have different shapes: LS apartments are often rectangular, HR apartments may be rectangular or circles with holes (Fig. 2c), and AD buildings are polygons with complex shapes. The variations of urban buildings in size, shape, structure, and spectrum pose many challenges to semantically classifying buildings. Although a few work has been done for automatically parameterizing multi-scale segmentation (Dra˘gutß et al., 2014), while it is impractical to ﬁnd appropriate segmentation scales or

(c) HR apartments

(d) ID buildings

Fig. 2. Typical images of various buildings.

(d) AD buildings

(e) AU buildings

110

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

algorithms to extract image objects for one or multiple categories without guiding by prior information. Inappropriate segmentation often produces over- or under-segmented image objects instead of single objects for buildings, which leads to a lack of appropriate features for describing the properties of buildings as components and as a whole. Furthermore, a large number of samples and high-dimension image features are required because a few samples are insufﬁcient to train a classiﬁer. It needs to be addressed how to choose a large number of samples and guarantee that they are evenly distributed in the whole study area. Third, a powerful classiﬁer is needed to handle a large number of samples with high-dimension features and evaluate what features are important to classify different categories of buildings.

spectrum, texture, geometry, and spatial distribution. The ﬁrst three are used to choose samples and classify buildings while the last to evaluate clustering results and choose samples. (3) Sample selection with semi-supervised method. A large number of samples is required to semantic classiﬁcation. The ISODATA algorithm is ﬁrst used to cluster buildings according to spectrum, geometry, and texture features. Then, intra-cluster similarity is used to choose corresponding samples of semantic categories in a semi-supervised way. (4) Semantic classiﬁcation of buildings using improved RF classiﬁer. The three steps above can collect a larger number of samples as well as high-dimension and heterogeneous features for each sample. To train a classiﬁer from these complex features and huge samples, a semantic classiﬁcation approach using RF classiﬁer is presented. Furthermore, RF is improved to evaluate the contribution of each feature to each category, which reduces the inﬂuences of imbalanced samples.

3. Methodology To solve the issues raised above (the complete segmentation of buildings, the choice of training samples, and the classiﬁcation approach of semantic categories), this study presents an approach for semantic classiﬁcation of urban buildings. Four steps are required (Fig. 3):

3.1. Image segmentation constrained by GIS data (1) VHR image segmentation constrained by GIS data. Since existing segmentation methods cannot produce single image object for each building, this study adopts a segmentation constrained by GIS data to obtain a single object for each building. The pixels inside each GIS contour are ﬁrst merged into a single object, and then image features are computed. (2) Features extraction of buildings. Image features are the bridge connecting building objects to semantic categories. In this study, four types of features are used, including

GIS constrained segmentation

Building Edges

RS Imagery

Due to the complexity of roof structures, the inﬂuences of sidewalls, shadows and occlusions, and the great variations in image features of different buildings in VHR images, the accuracy of classiﬁcation is further limited by the difﬁculty in obtaining single-image objects for diverse buildings relying solely on image information. Therefore, contours of GIS buildings are used as constraints to segment VHR images. The two-level segmentation proceeds in the following two steps (Fig. 4). For the ﬁrst step at

Image Objects

Spectral Features

Texture Features

Geometric Features

Structure Features

ISODATA Clustering Spectral Feature Space

Geometry Feature Space

Building Clusters

mapping

Classification System

Training Samples

sampling

K-NN Analysis

Random Forest Bootstrap DT Training

Decision Trees Voting

RF Classifier

Vote rule adjustment

OOB Votes Distribution

OOB Assessment

Classifying Semantic Categories

Feature Importance

Fig. 3. Flowchart of semantic classiﬁcation of urban buildings.

111

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

extracted (Table 2). Spectral and geometric features can distinguish buildings with different colors, sizes and shapes (Nevatia and Babu, 1980) in VHR images. Furthermore, some buildings cannot be distinguished by visual features: texture and distribution features are used instead to describe the distributions of pixels or sub-objects.

GIS contours Image pixels constrained

Image objects

3.2.1. Spectral features Spectral features are determined by construction materials and appearances of buildings, such as the average, brightness, maximal difference ratio, and standard deviation of pixel values. These features can be deﬁned using spectral values of pixels at four bands, i.e., one deﬁnition often corresponds to four features. A total of 47 spectral features are presented.

multi-scale

Image sub-objects Fig. 4. Two-level image segmentation constrained by GIS data.

the ﬁrst level, the pixels within each building contour are directly merged to create a single-image object. For the second step at the second level, each image object obtained in ﬁrst step is further segmented into sub-objects using smaller segmentation scales (Baatz and Schape, 2000). The image features of image objects can help describe the shapes and spectrums of buildings from a global perspective, while those of sub-objects can disclose the internal structures from a local perspective.

3.2. Feature extraction of buildings Based on the two-level image segmentation above, spectral, texture, geometric and distribution features can be deﬁned and

3.2.2. Texture features Image objects differ in both grays at different bands and the distributions of sub-objects and pixels. Accordingly, texture features include the structure features deﬁned based on sub-objects and the textures of pixels derived from gray-level-co-occurrence matrix (GLCM) (Honeycutt and Plotnick, 2008). The texture features derived from GLCM depend on the directions (0°, 45°, 90° and 135°), and in total, 16 features are obtained from four bands in four directions (Table 2). The structure features refer to the standard deviations of the mean spectrum, areas and main directions of sub-objects. Altogether, there are 216 texture features for classiﬁcation. Fig. 5 illustrates the structure features of three buildings: a, b and c. Building a has the lowest standard deviation of average spectrum and b has the largest in both average spectrum and standard deviation. If only average spectrums are considered, buildings a

Table 2 The deﬁned features of urban buildings. Types

Names

Meanings

Ranges

Spectrum (47)

[Mean] [Brightness] [Max. diff.] [Std. dev] [Skewness] [Ratio] [HIS] [Mean border] [Contrast border] [Circular] [NDVI]

The average spectrum of pixels The weighted average spectrum of pixels on each band The ratio of the maximal difference of average spectrum of each band to the brightness The standard deviation of pixels’ grayscale of an image object The skewness of grayscale histogram of pixels The ratio of average spectrum to brightness The hue, saturation and brightness of RGB color The average spectrum on the inside and outside border The average differences between border pixels and their neighborhood The average spectrum and brightness of pixels within a ring around the center of a building Normalized difference vegetation index

[m, M] [m, M] [0, 1] [0, 1] [M3, M3] [0, 1] [0, 1] [m, M] [M, M] [m, M] [1, 1]

Texture (216)

[SO std. dev.] [SO mean diff.] [SO area] [SO density] [SO direction] [Homogeneity] [Contrast] [Dissimilarity] [Entropy] [2nd Mom.] [Correlation] [GLDV]

The The The The The The The The The The The The

standard deviation of spectrums of sub-objects average and standard deviation of spectrum differences between adjacent sub-objects standard deviation of the areas of sub-objects standard deviation of the average density of sub-objects average and standard deviation of main directions of all sub-objects homogeneity derived from GLCM contrast derived from GLCM heterogeneity parameters derived from GLCM information entropy derived from GLCM secondary moment derived from GLCM correlation derived from GLCM vector composed of diagonal elements of GLCM

[0, 1] [0, 1] [0, N] [0, 1] [0, 180] [0, 1] [0, 65,025] [0, 255] [0, 10,404] [0, 1] [0, 1] [0, 255]

Geometry (44)

[Area] [Border length] [Length/width] [Border index] [Compactness] [Ellipse ﬁt] [Main direction] [Radius ellipse] [Rectangular ﬁt] [Roundness] [Shape index] [Sub-objects]

The The The The The The The The The The The The

number of pixels within image objects number of pixels the inner and outer borders length–width ratio of the envelop rectangle ratio between the border lengths of a building and the smallest enclosing rectangle ratio of the product of length and width to the area of an image object goodness of a building ﬁtting into an ellipse direction of the eigenvector belonging to the larger of the two eigenvalues radiuses of maximal inscribed ellipse and minimal circumscribed ellipse goodness of a building ﬁtting into a rectangle difference between the radiuses of the maximal inscribed and minimum circumscribed ellipses ratio of perimeter of a building to four times the square root of its area number of sub-objects

[0, N] [0, 1] [0, 1] [1, 1] [0, 1] [0, 1] [0, 180] [0, 1] [0, 1] [0, 1] [1, 1] [1, 1]

112

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

1000

Brightness

The distributions of sub-object brightness values

standard deviation 800

a

mean 600

b 400

200

c Blue

Green

Red

Nir

Fig. 5. The structure features of sub-objects.

and b can be confused with each other as their spectrums are very similar. However, if standard deviations are considered, the three buildings are distinct for the differences between standard deviations are larger than those between average spectrums. 3.2.3. Geometric features Due to the insufﬁciency and confusion of spectrum, it is hard to accurately classify buildings relying solely on image features. At the ﬁrst segmentation level, image objects as wholes can directly measure geometric shapes and arrangements of buildings. Besides the area and perimeter, how well image objects can ﬁt into basic shapes of similar sizes (e.g. square, round and ellipse) is important in distinguishing different categories, such as shape index, rectangular ﬁt, length of skeleton lines, roundness, and elliptic ﬁt (Trimble, 2011). 3.2.4. Distribution features Distribution features can measure the spatial proximity and category similarity between buildings. Buildings neighboring in space and similar in shapes are more likely to belong to the same category. Two distribution features serve for this purpose: the proportion of buildings belonging to the same category in the K-nearest neighbors (abbreviated as the proportion in KNN), and the average distance between a building and its K-nearest neighbors with the same category (abbreviated as the average distance of KNN). The proportion in KNN (Eq. (1)) considers both spatial proximity and categorical similarity to measure the degree to which a building belongs to the category of its neighbors. The average distance of KNN (Eq. (2)) measures the distance of buildings satisfying the requirements of the descriptor, i.e., the proportion in KNN. Therefore, the two features contribute to choosing samples.

The proportion in KNN :

jKNNðoÞj 100% ð1Þ KP oi 2KNNðoÞ disðo; oi Þ Dis KðoÞ ¼ ð2Þ jKNNðoÞj

Prop KðoÞ ¼

The average distance of KNN :

where KNNðoÞ refers to the K-nearest neighbors with the same category of o, jKNNðoÞj the number of buildings in KNNðoÞ, and disðo; oi Þ the distance between buildings o and oi . 3.3. Sample selection with a semi-supervised method The quality of samples heavily determines the accuracy of semantic classiﬁcation. Thus, it is of great importance to obtain a

large number of samples that are evenly distributed in the whole study area and in proportion to the ratio of each category of buildings to the total buildings. That is, the samples should reﬂect the real distribution of the features in each category. However, manual selection cannot guarantee sufﬁcient and even distribution of samples in a large study area, not to mention that the samples obey the real distributions. The discrepancies between the samples and the real distributions lie in the following two aspects. First, the distributions of feature values of selected samples do not obey those of the real ones in image data. Second, the ratio of the samples in each category to the total samples does not match with the actual one in image data. These two discrepancies will affect the classiﬁcation results. For the ﬁrst discrepancy, this study presents a supervised approach to choose a large number of samples guided by the results of ISODATA algorithm (Memarsadeghi et al., 2007). This approach uses feature similarity to group image objects into clusters by measuring how feature values are distributed over different clusters (Fig. 6). Furthermore, objects with high conﬁdences are chosen as samples. Formally, the conﬁdence is deﬁned as conf ðoÞ ¼ Prop KðoÞ=Dis KðoÞ. The more likely the neighbors of an object belong to the same cluster, the shorter the distance among the neighbors, the higher the conﬁdence that the object will be chosen as a sample. Generally, one category often corresponds to multiple clusters; thus, the categories of samples are manually speciﬁed by combining multiple clusters into one category. Because clusters are obtained by considering feature similarity and feature distribution of buildings, the feature distributions of samples should be consistent with the actual ones. However, this approach can inevitably produce imbalanced samples. In the physical world, buildings of some categories greatly outnumber those of others, so are samples. Therefore, the imbalance of samples will be considered in the following sections. 3.4. Improved RF classiﬁer for semantic classiﬁcation The presented semi-supervised method can obtain a large number of samples (over thousands) and high-dimension (over hundreds) and heterogeneous features. To reasonably exploit samples and their features, the RF classiﬁer is employed to randomly select samples and features to train decision trees and to integrate all trained decision trees to vote for the most popular category (Breiman, 2000). In this study, RF is improved to evaluate the contribution of each feature to classifying each category and to reduce the inﬂuences of imbalanced samples on classiﬁcation results.

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

113

Fig. 6. Sample selection using ISODATA clustering.

3.4.1. The principles of RF classiﬁer Based on bagging integration learning algorithm (Efron, 1979), RF classiﬁer trains each decision tree independently (Pal, 2005) and the votes of all decision trees determine the ﬁnal results. The training steps are as follows: (1) to choose a subset of samples pﬃﬃﬃﬃﬃ using Bootstrap sampling methods, (2) to choose randomly M features from M ones for each node, (3) to construct a CART decision tree with the chosen samples by using GINI coefﬁcient (Eq. (3)) as information gain (Quinlan, 1986), and (4) to build N CART decision trees until a RF is built. K X Gini SLM ¼ 1 p2i

ð3Þ

i¼1

where SNM refers to the chosen subset consisting of L samples and M features, pi the frequency of samples of ith category, and K the number of categories. The classiﬁcation process of decision trees is exactly same as that of the training process. For each building, each tree independently predicts a category; accordingly, the resulting category is the most popular category. 3.4.2. Learning from imbalanced samples: a voting-distribution ranked rule Traditional RF uses the simple majority voting rule to make decisions and tends to misclassify the minority categories. Therefore, imbalanced samples heavily affect the classiﬁcation accuracy. Existing work on learning imbalanced data includes pseudo balanced random forest (BRF) and weighted random forest (WRF) (Chen et al., 2004). BRF copies a small number of samples for the minority categories and randomly chooses the same number of samples for the majority categories. WRF weighs the samples of different categories to reduce the differences in samples. However, BRF needs to identify adaptively the number of copied samples, causing it to change the original distribution of samples. Meanwhile, WRF necessitates the speciﬁcation of weight for each category. However, how to specify weights is still unresolved. Therefore, a new voting rule – the voting-distribution ranked rule – is proposed to replace the simple majority voting rule. Supposing there are N decision trees and K categories, and each tree has one vote for a sample ðoÞ, then the votes of the N trees can be represented as a distribution VoteðoÞ ¼ ðn1 ; n2 ; . . . ; nK Þ with

PK

i¼1 ni ¼ N. The simple majority vote rule (Kontschieder et al., 2014) assigns a sample to the most popular category. The vote distributions of out-of-bag (OOB) samples are signiﬁcant in discovering confused categories. For example, let n1 and n2 be the ﬁrst and second maximum votes; if they are close, unclassiﬁed samples tend to be misclassiﬁed. Due to the inﬂuence of imbalanced samples, it is easier for majority categories than minority categories to obtain more votes. Once the votes of the majority categories are larger than those that they deserved, misclassiﬁcation will occur. Fortunately, imbalanced samples do not hide the vote distribution over categories, even though they affect the number of votes for each category. Let pi ¼ ni =N ði ¼ 1; 2; . . . ; KÞ be the probability of the ith category, then the vote distribution of unclassiﬁed sample o is deﬁned as probðoÞ ¼ ðp1 ; p2 ; . . . ; pK Þ. For each OOB sample, a vote distribution can be obtained; as a result, multiple vote distributions can be obtained for each category since there are many OOB samples for each category. The reliable distributions (ranked in top 5% in the probabilistic distributions) of each category are averaged into a representative one. Accordingly, for an unclassiﬁed sample, its vote distribution is compared with the representative distributions of the K categories. The sample is then assigned to the category with the largest similarity (Eq. (4)).

ClassðoÞ ¼ arg mini

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 X ~ ði; j ¼ 1; 2; . . . ; KÞ pj ~ pij j

ð4Þ

where ~ pi refers to the representative distribution of category i and ~ p the vote distribution of unclassiﬁed sample o. 3.4.3. Evaluating feature importance: a Gini coefﬁcient descent approach To rank feature importance per category, a Gini coefﬁcient descent approach for each category is proposed. Two existing methods have been used to rank feature importance: permutation importance and Gini importance (Calle and Urrea, 2011). The ﬁrst method randomly permutes the values of feature vi in all samples, and then classiﬁes OOB samples by trained RF. The average decrease in accuracy, caused by feature permutation, is regarded as the importance. The Gini importance computes information gain DGini of branch nodes in each tree caused by feature vi, and thus the total DGini in all trees is considered as the feature importance. Feature permutation measures the importance for each feature

114

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

independently, while Gini importance evaluates the importance of each feature using the information on all features in all decision trees. Therefore, a path tracing approach is presented to evaluate feature importance per category based on Gini coefﬁcient descent. To estimate the feature vi’s importance to category cj, the leaf nodes labeled as cj in all trees are ﬁrst chosen, and the paths from these leaves to the branch nodes applying vi to split samples can be found. Let DGiniv i ðcj Þ be the information gain of one path from a branch node with vi to a leaf labeled as cj. Then, the information gain of all paths is deﬁned as the vi’s importance to category cj (Eq. (5)). Similarly, feature importance vi per category can be obtained.

VIG ðv i ; cj Þ ¼

X

DGiniv i ðcj Þ

ð5Þ

4. Case studies To verify the feasibility of the presented approach, a series of experiments are conducted, including sample selection, semantic classiﬁcation of traditional RF, semantic classiﬁcation of improved RF for handling imbalanced samples, and evaluation of feature importance.

4.1. Study area and used data Since urban buildings vary notably in size, shape, structure and spectrum, the experiments will focus on them to test the presented method. The study area is located at Haidian and Xicheng districts in Beijing city (Fig. 7), which belongs to the city expansion area and is full of high-tech companies, culture and education industry, commercial and service industry, and research institutions. In addition, due to rapid urbanization in recent years, the area has a large number of informal settlements and developing zones. Therefore, it is very signiﬁcant to analyze and address the semantic categories of buildings in this area.

For semantic classiﬁcation of buildings, both GIS and Quickbird data are used: (1) Quickbird image Data. Panchromatic band with resolution of 0.61 m and four multispectral bands with resolution of 2.44 m are fused to produce a four band data with resolution of 0.61 m. The spectral, geometric, and texture features are extracted from Quickbird image. (2) GIS data of buildings. To obtain a single-image object for each building, the contours in GIS data were used. The study area covers 38.7 km2 and contains 8831 buildings. The largest contour, about 96,462 m2, represents a LS shantytown, while the smallest contour is only 70 m2 and refers to an AU building. Before semantic classiﬁcation and sample selection, two-level segmentation was conducted and 307 image features were computed for each image object. In total, 8831 image objects and 15,258 sub-objects were obtained. The largest object has 118 sub-objects while the smallest object has only one sub-object. 4.2. Selected samples with the semi-supervised method The presented semi-supervised method (Section 3.3) was conducted to choose 2747 unbiased samples from 8831 buildings in the study area (Fig. 8). As shown in Fig. 8, the selected samples of each category are distributed sparsely and evenly in the whole study area, instead of being clustered together. It is clear that without the assistance of our presented semi-supervised method, it will be a big burden for users to choose so many samples from huge amount of candidates such that the chosen ones are distributed in the whole study. 4.3. Results of parameter sensitive analysis Two types of parameters are used in this study for deﬁning image features and RF classiﬁer. For the ﬁrst kind of parameters,

Fig. 7. Study area and Quickbird data.

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

115

Fig. 8. The selected samples and unclassiﬁed buildings.

multiple features are deﬁned using different parameters, and thus 307 features are obtained for classiﬁcation. Therefore, only the parameter sensitive analysis of RF classiﬁer is provided. However, the overall accuracy of RF classiﬁer is mainly determined by the number of decision trees as the performance of RF is insensitive to the other parameters (Cutler et al., 2007). For each decision tree, about two-thirds of the samples are chosen for training, and the remaining ones are used as OOB samples to evaluate the accuracy. Table 3 reports the overall accuracies of RF classiﬁers with different numbers of decision trees. It is clear that accuracies increase with the increase in the number of decision trees, and tend to become stable when the number of trees is larger than 200. Accordingly, the RF classiﬁer with 200 decision trees is used to classify buildings. 4.4. Classiﬁcation results with the simple majority vote rule A RF classiﬁer with 200 decision trees is trained using the chosen 2747 samples, and the 6084 unclassiﬁed buildings are Table 3 The overall accuracies of RF classiﬁers with different numbers of trees.

The simple majority voting rule The votingdistribution ranked rule

50

100

150

200

250

300

0.7117

0.7059

0.7011

0.7175

0.7135

0.7084

0.6727

0.7353

0.7561

0.7750

0.7521

0.7685

classiﬁed using the trained RF. For each OOB sample, each decision tree will have one vote for classiﬁcation. The ﬁnal category is the one with the most votes, and the confusion matrix of accuracy assessment is created based on the predicted and existing categories of OOB samples (Table 4). The overall accuracy is 71.50%, and overall kappa coefﬁcient 0.59. From Table 4, the overall accuracy is greatly reduced by the misclassiﬁcation of 303 high-story apartments and 35 commercial buildings. The possible reasons are: (1) the samples are unevenly distributed in different categories, and (2) classiﬁcation difﬁculty varies much over different categories. Among the 2747 samples, MS apartments greatly outnumber buildings of other categories, accounting for 39.06% of total samples, while CM buildings have the smallest number of samples, taking up only 1.27%. The imbalanced samples are in proportion to the real distributions of buildings in the physical world, but they can greatly affect the classiﬁcation accuracy. For example, when choosing a feature to partition samples at a branch node, if there are only a few samples at that node, the partition will stop. Therefore, if a category has too few samples for it to be distinguished from other categories, it will tend to be overlooked. Furthermore, the feature differences between categories also affect the classiﬁcation results. According to the cognition to building semantics and object features of each category, it is relatively easy to classify LS shantytowns and AU buildings due to their special area features, as well as MS and HR apartments due to their special geometric shapes. However, it is relatively difﬁcult to distinguish CM and ID buildings due to the confused and diverse features, leading to a low classiﬁcation accuracy of the two categories.

116

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

Table 4 Confusion matrix and overall accuracy of the original RF. Category code

Reference 0

1

2

3

4

5

6

Total

0 1 2 3 4 5 6

159 1 0 8 9 13 0

22 1050 255 99 4 20 180

0 0 0 0 0 0 0

18 21 4 322 21 52 8

0 0 0 0 0 0 0

1 0 0 1 1 12 0

0 1 4 0 0 0 421

200 1073 303 430 35 97 609

Total

190

1670

0

446

0

15

426

2747

Producer’s accuracy (%)

User’s accuracy (%)

83.68 62.87 – 72.20 – 80.00 98.83

79.50 97.86 0.00 74.88 0.00 12.37 69.13

Overall accuracy = 71.50%. Overall kappa statistics = 0.59.

Fig. 9. The distributions of voting proportions of OOB samples. The colored vertical segments represent the correct results, while the white ones the wrong results. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

4.5. Classiﬁcation results with the voting-distribution ranked rule

The minority categories will probably be misclassiﬁed. Due to the small number of samples, MS apartments – as are CM and AD buildings – are often confused with HR apartments. This confusion makes it difﬁcult to correctly classify these buildings by the simple majority voting rule. The voting-distribution ranked rule presented in Section 3.4.2 is used to reclassify unclassiﬁed buildings, and the results are reported in Table 5. The overall accuracy increases to 79.54%, and the kappa coefﬁcient remarkably to 0.72, demonstrating that our new vote rule can improve the classiﬁcation accuracy: the misclassiﬁed MS and HR apartments are greatly reduced. Fig. 10 illustrates the improved classiﬁcation results using the voting-distribution ranked rule. The spatial distributions of different categories are consistent with those of the actual situations. LS

The imbalanced samples result from the real distribution of buildings in each category, and they will lead to a certain bias when identifying the categories using the simple majority voting rule. Fig. 9 illustrates the distribution of voting results for the 2747 samples, where the x-axis denotes all OOB samples and the y-axis the probability distribution of the voting results. Each vertical line represents the votes for the categories assigned to a sample, with the height of each colored segment being the ratio of the votes for one category to all categories. Take the region in the circle as an example: the piecewise vertical segments represent the vote distribution of 1670 MS apartments, and only 1050 are correctly classiﬁed (Table 4) while the other 620 are misclassiﬁed.

Table 5 Confusion matrix and overall accuracy of the improved voting rule. Category code

Reference 0

1

2

3

4

5

6

Total

0 1 2 3 4 5 6

158 1 0 5 7 6 0

21 1041 134 86 0 15 140

0 7 162 3 0 3 13

17 23 3 327 3 44 7

2 0 0 3 22 3 0

2 0 0 6 3 26 0

0 1 4 0 0 0 449

200 1073 303 430 35 97 609

Total

177

1437

188

424

30

37

454

2747

Overall accuracy = 79.54%. Overall kappa statistics = 0.72.

Producer’s accuracy (%)

User’s accuracy (%)

89.27 72.44 86.17 77.12 73.33 70.27 98.90

79.00 97.02 53.47 76.05 62.86 26.80 73.73

117

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

Fig. 10. The classiﬁcation results with the improved voting rule.

shantytowns are mainly located in the periphery of the city, HR apartments in the core area, and MS apartments are distributed uniformly and locally clustered. LS shantytowns, MS apartments, HR apartments and AU buildings are classiﬁed with high accuracies, while AD, CM and ID buildings are in relatively low accuracies, which are mainly caused by the complexity of the buildings themselves.

complex roofs and structures, showing unique characteristics of texture, and thus can be identiﬁed by texture features. Composed of diverse sub-objects, AD buildings vary greatly in materials and styles; thus, they can be classiﬁed by combining geometric and texture features. In other words, geometric and texture features are more important to classiﬁcation than spectral ones. 5. Discussion

4.6. Evaluation of feature importance 5.1. The effectiveness of the presented approach The image features used in this study are highly correlated, high-dimensional and heterogeneous; therefore, it is necessary to evaluate the contributions of these features to classiﬁcation. Based on the trained RF, the feature importance analysis method (Gini descent) in Section 3.4.3 was used to evaluate the importance of the 307 features. As shown in Table 6, texture and geometric features have great contributions to most categories, while spectral ones have poor performances due to their confusions. LS shantytowns are much larger than other buildings, and thus geometric features (e.g. area and perimeter) are signiﬁcant. In addition, AU buildings are usually relatively small and easily distinguished by geometric features. Geometric features also play important roles in classifying MS apartments and ID and CM buildings because of the buildings’ unique shapes and areas; however, HR apartments often have

Since existing studies mainly have focused on completely different aspects and used different data and methods (Kohli et al., 2012; Table 6 The feature importance scores to seven categories. Semantic category

LS shantytowns MS apartments HR apartments AD buildings CM buildings ID buildings AU buildings

Feature contribution rank (%) Spectrum

Geometry

Texture

15.36 12.30 25.94 10.71 3.39 15.27 15.80

31.80 45.14 15.37 40.27 55.00 38.28 63.07

52.84 42.56 58.69 49.02 41.62 46.45 21.13

118

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119

Graesser et al., 2012; Belgiu et al., 2014; Kuffer et al., 2014; Lu et al., 2014), it is hard to compare this study with them. Accordingly, the differences between them are summarized as follows. First, the seven categories adopted in this study are much more reﬁned than those of existing studies, which focused on three categories (Lu et al., 2014; Belgiu et al., 2014) or special zones about residents, such as formal/informal areas, unplanned settlements and slums (Graesser et al., 2012; Kohli et al., 2012; Kuffer et al., 2014). It is clear that the semantic categories in existing studies are too broad, as they cannot effectively recognize residential buildings, industrial and commercial buildings, and their subtypes. Therefore, our work is better than existing studies for environmental and social studies. Second, different strategies for computing image features are used. The two-level segmentation strategy can obtain single-image objects for diverse categories of buildings regardless of their sizes, structures and shapes. Thus, it can obtain object features (especially geometric ones) at the ﬁrst segmentation level to exactly characterize buildings as wholes and the features of sub-objects at the second level to distinguish buildings with complex structures from those with simple ones. As a result, 307 features were used in our study, which is much greater than the number of features (less than 50) used in existing studies (Lu et al., 2014; Belgiu et al., 2014). Our strategy outperforms multi-scale image segmentation, which cannot produce single objects and appropriate features for classiﬁcations. Therefore, geometric features can help to improve the recognition of some categories of buildings. For example, geometric features are important to LS shantytowns and AU buildings, and shapes and areas to MS apartments, and ID and CM buildings. However, these features are useless in existing studies. Therefore, although LiDAR data was not used in this study, the overall accuracy is still higher than that in existing studies in urban areas. Third, a large number of samples and a larger study area were used in this study: 2717 samples were chosen to train RF classiﬁer, 6084 buildings were classiﬁed, and the study area covers 38.7 km2, which is large enough to illustrate the effectiveness of the presented approach. In conclusion, considering that our semantic categories are ﬁner and the inter-category variations are larger than existing studies (Lu et al., 2014; Belgiu et al., 2014), the overall accuracy 79.54% is still acceptable, and the presented approach is effective. 5.2. The scalability of the presented approach The improved RF classiﬁer in this study uses the spectral, texture, shape and distribution features to classify buildings in urban areas, and the image features are derived from QB imagery and GIS data. If the categories of buildings are distinguishable by the four types of image features, RF classiﬁer will work effectively. Therefore, when QB imagery (or other VHR imagery with similar resolution, such as IKONOS and GeoEYE) and GIS data (OpenStreetmap) are available in other places to deﬁne those image features, the presented approach needs no any modiﬁcations. However, if the used data are different, some modiﬁcations are required. For example, if LiDAR data or WorldView-2 data are available, more image features can be incorporated; while if only RGB fusion image is available, fewer image features can be deﬁned. Therefore, if different VHR image is used, only the image features need to be modiﬁed. 5.3. The limitations of the presented approach The presented approach also has some limitations. First, GIS data, as prior restrictions for image segmentation, help to greatly reduce the errors caused by image segmentation. However,

locational discrepancies exist between GIS data and VHR images, which may affect the classiﬁcation accuracy to a certain extent. Therefore, accurate geo-registration is required to reduce the discrepancies before the two data used. Second, LiDAR data have been proved to be useful in classifying buildings (Sohn and Dowman, 2007; Awrangjeb et al., 2010, 2013), but the data were unavailable and not adopted in this study. Since LiDAR data can provide height information on buildings, they can help characterize the structure of complex buildings and deﬁne new image features related to heights of buildings, leading to a high classiﬁcation accuracy. Third, buildings in the same category and neighboring in space often tend to have similar shapes. Thus, shape classiﬁcation or clustering techniques (Belongie et al., 2002) can be incorporated to improve classiﬁcation accuracy. Fourth, a large number of unbiased samples are required for semantic classiﬁcation of buildings. In this study, a semi-supervised approach combining spatial proximity and intra-clustering was presented to improve the efﬁciency of choosing samples while the labels of samples were still identiﬁed manually by users. Accordingly, the efﬁciency of choosing unbiased samples can be further improved by automatically identifying the labels of samples (Jirka et al., 2014). 6. Conclusion and future work This study presents a complete semantic category system, feature extraction, and improved classiﬁcation approach for semantic classiﬁcation of urban buildings. Four scientiﬁc tasks are resolved. Initially, GIS data were used to constrain the image segmentation for producing a single-image object for each building. Then, at the second level, each image object is further split into sub-objects to measure the internal heterogeneity of buildings. Next, ISODATA algorithm was used to group image objects into clusters by using extracted features, and a large number of unbiased samples were chosen by considering spatial proximity and intra-cluster similarity. The chosen samples reﬂect the real distributions of buildings in the physical world. Subsequently, the voting-distribution ranked rule was presented to improve RF classiﬁer by reducing classiﬁcation error caused by imbalanced samples. Finally, a path-tracing approach was presented to evaluate feature importance to classify buildings. The classiﬁcation results of the improved and original RF were compared, and the accuracy increased from 71.50% to 79.54%, demonstrating the effectiveness of the improved approaches. Moreover, the results are highly in accordance with the recognition of humans. Furthermore, they can also be used to other types of VHR images. Nevertheless, there are still some limitations in this study. Although GIS data was introduced as prior restrictions for image segmentation to reduce errors caused by image segmentation, the location discrepancies between GIS and image data and the inﬂuences of shadows and occlusions on feature extraction can affect the classiﬁcation accuracy to a certain extent. The ﬁner categories of buildings obtained in this study are helpful in estimating urban population and heating consuming, which needs to be proved quantitatively. Therefore, these issues need to be addressed in future. Acknowledgements The work presented in this paper was supported by the National Natural Science Foundation of China (No. 41471315). References Awrangjeb, M., Ravanbakhsh, M., Fraser, C.S., 2010. Automatic detection of residential buildings using LIDAR data and multispectral imagery. ISPRS J. Photogramm. Remote Sens. 65, 457–467.

S. Du et al. / ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015) 107–119 Awrangjeb, M., Zhang, C., Fraser, C.S., 2013. Automatic extraction of building roofs using LIDAR data and multispectral imagery. ISPRS J. Photogramm. Remote Sens. 83, 1–18. Baatz, M., Schape, A., 2000. Multiresolution segmentation: An optimization approach for high quality multi-scale image segmentation. J. Photogr. Sci. Remote Sens. 58 (3–4), 12–23. Belgiu, M., Tomljenovic, I., Lampoltshammer, et al., 2014. Ontology-based classiﬁcation of building types detected from airborne laser scanning data. Remote Sens. 6, 1347–1366. Belongie, S., Malik, J., Puzicha, J., 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24 (4), 509–522. Breiman, L., 2000. Randomizing outputs to increase prediction accuracy. Mach. Learn. 40 (3), 229–242. Cutler, D.R., Edwards Jr., T.C., Beard, K.H., et al., 2007. Random forests for classiﬁcation in ecology. Ecology 88 (11), 2783–2792. Calle, M., Urrea, V., 2011. Letter to the editor: stability of random forest importance measures. Brief Bioinform. 12, 86–89. Chen, C., Liaw, A., Breiman, L. 2004. Using random forest to learn imbalanced data. Technical Report of Department of Statistics, UC, Berkeley. Dra˘gutß, L., Csillika, O., Eisankb, C., Tiedeb, D., 2014. Automated parameterisation for multi-scale image segmentation on multiple layers. ISPRS J. Photogramm. Remote Sens. 88, 119–127. Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7 (1), 1– 26. Graesser, J., Cheriyadat, A., Vatsavai, R.R., Chandola, V., Long, J., Bright, E., 2012. Image based characterization of formal and informal neighborhoods in an urban landscape. IEEE J. Select. Top. Appl. Earth Observat. Remote Sens. 5 (4), 1164– 1176. Geiß, C., Taubenböck, H., Wurm, M., Esch, T., Nast, M., Schillings, C., Blaschke, T., 2011. Remote sensing-based characterization of settlement structures for assessing local potential of district heat. Remote Sens. 3 (7), 1447–1471. Honeycutt, C.E., Plotnick, R., 2008. Image analysis techniques and gray-level cooccurrence matrices (GLCM) for calculating bioturbation indices and characterizing biogenic sedimentary structures. Comput. Geosci. 34 (11), 1461–1472. Huertas, A., Nevatia, R., 1988. Detecting buildings in aerial images. Comput. Vision, Graph., Image Process. 41 (2), 131–152. Jirka, V., Feder, M., Pavlovicova, J., Oravec, M., 2014. Face recognition system with automatic training samples selection using self-organizing map. In: The 56th International Symposium on ELMAR, pp. 23–26. Kim, T., Muller, J.-P., 1999. Development of a graph-based approach for building detection. Image Vis. Comput. 17 (1), 3–14. Kohli, D., Sliuzas, R., Kerle, N., Stein, A., 2012. An ontology of slums for image-based classiﬁcation. Comput. Environ. Urban Syst. 36, 154–163.

119

Kontschieder, P., Bulo, S.R., Pelillo, M., Bischof, H., 2014. Structured labels in random forests for semantic labelling and object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36 (10), 2104–2116. Kuffer, M., Barros, J., Sliuzas, R.V., 2014. The development of a morphological unplanned settlement index using very-high-resolution imagery. Comput. Environ. Urban Syst. 48, 138–152. Lin, C., Nevatia, R., 1998. Building detection and description from a single intensity image. Comput. Vis. Image Und. 72 (2), 101–121. Lu, D., Weng, Q., Li, G., 2006. Residential population estimation using a remote sensing derived impervious surface approach. Int. J. Remote Sens. 27, 3553– 3570. Lu, Z., Im, J., Rhee, J., Hodgson, M., 2014. Building type classiﬁcation using spatial and landscape attributes derived from LIDAR remote sensing data. Landscape Urban Plan. 130, 134–148. Memarsadeghi, N., Mount, D.M., Netanyahu, N.S., Moigne, J.Le., 2007. A fast implementation of the ISODATA clustering algorithm. Int. J. Comput. Geom. Ap. 17, 71–103. Myint, S.W. et al., 2011. Per-pixel vs. object-based classiﬁcation of urban land cover extraction using high spatial resolution imagery. Remote Sens. Environ. 115 (5), 1145–1161. Nevatia, R., Babu, K.R., 1980. Linear feature extraction and description. Comput. Graph. Image Process. 13 (3), 257–269. Nevatia, R., Lin, C., Huertas, A., 1997. A system for building detection from aerial images. In: Automatic Extraction of Man-Made Objects From Aerial and Space Images (II). Birkhäuser Basel, pp. 77–86. Ok, A.O., 2013. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote Sens. 86, 21–40. Pal, M., 2005. Random forest classiﬁer for remote sensing classiﬁcation. Int. J. Remote Sens. 26 (1), 217–222. Paul, S. et al., 2001. Census from heaven: an estimate of the global human population using night-time satellite imagery. Int. J. Remote Sens. 22 (16), 3061–3076. Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 1 (1), 81–106. Sirmacek, B., Unsalan, C., 2009. Urban-area and building detection using SIFT keypoints and graph theory. IEEE Trans. Geosci. Remote Sens. 47, 1156–1167. Sohn, G., Dowman, I., 2007. Data fusion of high-resolution satellite imagery and LiDAR data for automatic building extraction. ISPRS J. Photogramm. Remote Sens. 62, 43–63. Trimble Germany, 2011. eCognition Developer 8.7 Reference Book. Trimble Germany, Munich, Germany, pp. 262–272. Wu, S., Qiu, X., Wang, L., 2005. Population estimation methods in GIS and remote sensing: a review. GI Sci. Remote Sens. 42 (1), 80–96.

Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

Recommend Documents