A random forest classifier based on pixel comparison features for urban LiDAR data

A random forest classifier based on pixel comparison features for urban LiDAR data

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86 Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and Re...

9MB Sizes 0 Downloads 48 Views

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs

A random forest classifier based on pixel comparison features for urban LiDAR data

T

Chisheng Wanga, Qiqi Shua, Xinyu Wanga, Bo Guob, , Peng Liuc, Qingquan Lia ⁎

a

Key Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation, School of Architecture & Urban Planning, Shenzhen University, Shenzhen, Guangdong, China b School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou, Guangdong, China c The Department of Earth and Space Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, China

ARTICLE INFO

ABSTRACT

Keywords: Random forest Pixel comparison LiDAR classification Urban

The outstanding accuracy and spatial resolution of airborne light detection and ranging (LiDAR) systems allow for very detailed urban monitoring. Classification is a crucial step in LiDAR data processing, as many applications, e.g., 3D city modeling, building extraction, and digital elevation model (DEM) generation, rely on classified results. In this study, we present a novel LiDAR classification approach that uses simple pixel comparison features instead of the manually designed features used in many previous studies. The proposed features are generated by the computed height difference between two randomly selected neighboring pixels. In this way, the feature design does not require prior knowledge or human effort. More importantly, the features encode contextual information and are extremely quick to compute. We apply a random forest classifier to these features and a majority analysis postprocessing step to refine the classification results. The experiments undertaken in this study achieved an overall accuracy of 87.2%, which can be considered good given that only height information from the LiDAR data was used. The results were better than those obtained by replacing the proposed features with five widely accepted man-made features. We conducted algorithm parameter setting tests and an importance analysis to explore how the algorithm works. We found that the pixel pairs directing along the object structure and with a distance of the approximate object size can generate more discriminative pixel comparison features. Comparison with other benchmark results shows that this algorithm can approach the performance of state-of-the-art deep learning algorithms and exceed them in computational efficiency. We conclude that the proposed algorithm has high potential for urban LiDAR classification.

1. Introduction Most people in the world live in cities, and the urban population continues to grow. The urban population percentage is expected to increase to 68% by 2050, as reported by the United Nations (UNITED NATIONS, 2018). Information about the urban setting is very important for understanding how cities work and to support local policy making (Phinn et al., 2002). Fortunately, the increasing availability of remote sensing data provides us with the capability to identify individual objects of the urban fabric, benefiting many urban application studies, such as the detection of urban deprivation hot spots, GPS/airport signal obstruction modeling, urban growth analysis, house value estimation, urban green analysis, and urban social vulnerability assessment (Patino and Duque, 2013; Yan et al., 2015). Airborne light detection and ranging (LiDAR) is a particularly useful technique for collecting elevation data in 3D point clouds in urban ⁎

areas. The outstanding accuracy and spatial resolution make it possible to precisely measure many small objects such as cars, trees, and buildings and generate 3D city models, which is important information for the detailed monitoring and study of urban settings. Classification is a crucial step in LiDAR data processing, in which each LiDAR point or pixel is assigned to a semantic object class. The development of classification methods has been an important issue in recent decades. Due to the complex environment in urban areas, the classification of urban LiDAR point clouds is a very challenging task (Li et al., 2017). To date, a number of methods have been developed for land-cover classification using LiDAR data. These methods differ from each other mainly in the selection of features, the data format, and the classifier. One of the most crucial factors in determining LiDAR classification accuracy is the feature selection (Wang et al., 2015). The commonly used features can be grouped into height features, intensity features, multiple return features, texture features, and waveform features. One

Corresponding author. E-mail address: [email protected] (B. Guo).

https://doi.org/10.1016/j.isprsjprs.2018.12.009 Received 20 July 2018; Received in revised form 4 December 2018; Accepted 11 December 2018 0924-2716/ © 2018 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 1. Framework of the proposed classification algorithm using LiDAR data.

early studies (Bartels and Wei, 2006; Riordan et al., 2003). More recent studies have attempted to use discriminative methods such as neural networks, support vector machine, random forest, conditional random fields, and AdaBoost (Mountrakis et al., 2011; Niemeyer et al., 2014; Wang et al., 2018). These methods model the decision boundary between the classes rather than the actual distribution of each class. As a result, the methods are more direct and need fewer training data. To conclude, although much work has been done in LiDAR classification with regard to features, data format, and classifier selection or design, some shortcomings still exist in the existing algorithms and further effort is required. Firstly, most of the previous studies have used artificially selected features. According to data-processing inequality theory, any feature descriptor may decrease the information contained in the original data. It is therefore difficult to obtain discriminative and robust features for accurate classification. Secondly, many features are still pixel- or point-based, while there is still a lack of contextual features which contain spatial dependency information. These features can greatly benefit classification accuracy in complex urban settings. Thirdly, the high computational burden brought by the large size and complex structure of LiDAR data is a long-standing challenge in urban LiDAR classification. A high-performance computing algorithm is therefore required to meet the LiDAR processing requirements. In this paper, we propose a random forest classifier using pixel comparison features for urban area classification. The pixel comparison features are based on a linear operation of the height map and can preserve the information contained in the original data. The randomized trees can handle thousands of features and are robust and quick. This allows for us to automatically extract the classification rules based on high-dimensional pixel comparison features. Contextual information is implicitly encoded in these features, as the heights from neighboring pixels are adopted for computation. The comparison operation does not require the computation of feature descriptors and is quick to implement. Motivated by these advantages, this work investigates the potential of pixel comparison features and examines the performance of the random forest classifier based on such features in an urban setting.

unique feature in LiDAR data is the precise and dense height information. This is important information for identifying land-cover classes (e.g. trees and shrubs in urban areas) which are difficult to separate in 2D remote sensing images. A significant classification accuracy improvement of ∼5% has been reported in many studies by the use of height information (Hartfield et al., 2011; Priestnall et al., 2000). The radiometric feature (in terms of intensity) is not as notable as the height information, but it can be quite helpful for separating certain objects, e.g., roads and low vegetation, which share similar heights (Charaniya et al., 2004). However, some drawbacks, such as the large variance (Yan et al., 2012), misalignment in overlapping data (Yan and Shaker, 2014), and radiometric noise (Minh and Hien, 2011) exist and affect the performance of the radiometric feature in classification. The laser pulse of a LiDAR system can penetrate certain land objects (e.g. trees) and return with multiple echoes. Therefore, the multiple return features are sometimes adopted for discriminating permeable objects and nonpermeable objects (Bujan et al., 2012). To avoid inhomogeneous classification results in complex scenes, contextual information considering the neighborhood pixel data is also adopted. Typical contextual features are the gray-level co-occurrence matrix measures, including homogeneity, contrast, entropy, correlation, and so on (Im et al., 2008). The emergence of full-waveform LiDAR is a milestone in LiDAR technique development (C Wang et al., 2015), which enables a record of the entire history of the backscattered signal. Some attempts have been made to extract features such as the waveform amplitude, number of echoes, and echo width from the waveform for urban land-cover classification (Alexander et al., 2010). Other important issues in LiDAR classification are the data format and classifier selection. Raw LiDAR data are in the form of an irregularly distributed 3D point cloud. Classification can be implemented directly on the point cloud or on gridded image data interpolated from the point cloud data. Using the gridded data for classification is preferred by many studies as it is a quite straightforward approach and many image processing methods can be directly applied (Clode, 2004). However, the use of the irregular discrete point data is sometimes required when very fine classification is demanded because the details of 3D objects can be lost in the gridding process (Yastikli and Cetin, 2016). The classifiers can be divided into generative and discriminative classifiers. The first type of classifier focuses on modeling the joint distribution of the input object data and labels (e.g. a Gaussian mixture model and maximum likelihood method), and were widely adopted in

2. Methods A general depiction of the proposed algorithm framework is shown in Fig. 1. As our purpose in this study was to test the potential of pixel comparison features using the random forest classifier, the LiDAR data 76

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

format was not our main concern. Gridded LiDAR data were preferred throughout this study for convenience. Further efforts could be made to extend the algorithm to discrete point cloud data. In this procedure, we interpolate the 3D point cloud to a gridded image, then extract the pixel comparison features from the nonground objects. These features are fed into a random forest classifier to generate a rough classification map. We apply the majority analysis step in the postprocessing to mitigate inhomogeneous classification errors and obtain the final classification map. The details of the main steps are described in the following sections.

figure indicates the current pixel ( p ) and the two red squares are the neighboring pixels ( p + p1 and p + p2 ) for the height comparison. The neighboring pixels are randomly selected within a predefined window (7 × 7 in this case). The number of pixel pairs to be calculated is the feature number (N ) used for the classification. These features are based on simple pixel comparison, and are therefore extremely fast to calculate. The window size and pixel pair number can be flexibly chosen, so these features have the potential to encode very detailed contextual information and can be easily deployed in different scenes. As these features only rely on the order of the digital surface model (DSM) heights between neighbors, they tend to be fairly invariant to monotonous background height changes in the digital elevation model (DEM). For complex scenes, it can be possible to capture the details of objects while not being affected by background signals. These features are unlike many other height-based features requiring calculation of a local plane (Chehata et al., 2009; Guo et al., 2011), and can therefore avoid the errors propagated from the local plane fitting.

2.1. LiDAR point preprocessing and gridding First, we preprocessed the original LiDAR points by removing the outliers caused by particles in the air, birds, low-flying aircraft, multipath errors, and system errors. The local outlier factor (LOF) is used to detect the anomalous points that are distinctly under or over the surface level. The LOF is based on the concept of local density, which is estimated by the reachability distance of the object to its k nearest neighbors. By comparing the local reachability density of an object to those of its neighbors, its LOF value can be obtained. An LOF value significantly larger than 1 indicates a sparse region. Points with a substantially lower density than their neighbors are considered outliers. For details of the algorithm and formulas, refer to Breunig et al. (2000). Here, we divide the point cloud to many 10 m × 10 m patches to avoid the large computational burden of the whole map and select k = 10 to calculate the LOF values of all points in a patch. The points with LOF larger than 20 (a threshold set by experience) are regarded as outliers to be removed from the point cloud. We rasterize the cleaned point cloud to convert the unstructured point data to gridded data. We use a triangulation-based linear interpolation method for the gridding process. First, we use Delaunay triangulation to connect all the neighboring laser points in 2D. Then, we search the triangles to find where the gridded points are located. Finally, linear interpolation of all the gridded points is applied within each triangle.

2.3. Classification using the random forest classifier The features selected in the above step, as well as the original height feature, are fed into the classifier to generate the classification results. A number of different classification algorithms, such as the k-nearest neighbor classifier, the support vector machine classifier, or neural network classifiers, could be considered at this point. Among the different classifiers, we found the random forest classifier to be eminently suitable because it naturally handles multiclass problems and multiple features. It is also robust and fast, while remaining reasonably easy to train. In the following, we give a brief introduction to the random forest model. For more details, refer to Breiman (2001). The random forest model is an ensemble learning method that uses a combination of several tree predictors (Fig. 3). Each tree consists of split nodes and leaf nodes. The split node contains a “weaker learner” represented by feature parameters ( ) and a scalar threshold ( ). The split for an unclassified pixel p starts from the root node to the leaf nodes by repeatedly evaluating the weak learner:

2.2. Feature selection

h(p; , ) = [

We generate the pixel comparison features on the gridded LiDAR data (Lepetit and Fua, 2006; Shotton et al., 2013). For each pixel, we randomly select two nearby pixels and determine the difference between them. The feature response can be computed as:

fi (p) = H (p + p1)

H (p + p2 )

n (p )

n]

(2)

where n is the node number in the tree. [⋅] is a logical operator outputting true or false. If h(p; , ) returns true, the pixel branches to the left; otherwise, it branches to the right. The training set for a particular tree is a bootstrap sample with replacement from the original training data. Approximately one-third of the data are left out of the sample, which are called the out-of-bag (OOB) data. Gini impurity is adopted to decide the splitting criterion n (p) n . The metric is applied to each candidate subset, and the values represent their homogeneity. The smaller the Gini impurity, the better the split quality. If we assume that

(1)

where fi (p) is the ith pixel comparison feature value for pixel p , H(⋅) is the height of the gridded LiDAR data, and p1 and p2 indicate randomly selected 2D offset vectors within an m × m size neighboring box. The feature generation is depicted in Fig. 2. The black square in the

Fig. 2. Schematic diagram of pixel comparison features generation. The black square is the current pixel. The red squares show two randomly selected neighboring pixels for comparison. Two examples are shown in this figure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

77

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 3. Schematic diagram of the random forest classifier.

a set of pixels has J classes and that the fraction of pixels labeled with class i in the set is represented by ci , then the Gini impurity takes the form of:

(5) After a large number of trees is generated, vote for the most popular class. The random forest model also allows us to measure the feature importance, which allows the user to evaluate the significance of the proposed features. Two ways exist for measuring the parameter importance. The first is computed from permuting OOB data. For each tree, the error rate for the classification on the OOB portion of the data is recorded. The same is then done after permuting each predictor variable. The differences between the two are then averaged over all the trees, and normalized by the standard deviation of the differences. The second approach is the total decrease in node impurities from splitting the node, as measured by the Gini index (Eq. (3)), averaged over all the trees. In this study, we used the first approach for measuring the feature importance.

J

ci2

IG (c) = 1 i=1

(3)

A forest (strong learners) is formed after combining all the trees (weak learners). The random forest classifier has two crucial parameters: the number of trees and the number of features randomly chosen at each split. The forest error rate depends on the correlation between any two trees and the strength of each individual tree in the forest. The forest error rate decreases as the correlation decreases or the strength of the individual trees increases. However, reducing the number of features at each split reduces both the correlation and the strength, and therefore has a complex influence on the classification accuracy, which is discussed below. To sum up, the main steps for random forest construction are as follows:

2.4. Postprocessing If the forest was a perfect predictor, the classification map would be smooth and the boundaries between objects would be clear. In practice, the forest makes noisy predictions, so we apply majority analysis (MA) to refine the classified images. Given a kernel size, the center pixel in the kernel is replaced with the class value of the majority of the pixels in the kernel (Fig. 4). Larger kernel sizes produce more smoothing of the classification image. The center pixel weight is used to determine how many times the class of the center pixel is counted when determining which class is in the majority. The detailed steps are as follows. First, we define the kernel size and center pixel weight. Then, we

(1) Create multiple bootstrapped samples of the original training data. Each sample is the training set for growing the tree. (2) Randomly select some features out of all the input features at each node, and split the node using the feature and threshold that minimizes the Gini impurity. (3) Grow the tree as large as possible without pruning, until the terminal nodes only contain one class or reach the minimum size. (4) Repeat steps 1–2 to generate a certain number of trees.

Fig. 4. Schematic diagram of the majority analysis, in which a noisy classification map is smoothed to a new classification map after running the majority analysis with a 3 × 3 window. 78

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 5. Classification map obtained using the proposed method. (a) Preliminary results before the majority analysis. (b) Results after the majority analysis. (c) Ground truth. (d) Confusion matrix for the preliminary results. (e) Confusion matrix for the final results.

employ a moving window and count the pixel number for each class. Note that the central pixel is counted differently according to the weight. Finally, the central pixel is assigned to the majority class. This post-classification step is intended to reduce the “salt-and-pepper” noise and results in the larger classification units adhering more to the human perception of land cover (Stuckens et al., 2000).

defined in the benchmark test: impervious surfaces, building, low vegetation, tree, car, and background. Because we only included the LiDAR height dataset in our experiment, and it was very challenging to distinguish the low vegetation and impervious surfaces with only the height features, we unified these classes to the ground category. Meanwhile, there is no background class in area 1, so we finally defined four categories in our study, i.e., ground, building, tree, and car.

3. Data and results

3.2. Evaluation metrics

3.1. Study area and data availability

We consider the efficiency and classification accuracy to evaluate the performance of the algorithm. The elapsed times for feature calculation, training, and classification are used to evaluate the efficiency. A confusion matrix is used to evaluate the classification accuracy. From the confusion matrix, we can further derive the recall (completeness) and precision (correctness) for each class and the overall accuracy. These metrics are defined as follows:

The test LiDAR data were captured using a Leica ALS50 system over Vaihingen, Germany on 21 August 2008, with a median point density of ∼6.7 pts/m2. The ALS50 system flew 500 m above the ground, with a field of view of 45°. Strip adjustment was applied to the raw data to remove the systematic errors for geo-referencing. A standard deviation of 2.9 cm was obtained after the strip adjustment processing. This processed LiDAR dataset was publicly available under the ISPRS Test Project on Urban Classification, 3D Building Reconstruction and Semantic Labeling (Rottensteiner et al., 2012). The 2D semantic labeling ground truths for some test sites in the study area are available. We selected area 1 (Inner City) as our test area. This area is in the center of the city and is characterized by a few trees and dense historic buildings with complex shapes. Six categories were

recall =

tp tp + fp tp tp + fn tp + tn = tp + fp + tn + fn

precison = overall accuracy

(4)

where tp , fp , tn , and fn represent the true positive, false positive, true 79

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 6. (a) Classification errors-Isolated ground pixels classified as trees. (b) Classification errors-Loss of boundary details of objects. (c) Classification accuracy increase after post-processing. (d) Classification accuracy decrease after post-processing.

negative, and false negative in the confusion matrix, respectively.

test areas, we consider this result to be satisfactory. The two dominant classes in the scene, namely, ground and buildings, are well classified with very good recall (91.6% and 82.1%) and precision (92.8% and 93.2%). However, the proposed method has more difficulty discriminating cars from ground and trees. A low precision of 30.4% shows that a large number of pixels are incorrectly classified as cars, most of which are actually ground pixels between cars. There are several possible explanations for this. First, the ground pixels between cars behave like discontinuities and tend to be classified to the same class as the surrounding pixels in the majority analysis. Second, the height difference between the cars and the ground is not large enough to distinguish them using only height information. Third, cars have a relatively small size with regard to LiDAR point density, and some cars might not be represented in the derived gridded image as cars, leading to confusion in the training. We note that other studies have reported difficulty in the classification of cars. Cars are very likely to be merged with the background (Shapovalov et al., 2010). In fully convolutional neural network (FCN) methods, their details are also hard to restore (Liu et al., 2018). Sometimes it has been treated as the low vegetation class in the category definition (Niemeyer et al., 2014). In addition to the difficulties in car classification, another challenge is the confusion between trees and ground in some pixels. Although the completeness of the tree class is good (92.0%), the relatively low correctness (75.5%) implies that many non-tree pixels are classified as tree, most of which are actually the ground class. This happens mostly in the case of an isolated area of ground being surrounded by trees (Fig. 6a), which is very similar to the discontinuous ground pixels between cars. As explained above, the filtering processing can cause these pixels to be smoothed to the surrounding class. Furthermore, the insufficient point density makes it difficult to distinguish them from tree pixels. The last obvious challenge comes from the transition area from one object to another. We note that the boundaries between objects in the LiDARderived DSM are not as obvious as in the optical images (Fig. 6b). This is mainly because the relatively low resolution of the LiDAR data points cannot capture boundary details. The interpolation therefore creates obtuse boundaries, differing from the actual ones with sharper shapes. In the proposed algorithm, we include a postclassification step named the majority analysis to smooth the isolated pixels. This increased the overall accuracy (OA) from 86.7% to 87.2%. The relatively small improvement of the postclassification (0.5%) implies that the proposed method effectively preserves the contextual information.

3.3. Experimental results Several datasets are available for this area, including true orthophoto (TOP) data, the 9-cm gridded DSM from dense image matching, and the LiDAR point cloud data. As our aim was to investigate the potential of pixel comparison features from LiDAR data, the features from the TOP and image-derived DSM data were not taken into account, and we only selected the LiDAR point cloud data as the input data. As the original point cloud had a median point density of ∼6.7 pts/m2, we interpolated it to a gridded image with 30-cm resolution, which was enough to keep the details from the original point cloud data. The processing procedure followed the steps described in Fig. 1. We chose a 200 × 200 neighborhood window to calculate the pixel comparison features. The number of features was set to 100. A MATLAB version of Breiman and Cutler’s random forest classifier (https://github. com/jrderuiter/randomforest-matlab/tree/master/RF_Reg_C) was adopted for the classification. We built 300 trees in our random forest and randomly sampled 10 variables as candidates at each split. The cases were sampled with replacement. To allow for an unlimited maximum tree depth, the minimum size of the terminal nodes was set to 1. We used 2000 randomly distributed samples per class to train the random forest. In the postclassification processing, we set a kernel size of 8 in the majority analysis, and the center pixel weight was equal to every pixel in the kernel window. To validate the classification results, we used the labeled ground truth provided by the ISPRS Test Project to compute the confusion matrix. The precision and recall for each class are given in the far-right column and the bottom row of the confusion matrix, respectively. Fig. 5 shows the classification map and confusion matrix obtained using the proposed method. The elapsed times for feature calculation, training, and classification were 1.41 s, 49.94 s, and 14.90 s, respectively. The classification code was run without parallel computing under the MATLAB environment on a Windows 10 computer with an Intel i7-6700 3.4 GHz quad-core processor and 16 GB of RAM. The proposed method shifts much of the computational burden to the training phase and requires very little time to calculate the feature descriptors and classification. The quantitative assessment shows that the proposed method achieves an overall accuracy of 87.2%. Given that we only adopted the height-based information and that area 1 is thought to be the most challenging scene of the 80

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

However, we include this step to refine the classification results. After the majority analysis, some obvious isolated pixels inside the objects are smoothed and corrected. The borders between objects are also filtered to be more natural and closer to the actual borders (Fig. 6c). However, a side effect of the majority analysis is that some isolated ground pixels are incorrectly smoothed (Fig. 6d).

a complex effect on the OA, as noted above, and conflicting opinions have been reported in the literature (Cutler et al., 2007; Strobl et al., 2008). Reducing it reduces the correlation between any two trees in the forest, and therefore reduces the forest error rate. However, the strength of the individual trees is also decreased by reducing it, which increases the forest error rate. Our test also shows an irregular fluctuation of OA accompanying an increase of the number of features at each node. The default value of 10, which is set as the square root of the total feature number, has the best accuracy. Therefore, we believe that this default setting is suitable for LiDAR classification. The number of trees to generate also positively influences the OA. As the number increases, the classification becomes finer and finer at the cost of increased computational expense. Our test shows that a few hundred trees are sufficient to classify the LiDAR pixels. The final tested parameter for the random forest model was the minimum number of terminal nodes. A larger number causes the low-depth trees to grow and therefore saves computational cost but decreases the strength of the individual tree. A decrease in OA was consequently observed in the test. However, as the training time was not too long, we prefer a robust classification performance and therefore set the minimum size to 1, which splits the nodes as deeply as possible. For the postprocessing parameters, the kernel size in the majority analysis has a nonmonotonic effect on the OA. A kernel size of 8 is preferred for this case. A too small kernel may not be sufficient to smooth all the noise and a too large kernel may over smooth the boundaries and small objects. The choice of kernel size is a trade-off between the noise scale and object scale, which should be made according to the preliminary classification map. In contrast, the central pixel weight has a constant negative effect on the OA, implying that it is necessary to rely on neighboring pixels to refine the classification results. This also implies that, once an optimal window is set, there is no need to tune the weight parameter for smoothness adjustment.

4. Discussion 4.1. Parameter setting tests As many parameter settings are involved in the proposed method, we tested their influence on the classification performance. The test results can guide us to balance the optimal computational efficiency and classification accuracy and select better algorithm parameter settings. Eight parameters were tested, which can be grouped into featurebased parameters (pixel comparison window size and number of features), random forest-based parameters (training samples, number of selected features, number of trees, minimum size of terminal nodes), and postclassification-based parameters (kernel size and central pixel weight). In the default, we set 200 × 200 pixel comparison windows and calculated 100 features. 2000 samples per class were selected for training. The selected feature number on each node was set as the square root of the number of all the features. 300 trees were built in the random forest, and the minimum size of the terminal nodes was set to 1. A kernel size of 8 × 8 and a weight of 1 were used for the majority analysis. Testing of these parameters was performed by running several classification cases with the parameters varied around the default values. We use the OA and training time consumption for the quantitative evaluation of the classification performance. Note that this test is based on a 30-cm resolution gridded image. A change in resolution will affect the optimized values for the pixel comparison window size and postprocessing kernel size, which are defined in pixels. A finer LiDAR image will require higher optimized values because the object is represented by more pixels. The results of the parameter setting tests are depicted in Fig. 7. All the parameters have an obvious effect on the OA (Fig. 7). For the feature-based parameters, the classification accuracy increases as the window size of the pixel comparison window increases, but a saturation effect is observed when the value reaches approximately 150. This is reasonable because a larger window size allows for the pixel comparison features to include larger-scale contextual information, and therefore leads to better accuracy. However, when the size increases to a certain value close to the largest object size, an additional increase in window size cannot contribute to the classification and is unnecessary. As the window size does not affect the computational burden for the training, the training time randomly varies in a very small range as the window size changes. Similar to the window size, the feature number also has a positive effect on the classification accuracy. When the number increases to approximately 100, no significant improvement in OA is observed. In the default setting, we use a 200 × 200 search window size, which corresponds to 39,800 available pixel comparison features. As real-world objects exhibit spatial coherence, a very small subset is sufficient to guarantee a good classification rate. This test shows that a feature number of approximately 100 is capable of classifying a pixel. The test results show that the training time linearly increases as the feature number increases. As a small number of features (∼100) are required for satisfactory accuracy, the algorithm is generally very fast. For the random forest-based parameters, an increase in the number of training samples increases the classification accuracy significantly. This is logical because the greater the number of samples input for training, the more robust the classifier will be. However, we cannot arbitrarily increase the sample size, as the training time cost increases linearly and, in a real case, we often do not have a large volume of ground-truth data for training. The number of features at each node has

4.2. Comparison with other features To further validate the performance of the proposed pixel comparison features, we included some widely used LiDAR point cloud features in the comparison (Chehata et al., 2009; Guo et al., 2011). These features include the height difference ( z ), the deviation angle of the normal vector of the local plane (Nz ), the residuals of the local plane (R z ), the number of echoes (N ), and the echo amplitude (A ). These features were calculated for each point and then interpolated to the gridded image with a 30-cm resolution as the input gridded LiDAR data. To calculate the features, we created a neighboring cylindrical volume and fit a local plane for each point, as depicted in Fig. 8. Before the feature calculation, we removed the outliers in the original LiDAR points using the LiDAR preprocessing method mentioned above. The definitions of these features generally follow Chehata et al. (2009) and Guo et al. (2011), with slight modification, as follows: 1. A: amplitude of the first returned echo for the current LiDAR point P. 2. z : height difference between the current LiDAR point P and the lowest point found in a cylindrical volume with radius r = 15 m, which is fixed to the experimental value used in a previous study (Guo et al., 2011). Ground and off-ground objects are supposed to be discriminated by this feature in the absence of a digital terrain model. 3. Nz : deviation angle of the normal vector of the local plane from the vertical direction. Ground points can be well identified by this feature because the flat ground is supposed to have a low deviation angle. The local plane is estimated using a least-squares estimator on the points in the r = 2 m cylindrical volume. The radius of 2 m is chosen to include enough points for robust fitness of local plane, and this r value is found to perform better comparing with other neighboring values (1 m and 3 m) in this case. 81

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 7. Overall accuracy and training time as a function of the algorithm parameters. (a) Pixel comparison window size. (b) Number of features. (c) Training samples per class. (d) Number of selected features at node. (e) Number of trees. (f) Minimum size of terminal nodes. (g) Kernel size for post-processing. (h) Central pixel weight.

4. N : number of echoes for the current LiDAR point P. 5. R z : residuals of the least-squares estimator in fitting a local plane in the 2-m radius cylinder.

Compared to the original height image, all the features generally give more discriminative information for classification (e.g., the tree class is easy to discriminate in Nz and N , as it has relatively high values). However, some errors in the original LiDAR points may be propagated to these features. In Fig. 9b, although most outliers are removed in the preprocessing step, some outliers remain and lead to circular errors in the z map. Furthermore, the deviation angle and residuals rely on the

The calculated features are shown in Fig. 9. The elapsed time for the calculation was 1150.21 s, which is significantly larger than the calculation time of 1.41 s for obtaining the pixel comparison features.

Fig. 8. Neighboring cylindrical volume and local plane for feature computation. 82

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 9. The five additional LiDAR features taken for comparison.

estimation of the local plane. Errors in the plane fitting will result in incorrect values in these features. Using the five additional features described above, we conducted another two classification experiments: the pixel comparison features + the height feature + the five additional features (test 1); and the height feature + the five additional features (test 2). The classification with the pixel comparison features and the height feature adopted in the proposed algorithm was also used for comparison (named test 3 here). The classification results and the importance of these features are shown in Fig. 10. Among the features, the heightbased features of the height difference z , height H, residuals R z , and deviation angle Nz were the most important, which is in general agreement with the importance tests reported in other studies. The individual pixel comparison features are not as important as the manually designed features, but there are many of them and they can be assembled to form a strong classifier. As expected, the OA of test 1 is the largest because it uses the most features. However, the improvement in OA after including the five additional features is not significant (test 3 to test 1), an increase of only 0.6%. This implies that the pixel comparison features contain sufficient discriminative information for classification in this complex urban setting. If we abandon the pixel comparison features, as in test 2, the OA significantly decreases to 79.6%, which is much less than the 87.2% achieved by the proposed method. Compared to these manually designed features, the pixel comparison features appear to be more robust. Furthermore, the pixel comparison features have the advantage of fast computation. Therefore, we believe they could be a viable alternative or act as complementary LiDAR features to those currently used for classification.

describe the characteristics of a pixel comparison feature. We therefore computed the distance and direction of the point pair vector for each feature and plotted their relationship with the calculated importance, as shown in Fig. 11. The pixel pair distances of the most important features are mainly from 50 to 150 m. This explains why an increase of the pixel comparison window size does significantly not improve the OA when the size is larger than 150, as observed in Fig. 7a. The distance distribution of the important features is more concentrated for the ground and building classes, which mainly cluster at approximately 80 m, 110 m, and 130 m. However, for the car and tree classes, the distance of the important features is distributed more randomly and no significant preference for pixel comparison distance is observed. Compared to the vector length, the feature importance appears to be more relevant to the point pair direction. As shown in Fig. 10b, the favored pixel pairs are mainly located around the directions of NW43° and ENE20°, which are close to the directions of local objects (e.g., buildings and roads) in the ground-truth classification map. Similar to the pixel pair distance, the pixel pair direction distribution of the most important features is more scattered for the tree and car classes. Although the directions around NW43° and ENE20° are mostly preferred, the pixel comparison features with other directions are also relatively important. Most man-made objects have a regular structure with a certain size and direction in an urban setting. This is why the pixel comparison features computed along the object direction and with a distance close to that of the size of the objects contain more discriminative information, as shown in the importance analysis. The proposed algorithm initializes a number of pixel comparison features that are randomly distributed in distance and direction and allows for the random forest classifier to automatically select the useful features for classification. Although they are not carefully designed features, they work as well or even better than the traditional features. This approach is somewhat similar to the use of convolutional neural networks (CNNs) (He et al., 2016), which does not require prior knowledge or human effort in the feature design. However, unlike the complex and unclear mechanism of a CNN, the mechanism of the proposed method is more direct and simpler.

4.3. Pixel comparison feature importance analysis The calculated importance allows for us to determine the most important pixel comparison features and helps us understand how the pixel comparison features work in the classification. As the pixel comparison features are obtained by calculating the height difference between a neighboring pixel pair, the vector connecting the pixel pair can 83

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 10. Comparison results. (a) Feature importance by means of decreasing permutation accuracy. (b) Classification results with height + pixel comparison features. (c) Classification results with height + 5 additional features. (d) Classification results with height + pixel comparison features + 5 additional features.

Fig. 11. Importance of the pixel comparison features. (a) Feature importance as a function of pixel pair distance. (b) Feature importance as a function of pixel pair direction. The inset map is the ground-truth classification map to compare with the direction distribution.

4.4. Comparison with other methods

multispectral image dataset with the near infrared, red and green bands (IRRG) in addition to the digital surface model. To make a fair comparison, we included the spectral features from the IRRG image as well. They were directly fed into the random forest classifier together with our proposed height-derived features. As the IRRG image contains sufficient information to distinguish the vegetation and non-vegetation ground, we can subdivide the ground category into low vegetation and impervious surfaces categories, consistent with the categories defined

The Vaihingen area used in this study is from the ISPRS 2D semantic labeling contest, which is investigated in a number of previous classification studies (ISPRS Working Group II/4, 2018). In October 5, 2018, there were approximately 140 classification results presented in the contest. These results can be directly adopted to compare with our study. However, different from this study, the contest adopts an extra 84

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al.

Fig. 12. Comparison with published results in the ISPRS Semantic Labeling Contest. (a) Overall accuracy ranking. The red star indicates the OA of our proposed method using both height and spectral information. (b) Ground truth. (c) Classification results in this study.

initially leads to a sharp increase in OA, but a saturation effect was observed after they reach a certain level. These setting refinements include the pixel comparison window size, feature number, training sample number, and tree number. To balance the accuracy and computational burden, our testing suggested the optimal settings of a search window size close to the largest object size, a feature number of ∼100, ∼2000 training samples per class, and a tree number of ∼300 for a similar urban scene. We also found that setting the number of selected features for each node to the square root of the total features results in better classification accuracy. We also recommended that the trees are grown as deep as possible for a stronger classification ability. We also assessed the feature importance of the proposed pixel comparison features and five additional popular features. The feature importance of the individual pixel comparison features is not as high as for the manually designed features, but their combination achieves a better OA than the five introduced features. A further investigation of the importance of randomly selected pixel comparison features showed some relevance between the importance and the pixel pair for comparison. The more important features normally prefer a pixel pair in the direction of the ground object structure (e.g., the orientation of buildings or roads). The distance of the pixel pair is less relevant to the importance, and the range between the smallest and largest object size is preferred. This work aimed at investigating the performance and potential of pixel comparison features for urban LiDAR classification. In future work, we will extend the algorithm to 3D point cloud use, which should avoid data and accuracy loss in the gridding process. The proposed algorithm framework could be easily applied to other remote sensing data such as radar, aerial photographs, or hyperspectral imagery. The fusion of multisource data with the proposed classification algorithm is another important task in the future.

in the ISPRS contest. The results show our algorithm achieves an OA of 90.3%, which ranks approximately 15% among the published results in the ISPRS 2D Semantic Labeling Contest (Fig. 12). We note that the results with > 90% OA mostly are based on the CNNs, e.g., the NLPR3 (91.2% OA), CASIA2 (91.1% OA), BKHN10 (91% OA), and HUSTW3 (90.7% OA). The NLPR3 and the BKHN10 both use an effective deep FCN ensemble classification (Long et al., 2014), and the former includes a fully connected conditional random fields for refinement. The CASIA2 proposes an end-to-end self-cascaded convolutional neural network (ScasNet) (Liu et al., 2018). The HUSTW3 utilizes U-Net (Ronneberger et al., 2015), a modified network based on FCN, as their semantic labeling model. Although some of their work has not yet published, their results in the benchmark test revealed the remarkable ability of the CNNs in semantic labeling. However, high accuracy of CNNs is normally accomplished by increasing the network size, leading to huge computational costs (Treml et al., 2016). More recently, CRF-based refinement was introduced in the postprocessing step of the CNN classification results to capture the contextual information (Chen et al., 2014), which increases the computational cost significantly. The algorithm proposed in this study can achieve a relatively high accuracy approximating the CNN method and has the remarkable benefit that it can avoid the complex feature generation and training steps of the CNNs and significantly reduce the computational burden in the classification. 5. Conclusion In this paper, we proposed the use of pixel comparison features for LiDAR classification in an urban setting. The proposed method requires only LiDAR height information and is fast to compute. We presented a procedure using the random forest model and majority analysis to identify the ground objects and refine the classification results. The experiments undertaken in this study revealed that the proposed method had an OA of 87.2%, which can be considered a good result given that no spectral information was used in the classification. After including the spectral information, an OA of 90.3% was obtained, approximating the state-of-the-art performance in the ISPRS semantic labeling contest. Notably, the feature calculation in this study is a linear operation that is fast to implement (less than 1.5 s for 100 features in our dataset). The training time is also acceptable, as it was less than 50 s for 8000 samples. Based on the assessments of the classification accuracy and computational cost, we believe that the proposed method is robust and has high potential for urban LiDAR classification. The parameter settings were also tested to guide the application of the proposed method. We observed that refining many of the settings

Acknowledgments This work was supported in part by the National Natural Science Foundation of China (No. 41504004, No. 41504003), the Natural Science Funding of Shenzhen University (No. 2018073), the Shenzhen Scientific Research and Development Funding Program (No. JCYJ20170302144002028), and the Open Fund of the State Key Laboratory of Earthquake Dynamics (LED2016B03). References Alexander, C., Tansey, K., Kaduk, J., Holland, D.A., Tate, N.J., 2010. Backscatter coefficient as an attribute for the classification of full-waveform airborne laser scanning

85

ISPRS Journal of Photogrammetry and Remote Sensing 148 (2019) 75–86

C. Wang et al. data in urban areas. ISPRS J. Photogramm. Remote Sens. 65 (5), 423–432. Bartels, M., Wei, H., 2006. Rule-based improvement of maximum likelihood classified LIDAR data fused with co-registered bands. In: Paper presented at Proceedings of Annual Conference of the Remote Sensing and Photogrammetry Society. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J., 2000. LOF: identifying density-based local outliers. Int. Conf. Manage. Data 29 (2), 93–104. Bujan, S., Gonzalezferreiro, E., Reyesbueno, F., Barreirofernandez, L., Crecente, R., Miranda, D., 2012. Land use classification from lidar data and ortho-images in a rural area. Photogramm. Rec. 27 (140), 401–422. Charaniya, A.P., Manduchi, R., Lodha, S.K., 2004. Supervised Parametric Classification of Aerial LiDAR Data. In: Paper presented at Conference on Computer Vision and Pattern Recognition Workshop. Chehata, N., Guo, L., Mallet, C., 2009. Airborne lidar feature selection for urban classification using random forests. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 39 (Part 3/W8), 207–212. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.J.C.S., 2014. semantic image segmentation with deep convolutional nets and fully connected CRFs (4), 357–361. Clode, S., 2004. The automatic extraction of roads from LIDAR data. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 3, 231–236. Cutler, D.R., Jr, E.T., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., 2007. Random forests for classification in ecology. Ecology 88 (11), 2783–2792. Guo, L., Chehata, N., Mallet, C., Boukir, S., 2011. Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests. ISPRS J. Photogramm. Remote Sens. 66 (1), 56–66. Hartfield, K.A., Landau, K.I., Leeuwen, W.J.D.V., 2011. Fusion of high resolution aerial multispectral and LiDAR data: land cover in the context of urban mosquito habitat. Remote Sens. 3 (11), 2364–2383. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Paper presented at IEEE Conference on Computer Vision and Pattern Recognition. Im, J., Jensen, J.R., Hodgson, M.E., 2008. Object-based land cover classification using high-posting-density LiDAR data. Geosci. Remote Sens. 45 (2), 209–228. Lepetit, V., Fua, P., 2006. Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 28 (9), 1465–1479. Li, Z., Zhang, L., Mathiopoulos, P.T., Liu, F., Zhang, L., Li, S., Liu, H.J.I.J.o.P., Sensing, R., 2017. A hierarchical methodology for urban facade parsing from TLS point clouds, 123, 75–93. Liu, Y., Fan, B., Wang, L., Bai, J., Xiang, S., Pan, C.J.I.J.o.P., Sensing, R., 2018. Semantic labeling in very high resolution images via a self-cascaded convolutional neural network. Long, J., Shelhamer, E., Darrell, T.J.I.T.o.P.A., Intelligence, M., 2014. Fully convolutional networks for semantic segmentation, PP(99), 1–1. Minh, N.Q., Hien, L.P., 2011. Land cover classification using LiDAR intensity data and neural network. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 29 (4), 429–438. Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: a review. ISPRS J. Photogramm. Remote Sens. 66 (3), 247–259. Nations, U., 2018. World Urbanization Prospects 2018, edited. Niemeyer, J., Rottensteiner, F., Soergel, U., 2014. Contextual classification of lidar data and building object detection in urban areas. ISPRS J. Photogramm. Remote Sens. 87 (1), 152–165. Patino, J.E., Duque, J.C., 2013. A review of regional science applications of satellite

remote sensing in urban settings. Comput. Environ. Urban Syst. 37 (1), 1–17. Phinn, S., Stanford, M., Scarth, P., Murray, A.T., Shyy, P.T., 2002. Monitoring the composition of urban environments based on the vegetation-impervious surface-soil (VIS) model by subpixel analysis techniques. Int. J. Remote Sens. 23 (20), 4131–4153. Priestnall, G., Jaafar, J., Duncan, A., 2000. Extracting urban features from LiDAR digital surface models. Comput. Environ. Urban Syst. 24 (2), 65–78. Riordan, K.D., Jensen, J.R., Tullis, J.A., Hodgson, M.E., 2003. Synergistic use of lidar and color aerial photography for mapping urban parcel imperviousness. Photogramm. Eng. Remote Sens. 69 (9), 973–980. Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., Breitkopf, U., 2012. The Isprs benchmark on urban object classification and 3d building reconstruction, I-3(3), 293–298. Shapovalov, R., Velizhev, A., Barinova, O., 2010. Non-associative Markov networks for 3D point cloud classification. In: Paper presented at Photogrammetric Computer Vision and Image Analysis. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A., 2013. Scene coordinate regression forests for camera relocalization in RGB-D images. In: Paper presented at IEEE Conference on Computer Vision and Pattern Recognition. Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: convolutional networks for biomedical image. Segmentation 9351, 234–241. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A., 2008. Conditional variable importance for random forests. BMC Bioinf. 9 (1), 307. Stuckens, J., Coppin, P.R., Bauer, M.E., 2000. Integrating contextual information with per-pixel classification for improved land cover classification. Remote Sens. Environ. 71 (3), 282–296. Treml, M., Arjona-Medina, J., Unterthiner, T., Durgesh, R., Friedmann, F., Schuberth, P., Mayr, A., Heusel, M., Hofmarcher, M., Widrich, M., 2016. Speeding up semantic segmentation for autonomous driving. In: Paper presented at NIPS 2016 Workshop – MLITS. Wang, A., He, X., Ghamisi, P., Chen, Y., 2018. LiDAR data classification using morphological profiles and convolutional neural networks. IEEE Geosci. Remote Sens. Lett. PP (99), 1–5. Wang, C., Li, Q., Liu, Y., Wu, G., Liu, P., Ding, X., 2015a. A comparison of waveform processing algorithms for single-wavelength LiDAR bathymetry. ISPRS J. Photogramm. Remote Sens. 101, 22–35. Wang, Z., Zhang, L., Fang, T., Mathiopoulos, P.T., Tong, X., Qu, H., Xiao, Z., Li, F., Chen, D.J.I.T.o.G., Sensing, R., 2015b. A multiscale and hierarchical feature extraction method for terrestrial laser scanning. Point Cloud Classificat. 53 (5), 2409–2425. Yan, W.Y., Shaker, A., 2014. Radiometric correction and normalization of airborne LiDAR intensity data for improving land-cover classification. IEEE Trans. Geosci. Remote Sens. 52 (12), 7658–7673. Yan, W.Y., Shaker, A., El-Ashmawy, N., 2015. Urban land cover classification using airborne LiDAR data: a review. Remote Sens. Environ. 158, 295–310. Yan, W.Y., Shaker, A., Habib, A., Kersting, A.P., 2012. Improving classification accuracy of airborne LiDAR intensity data by geometric calibration and radiometric correction. ISPRS J. Photogramm. Remote Sens. 67 (2), 35–44. Yastikli, N., Cetin, Z., 2016. Classification of LiDAR data with point based classification methods. ISPRS – Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLI-B3, 441–445. ISPRS Working Group II/4, 2018. ISPRS Semantic Labeling Contest (2D): Results, < http://www2.isprs.org/commissions/comm2/wg4/vaihingen-2d-semanticlabeling-contest.html > .

86