Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments

Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments

ARTICLE IN PRESS JID: NEUCOM [m5G;March 27, 2020;21:17] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing jo...

3MB Sizes 0 Downloads 87 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;March 27, 2020;21:17]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments ✩ Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang∗ School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China

a r t i c l e

i n f o

Article history: Received 19 December 2019 Revised 6 February 2020 Accepted 24 February 2020 Available online xxx Communicated by Bo Shen Keywords: Relocalization Semantic map 3d point clouds Outdoor environment Unmanned Ground Vehicles

a b s t r a c t In this paper, we proposed a sparse semantic map building method and an outdoor relocalization strategy based on this map. Most existing semantic mapping approaches focus on improving semantic understanding of single frames and retain a large amount of environmental data. Instead, we don’t want to locate the UGV precisely, but use the imprecise environmental information to determine the general position of UGV in a large-scale environment like human beings. For this purpose, we divide the environment into environment nodes according to the result of scene understanding. The semantic map of the outdoor environment is obtained by generating topological relations between the environment nodes. In the semantic map, only the information of the nodes is saved, so that the storage space can be kept at a very small level with the increasing size of environment. When the UGV receives a new local semantic map, we evaluate the similarity between local map and global map to determine the possible position of the UGV according to the categories of the left and right nodes and the distance between the current position and the nodes. In order to validate the proposed approach, experiments have been conducted in a large-scale outdoor environment with a real UGV. Depending on the semantic map, the UGV can redefine its position from different starting points. © 2020 Elsevier B.V. All rights reserved.

1. Introduction The ability to acquire accurate position information in real-time is the premise and basis for the UAVs to perform various tasks. Relocalization consists in locating the UAVs on an existing map when they lost their positions because of features matching failure or being kidnapped. It is commonly used for navigation, simultaneous localization and mapping (SLAM) and automatic drive [1,2]. Visual sensors and laser sensors are the most important sensors for UGVs to perceive environments. At present, visual relocalization plays a critical role in the field of relocalization. Given a pre-built map or a series of images with the exact position, visual relocalization aims to determine the 6D pose of the current camera view. The traditional methods of visual relocalization detect the most similar database image of the current image by using image matching methods [3,4]. In order to describe a part of the map, there are several feature descriptors, such as SIFT, SURF and ORB. All of these descriptors have been used in visual relocaliza-

✩ This work was supported by the National Natural Science Foundation of China (U1913201) and the Science and Technology Foundation of Liaoning Province of China (20180520031). ∗ Corresponding author. E-mail address: [email protected] (Y. Zhuang).

tion. However, each place may contain hundreds of local features and match place features directly can be time consuming. The BoW technique quantizes local features into a vocabulary and matches places using numerical vectors, which is adopt to increase efficiency of relocalization in large environments [5]. Although some methods to improve the image quality can improve the relocalization results [6], the fact that the visual sensor is vulnerable to poor illumination or dynamic changes exists objectively. So the visual relocalization based on deep learning are proposed, such as Deep Belief Network [7] and Convolutional Neural Network [8], which estimated the camera poses by training an end-to-end deep model using images. Although the deep model methods can cope with the environmental change, these methods need a long time to learn the different appearances of a place or to train robust features. Furthermore, visual sensors are limited by their environmental factors, such as brightness and light intensity. Comparing to visual sensors, the laser sensors are not affected by environment factors and can provide precise depth information. So laser sensors are also considered as the main sensors for relocalization. Traditional methods are usually relying on geometry matching to achieve relocalization [9,10], and some filtering methods are also used for localization, such as unscented Kalman filtering, particle swarm optimization, set-membership, H∞ [11–15]. These methods largely depend on environment maps with dense

https://doi.org/10.1016/j.neucom.2020.02.103 0925-2312/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

points or geometrical feature. To build these maps, a dense exploration need to be made in the environment, which is usually computationally inefficient and need a lot of storage space. Therefore, the relocalization based on geometry matching is infeasible in some cases such as long term navigation. On the other hand, deep model methods also be used in laser relocalization [16,17], but the problem that training model needs a long time still exists. In this paper, a sparse semantic map building method and a laser relocalization strategy are proposed. First, in order to obtain robust features, the 3D point clouds are compressed into 2D ODVL images, which are divided into super pixels, and 23 dimensional texture features are extracted for each super pixel. Then, the Gentle-AdaBoost algorithm is used to accomplish feature screening on these texture features to obtain the initial classification results of the 3D point clouds. Moreover, reclassification is carried out to improve the classification accuracy by using the Centroid method and considering the height distribution of the point clouds. According to the results of scene understanding, the outdoor environments are divided into scene nodes and road nodes. The semantic map of the outdoor environment is obtained by generating topological relations between the scene nodes and the road nodes. Finally, the map is simplified so that the positioning accuracy can be improved. Different from the general concept maps, for the semantic map, a map format which only stores the distance information between objects is proposed. According to the scenes on the left and right sides observed at the current position, combined with the environmental nodes of the semantic map, the current positions of the UGV can be estimated. Combining with the triangular relationship, a method for obtaining the possible positions of the UGV by calculating the intersect points of two circles is proposed. And then the last position information is maintained and used as the priori condition to calculate the next possible positions of the UGV. In addition, each point has the score indicating the possibility of the UGV at the current position. The point with the highest score is the most likely position of the UGV. In order to validate the proposed approach, experiments have been conducted in a large-scale outdoor environment with a real UGV. Depending on the semantic map, the UGV can redefine its position from different starting points. This paper is organized as follows: in Section II we first review related work on semantic mapping and laser relocalization. Implementation details of scene understanding and semantic map building are provided in Section III, while our relocalization method is described in Section IV. The experiment is conducted and the results is analyzed in Section V and Conclusion is reached in the last section. 2. Related works 2.1. Semantic map building To enable UGVs to recognize an environment and act accordingly in a real world, the UGVs need to infer not only the geometry but also semantic information of surrounding environment. Semantic mapping is a map building process, which fuses the Simultaneous Localization and Mapping (SLAM) algorithm and scene understanding algorithm [18]. Visual sensors are usually used for semantic mapping. McCormac et al. [19] combined Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system to build indoor semantic map based on RGBD camera. Zhao et al. [20] proposed a deep learning method which performs 3D reconstruction while simultaneously recognizing different types of materials and labeling them at the pixel level. A novel deep learning approach for semantic segmentation of RGBD images with multi-view context was presented in [21]. These

methods are all aimed at the narrow indoor environment. Although they can provide abundant environment information, they are not suitable for UGVs localization and navigation in a largescale environment. In order to make semantic map more practical, the semantic maps of the environment as priori maps to realize the location of robots is manually built in [22] and [23]. Furthermore, in [24], a 3D online semantic mapping system is presented, which utilized CNN model to transfer pixel label distributions of 2D image to 3D grid space, and the authors proposed a Conditional Random Field (CRF) model with higher order cliques to enforce semantic consistency among grids. Considering that many papers convert 2D images into 3D information, some researchers directly use 3D LIDAR data or fusion information to build semantic map. Sun et al. [25] represented the 3-D map as an OctoMap and modeled each cell as a recurrent neural network, to obtain a Recurrent-OctoMap. In this case, the semantic mapping process can be formulated as a sequenceto-sequence encoding–decoding problem. In [26], a real-time endto-end semantic segmentation method is proposed for road-objects based on spherical images, which are transformed from the 3D LiDAR point clouds and taken as input of the convolutional neural networks to predict the point-wise semantic mask. In comparison, Liu et al. [27] transformed 3D laser point clouds into 2D images and obtained the semantic information based on the images. The fusion methods based on the data fusion of 3D Lidar and camera can obtain the semantic segmentation in the camera frame, which reduce the complexity of semantic segmentation based on laser data, then use the LiDAR points to localize the segmentation in 3D [28,29]. However, in [28], the semantic maps retain a large number of voxels, that make storage space and computing efficiency face severe challenges in growing environments. Ma et al. [29] formulated the problem in a Bayesian filtering framework, and exploited lanes, traffic signs, as well as vehicle dynamics to localize robustly with respect to a sparse semantic map. Although the map can avoid the problem of storage and computing efficiency, its application environment has great limitations.

2.2. Laser relocalization Visual relocalization algorithms struggle to cope with strong appearance changes that commonly occur during long-term applications in outdoor environments, and fail under certain ill-lighted conditions. In contrast to that, LiDAR sensors are mainly unaffected by appearance change and used for relocalization. Some works are matching current laser data designed with some different features and prior map or pre-calculated database of features. Chong et al. [30] projected 3D point cloud into 2D plane and extracted the environment features from the plane to localize UGV in 3D urban environment, but the basic assumption was that many surfaces in the urban environment are rectilinear in the vertical direction. Collier et al. [31] extracted highly descriptive features, called the Variable Dimensional Local Shape Descriptors, from LiDAR range data to encode environment features. These features were used to learn a generative model of place appearance and determines if an observation comes from a new or previously seen place. In [32], when the map-matching method is not feasible in two parallel straight wall environments, the data from the laser range sensor and the odometry is integrated to localize the robot based on moving horizon estimation [33]. Wolcott and Eustice [34] proposed a Gaussian mixture map, which was a 2D grid structure where each grid cell maintains a Gaussian mixture model characterizing the distribution over z-height in that cell. Cao et al. [35] converted 3D laser points to 2D bearing angle images and extracted ORB features, then a visual bag of words approach was used to improve matching efficiency. In [36], the 3D LiDAR data was projected into

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

2D scan context, which was used to calculate the similarity score of two LiDAR scans. Instead of relying on geometry based matching, learning-based approaches have shown amazing performances in field of laser relocalization. Yin et al. [37] proposed a novel synchronous adversarial feature learning, a kind of unsupervised learning method for LiDAR based loop closure detection task. Furthermore, Yin [38] accomplished the place recognition task based on an end-to-end feature learning framework with the LiDAR inputs. In [39], the semihandcrafted range histogram features are used as an input to a 2D convolutional neural network, while Uy et al. [40] proposed the PointNetVLAD where they leverage on the recent success of deep networks to solve point cloud based retrieval for place recognition. Kim et al. [41] recently presented the idea to transform point clouds into scan context images [36] and feed them into a CNN for solving the place recognition problem. In [42], a Convolutional Neural Network is trained to extract compact descriptors from single 3D LiDAR scans. In summary, the traditional laser relocalization approaches need to save lots of LiDAR data or features. The complexity tends to grow with the environment scale, and matching the current scans with potential target scans is computationally expensive. Meanwhile, the learning-based approaches need a long time to learn the different appearances of an environment or to train robust CNN features. Instead, we present a laser relocalization method based on a sparse sematic map without learning. 3. Scene understanding and semantic map building The Bearing Angle (BA) image and the range image are the general models which compress 3D point clouds data into 2D images. From the two aspects of describing the details and the texture of the images and classifying the objects, the range image does not perform well and the BA image model performs poorly in distinguishing the objects at different depths. The novel optimal depth & vector length (ODVL) image model compensates for the shortcomings of the two methods by calculating the optimal depth value and the length of the BA vector corresponding to the point clouds and mapping images to gray images with weights. It also can be used for continuous scanning data with large range and fixed-point scanning data. There is no special requirement for the scanning mode and scanning position and it has good adaptability. 3.1. ODVL image model The origin scenes of the point clouds consist of multiple continuous scans of the laser data. The number of laser data obtained in one scan is 381. 361 laser points in the middle are considered as the effective laser data. 250 scans of the laser data are considered as a scene. The points are projected onto the horizontal plane and the eigenvectors are calculated according to the distribution of the point clouds. The direction of the largest eigenvector represents the main direction in which the laser points are distributed on the plane. In the main direction, the variance of the data is the largest. The main direction is perpendicular to the optimal depth direction. Take the point at the lower left corner of the current plane as the coordinate origin point and make a straight line parallel to the main direction through the origin point. The distance from each point on the plan to the straight line is the optimal depth value of this point. Sort all the obtained optimal depth values from small to large and divide them into 256 sets evenly which correspond to 256 Gy values (0–255). The gray value GDi,j with respect to the optimal depth value can be obtained according to the optimal depth value corresponding to each point. The obtained BA vectors of each point are sorted from small to large and they are divided into 256 sets evenly like above. The gray values GBi,j according to the BA

3

vectors corresponding to each point are obtained. The gray value GOi,j about the OVDL image model according to GDi,j and GBi,j is obtained as following:

GOi, j =

λ1 GBi, j + λ2 GDi, j λ1 + λ2

(1)

where λ1 and λ2 are weights. The variance of the gray value matrix is n  m  

δ2 =

i=1 j=1

2

GOi, j −GO nm

(2)

The optimization method is used to obtain the λ1 and λ2 , which generate the maximum variance. 3.2. Segmentation for ODVL image In order to classify the objects in the environment, the Simple Linear Iterative Clustering (SLIC) is used to segment the OVDL image into many pixel blocks. The SLIC method produces a relatively regular lattice and costs low computational complexity, which are desirable properties for the subsequent applications. In the SLIC method, the local K-means clustering is performed on pixels based on the color distance. Then isolated small clusters are merged with the largest neighbor clusters to obtain the superpixels [1]. Despite of its simplicity and superiority over the conventional methods, the SLIC method encounters problems in two aspects: first, during the iterative clustering of the K-means algorithm, the inaccurate classification of the pixels can be propagated to produce wrong superpixels; second, small clusters are merged into its largest neighbor without considering their color similarity. When we use SLIC method to segment the OVDL image, we consider not only the color information, but also the depth information of each pixel. In this way, the two problems of SLIC mentioned above can be improved. Firstly, SLIC calculates the size of each pixel block.



S=

N C

(3)

where S is the side length of the square and N is the total number of pixels and C is the number of pixel blocks to be separated. On the current plane, the cluster center points are selected evenly at intervals of S pixels to produce the same size of pixel blocks. In a 2S × 2S neighborhood, since the interval of each cluster center is S, a pixel point will be searched by many cluster center points. In the search process, the distance from one pixel to one cluster center point which is defined as D’:

⎧ 2 2 ⎪ D = ( dc ) + ( ds ) k2 ⎪ ⎨  2 dc =

pi − p j

⎪ ⎪ ⎩d = x − x 2 + y − y 2 s i j i j

(4)

where p is the grey value, (x, y) is the position of a pixel, dc is the grey distance and ds is the pixel distance between two pixels. k is used as a weight to control the influence of gray distance and pixel distance on the clustering distance. A pixel point has the different distances corresponding to different center points, and the center point corresponding to the smallest distance is selected as the cluster center of this pixel point. After calculating all the pixels, the new cluster center points are calculated according to average of pixel positions with the same category. Next, the error between the new center positions and the last center positions is obtained. If the error is less than the threshold, the algorithm is finished.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

3.3. Point clouds classification based on ODVL image We classify the super pixel blocks segmented from the ODVL images. The classification method is the Gentle- AdaBoost classification algorithm which is based on the AdaBoost framework in the open source library and combines the Classification and Regression Trees (CART) method. The algorithm extracts feature vectors made of the texture features and spatial geometric features of the pixel blocks from each pixel block for training. The dimension of the feature vector is 23. The first twenty data are from four gray symbiotic matrices of the same pixel block. Each matrix is generated by the different direction parameters and each gray symbiotic matrix can calculate five parameters [43]. The first parameter is:

Asm =

i

P (i, j )2

(5)

j

where Asm is defined as the energy of the pixel block. It reflects the uniformity of gray distribution and texture thickness. The second parameter is:



Ent =

i

[P (i, j ) log P (i, j )]

(6)

j

where Ent is defined as the entropy of the pixel block. It represents the degree of non-uniformity and complexity of the texture. The third parameter is:

Con =

i

2

P (i, j )(i − j )



(7)

j

where Con is defined as the contrast of the pixel block. It measures the sharpness of the image and the depth of the texture. The fourth parameter is:

Idm =

i

j

P (i, j ) 1 + ( i − j )2

(8)

where Idm is defined as the inverse difference moment of the pixel block. It reflects the local variation of the image texture. The last parameter is:



Corr =

i

j

(i × j )P (i, j ) − μx μy

δx δy

(9)

where Corr is defined as the correlation of the pixel block. It measures the degree of gray value of the image in the row and the column. The twenty-first data is the height feature of the pixel blocks:

Hp =

1 zi n

(10)

i

The twenty-second information is the spatial shape of the pixel block. For the spatial shape, we calculate the covariance matrix of the pixel clock and obtain three eigenvalues λ0 , λ1 , λ2 (λ0 ≥λ1 ≥λ2 ) and three eigenvectors e1 , e2 , e3 . The distribution of the point clouds is divided three types according to the relationship between the three eigenvalues: spherical distribution, linear distribution and plane distribution which are represented respectively by the values of λ0 , λ0 -λ1 and λ1 -λ2 . The twenty-third information is the directional characteristic of the local point clouds corresponding to the pixel blocks. It is defined as the eigenvector corresponding to the smallest eigenvalue of the local point clouds and it is also the normal vector of this point clouds. This paper uses the Gentle-AdaBoost classifier to classify each point cloud with the feature vectors. After the classification process, the “lower part of the building”, “vehicle”, and “trunk” which are defined as the “object to be optimized” are similar in height

Fig. 1. The three-dimensional border of the object.

and have similar texture features in the ODVL image. In order to improve the classification accuracy, the Centroid method is used to cluster the point clouds corresponding to the “object to be optimized”. In each clustering process, the method clusters two closest objects, until the distances between the centers of all objects are larger than a threshold. This paper selects 2.5 m as the threshold. The Gentle-AdaBoost classifier is used to classify the point clouds with the “object to be optimized” tag. The features to be trained which are obtained from the “object to be optimized” are 4-dimensional vectors. In the previous work [44], for the “object to be optimized”, a three-dimensional border is added according to the point cloud. The longest diagonal lb of the border and the height h of the border can be obtained (as shown in Fig. 1). The other two parameters are obtained from the distribution of the z direction. For the point clouds of the “object to be optimized”, they are equally divided into three regions from top to bottom according to the height in the z direction. The point clouds of the vehicle are concentrated in the middle layer and the point clouds of the trunk are evenly distributed among the three layers. The point clouds in the lower part of the building are scattered randomly. For each “object to be optimized”, the numbers in three layers of the points from top to bottom are recorded as n1 , n2 and n3 . d1 =n1 /n2 and d2 =n2 /n3 are defined as the distribution characteristics of point clouds in the height direction. In summary, the classifier uses the feature vectors composed of lb , h, d1 and d2 to classify the point clouds with the “object to be optimized” tag. After the reclassification, the confidence of the classification results of buildings and ground are very high. If the other types of point clouds are in the classification results of buildings and ground, the point clouds with incorrect classification results will be corrected by clustering and calculate the covariance matrices. 3.4. Semantic map building The semantic map is defined as the map containing the spatial characteristics of the environment and the distribution characteristics of the scenes with known categories. In this paper, the semantic map consists of information about nodes, such as trees, vehicles, roads, buildings and ground. The environment nodes of trees, buildings and the ground can be added to the semantic map which are represented by ‘T’, ‘B’ and ‘G’ respectively. If there are no buildings and trees on this section of the ground, it will be divided as the ground nodes. For the “buildings and trees” nodes, if there are continuous trees in the building nodes, they are defined as the “building-tree” nodes. Ground, trees, buildings and building-tree are considered as the environmental nodes. According to the positions of the environment nodes relative to the UGV, the nodes on the right of the UGV become the right environment nodes, and the nodes on the left are called the left environment nodes. New road nodes are added in the semantic map in two cases. The first is when one of the left or right environment nodes changes. The second is when the road turns or the UGV moves long enough along a straight road. According to the left and right environmental information, GPS information and IMU information of the UGV, the algorithm generates environment nodes and road nodes. In the semantic map, a road node is connected to environment nodes on the left and right

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

Fig. 2. The Comparison between original map (left) and simplified map (right).

sides and an environment node can be connected to multiple road nodes. The point clouds of the previous scene are always stored in the memory of the UGV to ensure the continuity of the semantic map. When new data is read in, the existing semantic map can be updated and maintained. In the actual experiment of establishing the map, since the same object is viewed from different angles, the error of the pose estimation will cause the same object to be represented by multiple nodes which are densely distributed. It is not conductive to robot positioning. Simplifying the semantic map facilitates positioning and navigation while also reducing the memory. We scan all the environment nodes to get their spatial location and determine the position relationship between an environment node and other environment nodes with the same type. If the distance between these two nodes is less than 20 m, the two environment nodes are replaced by the midpoints of them and the road nodes connected by the previous two environment nodes are connected to the midpoint. The simplified map is shown in Fig. 2. In the red area, there are many nodes which represent the same object. After simplification, the excess environment nodes area removed in the yellow area. The blue line represents the route the UGV uses to build the map. The blue points and numbers represent the segmented road nodes. 4. Relocalization in the semantic map This section introduces an approach to store the semantic map and how to relocate the robot in the map. 4.1. Method of saving the semantic map When the global semantic map is built, the information of the map is saved in a map file, which can be recalled to relocate the robot as a prior map. In the map file, each row stores an environment node information. For each environment node, the first message is a character, which represents the category of the environmental node. The second message is the position of this environmental node in the world coordinate system. In this paper, the semantic map is built on the ground and the z direction is zero. The other messages are the distances from other environmental nodes to the current one. So far, the information for this environment node has been saved. For example: Signpost, Sign_x, Sign_y: Dis_1, Dis_2, Dis_3… Dis_n. It should be noted that the data on the right of the colon from all the nodes can create a square symmetric matrix with a main diagonal of zero, in which Ii,j = Ij,i , i, j∈N. N is the number of rows of this matrix and it is also the number of all environmental nodes. Ii,j represents the distance between the ith node and the jth node. Ii,i is equal to zero. This method has a much smaller storage than other common maps. Especially in the establishment of maps in large scale environments, there are more obvious advantages. 4.2. Relocate in the semantic map When the UGV relocates in the prior map, a local semantic map is built and the information about the left and right environmen-

5

Fig. 3. The intersection points of two circles, which represent the possible positions of the UGV.

tal nodes is obtained, which includes the categories of the left and right nodes and the distance between the current position and the nodes. dis_l and dis_r are defined as the distances between the current position of the UGV and the left and right node respectively. According to the categories of the left and right nodes corresponding to current position of UGV, we find all nodes with the same categories as the reference nodes in the map file. Two reference nodes corresponding to left and right node categories form a combination. Assume that the left reference and right reference nodes in a combination are sig_l and sig_r. sig_l, sig_l_x, sig_l_y and sig_r, sig_r_x sig_r_y are put into a container. Each combination is required to appear only once. The (sig_l_x, sig_l_y) represent position of the left environmental node in the world coordinate system, and the (sig_r_x, sig_r_y) represent position of the right environmental node. sig_l and sig_r are the categories of the nodes. After all the combinations in the map file are obtained, the absolute distance of each combination is defined as dis_tem, which is equal to the following equation



dis_tem =

|sig_l _x − sig_r_x|2 + |sig_l _y − sig_r_y|2

(11)

Comparing the dis_tem with the sum of dis_r and dis_l. If dis_tem is larger than the sum of dis_r and dis_l, the matching combination is removed from all the matching results according to the triangular relationship. Otherwise, the matching combination is retained. After obtaining all the effective matching combinations, draw two circles with the dis_l and dis_r as the radius and the positions of the nodes in a combination as the centers (as shown in Fig. 3). Since dis_tem is not bigger than the sum of dis_r and dis_l, there is at least one intersection point which is the possible position of the UGV. In particularly, if both of two environmental nodes have the same category, because the orientation of the UGV can’t be determined, there will be up to four intersection points. After all the matching combinations have been calculated, all the possible positions will be obtained. The coordinate information of the possible positions is stored in a vector and a score is given to each position. The score represents the probability of the UGV at this position and the initial score is 4. And so on, based on the remaining matching combinations, the possible positions are obtained to get the complete vector. After updating the vector, if the newly obtained possible position has existed in the previous vector or the distance between the new position and the old position is in an acceptable range, the mean coordinate of the old position and the new position is taken as the new coordinate which will replace the old one. The score of the coordinate is increased by 3 on the base of the original scores. If the newly obtained possible position does not satisfy the above requirements, they will be directly added to the end of this vector and the score is set to 4. After loading all newly generated positions, all the scores of the positions are reduced by 3 to reduce the possibility of the UGV in the wrong position. All the positions with the score less than 0 are eliminated. Ultimately, considering the positions with the highest score as the most likely positions of the UGV.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

Fig. 4. All the previous possible positions are moved according to the motion information. The blue points are the previous possible points, the red points are the points moved. x is the distance the UGV moves in the abscissa direction and y is the distance the UGV moves in the ordinate direction.

Fig. 6. The UGV used for experiment.

Fig. 7. The “main”, “location” and “map” represent the nodes of the program. The “map.txt” is the file of map.

After the position estimation is completed, the motion information of the UGV is obtained according to its own IMU system and the GPS. All the possible points are also moved from previous positions to new current positions according to the motion information of the UGV as shown in Fig. 4, and then the next position estimation is carried out using the position information of the current possible points. The relocalization is successful until only one possible location exists. The diagram of the relocalization algorithm is shown in Fig. 5.

OpenGL is a widely used cross-platform 3D graphics interface. This paper mainly uses the functions in the OpenGL library to visualize the experimental results which includes the display of laser points in the three-dimensional coordinate system, the understood 3D point clouds scenes according to the classification labels and the semantic map nodes, etc. OpenCV is a widely used cross-platform computer vision library with many functions for processing images. The program runs under ROS and it is divided into three nodes according to the function. The first node is the main node which can understand the objects in the scene and pass the results to the map node through the service communication method in the ROS. The map node generates a semantic map of the identified objects and stores the map in a file called map.txt. The third node called location is independent of the first two nodes. It simplifies the map.txt generated by the map node and then reads the data used for relocation. Fig. 7 shows the program structure.

5. Experiment

5.2. Semantic map building

5.1. Experimental condition

A scene in the point cloud data is selected for analysis, as shown in Fig. 8. This scene is transformed into range image, Bearing Angle image and ODVL image as shown in Fig. 9. Compared with the traditional range image and the BA image, the ODVL image can be used for the wide range of data, adapt to the variety of scanning methods and improve the distinguishing degree and detail quality of the boundary of the image. It can also provide effective support for further research on subsequent image segmentation and recognition. The ODVL image is completely independent of the scene viewpoint and does not use angle calculation, which is also the essential difference between the ODVL image and the BA image. The results of segmentation corresponding to each figure are shown in Fig. 10. In the range image, the boundary distinction of objects with similar distance is not obvious, so the range image will lose the details of the image and the segmented texture information will also be lost. For example, the car is not distinguished from the ground, and the left part of the building is not distinguished from tree tops. Furthermore, the some windows of the building are lost in the range image. Because he

Fig. 5. The diagram of the relocalization algorithm.

In this experiment, the data used for the semantic map building experiment and relocalization experiment came from the DLUT database. Fig. 6 shows our UGV with onboard sensors. The hardware system of the UGV is composed of laser sensors (two LMS511), ADU5630 attitude and orientation integrated navigation system and industrial computer. The LMS511 is a two-dimensional laser sensor that acquires a single column of laser data per scan. With the movement of the UGV, the twodimensional laser data will be accumulated in the direction of travel, thereby three-dimensional point cloud data is obtained. The ADU5630 attitude and orientation integrated navigation system provides position, heading angle and attitude angle accuracy within the allowable range. It’s frequency is 100 Hz, which can match the highest scanning frequency of the LMS511 laser sensor. The UGV’s computer is equipped with the ROS(Robot Operating System) under Linux and the software system is written in C++. The open source OpenGL and OpenCV program interfaces are used.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

7

Table 1 Error rate of SLIC-based Segmentation.

Error Rate

Range Image

BA Image

ODVL Image

12.41%

9.85%

7.13%

Fig. 8. Original laser point clouds of outdoor scene.

Fig. 11. (a) The result of the initial classification of the Gentle-AdaBoost classifier; (b) The reclassification result. The buildings are brown. The trees are green and the trunk of trees are blue. The cars are yellow and the ground is gray.

Fig. 9. Range image (left), BA image (middle), ODVL image (right).

Fig. 10. SLIC-based segmentation in range image (left), BA image (middle) and ODVL image (right).

BA image can’t distinguish between scenes with different depths, many pixels belonging to different scenes which are close to each other in two-dimensional image space but far away in threedimensional space are wrongly divided into a pixel block, such as trunk and building in middle of Fig. 10. The ODVL image has relatively good details and texture features, which are beneficial to improve the accuracy of classification. In the experiment, 60 scenes were randomly selected from the database and they are converted into the range image, BA image and ODVL image. Each image is divided into 500 pixel blocks. Define the segmentation error rate as:

Er =

Ne N

(12)

where Ne is the total number of the pixel points which are divided incorrectly in the pixel blocks and N is the total number of the pixel points. The Er of super pixel segmentation is shown in Table 1. For the method of testing the Gentle-AdaBoost classifier, 70% of the data are used for training and 30% of the data are used for testing. The result is shown in Table 2. Two understanding results

from the test scenes are selected randomly as shown in Fig. 11. The Gentle-AdaBoost classifier is used to classify the test samples. It is obviously that the point clouds classification based on the ODVL image has the better classification effect on the “top half of the building”, “vegetation” and “ground”. But the classification effect of “vehicle”, “trunk” and “bottom of the building” is not ideal. The results of the initial classification are reclassified and semantically corrected. The results of the classification for all the test samples are shown in Table 3. Through the classification based on ODVL diagram, semantic reclassification and point cloud correction, the final classification results get better accuracy and recall rate, with a total accuracy rate of 0.868. The continuous scene data in Fig. 11 is derived from the database, which was collected based on the campus environment. The data format and the acquisition method of this data set are similar to those of this paper. It is used to test the generalization ability of the algorithm. The scene contains about 2346,500 laser points. The length of the trajectory of the UGV in this database is 1155 m, and the laser points cover an area of 50,419 m2 s. It takes about 300 MB to store all the laser points. There are 13 scene data on the left and right sides. Fig. 12(a) shows the point clouds data after coloring the classified objects and Fig. 12(b) shows the schematic diagram of the semantic description of the scene. The semantic map of the experimental area is established as shown in Fig. 13. This experiment is from the long route with 175 road nodes which are considered as 175 steps. The size of the map file is 9.82 KB for this experience, which is much smaller than the storage space of the laser points. 5.3. Relocalization The UGV start from different positions in the semantic map. After several steps, when there is only one possible position, the relocalization is completed. In the 175 road nodes, we choose one node as the starting point every 5 nodes. The relocalization results are shown in Fig. 14. The abscissa represents the start position of the UGV, and the ordinate represents the number of steps for the UGV to relocate itself successfully after starting. Instead of

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx Table 2 Training set and test set information.

Training Testing

Number of scenes

Vegetation

Building

Ground

Trunk

Vehicle

Others

70 30

26.31% 22.58%

21.70% 25.55%

34.65% 35.43%

1.02% 1.30%

5.13% 4.42%

11.19% 10.71%

Table 3 Scene understanding result for testing sample.

Vegetation Building Ground Trunk Vehicle Others Accuracy Rate

Vegetation

Building

Ground

Trunk

Vehicle

Others

Recall rate

526,379 10,963 4022 9558 5118 80,393 0.827

29,375 638,667 5297 5728 14,917 28,822 0.884

10,092 17,371 939,990 4102 15,431 18,714 0.934

2528 8386 1934 14,985 2585 5967 0.412

2067 3372 214 139 75,668 11,163 0.817

38,107 9903 3376 595 5443 143,605 0.714

0.865 0.921 0.982 0.427 0.635 0.497

Fig. 14. The results of relocalization. The abscissa is the starting position. The number of the blue points represents the number of steps for the successful relocalization. In the direction of the ordinate axis, the distance from the blue point to the yellow line indicates the number of steps successful relocalization and the distance from the blue point to the red line indicates the number of steps in which the positioning failed.

Fig. 12. (a) The 3D laser point clouds used for semantic mapping. (b) Large-scale outdoor semantic map for the same environment. The white lines represent the constraint relationship between road nodes and environment nodes.

Fig. 13. The semantic map of the experimental area.

Fig. 15. The positions of the relocalization results in the semantic map. The blue lines and points indicates the trajectory when building the semantic map. The green points are the environment nodes and the red points are the positons of relocalization.

locating the UGV precisely, our focus is to use the sparse semantic map to determine the general position of UGV in a large-scale environment. Fig. 15 shows the positions of the relocalization results in the semantic map, in which the position points are distributed near the trajectory when building the prior semantic map. But because there are few environmental nodes on the left side of the map, the locating is inaccurate, and the robot needs to move forward for a distance to find its own position. The processing of building the semantic map is divided into five parts: BA image generation, super pixel segment, feature extraction and classification, reclassification and topological map building. The duration of running the algorithm is shown in Fig. 16. As can be seen from the figure, the feature extraction and classifica-

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

Fig. 16. Time Cost of Scene Understanding and Semantic Mapping.

tion take the longest time. For the other process, they spend below 100 ms. 6. Conclusion and future work This paper has focused on how to relocalize UGVs in outdoor environments based on a sparse semantic map. The laser points are divided into the pixel blocks according to the OVDL image and classified by Gentle-AdaBoost classifier. The semantic map of the outdoor environment is built by generating topological relations between the environment nodes, which are obtained according to the classification results. Instead of the laser data or the features, only information of the environment nodes is saved for relocalization, that avoids the problem of storage in large-scale environments. As for locating the UGV, a new positioning method is proposed. After getting the environmental object categories on the both sides of the current position, we draw a circle and find the intersection point to get the possible positions of the UGV. Applying this position as the prior knowledge to the next position estimation process. The related relocalization experiments are carried out in the outdoor environment and the results verify the effectiveness and robustness of the algorithm. When the UGVs return to a visited position, if the environment object is not completely observed by LiDAR, current environment node is not completely consistent with the environment node in the sematic map. It is a problem to fuse the new node into the sematic map. So we focus on improve the map updating method in the future. And how to improve the positioning accurate in the area with few environmental nodes is also one of the future works. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. CRediT authorship contribution statement Fei Yan: Writing - original draft, Methodology, Conceptualization, Methodology, Writing - original draft. Jiawei Wang: Software, Writing - original draft. Guojian He: Data curation, Visualization, Investigation. Huan Chang: Investigation, Validation. Yan Zhuang: Supervision, Resources, Project administration. References [1] Q.L. Li, Y. Son, Z.G. Hou, Neural network based fastslam for autonomous robots in unknown environments, Neurocomputing 165 (2015) 99–110.

9

[2] J.G. Liu, D.Q. Liu, J. Cheng, Y.Y. Tang, Conditional simultaneous localization and mapping: a robust visual SLAM system, Neurocomputing 145 (2014) 269–284. [3] T. Sattler, B. Leibe, L. Kobbelt, Efficient & effective prioritized matching for large-scale image-based localization, IEEE Trans. Pattern Anal. Mach. Intell. 39 (9) (Sep. 2017) 1744–1756. [4] Y. Feng, Y. Wu, L. Fan, Real-time slam relocalization with online learning of binary feature indexing, Mach. Vis. Appl. 28 (8) (Aug. 2017) 953–963. [5] A. Angeli, D. Filliat, S. Doncieux, J.A. Meyer, Fast and incremental method for loop-closure detection using bags of visual words, IEEE Trans. Robot. 24 (5) (Oct. 2008) 1027–1037. [6] N. Zeng, H. Zhang, Y. Li, J. Liang, A.M. Dobaie, Denoising and deblurring gold immunochromatographic strip images via gradient projection algorithmsL, Neurocomputing 247 (2017) 165–172. [7] N. Zeng, Z. Wang, H. Zhang, K. Kim, Y. Li, X. Liu, An improved particle filter with a novel hybrid proposal distribution for quantitative analysis of gold immunochromatographic strips, IEEE Trans. Nanotechnol. 18 (1) (2019) 819– 829. [8] A. Kendall, R. Cipolla, Modelling uncertainty in deep learning for camera relocalization, in: 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 2016, pp. 4762–4769. [9] T. Röhling, J. Mack, D. Schulz, A fast histogram-based similarity measure for detecting loop closures in 3-D lidar data, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 736–741. [10] B. Steder, G. Grisetti, W. Burgard, Robust place recognition for 3D range data based on point features, in: 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, 2010, pp. 1400–1405. [11] N. Zeng, H. Zhang, Y. Chen, B. Chen, Path planning for intelligent robot based on switching local evolutionary PSO algorithm, Assem. Autom. 36 (2) (2016) 120–126. [12] V. Pham, J. Juang, Robust and efficient slam via compressed H∞ filtering, Asian J. Control 16 (3) (2014) 878–889. [13] S. Liu, Z. Wang, G. Wei and M. Li, Distributed set-membership filtering for multi-rate systems under the round-robin scheduling over sensor networks, IEEE Trans. Cybern., doi:10.1109/TCYB.2018.2885653. [14] Y. Chen, Z. Wang, Y. Yuan, P. Date, Distributed H∞ filtering for switched stochastic delayed systems over sensor networks with fading measurements, IEEE Trans. Cybern. 50 (1) (2020) 2–14. [15] S. Liu, Z. Wang, Y. Chen, G. Wei, Protocol-based unscented Kalman filtering in the presence of stochastic uncertainties, IEEE Trans. Autom. Control 65 (3) (2020) 1303–1309. [16] K. Granstrom, J. Callmer, F. Ramos, J. Nieto, Learning to detect loop closure from range data, in: 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 2009, pp. 15–22. [17] Y.M. Li, S. Li, Y.J. Ge, A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments, Neurocomputing 104 (2013) 170–179. [18] C. Galindo, J.A. Fernández-Madrigal, J. González, A. Saffiotti, Robot task planning using semantic maps, Rob Auton. Syst. 56 (11) (Aug. 2008) 955–966. [19] J. McCormac, A. Handa, A. Davison, S. Leutenegger, SemanticFusion: dense 3D semantic mapping with convolutional neural networks, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 4628–4635. [20] C. Zhao, L. Sun, R. Stolkin, A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition, in: 2017 18th International Conference on Advanced Robotics (ICAR), Hong Kong, China, 2017, pp. 75–82. [21] L. Ma, J. Stückler, C. Kerl, D. Cremers, Multi-view deep learning for consistent semantic mapping with rgb-d cameras, in: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 598–605. [22] P. Wang, R. Yang, B. Cao, W. Xu, Y. Lin, DeLS-3D: deep localization and segmentation with a 3D semantic map, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018, pp. 5860–5869. [23] Z. Xiao, K. Jiang, S. Xie, T. Wen, C. Yu, D. Yang, Monocular vehicle self-localization method based on compact semantic map, in: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, United States, 2018, pp. 3083–3090. [24] S. Yang, Y. Huang, S. Scherer, Semantic 3D occupancy mapping through efficient high order CRFs, in: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 590– 597. [25] L. Sun, Z. Yan, A. Zaganidis, C. Zhao, T. Duckett, Recurrent-OctoMap: learning state-based map refinement for long-term semantic mapping with 3-d-Lidar data, IEEE Robot. Autom. Lett. 3 (3) (2018) 3749–3756. [26] Y. Wang, T. Shi, P. Yun, L. Tai, M. Liu, PointSeg: real-Time semantic segmentation based on 3D lidar point cloud, arXiv preprint arXiv:1807.06288v8, 25 Sep. 2018. [27] Y. Liu, F. Wang, A.M. Dobaie, G. He, Y. Zhuang, Comparison of 2D image models in segmentation performance for 3D laser point clouds, Neurocomputing 251 (2017) 136–144. [28] J. Jeong, T.S. Yoon, J.B. Park, Multimodal sensor-based semantic 3D mapping for a large-scale environment, Expert Syst. Appl. 105 (2018) 1–10. [29] W.C. Ma, T. Ignacio, Bârsan, et al., Exploiting sparse semantic hd maps for selfdriving vehicle localization, arXiv preprint arXiv:1908.03274v1, 8 Aug. 2019.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;March 27, 2020;21:17]

F. Yan, J. Wang and G. He et al. / Neurocomputing xxx (xxxx) xxx

[30] Z.J. Chong, B. Qin, T. Bandyopadhyay, M.H. Ang, E. Frazzoli, D. Rus, Synthetic 2D lidar for precise vehicle localization in 3D urban environment, in: 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 2013, pp. 1554–1559. [31] J. Collier, S. Se, V. Kotamraju, P. Jasiobedzki, Real-Time Lidar-Based Place Recognition Using Distinctive Shape Descriptors, in: Proceedings of the international society for optics and photonics, 8387(1), 2012, pp. 1–11. [32] K. Sakaeta, K. Nonaka, K. Sekiguchi, Experimental verification of a vehicle localization based on moving horizon estimation integrating lrs and odometry, J. Phys.: Conf. Ser. 744 (2016) 1–12. [33] L. Zou, Z. Wang, Q.L. Han and D. Zhou, Moving horizon estimation for networked time-delay systems under round-robin protocol, vol. 64, no. 12, pp. 5191–5198, 2019. [34] R.W. Wolcott, R.M. Eustice, Fast lidar localization using multiresolution Gaussian mixture maps, in: 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 2015, pp. 2814–2821. [35] F. Cao, Y. Zhuang, H. Zhang, W. Wang, Robust place recognition and loop closing in laser-based slam for UGVs in urban environments, IEEE Sens. J. 18 (10) (2018) 4242–4252. [36] G. Kim, A. Kim, Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, Oct. 2018, pp. 4802–4809. [37] P. Yin, Y. He, L. Xu, Y. Peng, J. Han, W. Xu, Synchronous adversarial feature learning for lidar based loop closure detection, in: 2018 Annual American Control Conference (ACC), Milwaukee, WI, US, June 2018, pp. 234–239. [38] P. Yin, L. Xu, Z. Liu, L. Li, H. Salman, Y. He, W. Xu, H. Wang, H. Choset, Stabilize an unsupervised feature learning for lidar-based place recognition, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, Oct. 1-5, 2018, pp. 1162–1167. [39] H. Yin, Y. Wang, L. Tang, X. Ding, and R. Xiong, LocNet: global localization in 3D point clouds for mobile robots, arXiv preprint arxiv:1712.02165, 2017. [40] M.A. Uy, G.H. Lee, PointNetVLAD: deep point cloud based retrieval for large-scale place recognition, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 2018, pp. 18–23. [41] G. Kim, B. Park, A. Kim, 1-Day Learning, 1-year localization: long-term lidar localization using scan context image, IEEE Robot. Autom. Lett. 4 (2) (2019) 1948–1955. [42] L. Schaupp1, M. Bürki, R. Dubé, R. Siegwart and C. Cadena, OREOS: oriented recognition of 3D point clouds in outdoor scenarios, arXiv preprint arXiv:1903. 07918v1, Mar. 2019. [43] M.H. Bharati, J.J. Liu, J.F MacGregor, Image texture analysis: methods and comparisons, Chemom. Intell. Lab. Syst. 72 (2004) 57–71. [44] Y. Zhuang, G. He, H. Hu, Z. Wu, A novel outdoor scene-understanding framework for unmanned ground vehicles with 3D laser scanners, Trans. Inst. Meas. Control 37 (4) (2015) 435–445. Fei Yan (M’15) received the bachelor’s and Ph.D. degrees in Control Theory and Engineering from the Dalian University of Technology, Dalian, China, in 2004 and 2011, respectively. In 2013, he joined the Dalian University of Technology, as a Post-Doctoral and became a Lecturer in 2015, where he is currently an Associate Professor with the School of Control Science and Engineering. His-current research interests include 3-D mapping, path planning, and semantic scene understanding.

Jiawei Wang is a graduate student in the School of Control Science and Engineering at Dalian University of Technology, P. R. China. He received a bachelor’s degree in Automation from the School of Marine Electrical Engineering of Dalian Maritime University in 2019. His-research interests are in robot navigation, and semantic scene understanding.

Guojian He received the bachelor and master degree in Control Theory and Engineering from Dalian University of Technology, Dalian, China, in 2011 and 2014, respectively. Currently he is a Ph.D. candidate in the School of Control Science and Engineering at Dalian University of Technology, P. R. China. His-research interests are in robotics, 3D LiDAR SLAM, mapping and semantic scene understanding.

Huan Chang received bachelor degree in Measurement and Control Technology and Instrumentation Program from Huazhong University of Science and Technology, China, in 2012, and master degree in Control Theory and Engineering from Dalian University of Technology, China, in 2016. His-research interests are in semantic scene understanding, 3-D mapping, and path planning.

Yan Zhuang (M’11) obtained the B.Sc. and M.Sc. degrees in Control Theory and Engineering from Northeastern University, P. R. China, in 1997 and 20 0 0 respectively. He received the Ph.D. degree in Control Theory and Engineering from the Dalian University of Technology, P. R. China in 2004. He joined Dalian University of Technology in 2005 as a Lecturer and became an Associate Professor in 2007. Currently he is a Professor in School of Control Science and Engineering, Dalian University of Technology. His-research interests include mobile robot 3D mapping, outdoor scene understanding, 3D-laser-based object recognition, 3D scene recognition and reconstruction.

Please cite this article as: F. Yan, J. Wang and G. He et al., Sparse semantic map building and relocalization for UGV using 3D point clouds in outdoor environments, Neurocomputing, https://doi.org/10.1016/j.neucom.2020.02.103