Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV

Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV

ARTICLE IN PRESS JID: CAEE [m3Gsc;September 7, 2017;6:7] Computers and Electrical Engineering 0 0 0 (2017) 1–16 Contents lists available at Scienc...

3MB Sizes 0 Downloads 72 Views

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;September 7, 2017;6:7]

Computers and Electrical Engineering 0 0 0 (2017) 1–16

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAVR Yamin Li a, Yong Wang b, Dianhong Wang a,b,∗ a b

Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan, 430074, China School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan, 430074, China

a r t i c l e

i n f o

Article history: Received 7 May 2016 Revised 12 August 2017 Accepted 15 August 2017 Available online xxx Keywords: Mini MAV Visual sensor network 3-D reconstruction Indoor localization Kalman-consensus filter Data fusion

a b s t r a c t Micro aerial vehicle (MAV) with lower cost, smaller size and more flexible flight performance plays a greater role in the global position system-denied (GPS-denied) indoor tasks. This paper focuses on the realization of three-dimensional (3-D) panoramic environment reconstruction and target localization without relying on any external navigation aid. By establishing a visual sensor network (VSN) composed of multiple RGB-Depth (RGB-D) sensors, a fast 3-D model of the observed environment is built utilizing an improved speeded up robust feature (SURF) extraction algorithm and iterative closest point-based (ICP-based) reconstruction algorithm. A distributed data fusion algorithm known as kalman-consensus filter (KCF) is then used to estimate a more accurate global position and trajectory of mini MAV. Both software simulations and real indoor flight experiment results demonstrate that the proposed approach achieves a fast and reliable 3-D map of the indoor environment and the accuracy of localization is largely improved. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction MAV is a form of robot that allows one to sample the environment and to act on it where no other sensor can reach, e.g., to monitor the environment at altitude [1], to search and rescue in the earthquake site [2] or fire disaster. Benefit from the development of micro-electro mechanical systems (MEMS), automatic control technology, new materials and battery technologies, mini MAV with lower cost, smaller size, lighter weight, more flexible flight performance and better concealing will play a greater role in surveying, aerial photography, medical escort, urban management [3] and other commercial and service areas. In all these application cases, the mini MAV needs to be aware of the map of the environment and its own position in the 3-D space. To this end, positioning technologies and devices are needed in the system to efficiently accomplish these tasks. For outdoor applications, the GPS is widely used to sense object location, such as the vehicle tracking system and electronic maps in smart phones. However, when moving the applications into indoor environment, GPS may not be suitable since it requires a direct line-of-sight communication to the satellites which is not possible in indoor environment. Furthermore, the moving space of indoor environment is limited and has changeable furniture and moving people, which is usually more complex and challenging for mini MAV. New challenges and higher requirements of positioning accuracy, dependability and efficiency are faced. R ∗

Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. A. H. Mazinan. Corresponding author. E-mail address: [email protected] (D. Wang).

http://dx.doi.org/10.1016/j.compeleceng.2017.08.011 0045-7906/© 2017 Elsevier Ltd. All rights reserved.

Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE 2

ARTICLE IN PRESS

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

This paper focuses on the GPS-denied indoor applications of mini MAVs and deals with the problems of the realization of 3-D panoramic environment reconstruction and target localization. A VSN composed of multiple overlapping active RGB-D sensors is established to cover the whole indoor surveillance area. The sensors employed in the 3-D VSN are rigidly mounted in the environment to capture the scene from different perspectives instead of fixing on the MAV or being held by user’s hands and moving around to cover the whole area. This configuration reduces the load of the mini MAV and enables flexible network size. Compared with the existing 3-D reconstruction and MAV localization systems and approaches, the main contributions of this paper are as follows: (1) We present a 3-D panoramic reconstruction method of the observed indoor environment using only RGB-D data provided by multiple RGB-D sensors. A hole filling algorithm based on local color matching and the ICP-based registration method are proposed to ensure a fast and consecutive 3-D panoramic reconstruction. (2) We propose a KCF-based data fusion approach to estimate the global position and the trajectory of mini MAV. By referring to the estimations of all sensors that viewing the same target, our method attains a 3-D consistent global localization of the mini MAV, which is more synthetic, accurate and reliable than the method using only one single sensor. (3) We establish a 3-D VSN composed of only RGB-D sensors to enable indoor applications of mini MAVs in GPS-denied environments without relying on any external navigation aids. Moreover, we validate the efficiency of the proposed 3-D reconstruction and KCF-based data fusion localization methods on the real-time platform. The comparison experiments and real flight experiments demonstrate that our approach achieves a fast, reliable and comprehensive 3-D map of the indoor environment and the accuracy of localization is greatly improved. The rest of this paper is organized as follows. Section 2 gives a review of the related methods, enabling technologies and system implementations of indoor localization. Section 3 states both the hardware architecture and the software process of the proposed 3-D reconstruction and localization approach. Section 4 and Section 5 introduce the details of the proposed SURF and ICP-base environment modeling and the KCF-based data fusion localization methods respectively. Results and discussions of both simulations and real flight experiments are presented in Section 6. Finally, we conclude our work in Section 7. 2. Related work The researches have offered broad approaches in the indoor localization field. One alternative solution is to use localization technologies based on wireless network. In recent years, researchers have implemented several successful indoor localization systems using different wireless technologies such as IEEE 802.11 wireless local area network (WLAN), Zigbee and radio frequency identification devices (RFID). RADAR System is a building-wide WLAN-based tracking system developed by Microsoft Research Group [4]. Since this approach utilized the existing wireless networking infrastructure of the building, only a few base stations were needed. Sugano et al. implemented an indoor localization system based on the ZigBee standard in a wireless sensor network [5]. The system automatically estimated the distance between sensor nodes by measuring the received signal strength indicator (RSSI). Álvarez et al. also provided a RSSI-based indoor person/assets location method and implemented it in a wireless sensor network using Zigbee standard [6]. They improved the positioning accuracy to 0.75 m by a correction based on the difference between the free space field decay law and the measured RSSI. In LANDMARC system, RFID technology was used for locating objects inside buildings [7]. The system calculated the position of the tagging object according to the information of nearest known reference tags. UBIRO is a mobile robot navigation system using passive RFID deployed on the ground [8]. According to the Cartesian coordinates of the tags in a regular grid-like pattern, it was able to estimate the robot’s location and orientation by using trigonometric functions. The experimental results showed that the localization error was 13.3 cm at X-axis and 5.7 cm at Y-axis in average. Instead of using RSSI measurements like the previse two RFID-based systems, Yang et al. provided a millimeter-level accuracy localization and tracking system named Tagoram by leveraging the RF phase value of the backscattered signal to estimate the location of the object [9]. All the localization systems introduced above were implemented in two-dimensional (2-D) situation and it is difficult to apply the same methods and system implements to 3-D space because they could only use one-dimensional (1-D) wireless signals, whether RSSI or RF phase. Moreover, they all suffered from unstable signals intensity, inevitable signal interference and the localization accuracies were not satisfied even though the targets were not moving in the space. Therefore, other researchers have emphasized on newer technologies that could provide richer environmental information. Chowdhary et al. used a scanning 2-D laser rangefinder and a streamlined simultaneous localization and mapping (SLAM) algorithm to provide a position and heading estimate [10]. This implementation only works well in environments with vertical structures and was not suitable for complex 3-D indoor environment. Weiss et al. developed an autonomous navigation system of a micro helicopter by using a single monocular camera and inertial sensors onboard [11]. Carrillo et al. utilized stereo camera and inertial motion estimation system to realize autonomously take-off, positioning, navigation and landing in unknown environment for unmanned aerial vehicle (UAV) [12]. A kalman filter fusion method of using stereo visual odometry and inertial measurements was used to provide accurate estimates of the UAV position and velocity. These methods relied on computation-intensive algorithms to execute the localization, because the vision data contained rich environmental information. In order to speed up the visual localization and increase accuracy, other methods may have to be Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE

ARTICLE IN PRESS Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 3

Fig. 1. Microsoft Kinect sensor and the Crazyflie Nano Quadcopter used in the system.

combined with the vision information like laser scanners or robots odometry [13]. Achtelik et al. provided a solution for enabling a quadrotor to autonomously navigate in unstructured and unknown indoor environment using a laser rangefinder and a stereo camera to combine the advantages of the two sensor suites [14]. Recently, novel RGB-D sensors like the Microsoft Kinect or the Asus Xtion sensor have brought the possibility of economical vision-based 3-D localization in unstructured environments. Different from monocular or stereo systems, such camera systems composed of RGB-D sensors directly capture color image data with depth data per frame and can estimate the depth even in environments with poor visual textures [15]. Henry et al. presented a RGB-D Mapping framework that could generate dense 3-D maps of indoor environments despite the limited depth precision and field of view (FOV) provided by RGB-D cameras [16]. Huang et al. established a system enables 3-D flight in cluttered environments using RGB-D camera for indoor visual odometry and mapping [17]. The system used only onboard sensor data by leveraging results from the RGB-D sensors and SLAM framework. It is noteworthy that all the vision-based localization and reconstruction methods above only use data from one single RGB-D sensor that installed on a mobile robot or held by hands to explore the environment. Although the adaptability of the robot to the unknown environment was improved, the accuracy and speed of the SLAM-based localization and mapping algorithm were affected due to the limited onboard data processing ability of the robot and the high real-time positioning requirements. To tackle these problems, one feasible solution is to establish a 3-D VSN that deployed in a particular area with the abilities of visual perception, data processing and communication. A research group at the Massachusetts Institute of Technology (MIT) employed a 3-D VSN using Vicon 3-D motion capture and analysis systems that composed of high-speed infrared cameras. They have demonstrated impressive applications including path planning and fault detection of MAVs [18]. The number and locations of the camera sensor nodes need to be designed appropriately to cover all the monitoring area and to minimize the cost. Despite the high accuracy and robustness of the Vicon motion capture system, it is very expensive and not suitable for commercial indoor applications. In this paper, we also establish a 3-D VSN composed of multiple RGB-D sensors for indoor applications of mini MAVs to combine the advantages of RGB-D sensors and the VSN. A 3-D VSN with the ability of both color and depth image capture can provide richer environment information and effectively improve the network coverage. 3. System architecture The hardware of the proposed system consists of RGB-D sensors, a mini MAV and a host PC. As shown in Fig. 1, the RGB-D sensors used in the VSN are Microsoft Kinect sensors which feature a built-in color camera and a 3-D infrared depth sensor. They are used as information input devices of the indoor environment and the position of mini MAV. The current prototype of mini MAV in the application is Crazyflie Nano Quadcopter of Bitcraze [19]. It is a cheap commercially-available miniature quadcopter which only weights about 19 g and is 9 cm motor-to-motor in length. It can be used as a versatile runtime open-source development platform. The tiny size of the Crazyflie makes it an ideal choice for indoor applications since it is capable of flexible flight without affecting people’s daily life. Different from the SLAM algorithm, in our proposed system, the 3-D map model of the indoor environment is built and stored in the computer before localization. The navigation of the mini MAV can start from any place in the monitoring area. As depicted in Fig. 2, the framework of the proposed 3-D reconstruction and localization system is divided into two parts, 3-D environment reconstruction and data fusion localization. After coordinate calibration and preprocessing of the captured images, the following five key methods are developed to ensure efficient environment mapping and reliable localization estimation: feature extraction and matching, feature merging and correspondence rejection, ICP-based transformation estiPlease cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE 4

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

...

... Color Image

Color Image

Depth Image

Depth Image

SURF Feature Extraction and Matching

Preprocessing

SURF Feature Extraction and Matching

Preprocessing

Target Detection

Coordinate Calibration

Target Detection

Coordinate Calibration

...

3D point cloud (Viewpoint 1)

3D point cloud (Viewpoint 2)

...

Feature Merging and Correspondence Rejection

3D Environment Reconstruction

ICP-based Transformation Estimation

3D Environment Model (Global View)

...

Single-view Localization

Single-view Localization

...

KCF-based Data Fusion Estimation

Data fusion Localization Global Consistency Localization Fig. 2. The framework of the proposed 3-D reconstruction and localization system.

mation, target detection and KCF-based data fusion estimation. Details of each method will be introduced in Section 4 and Section 5. 4. 3-D environment reconstruction 4.1. Calibration and coordinates establishment After the deployment of the 3-D VSN, the color cameras and the infrared projectors of all the Kinect sensors need to be calibrated to correct the distortion effect and to obtain the geometric models of the cameras and a rough correspondence relationship between two Kinect sensors. We use the calibration method of Herrera [20] and the Matlab Kinect Calibration Toolbox developed by them. In the process of the whole proposed approach, the following four coordinate systems are needed, image coordinates {I}, imaging plane coordinates {P}, camera coordinates {C} and world coordinates {W}. Fig. 3 shows the relationship between the four different coordinate systems in one scene. Although the origin of the depth sensor cameras does not coincide with the one of the color camera, they are aligned to camera coordinates {C} after calibration. In a 3-D VSN composed of N (N = 1, 2, 3, …, n) RGB-D sensors, the set of all sensors in the VSN is defined as C = {Cn |n = 1, 2, 3, …, N}. In our proposed system, the origin of the world coordinates {W} is set to the origin of camera coordinates {C} of RGB-D sensors C1 . The goal of 3-D reconstruction is to estimate the coordinate transformations between different camera coordinates {C} of different RGB-D sensor Cn and align them to the world coordinates {W}. Assume that (u0 , v0 ) are the coordinates of a pixel p0 on the image in image coordinates {I} and (x0 , y0 ) are the coordinates of it in imaging plane coordinates {P}. The transformation of the two coordinates follows

  u

v = 1



1/x0 0 0

0 1/y0 0

u0

v0 1

 

x y . 1

(1)

Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 5

Fig. 3. The four different coordinate systems in one scene.

Fig. 4. Random noise and black spots in a depth image and 3-D point cloud reconstruction scene. (a) A raw depth image. (b) A 3-D point cloud reconstruction scene.

XC -axis of camera coordinates {C} is parallel with the x-axis of imaging plane coordinates {P} and YC -axis is parallel with y-axis. The transformation of a point from camera coordinates {C} to imaging plane coordinates {P} follows

  x y 1

 =

1 Zc

⎡ ⎤  Xc ⎢Yc ⎥ ⎣Z ⎦,

− f cot β f /sinβ 0

f 0 0

00 00 10

(2)

c

1

where f is the distance between the two origins O and O1 which defined as the focal length of the camera and β is defined as the angle between the two coordinates. Therefore, the transformation from camera coordinates {C} to image coordinates {I} follows

  u

v = 1

 1 Zc

f /x0 0 0

(− f cot β )/x0 f /(y0 sinβ ) 0

u0 0 v0 0 1 0

⎡ ⎤ ⎡ ⎤  Xc Xc ⎢Yc ⎥ 1 ⎢Yc ⎥ ⎣Z ⎦ = Z ζ ⎣Z ⎦, c c c 1

(3)

1

where ζ is defined as the internal parameter matrix of the camera which includes six parameters (x0 , y0 , u0 , v0 , f, β ). Points from one coordinate system can be transformed to another using a rigid transformation denoted by T = {R, t}, where R is a 3 × 3 rotation matrix and t a 3 × 1 translation matrix. The projection of a point P in space from world coordinates (Xw , Yw , Zw ) to camera coordinates (Xc , Yc , Zc ) is obtained through the following equation

⎡ ⎤

Xc

R ⎢Yc ⎥ ⎣ Z ⎦ = 0T c 1

⎡ ⎤ Xw t ⎢Yw ⎥ . 1 ⎣Z ⎦ w

(4)

1

4.2. Depth image preprocessing The depth sensor of Kinect consists of an infrared projector which can project specific structured pattern of speckles and a CMOS infrared camera. The depth images captured by the Kinect sensor contain much noise. As shown in Fig. 4(a), there Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE 6

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

Fig. 5. Results after the hole filling algorithm. (a) Result of bilateral filter. (b) Result of Gaussian filter.

are random noise and large numbers of black spots at the edge of the object and occluded areas in the depth image. This is because the depth values of these areas are not successfully obtained. These black spots will lead to surface discontinuity of the object in 3-D point cloud reconstruction scene as shown in Fig. 4(b). The reasons of the random noise and black spots in depth images are twofold. The first is due to the hardware restrictions of the devices. The best detection distance of the depth sensor is from 400 mm to 4500 mm. As the distance increases, the depth accuracy decreases. According to our repeated tests, the depth sensing accuracy is 3 mm at a distance of 10 0 0 mm and 26 mm at a distance of 30 0 0 mm and it decreases rapidly when the distance is longer than 40 0 0 mm. Secondly, the depth image quality is related to the nature of the objects being observed. If the observed objects are made of glass or mirror which cannot reflect the infrared light, the depth sensor cannot successfully obtain the depth values which will lead to a black area. Moreover, infrared light also exists in the daylight and lamplight. The structured pattern of speckles projected by Kinect will be interfered especially in strong light conditions and result in black spots and abrupt transitions in the depth image. In a depth image, depth values have cluster character, that is the points of the similar color often have the same depth value [21]. In view of this, a hole filling algorithm based on the local color matching is adopted to fill these black spots. According to the color of the pixel to be filled in the color image, first find the best color matching point in the neighborhood of it, and then fill the depth value of the pixel with the one of the point being found. To shorten the searching time of the proposed algorithm, the neighborhood size is 16 × 16 pixels in the 640 × 480 pixels color image and the color matching threshold is set to 8, which means that when the color of pixels in the neighborhood is less than 8, the colors of the two pixels are considered to be matched. As can be seen from the two images in Fig. 5, most of the black spots are filled using the proposed hole filling algorithm comparing with Fig. 4(a). After filling, a bilateral filter is then used to filter the noise. Compared with the result of Gaussian filter in Fig. 5(b), the bilateral filter shown in Fig. 5(a) can reduce random noise and preserve the clear edges of the objects in the depth image. 4.3. ICP-based registration The Kinect sensors are deployed in different locations to capture the environment from different perspectives. According to Eq. 3, the color and depth images captured by one Kinect sensor can be represented together as a colored point cloud in the camera coordinates {C} of the current Kinect. The purpose of 3-D reconstruction is to unify the information of different perspectives to a world coordinates {W} and eventually to form a 3-D point cloud of the panoramic indoor scene. Therefore, the key of 3-D reconstruction is to find the transformations between different perspectives, which is to solve the rotation and translation matrixes between the 3-D point clouds used in Eq. 4. The procedure of the proposed method of 3-D reconstruction is based on 3 steps: feature extraction and matching, feature merging and correspondences rejection and transformation estimation. 4.3.1. Feature extraction and matching The features are a set of points and their descriptions extracted from an image, which are re-detectable even in different scale, illumination and noisy conditions. Such set of points, usually lying on edges and corners of objects, should be stable and distinctive. The descriptions of the features are known as local feature descriptors represented using multi-dimensional vectors. They are generated based on the information around the point referred as the k-neighborhood. The features and descriptors can be used together to form compact and descriptive representations of the original data. Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 7

Fig. 6. SURF feature matching on different indoor scenes.

Scale-invariant feature transform (SIFT) [22] and its variant SURF [23] are two widely used feature extraction algorithms to detect and describe local features in images. Considering the efficiency of 3-D reconstruction, SURF algorithm is adopted in this paper. The key stages of SURF algorithm are feature points detection, generating SURF local feature descriptor and SURF feature matching. Feature matching means correspondences estimation between two color images from two Kinect sensors. Compared with matching features in 3-D point cloud data, extracting and matching features in color images have better accuracy and robustness. Since the color images and the point clouds of one Kinect sensor have registered, we can simply map the extracted SURF features from the color image into the point cloud. Specifically, the features are matched based on the similarity of the feature points which is defined as the distance between the SURF feature descriptors given by

sim2 =

i ≤64

(descriptorreal − descriptorbase )2 ,

(5)

i=1

where sim is the similarity, the SURF feature descriptors are 64-dimensional vectors. When mapping SURF features into the point cloud, some of the feature points may correspond to invalid point cloud data. The invalid correspondence can be rejected directly since only several corresponding features are sufficient for solving the rigid transformations. Moreover, in the proposed system, at most 10 0 0 SURF local features are saved for one image to reduce the running time. The results of using SURF algorithm on different indoor scenes are shown in Fig. 6. Only 100 matching points are drawn in each image for better display. 4.3.2. Feature merging and correspondence rejection The basic requirement of successful feature matching is that there must be a certain overlapping area between the two images. However, due to the overlapping there are lots of repeated features in a set of images. Feature merging can be done to reduce data redundancy and improve the efficiency. In point cloud data, the 3-D position coordinates (XC , YC , ZC ) of repeated features usually are close to each other in 3-D space as illustrated in Fig. 7. Therefore, the method of adjacent points clustering based on space relations and kd-tree nearest neighbor search [24] is used in the algorithm. The distance between feature points is defined as



s − s =



k 

si − si

2

,

(6)

i=1

where si is the ith dimension of vector s and the value of k is 3 in the 3-D point cloud space. Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE 8

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

Fig. 7. Feature merging based on spatial relations.

Fig. 8. Result of ICP-based transformation estimation of a simple indoor scene. (a) and (b) are two point clouds. (c) shows the transformation estimation result of the two point cloud.

As mentioned above, only several corresponding features are sufficient for solving the rigid transformation matrixes. Usually not all the correspondences are correct because of the noise or erroneous measurements. The wrong correspondences should be rejected to insure the estimation accuracy of the transformation matrix and improve the registration speed. Therefore, random sample consensus (RANSAC) estimation algorithm [25] is used to remove wrong correspondences in our system. RANSAC is an iterative method to estimate parameters of a mathematical model from a set of feature points by giving a small set of hypothetical “inliers” whose positons can be explained by some set of model parameters. The inliers represent the correct correspondences that should be reserved. The points which do not fit the model parameters are “outliers” that represent wrong correspondences. RANSAC iteratively selects a random subset of the original data as “inliers” and test them by the set of model parameters. Since there is no upper bound on the iteration time of the evaluation procedure, the number of iterations N is manually set to 300. The obtained set of points will be used as the initial set of corresponding points in ICP-based transformation estimation. And as a consequence, only a certain percent of the found correspondences are used in the rigid transformation estimation phrase.

4.3.3. ICP-based transformation estimation ICP is an algorithm to realize precise rigid transformation estimation of two point sets and combine the datasets into a global consistent model [26]. It is commonly used in real-time point cloud data registration. It iteratively revises the rotation and translation transformations to minimize the distance between corresponding points. For two point sets A = n m {ai }Ni=1 (Nn ∈ N ) and B = {b j }Nj=1 (Nm ∈ N ), the minimum distance is defined as

min

 N   n (Rai + t ) − b j 2 ,

j∈{1, 2, ..., Nm }

i=1

2

(7)

where R is the rotation matrix and t is the translation matrix, T = {R, t} is the obtained global transformation matrix. A good set of initial corresponding points can reduce the errors generated during the iterations and insure the global optimization-oriented convergence. The obtained set of points and the transformation matrix by RANSAC algorithm will be used as the initial set of corresponding points in ICP transformation estimation. Once the distance falls below a given threshold, the registration is completed and the output of the algorithm is the refined global transformation matrix T. Fig. 8 illustrates the result of ICP-based transformation estimation of a simple indoor scene. Fig. 8(a) and (b) show two point clouds captured by two Kinect sensors. The distances between the Kinect sensors and the objects are from about 150 cm to 250 cm. Fig. 8(c) presents the transformation estimation result of the two point clouds built by the proposed ICP-based environment reconstruction method. Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE

ARTICLE IN PRESS Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 9

RGB Camera Depth Sensor Camera

Convert 3-channel RGB stream to HSV stream

Covert 16-bit Depth stream to 8-bit numpy array

Filter out red color between (156,150,100) to (180,255,255)

Filter the data between [10,400]

Erode and Dilate the image Calculate the center coordinate (uc, vc)

Align to get the depth coordinates (ud, vd)

Calculate the mean depth value Zd

Calculate the 3D local coordinates (Xc, Yc, Zc) Fig. 9. The procedure of the color and depth image processing for target detection.

5. Data fusion localization The final goal of the proposed system is to accurately locate the mini MAV in the indoor environment according to the 3-D global map model previously established by the VSN with multiple Kinect sensors. 5.1. Target detection Two adjacent Kinect sensors in the VSN must have overlapping FOV. Therefore, when the mini MAV flies, it may be captured by several cameras simultaneously. For each Kinect sensor viewing the target, a color-based target detection method is applied to estimate the local location in camera coordinates {C}. To make it easier, a small red styrofoam ball is attached on the top of the mini MAV as shown in Fig. 1. Fig. 9 depicts the target detection procedure. The proposed method uses the color images as well as the depth images. The RGB camera of Kinect sensor returns 32-bit 3-channel RGB images at 30 FPS in the case of 640 × 480 resolution. Firstly, the captured color streams are converted into HSV format. Because compared to the RGB color space, the correlations of the three components in HSV color space are much smaller, which makes it simpler for color classification. According to the color characteristic of the attached red ball on mini MAV, the HSV parameters of every pixel within the range from (156,150,100) to (180,255,255) are kept. Then the center coordinates of the filtered image which are the 2-D coordinates (uc , vc ) in color image coordinates {Ic } are calculated. Since the depth and RGB cameras of Kinect sensor are not concentric, the captured depth image needs to be aligned to get the 2-D coordinates (ud , vd ) in depth image coordinates {Id }. Finally, the depth values in depth image outside the range of [10,400] are filtered to get rid of the uncertain depth values. The depth value of the target Zd can be calculated, which is the mean depth valve of a square with a radius of 32 pixels centering on point (ud , vd ). The final 3-D local coordinates (Xc , Yc , Zc ) in camera coordinates {C} can be obtained according to Eq. 3. 5.2. Multi-sensor Fusion-based localization Since all the cameras are globally calibrated in 3-D reconstruction phase, each camera that viewing the target can calculate the target position on 3-D global map using the rotation matrix R and the transformation matrix t according to its own estimation. However, their estimations of the mini MAV’s position may not exactly the same on the 3-D global map due to their different perspectives and estimation errors. On the other hand, as the mini MAV flying from one camera’s view into another, it should be continuous located in a seamless way. Therefore, considering the inaccuracies of single-view target localization, a multi-sensor data fusion algorithm based on KCF is applied in the proposed localization method. KCF is a kind of distributed estimation algorithm, which has been widespread concerned because of its advantages of fast convergence and high precision estimation [27]. In our localization system, the KCF module fuses single-view target position information together with positions estimated by other neighboring cameras that also see the target to come to a consensus about the actual localization of the target. The goal of this data fusion method is to combine multiple local position data from different Kinect sensors to obtain a real-world 3-D global localization which is more synthetic, accurate and reliable. The realization of the proposed data fusion algorithm consists of two parts: optimal estimation of the target by single sensor and consensus processing of the exchanged information between the neighbor sensors. Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE 10

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

5.2.1. Mathematical description We consider a VSN composed of n Kinect sensors to track the mini MAV target T in a 3-D space. The set of all Kinect sensors in the VSN is defined as C, the set of the sensors viewing target T is defined as Cv and the rest of the sensors is defined as Cp . Then it is true that Cv ⊂ C and Cp ⊂ C. If sensor Ci and its neighbor sensors Cin capture the target T, it is true that Ci ∈ Cv and Cv ⊂ {Cin ∪ Ci }. The network topology of the system at the instant k can be defined as a graph G(k) = (V(k), E(k)). The network topology is dynamic since the sensors viewing the target T are changing as target T moves through the surveillance area in the 3-D space. For each sensor Ci ∈ Cv , there is a separate KCF-based data fusion algorithm module in the proposed localization method. Therefore the dynamic network topology will not affect the performance of KCF. The state equation of the linear dynamical system is given as:

x(k + 1 ) = A(k )x(k ) + B(k )w(k ); x(0 ) ∈ Rm ,

(8)

where w(k) is zero mean white Gaussian noise (w(k) ∼ N(0, Q)) and x(0) is the initial state of the target T. The state vector x(k) consists of three components:

x(k ) = [x(k ), y(k ), z (k )]

T

(9)

representing the position of target T in the x, y and z coordinates at instant k. The noisy measurement of sensor Ci is given as:

zi (k ) = Hi (k )x(k ) + Vi (k ); zi ∈ R p ,

(10)

where Vi (k) is zero mean white Gaussian noise (Vi (k) ∼ N(0, Ri )). xi (k) is the state of target T observed by sensor Ci only and zi (k) is the noisy measurement (xi (k),yi (k),zi (k)) by sensor Ci . The estimated state of target T is given as:

xˆ i (k ) = E (xk |zi (k ) )

(11)

x¯ i (k ) = E (xk |zi (k − 1 ) ).

(12)

Let ei (k ) = xˆ i (k ) − x(k ), e¯i (k ) = x¯ i (k ) − x(k ) denote the estimation error and priori estimate error. The estimation error covariance is given as:



T

Mi ( k ) = E ei ( k ) · ei ( k )



T



(13)



Pi (k ) = E e¯i (k ) · e¯i (k ) .

(14)

5.2.2. KCF-based data fusion estimation To facilitate the description, the basic assumptions in this work are described as follows. First, at each instant k the target state x¯ i and the estimation error covariance matrix Pi are given from time instant k − 1. Second, at the beginning of the algorithm, x¯ i (0 ) of the KCF is initialized to be the mean value of z(0) by all sensors from Cv and Pi = P0 . Third, each sensor Ci knows its set of neighbors Cin who also see the target. The implementation of KCF-based localization method is described as follows. At each time instant k, if Ci is viewing the target T, the 3-D local position of the target (Xc , Yc , Zc ) in camera coordinates {C} is determined using the method introduced in Section 5.1. And the estimated global position zi on the 3-D map model can be obtained by the 4 × 4 transformation matrix T = {R, t}. Then compute the measurement covariance matrix Ri , the output matrix Hi . The corresponding information vector and matrix ui and Ui can be obtained by

ui = Hi T Ri −1 zi

(15)

Ui = Hi T Ri −1 Hi .

(16)

A message mi = (ui , Ui , x¯ i ) is send to Ci ’s neighbors who also see the target T. Ci also receives similar messages m j = (u j , U j , x¯ j ) from its neighbors. The following two steps are the key of the algorithm. All the measurements are fused and the kalman-consensus state is estimated to come to a consensus following the equations



Mi = Pi −1 + Si

−1

xˆ i = x¯ i + Mi (yi − Si x¯ i ) + γ Mi

(17) 



x¯ j − x¯ i ,

(18)

jCv

  where Si = jCv U j , yi = jCv u j and γ = 1/( Mi + 1). Finally, the state x¯ i and error covariance matrix Pi are updated according to the modeled linear dynamical system. Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 11

Fig. 10. The layout of indoor reconstruction and localization system. Table 1 Configuration of the experiment platform. Components

Type

RGB-D sensor

Microsoft Kinect for Xbox 360 Frame rate: 30 FPS Image resolution: 640 × 480 FOV: 57° × 43° Detect Distance: 0.3 m-5.0 m Crazyflie Nano Quadcopter 10-DOF Diagonal Length: 9 cm Weight: 19 g Equipped with 3-axis accelerometer MPU-6050, 3-axis MEMs gyro, 3-axis magnetometer HMC5883L and altimeter MS5611-01BA03 Intel® Coreۛ i7-3612QM CPU @ 2.10GHz 8.0 G RAM Graphics card: NVIDIA GeForce GT 640M nRF24L01 + chip @ 2.4GHz

Mini MAV

Ground Host PC

Radio

6. Experiments 6.1. Testbed setup A real 3-D indoor VSN composed of five rigidly mounted Kinect sensors is deployed in an office room of China University of Geosciences. The size of the indoor experimental area is 4 m × 3.5 m × 3 m in length, width and height. To completely cover the space, the five Kinect sensors (C1 , C2 , …, C5 ) are installed in the environment as demonstrated in Fig. 10. The sensors are connected to the host PC through USB cable as the solid blue lines show in Fig. 10. The mini MAV Crazyflie tagged with a red ball is under surveillance. We tested our approach of 3-D reconstruction and KCF-based localization in the indoor environment. All the Kinect sensors are connected to a host PC and the proposed 3-D reconstruction and KCF-based data fusion localization algorithm are also implemented on the host PC. The overall configurations of the experiment platform are listed in Table 1. In all the designed simulations and real flight experiments, the Kinect sensors capture color and depth images of the environment at a rate of 30 FPS. In order to make the Crazyflie more stable, when assembling the quadrotor, it is better to attach the battery and the red ball in the center of the quadrotor and make the four wings the same length. 6.2. Experiments and results We have conducted several 3-D reconstruction experiments and real-time flight experiments to evaluate the performance of the proposed 3-D reconstruction algorithm and data fusion localization algorithm. 6.2.1. Environment modeling To evaluate the performance of the 3-D reconstruction algorithm, the fixed five Kinect sensors of the VSN are used to capture data in an indoor office room with tables, doors and other furniture. Fig. 11 presents two different perspectives of the 3-D model of indoor environment built by the proposed ICP-based environment modeling method. As can be seen from the pictures, the built model has good horizontal and vertical consistency. The accuracy and speed of reconstruction is great affected by the parameters of RANSAC and ICP algorithms. By using the improved SURF extraction method and kd-tree-based Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

ARTICLE IN PRESS

JID: CAEE 12

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

Fig. 11. Two different perspectives of 3-D model of indoor environment. Table 2 The experimental results using single-view localization and KCF data fusion localization. Method

Direction

Max error/cm

Min error/cm

Mean error/cm

Standard deviation/cm

Single-view localization

x-axis y-axis z-axis x-axis y-axis z-axis

33.3882 25.5395 27.7521 14.7674 16.6498 18.6004

0.095271 0.12704 0.0058333 0.13179 0.050632 0.0056538

8.6985 9.3874 9.5074 6.2694 6.049 5.7628

5.8763 5.9488 6.214 3.655 4.0549 3.691

KCF data fusion localization

neighbor searching, the reconstruction efficiency is improved. Therefore, the 3-D model of the indoor environment can be updated in a few seconds when the environment facility changes, which makes it more practical. 6.2.2. Target detection Fig. 12 shows the captured color images and the target detection results by the five individual Kinect sensors at 3 time instants: k = 5 s, k = 18 s and k = 47 s. Each column shows the different observations from the five Kinect sensors at the same time instant. As can be seen from Fig. 12, at each time instant, there are more than one Kinect sensor viewing the mini MAV. For example, at time instant k = 18 s, sensor C2 , C3 , C4 , C5 capture the target T. The detected target is marked with a blue circle to indicate the location of the target T from the camera’s FOV and it tracks the moving target. The depth value and the image coordinates estimated by only one single sensor can be seen on the left-top of the image. Note that the target can be seen from sensor C1 at time instants k = 18 s and k = 47 s and from sensor C2 at time instant k = 5 s. But the algorithm fails to detect the target. That is because the target is too far away from the sensor and flies out of the detection range of Kinect as in the case of sensor C1 . Or the sensor fails to focus on the target because the mini MAV flies too fast as in the case of sensor C2 at time instant k = 5 s. 6.2.3. KCF-based data fusion estimation Each Kinect sensor viewing the target T determines the local position and the estimated global position on the 3-D map model in its FOV by the 4 × 4 transformation matrix T = {R, t}. Then it receives messages from its neighbor sensors and fuses the information together with its own estimation in the KCF-based localization module to obtain more accurate global location estimation. Fig. 13 shows the position estimations of x-axis, y-axis, z-axis by single-view localization and KCF-based data fusion method compared with the ground truth. The red line represents the observations of a single Kinect sensor. It is a splicing result of different sensors since one Kinect sensor cannot capture the target all the time. It is obviously that the blue lines in each direction are smoother and closely match the ground truth comparing with the red lines. The experimental results and statistics are listed in Table 2. As can be seen from Table 2, the mean errors and standard deviations of localization using KCF data fusion method are smaller than the ones using single-view estimation in x-axis, y-axis and z-axis. The localization is more accurate especially in z-axis which reduces 39.4% in mean error. In the real flight experiments, the Crazyflie is controlled to conduct a random exploration in the monitoring area. Fig. 14 shows the estimated 3-D trajectory by single-view localization and KCF-based data fusion method, compared with the ground truth. The red line represents the observations of a single Kinect sensor, which is also a splicing result of different Kinect sensors. As can be seen, the 3-D global localization using only one Kinect sensor is not reliable since the calculated Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE

ARTICLE IN PRESS Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 13

Fig. 12. The target detection results by the five individual Kinect sensors at 3 time instants. (a), (b), (c), (d) and (e) respectively show the color image frames captured from FOV of C1 , C2 , C3 , C4 and C5 .

Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE 14

ARTICLE IN PRESS

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

Fig. 13. Position estimations using single-view localization and KCF data fusion localization compared with the ground truth.

Fig. 14. The estimated 3-D trajectory by single-view localization and KCF-based data fusion method, compared with the ground truth.

positions are quite decentralized and have significant data transitions which will has great influence on the positioning accuracy. While the KCF-based data fusion estimation shown in blue line comes to a smooth results about the positions of the target. The estimated trajectory closely matches the ground truth values. The mean error and standard deviation of singleview localization are 14.7426 cm and 6.2746 cm. While the mean error and standard deviation of KCF data fusion localization are 9.3869 cm and 3.8297 cm. The localization accuracy is largely improved compared to the single-view localization. To further evaluate the performance of the proposed KCF-based data fusion localization method, we also compare it with other localization methods that using the Kinect sensor. The overwhelming majority of these methods are based on the onboard Kinect that fix on the body of the MAV. Chowdhary and co-wokers [10,15] provided a relative motion estimation method using an onboard Kinect sensor and a data fusion algorithm of the Kinect-based odometry and inertial measurements. The localization estimation error is with a maximum deviation of approximately 8 cm. Henry et al. [16] describe an autonomous flight system for visual odometry and mapping using an onboard Kinect sensor. The mean localization deviation is 6.2 cm and the max localization deviation is 19 cm. Compared with these above methods, the proposed data fusion Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE

ARTICLE IN PRESS Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

[m3Gsc;September 7, 2017;6:7] 15

localization method implemented in the VSN architecture can also achieve centimeter-level accuracy with a mean error of 9.4 cm. 7. Conclusion By combining the advantages of VSN and mini MAV, this paper presents a 3-D panoramic environment reconstruction and continuous localization framework in the GPS-denied indoor environment. Compared with the existing 3-D reconstruction and localization systems and approaches, a practical 3-D VSN composed of multiple RGB-D sensors is established in the indoor environment. A 3-D panoramic reconstruction method on the basis of both SURF feature extraction and ICP registration algorithms is present, as well as a KCF-based data fusion estimation algorithm which can come to a 3-D consistent global localization of the mini MAV by referring to the estimations of all sensors that viewing the same target. Moreover, the effectiveness of the proposed approach is validated through software simulations and real-time indoor flight experiments in the established VSN. The results demonstrate that the 3-D reconstruction modeling provides a fast reliable and comprehensive 3-D map. Compared with the single-view localization, the real flight tests prove that the accuracy of localization is largely improved by using the proposed KCF data fusion localization method. For the future work, we will focus on integrating the accurate localization algorithm with path planning and autonomous indoor flight for the mini MAV based on the 3-D map modeling. Additional works of detection and tracking for multiple targets can be done. Acknowledgments This work is sponsored by the China Natural Science Foundation (grant No. 2013073067); Hubei Province Natural Science Foundation (grant Nos. 2014CFB380 and 2016CFC766) and China Scholarship Council (grant No. 201406410051). References [1] Egbert J, Beard RW. Low altitude road following constraints using strap-down EO cameras on miniature air vehicles. Am Control Conf 2007;21(5):353–8. [2] Wang W, Song G, Nonami K, Hirata M. Autonomous control for micro-flying robot and small wireless helicopter X.R.B. In: IEEE/RSJ International Conference on Intelligent Robots & Systems; 2006. p. 2906–11. [3] Jean J, Lian F. Implementation of a security micro-aerial vehicle based on HT66FU50 microcontroller. In: IIAI International Congress on Advanced Applied Informatics; 2015. p. 409–10. [4] Bahl P, Padmanabhan VN. RADAR: an in-building RF-based user location and tracking system. In: IEEE INFOCOM Nineteenth Joint Conference of the IEEE Computer and Communications Societies, 2; 20 0 0. p. 775–84. [5] Sugano M, Kawazoe T, Ohta Y, Murata M. Indoor localization system using RSSI measurement of wireless sensor network based on ZigBee standard. In: 6th IASTED International Multi-conference on Wireless & Optical Communications: Wireless Sensor Networks; 2006. p. 503–8. [6] Álvarez Y, De Cos ME LorenzoJ, Las-Heras F. Novel received signal strength-based indoor location system: development and testing. EURASIP Journal on Wireless Communications and Networking; 2010. [7] Ni LM, Liu Y, Lau YC, Patil AP. LANDMARC: indoor location sensing using active RFID. Wireless Networks 2004;10(6):701–10. [8] Park S, Hashimoto S. Autonomous mobile robot navigation using passive RFID in indoor environment. IEEE Trans Ind Electron 2009;56(7):2366–73. [9] Yang L, Chen Y, Li XY, Xiao C, Li M, Liu Y. Tagoram: real-time tracking of mobile RFID tags to high precision using COTS devices. Int Conf Mobile Comput Network 2014:237–48. [10] Chowdhary G, Sobers DM, Pravitra C, Christmann C, Wu A, Hashimoto H, et al. Self-contained autonomous indoor flight with ranging sensor navigation. J Guidance Control Dynamics 2012;35(6):1843–54. [11] Weiss S, Scaramuzza D, Siegwart R. Monocular-SLAM based navigation for autonomous micro helicopters in GPS-denied environments. J Field Rob 2011;28(6):854–74. [12] García Carrillo LR, Dzul López AE, Lozano R, Pégard C. Combining stereo vision and inertial navigation system for a quad-rotor UAV. J Intell Robotic Systems 2012;65(1):373–87. [13] Abraham B, Samuel P, He R, Nicholas R. RANGE: robust autonomous navigation in GPS-denied environments. IEEE Int Conf Rob Autom 2010;28(5):1096–7. [14] Achtelik M, Bachrach A, He RJ, Prentice S, Roy N. Stereo vision and laser odometry for autonomous helicopters in GPS-denied indoor environments. In: Conference on Unmanned Systems Technology XI; 2009. p. 7332. [15] Li DC, Li Q, Cheng N, Wu QF, Song JY, Tang LW. Combined RGBD-inertial based state estimation for MAV in GPS-denied indoor environments. ASCC 9th Asian Control Conference; 2013. [16] Henry P, Krainin M, Herbst E, Ren XF, Fox D. RGB-D mapping: using kinect-style depth cameras for dense 3-D modeling of indoor environments. ISER 12th Int Symposium Experimental Rob 2010;31(5):647–63. [17] Huang AS, Bachrach A, Henry P, Krainin M, Maturana D, Fox D. Visual odometry and mapping for autonomous flight using an RGB-D camera. 15th International Symposium of Robotics Research; 2011. [18] How JP, Bethke B, Frank A, Dale D, Vian J. Real-time indoor autonomous vehicle test environment. IEEE Control Syst 2008;28(2):51–64. [19] The Crazyflie Nano Quadcopter. https://www.bitcraze.io/crazyflie/; [accessed 17.04.04]. [20] Daniel HC, Kannala J, Heikkilä J. Joint depth and color camera calibration with distortion correction. IEEE Trans Pattern Analy Machine Intell 2012;34(10):2058–64. [21] Wang K, An P, Zhang Z, Cheng H, Li H. Fast inpainting algorithm for kinect depth map. J Shanghai University (Natural Science Edition) 2012;18(5):454–8. [22] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004;60(2):91–110. [23] Bay H, Tuytelaars T, Gool LV. SURF: speeded up robust features. Comput Vision Image Understand 2006;110(3):404–17. [24] Module kdtree Documentation of PCL. http://docs.pointclouds.org/trunk/group__kdtree.html; [accessed 17.04.04]. [25] Fischler M, Bolles R. Random sample consensus: A paradigm for model fitting with apphcatlons to image analysis and automated cartography. Graphics Image Proc, Commun ACM 1981;24(6):381–95. [26] Rusinkiewicz S, Levoy M. Efficient variants of the ICP algorithm. In: International Conference on 3-D Digital Imaging & Modeling; 2001. p. 145–52. [27] Olfati-Saber R. Kalman-consensus filter: optimality, stability and performance. In: IEEE Conference on Decision & Control; 2009. p. 7036–42.

Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011

JID: CAEE 16

ARTICLE IN PRESS

[m3Gsc;September 7, 2017;6:7]

Y. Li et al. / Computers and Electrical Engineering 000 (2017) 1–16

Yamin Li received her M.S. and Ph.D. degrees in Information and Communication Engineering and Geodetection and Information Technology from China University of Geosciences, Wuhan, China, in 2013 and 2017, respectively. Her current research interests include wireless sensor networks, computer vision and mobile robot navigation. Yong Wang received his Ph.D. degree in Pattern Recognition and Intelligent Systems from Huazhong University of Science and Technology, Wuhan, China, in 2009. He is currently an Associate Professor at China University of Geosciences, Wuhan, China. His current research interests include wireless sensor networks, environmental monitoring and pattern recognition. Dianhong Wang received his Ph.D. degree in Pattern Recognition and Intelligent Systems from Huazhong University of Science and Technology, Wuhan, China, in 20 0 0. He is a Professor and Ph.D. supervisor in computer science at China University of Geosciences, Wuhan, China. His main research topics are wireless sensor networks, intelligent instrument and computer vision.

Please cite this article as: Y. Li et al., Multiple RGB-D sensor-based 3-D reconstruction and localization of indoor environment for mini MAV, Computers and Electrical Engineering (2017), http://dx.doi.org/10.1016/j.compeleceng.2017.08.011