Monocular Vision Based Navigation and Localisation in Indoor Environments

Monocular Vision Based Navigation and Localisation in Indoor Environments

Proceedings of IFAC-EGNCA 2012 Monocular Vision Based Navigation and Localisation in Indoor Environments Saurav Agarwal∗ Samuel B Lazarus∗ Al Savvari...

1MB Sizes 6 Downloads 86 Views

Proceedings of IFAC-EGNCA 2012

Monocular Vision Based Navigation and Localisation in Indoor Environments Saurav Agarwal∗ Samuel B Lazarus∗ Al Savvaris∗ ∗

Cranfield University, Bedfordshire MK43 0AL, UK [email protected]

Abstract: This paper describes a set of tools and algorithms to enable vision based navigation for a Micro Aerial Vehicle (MAV) flying in an indoor environment. An optical flow balancing strategy is used to avoid obstacles in the vehicle’s path. An Extended Kalman Filter (EKF) based visual odometry system is described which can estimate the trajectory of the vehicle since low cost MEMS based Inertial Measurement Units (IMU) are not capable of providing drift free attitude and position information. To extract depth information using a single camera, a simple triangulation based method using a laser pointer is presented. This enables the MAV to detect oncoming obstacles directly in front of it and execute a turn to avoid collision in case the optical flow balancing algorithm is unable to detect an obstruction. Keywords: Inertial Measurement Unit, Extended Kalman Filter, Autonomous Vehicles, Navigation 1. INTRODUCTION

2. RELATED WORK

Micro Aerial Vehicles (MAVs) are highly constrained in size and weight. To enable them to safely navigate in an unknown environment and build detailed maps, we need compact lightweight sensors. Since the payload capacity of an MAV is limited, cameras are suitable sensors for airborne applications. This paper studies the use of a single camera which presents considerable challenges. Primarily, it is not possible to determine depth with monocular vision as opposed to stereo vision based approaches. This makes it difficult to determine accurately the range to an approaching obstacle and likewise to a potential feature for the probabilistic feature map built by the EKF. To overcome the problem of depth perception, a simple triangulation based method is proposed. In this approach, a laser pointer which projects a beam onto the field of view of the camera is used. A straight forward image processing algorithm looks for the laser dot in the image and the range to the object is based on the deviation of the dot from the focal centre. The MAV navigates by using an optical flow balancing approach. This principle makes use of the inherent motion parallax as features at different distances exhibit different optical flow. A major challenge to tackle is the IMU drift. MEMS based IMUs have inherent errors and simply integrating their outputs with time to obtain the trajectory of the vehicle would result in considerable deviation within tens of seconds. This paper proposes the use of an Extended Kalman Filter (EKF) which fuses visual measurements with the output from the IMU to obtain accurate position and orientation information by building a probabilistic map of visual features and the vehicle state.

© IFAC-EGNCA 2012

Considerable research has been conducted on the use of vision for navigation, localisation and mapping. An optical flow balancing based altitude control system for an MAV in indoor flight was demonstrated by Beyeler et al. (2007). They treat altitude control as an obstacle avoidance problem using the ceiling and floor as references. Garratt and Cheung (2009) simulated an MAV navigating in a cluttered urban environment using optical flow field measurements to calculate the Time to Contact to an obstacle. Their work showed that a highly dynamic light weight MAV using only passive visual sensors could safely navigate such an environment. MonoSLAM (monocular vision based Simultaneous Localisation and Mapping) was first developed by Davison et al. (2007). His influential paper successfuly demonstrated accurate visual SLAM with loop closure, however the system is limited by the number of features it can map due to the computational constraints faced by the quadratic nature of the Extended Kalman Filter. Celik et al. (2009) successfuly implemented a monocular vision based navigation, localisation and mapping scheme for a helicopter in structured indoor environments using architecturual features and employed the use of a particle filter based approach to SLAM. Tournier et al. (2006) implemented a 6 DOF pose estimation algorithm based on Moire patterns. Zingg et al. (2010) have used optical flow to navigate corridor like environments. A visual odometry system that estimates the motion of a camera based on video input using stereo correspondence has been developed by Nister et al. (2006). It takes a purely geometric approach which is solved using the 5 point algorithm and preemptive RANSAC described in Nister (2004). Sunderhauf et al. (2007) demonstrated an Unscented Kalman Filter based MonoSLAM application for airship navigation in case of GPS denial. An EKF based visual

Page 1 of 6

Proceedings of IFAC-EGNCA 2012

odometry system was implemented by Civera et al. (2010). In their algorithm, they do not implement loop closure, rather; features which go out of the view of the camera are deleted from the feature map. This allows them to run the EKF for large trajectories. A RANSAC based hypothesize and test approach is used to validate the estimated state of the camera.

Then, D=

h tan(mδy + b)

(3)

m and b are determined by a simple calibration and leastsquares line fitting. Table 1. Range Finder Results

3. VISION BASED NAVIGATION 3.1 Visual Laser Range Finder There are many commercially available range measuring sensors such as ultrasonic, infrared, laser rangefinders etc. These devices vary in cost and accuracy. An aerial platform must be able to make use of its resources to the maximum as there is a constraint on the number of sensors that can be included in the payload. Hence, to extract maximum benefit from the camera, this paper proposes the use of a vision based laser range finder.

δy 42.1337 32.8504 30.2158 28.4795 27.9755 27.5344 27.1918

Measured Range (m) 0.2542 0.5510 0.8233 1.2208 1.4198 1.6560 1.9018

Actual Range (m) 0.25 0.55 0.85 1.15 1.45 1.75 2.0

Error -1.7 -0.18 3.14 -6.16 2.08 5.37 4.91

Table 1 shows the results from a calibration performed with a setup as described in Figure 1. Such an approach is reliable enough for use on the MAV as it’s primary purpose is to avoid large obstacles directly in fron of the vehicle. 3.2 Optical Flow To enable the MAV to navigate safely an optical flow balancing strategy is proposed. This method works on the principal that when the MAV is translating, closer objects give rise to faster motion across the imager than farther objects. Thus in this scheme, the MAV turns away from the side of greater optical flow. In order to implement this method, the field of view of the forward facing camera is divided into two halves i.e. left and right. The optical flow field in the left half OFL and right half OFR are calculated using the Pyramidal LucasKanade feature tracking algorithm described by Bouguet (2000). The control law is designed as: P P kOFL k − kOFR k P Ω= P kOFL k + kOFR k

Fig. 1. Principle of range measurement using a camera and laser The proposed system uses a green laser which is easily detected by the camera. The laser is aligned with the camera as shown in Figure 1. The reflected laser beam is detected by an image processing algorithm which searches for the brightest pixels in the green layer of the color image. The laser beam is ideally parallel to the optical axis of the camera. The range to the object is calculated as follows : h tan(θ)

(1)

θ = mδy + b

(2)

D=

Where, m: Radians per pixel δy : Deviation of laser dot from centre of focal plane b: Offset

© IFAC-EGNCA 2012

(4)

Where Ω is the commanded yaw rate,kOF k represents the magnitude of optical flow. Figure 2 shows the principle of operation. 3.3 Experimental Setup A specialised quadrotor shown in Fig. 3 is currently being developed for the purpose of autonomous navigation and localisation. It is equipped with an IMU which features a 3 axis accelerometer, 3 axis gyro and a 3 axis magnetometer. There are two controllers on board the vehicle. A lowlevel Arduino based controller will be used for stability control and a high-level single board computer. The single board computer will be used to run the image processing algorithms and sensor fusion. It is also equipped with a LIDAR for future experiments. To test the vision based obstacle avoidance algorithm and verify its suitability for an MAV, a Parror A.R drone has been used for experiments. It is a lightweight quadrotor equipped with a forward looking 93 degree wide-angle diagonal lens camera, a downward looking camera used

Page 2 of 6

Proceedings of IFAC-EGNCA 2012

Fig. 2. Navigating a corridor environment by balancing optical flow. In each visual hemifield upto 200 features are tracked. dL and dR indicate the distance vectors from the camera optical centre to obstacles on left and right Fig. 4. Control Flow    x ˆv Pxx Pxy1 Pxy2 . . .  yˆ1  Py1 x Py1 y1 Py1 y2 . . .    x ˆ=  yˆ2  , P = Py2 x Py2 y1 Py2 y2 . . . .. .. .. .. . . . . 

(5)

4.1 MAV State Vector The vehicle’s state vector xv comprises: Fig. 3. Bumblebee quadrotor for hover stabilisation and an ultrasound altimeter. The onboard IMU is equipped with a 3 axis accelerometer and 3 axis gyro. All image processing algorithms are run offboard on a ground control station which communicates with the MAV via a wifi link. Figure 4 shows the architecture of the setup.

• • • •

3D position: rW Orientation quaternion: q BW Linear velocity: v W Angular velocity: ω B

relative to ‘W’ the fixed world frame and “body” frame B carried by the vehicle.  rW q W B   xv =   vW  ωB 

4. EKF IMPLEMENTATION Several MonoSLAM based systems have been developed in the past. In the paper by Davison et al. (2007) a constant angular and linear velocity dynamics model of the camera is used where the velocities are automatically updated within the probabilistic framework of the EKF. In our algorithm no such assumption is made. Accelerometer and gyro measurements taken from the IMU are fused with the visual measurements captured by the camera. The sensor measurements are combined in a probabilistic 3D map which at an instant represents the estimates of the state of the camera, landmarks and uncertainity in these estimates. The map is comprised of a state vector x ˆ and covariance matrix P. The state vector x ˆ is basically the stack of state estimates of the camera (ˆ x) and features (yˆi ).

© IFAC-EGNCA 2012

(6)

4.2 System Initialisation Initially the filter does not have information stored about the enviornment around it. A coordinate frame is defined by the MAV such that its starting position is X = 0, Y = 0,Z = 0 which is also the origin of the world frame and the body frame is initially oriented with the world frame. At start-up, a minute amount of information is given the form of a known target in front of the camera. This is able to provide several features with known depth which allows the system to assign a precise scale to the estimated map.

Page 3 of 6

Proceedings of IFAC-EGNCA 2012

4.3 Control Input The control input u is modeled as, 

 ax  ay    a  u =  z ωx   ωy  ωz

(7)

It contains the linear accelerations (ax , ay , az ) and rotational rates (ωx , ωy , ωz ) as measured by the IMU in body frame. 4.4 State Update

Where, yi is the vector [Xi , Yi , Zi ]T in 3D space representing the position of the feature. [xi , yi , zi ]T is the position of the camera centre when the feature was first observed, ρi is the depth of the feature along the ray joining the camera center to the feature and θi , φi represent elevation and azimuth of the unit vector in that direction. This representation is called inverse depth parametrisation first developed by Civera et al. (2008). yi can be represented mathematically as: ! ! Xi xi 1 Yi = yi + m(θi , φi ) (12) ρ i Z z i

i

Where, m = [cos(φi )sin(θi ), −sin(φi ), cos(φi )cos(θi )]T

The state of the MAV can be updated as follows: rW + v W ∆t q W B × q(ΩB ∆t)  fv =   v W + aW ∆t  ωW 

 (8)

4.5 Feature Measurement and Updating the Map Normalized cross-correlation search is used to match the template of the patch projected onto the image for the current camera estimate. The location of a landmark relative to the camera is calculated using the estimates, of the camera location and feature location: BW hB (yiW − rW ) L =R

(9)

Where, yiW is the position of feature ’i’ in world frame, hB L is the vector from camera centre to feature center in body frame and RBW is the Rotation matrix from world frame to body frame. The standard pin hole model allows the calculation of the position (u, v) at which the feature would be expected to be found in the image:  hR Lx   u0 − f ku hR  Lz   u  hi = =   v   hR Ly v0 − f kv R hLz 

(10)

where f , ku , kv , u0 and v0 are the standard camera calibration parameters. 4.6 Feature Initialisation A feature i is represented in the map by:   xi  yi    z  yi =  i   θi  φi  ρi

© IFAC-EGNCA 2012

(11)

(13)

The features inside the visual SLAM map are image patches extracted from the camera feed at points of interest in the image. Large image patches (for e.g. 15 × 15 pixels) are extracted as they tend to serve as reliable landmarks, since they have more unique characteristics than standard corner features. The process of extracting a feature has two steps. It begins by using the Shi and Tomasi (1994) corner detector to extract a corner feature from the image and then an image patch of 15 × 15 pixels centered on the corner feature is extracted and stored.

5. FILTER LOOP DESCRIPTION The camera state vector and covariance are initialised based on an initial estimate of the position and uncertainity. Grayscale images are captured for fast processing. Feature patches (of size 15 × 15) are extracted from the image ensuring that they are distributed equally over it. The number of features tracked is key in deciding the accuracy of the system. More features result in higher accuracy but slower performance. New feature positions are predicted based on the current motion estimates of the camera. Matching of features is done by normalised patch cross correlation. If the maximum matching score exceeds a minimum threshold correlation score of 0.80 then it is considered a successful match. A track is kept of the number of times a features is matched. If a feature is succesfuly matched, its confidence level is increased i.e. the feature can be considered a stable landmark. However, for features predicted within view but whose cross correlation score falls below the minimum threshold, the confidence level is reduced. This ensures that if a feature is not successively matched then its confidence level decreases. If it reduces below a critical threshold, then the feature is automatically deleted from the map as it cannot be considered reliable. After feature matching, the Kalman filter update is performed using the matching observations. If the number of matched features falls below a certain predefined threshold, new features are added in regions of the image which previously do not contain features. The system complexity is constrained by keeping only those features within the map, that are visible to the camera.

Page 4 of 6

Proceedings of IFAC-EGNCA 2012

6. RESULTS 6.1 Optical Flow Based Navigation Using the experimental setup described in Figure 4, a test was conducted to verify the robustness of the optical flow balancing algorithm in an unstructured office environment. Figure 5 shows the images from the forward facing camera of the MAV. The blue circles represent the corner features. The left and right green horizontal lines represent the optical flow in the respective visual hemifields. The central green horizontal line depicts the direction and magnitude of the commanded yaw to avoid the obstacle. Image 1 and 2 show the MAV approaching an open door, the central green line in image 2 shows the commanded yaw (i.e. to the left). Image 3 shows the MAV turning away from the obstacle and image 4 shows that the MAV has avoided the obstacle successfuly.

Fig. 7. Vehicle position along Y axis

Fig. 8. Vehicle position along Z axis Fig. 5. MAV flying in cluttered indoor environment. The blue circles represent corner features.

Fig. 9. Vehicle Orientation in Roll (φ)

Fig. 6. Vehicle position along X axis 6.2 EKF Simulation A software framework was developed to test the proposed system. The virtual world is populated with 3D landmarks. The Kalman filter is initialised with a high uncertainity. Figure 6, 7 and 8 show the true position and the estimates from the IMU and the vision aided filter. Figure 9, 10 and 11 show the orientation estimates error from the IMU and the vision aided filter. It is clear that visual measurements enhance the accuracy of the odometry. As time progresses the Kalman filter is able to accurately determine the state of the vehicle. The IMU readings exhibit continuous divergence from the true value whereas the filtered estimtaes have considerably less error. Compared to the position

© IFAC-EGNCA 2012

Fig. 10. Vehicle Orientation in Pitch (θ) error the heading error is slightly higher but still within reasonable bounds.

Page 5 of 6

Proceedings of IFAC-EGNCA 2012

REFERENCES

Fig. 11. Vehicle Orientation in Yaw (ψ)

7. CONCLUSIONS & FUTURE WORK A simple algorithm for visual navigation based on optical flow was developed, tested and verified. However, it is not fool proof for example situations where a featureless obstacle such as a wall is presented to the MAV it will not comphrehend which direction to turn to. This drawback was solved with a vision based laser range finder. A MonoSLAM based visual odometry system was developed and tested. The simulations promise higher accuracy compared to using an IMU as the only sensor for position and orientation information. It is highly efficient and robust to perturbations to the MAV dynamics. Its limitation is that as it relies on an EKF to maintain the state estimates, high number of features tend to increase the complexity of the calculations which makes this system incapable of storing a large number of landmarks in the map. This constraint makes loop closure difficult for large trajectories. Future research would be to extend the algorithm for large scale environments by introducing a form of intelligent map management. It has been shown by Estrada et al. (2005) that a system of small scale maps can be successfully joined by a set of estimated transformations when loop closure is detected. We propose to develop and test a fully autonomous navigation strategy using the tools and algorithms described in this paper. The strategy would be as follows; the vehicle takes off pointed at a known target. This enables the visual odometry system to rapidly initialse landmarks accurately. Once the vehicle attains hover state, it is commanded to turn clockwise until it detects a traversable path and then start following the corridor using the optical flow balancing strategy. The vehicle constantly measures the time to collision (TTC) based on the depth estimate from the forward looking laser mentioned in section 3. If the TTC falls below a preset threshold, which would indicate an approaching wall, the MAV stops and executes a turn to the direction in which the optical flow was less until the depth measurement exceeds a certain minimum.

© IFAC-EGNCA 2012

Beyeler, A., Zufferey, J., and Floreano, D. (2007). 3d vision-based navigation for indoor microflyers. In IEEE International Conference on Robotics and Automation. Bouguet, J. (2000). Pyramidal implementation of the lucas kanade feature tracker description of the algorithm. Technical report, Intel Corporation. Celik, K., Chung, S., and Somani, A. (2009). Monocular vision slam for indoor aerial vehicles. In IEEE/RSJ International Conference on Intelligent Systems and Robotics. Civera, J., Davison, A.J., and Montiel, J. (2008). Inverse depth paramtrization for monocular slam. IEEE Transactions on Robotics. Civera, J., Grasa, O., Davison, A.J., and Montiel, J. (2010). 1-point ransac for ekf filtering: Application to real-time structure from motion and visual odometry. Journal of Field Robotics. Davison, A.J., Reid, I., and Molton, N. (2007). Monoslam: Real-time single camera slam. IEEE Transactions On Pattern Analysis and Machine Intelligence, 29(6). Estrada, C., Neira, J., and Tardos, J. (2005). Hierarchical slam: real-time acurate mapping of large environments. IEEE Transactions on Robotics, 21(4), 588–596. Garratt, M. and Cheung, A. (2009). Obstacle avoidance in cluttered environments using optic flow. In Australian Conference on Robotics and Automation. Nister, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions On Pattern Analysis and Machine Intelligence. Nister, D., Naroditsky, O., and Bergen, J. (2006). Visual odometry for ground vehicle applications. Journal of Field Robotics, 23(1). Shi, J. and Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition. Sunderhauf, N., Lange, S., and P.Protzel (2007). Using the unscented kalman filter in mono-slam with inverse depth parametrization for autonomous airship control. In IEEE International Workshop on Safety Security and Rescue Robotics. Tournier, G., Valenti, M., and How, J. (2006). Esitmation and control of a quadrotor vehicle using monocular vision and moire patterns. In AIAA Guidance, Navigation and Control Conference and Exhibit. Zingg, S., Scaramuzza, D., Weiss, S., and Siegwart, R. (2010). Mav navigation through indoor corridoors using optical flow. In International Conference on Robotics and Automation.

Page 6 of 6