A comparison between different feature-based methods for ROV vision-based speed estimation F. Ferreira ∗ G. Veruggio ∗ M. Caccia ∗∗ G. Bruzzone ∗∗ ∗
CNR-IEIIT, Via De Marini, 6,16149 Genova, Italy, (e-mail: fausto.ferreira,
[email protected]). ∗∗ CNR-ISSIA Via De Marini, 6,16149 Genova, Italy, (e-mail: max,
[email protected]). Abstract: A comparison study between different state-of-the-art visual approaches for estimating the motion of an underwater Remotely Operated Vehicle (ROV) is performed. The paper compares five different techniques: the template correlation, Speeded Up Robust Features (SURF), Scale Invariant Feature Transform (SIFT), Features from Accelerated Segment Test (FAST) and Center Surround Extrema (CenSurE), all based on feature extraction and matching. All these are implemented on the same free open source library which allows a fair comparison that can establish the best technique (depending on the criteria used). Taking into account previous work where SURF and template correlation techniques were evaluated using a batch of data collected in typical operating conditions with the Romeo ROV, the other techniques are compared using the same data set. In estimating vehicle speed, SURF and SIFT presented noise levels higher but close to template correlation, though SURF and SIFT have more outliers. In terms of computational time, template correlation outperforms all other alternatives by large in some cases. Keywords: ROV navigation, motion estimation, SURF, SIFT, benchmarking 1. INTRODUCTION There are several alternatives and possible combination of sensors/techniques in order to achieve underwater navigation including inertial sensors, Doppler velocity logs, systems with long and ultra-short baseline or visual approaches. In the last years, the optical methods have become more and more popular in underwater environments. From bottom-tracking (Huster et al., 1998) to station keeping and target following (Rife and Rock, 2001), the most exciting applications are now Simultaneous Localization and Mapping (SLAM) of the explored area and mosaicking (Garcia et al., 2001; Brignone et al., 2011; Ferreira et al., 2012). In the case of this article, visual features are used to an intermediate step of SLAM, the motion estimation. Along the years, there was a huge evolution on this area. Since the pioneering work carried out at the Monterey Bay Aquarium Research Institute on correlation-based motion estimation and video mosaicing (Marks et al., 1995), the optical applications got more and more complex up to video mosaicking and 3D reconstruction of the sea floor (Pizarro et al., 2009) in works by the Woods Hole Oceanographic Institution. ⋆ Research supported in part by the Funda¸ca ˜o para a Ciˆ encia e Tecnologia (FCT), Portugal with the PhD Grant SFRH/BD/72024/2010 and by the project MORPH, Contract Number: EU-FP7-ICT-288704 coordinated by ATLAS ELEKTRONIK GMBH.
Although the integration of conventional non-optical sensors with visual navigation is not uncommon, and it can provide very accurate SLAM (Eustice et al., 2008; Williams and Mahon, 2004), with the sole use of visual means it is also possible to obtain a good motion estimation. As we shall see, a bottom-looking camera can be enough to allow a full SLAM framework. The limitation of range introduced by the use of a purely visual approach does not constitute a considerable issue as the motion estimation techniques tested in this work are used as basis to a mosaicking algorithm that per se provides a good quality mosaic if it is performed near the sea floor (Ferreira et al., 2012). This article follows work developed on the automatic extraction and tracking through correlation of a set of high local variance image templates. This technique proved to be suitable in estimating the speed of the Romeo ROV developed by CNR-ISSIA (in a batch of operating conditions data)(Caccia, 2006). It was also compared with Speeded Up Robust Features (SURF) in (Ferreira et al., 2009) and with Phase Correlation in (Ferreira et al., 2010a,b). Within this context, the research hereby presented focuses on the performance evaluation, in terms of precision and reliability, extending the tested feature-based approaches from two to five in order to give to the community a good insight about the best choice (in terms of precision and computational time). The tested approaches are then: template correlation, Speeded Up Robust Features
(SURF), Scale Invariant Feature Transform (SIFT), Features from Accelerated Segment Test (FAST) and Center Surround Extrema (CenSurE). The choice was not unintended and has to do with implementation issues. The newest versions of the free open-source OpenCV 1 include these approaches which allows us to perform a fair comparison based on a similar implementation of each of the techniques. The algorithms were run on the same set of recorded images to produce five different speed estimates for the whole path. The noise level, the percentage of outliers and the computational time of each algorithm was compared. The results show that the template correlation is the best alternative in each of the three aspects except for the noise level of the sway speed where SIFT works slightly better. The paper is organized as it follows. Section 2 presents the basic information concerning the system design. Then, Section 3 introduces the different feature detection and tracking approaches. The results obtained are discussed in Section 4 while the conclusions are drawn in Section 5. 2. SYSTEM DESIGN In order to estimate the motion of the robot, a system based on a monocular optical vision approach is used. The image depth is provided by tracking a set of laser spots. The overall system is a modular approach constituted by a laser triangulation altimeter and a speed-meter that uses the image depth provided by the former to convert the speed into a metric system. 2.1 Laser triangulation altimeter The monocular optical vision system is composed by a video-camera and four red laser spots that are rigidly attached to the camera. Based on the laser spots image coordinates, the laser-triangulation altimeter computes the image depth Z, i.e. the range from the seabed. The system is calibrated in the laboratory before experiments (the reader can refer to (Caccia, 2002) for details). Only the red component of the image is used. In order to increase the system reliability and range, altitude measurements hS provided by acoustic altimeters are integrated with the vision-based estimate of the scene depth. 2.2 Feature-based speed-meter The feature-based speed-meter is shown in Figure 1. The feature detector and tracker changes slightly according to the selected class of features. Five different approaches to the issue of feature definition, detection and tracking will be presented in the next section. One common point in this approach that improves the tracking performance is the prediction of the image coordinates of the features. This prediction is performed on the basis of the estimated camera/vehicle speed (and the previous position of the features). When there is no possible prediction, the actual image is considered as predicted image and a larger neighborhood for feature matching, able to cope with the maximum vehicle speed, is considered. 1
http://opencvlibrary.sourceforge.net
Fig. 1. Feature-based motion estimation system Due to the vehicle’s translating movement at constant heading and altitude (details later) that is considered in this work, the motion estimation is performed using a Least Squares algorithm relying on the motion field equations of each feature: [ ] [ ] [ ] [ ] w m f u −q m ˙ + +f (1) ≈− p n˙ Z v Z n where f is the camera focal length, Z is the template image depth, as estimated by the laser-triangulation altimeter, [m n]T are the feature image coordinates and [m ˙ n] ˙ T its corresponding motion field and [u v w]T and [p q]T represent the vehicle linear (surge, sway, heave) and angular speed (roll and pitch rate) respectively. Equation 1 shows that, for small variations, surge and sway displacements are indistinguishable from pitch and roll rotations. In the case no measurements of pitch and roll rates are available for direct compensation, one can T adopt a simplified model neglecting the term f [−q p] . But on the other hand, the effects on surge and sway estimates of roll and pitch rates are not negligible at low speed. Indeed, for typical ROV benthic operations at an altitude of about 1m (as in this experiment), an angular rate of 1◦ /s corresponds to a disturbance of about 1.75cm/s on the estimated linear speed in the case one pixel corresponds to 1mm at a range of 1m. Therefore, suitable filtering techniques were used to reject these disturbance on linear speed estimate (for more detail refer to (Caccia, 2003)). 3. FEATURE DETECTION AND TRACKING APPROACHES As written previously, the choice of the methods to be tested was not unintended and has to do with the possibilities given by the latest versions of the free open source library OpenCV. More important than the easiness of implementation, using the several implementations of each algorithm in OpenCV is a good way of comparing the algorithms at a computational level. Therefore, the five different tested methods are introduced in the next subsections. The same kind of motion from tokens estimation is performed with all the techniques. This estimation only gives the relative motion in the image coordinates, not on a metric scale. Therefore, to convert to metric units, the laser triangulation system for image depth computation is used as explained in section 2.
3.1 Template correlation With template correlation, the authors intend to nominate correlation of templates (a small region) extracted accordingly to a high variance criterion. The automatic selection and extraction of the templates to be correlated was first presented in (Misu et al., 1999) for autonomous landing of spacecraft applications and can be summarized on these 3 steps: • 2-D band-pass filtering to enhance specific spatial wavelengths; • computation of local variances to evaluate contrast; • extraction of templates as high local variance areas. Special need has to be taken in our case. In order to avoid extraction of the laser spots mL as high local variance areas (artificially induced), the detection of features is inhibited in the proximity of these points. For the matching phase, the normalized correlation coefficient thresholds the matches. The experimental results were obtained at constant heading in the proximity of the seabed. Due to this fact, the disturbances induced in the correlation between two frames are negligible. To reduce computational workload, templates are tracked looking for the highest correlation matching in a suitable neighborhood of their predicted image position, as in the original approach in (Caccia, 2006). This prediction is also done in the other cases and was explained in subsection 2.2. 3.2 SIFT The SIFT algorithm is among the most popular (and oldest) interest point detectors. Introduced initially in (Lowe, 1999), an updated reference from the same author is given in (Lowe, 2004). It is invariant to translation, scaling and rotation and it uses a Laplacian-based detector: a Difference of Gaussians approach. The descriptor is then built considering also pixels around the detected point. The matching is performed using the Brute Force Matcher of OpenCV that uses the Euclidean distance between two descriptor vectors to define the best matches. Only pairs of features that are at a distance not greater than a maximum distance (threshold defined) are considered as a good match. To be fair, the same matcher and the same maximum distance (with respect to the minimum distance of each approach) were also used for the SURF, FAST and CenSurE methods. Several works in underwater navigation use SIFT and a good comparison between several interest points can be found in (Mozos et al., 2007). Other works include (Thomas, 2008; Kudzinava, 2007). 3.3 SURF The SURF approach is similar to the SIFT one. In (Thomas, 2008), SURF was tested together with SIFT in the context of real-time vision-based SLAM for an Autonomous Underwater Vehicle (AUV). The advantages of SURF compared to SIFT include a lower computational complexity and more robustness in terms of changes in scale than SIFT. Although, according to Kudzinava
(2007), the number of descriptors extracted is lower for SURF than for SIFT and SIFT works better for affine/projective deformations. In order to diminish the computational complexity, according to the original paper (Bay et al., 2006), a Fast-Hessian detector that approximates the Hessian with the use of integral images and box filters (Viola and Jones, 2001) is used. The determinant of the Hessian is a measure of the feature’s strength. The descriptors vector is based on a distribution of Haar-wavelet responses within the feature neighborhood. As in SIFT, to match the features between different frames, a na¨ıve nearest-neighbor algorithm is used with the Euclidean distance between the descriptors vector being used to define the best candidate match. Only features with the same Laplacian sign are matched increasing the matching speed. 3.4 FAST Features from Accelerated Segment Test (FAST) were introduced in (Rosten and Drummond, 2005, 2006). The Accelerated Segment Test algorithm states that if at least n contiguous pixels in a Bresenham circle of radius r around the point p are all brighter (or darker) than p by a threshold t, then p is a feature. In the FAST version, r = 3 and the best results were obtained with n = 9 (originally n = 12). If n is lower than 9, the corner detector degenerates in a edge detector. The authors claim as main advantages the high repeatability and of course speed motivating our choice as the tested SURF detector in previous works was too slow when compared with template correlation and thus FAST can be a good candidate to get a computational time closer to the template correlation approach. In order to compare in a more explicit way the improvement given by the FAST detector at a computational level, SURF-based descriptors were used allowing to compare directly the improvement between a SURF-based detector to a FAST detector. The only example of the use of FAST in an underwater context was found in (Shkurti et al., 2011) and thus more experiments are helpful to evaluate the usefulness of this method in underwater images. 3.5 CenSurE Center Surround Extremas (CenSurE) were presented recently in (Agrawal et al., 2008). The actual version in OpenCV is named STAR keypoint detector due to the shape of the detector. Indeed, the bi-level polygons that approximate the Laplace of Gaussian are octagons in the original CenSurE while in OpenCV there are two squares, one of which is rotated 45◦ thus forming the Star shape. Integral images are used also here (like in SURF) but to avoid subsampling and interpolation, filters of different sizes are applied giving a precise localization. A non-maximum suppression is performed eliminating weak features. Moreover, line and edges are suppressed by using the ratio of principal curvatures. The unstable features lying in a edge or line have a large principal curvature along one direction but not on the perpendicular direction and thus this ratio is a good way of eliminating the unstable features. The original work bases its descriptors
4. EXPERIMENTAL RESULTS 4.1 Experimental set-up
0.2
Correlation−based speed−meter: measured surge
0 −0.2 0 0.2
50 100 150 200 SURF128−based speed−meter: measured surge
250
50
100 150 200 SIFT−based speed−meter: measured surge
250
50
100 150 200 FAST−based speed−meter: measured surge
250
50 100 150 200 STAR−based speed−meter: measured surge
250
50
250
0 −0.2 0 0.2
u [m/s]
The data used to test the different algorithms was collected by the Romeo ROV in Portofino Park area, Italy, Summer 2005. During the experiment, the vehicle navigated in a lawn mowing pattern at constant heading through waypoints in auto-altitude mode at a constant altitude of 1.31m. The constant heading and altitude assumptions are confirmed from their standard deviation (1% and 1.5% respectively). The vehicle motion (in m/s) was computed online on the basis of the vision-based estimates of its horizontal speed and altitude like in (Caccia, 2007). Figure 2 shows the optical device mounted downward looking below the vehicle and aligned with its principal axis.
u [m/s]
As in FAST, the only work found that used CenSurE in the context of underwater vehicles navigation was (Shkurti et al., 2011).
the default parameters were used except for the number of octaves (2). For FAST, the threshold was 10 and the non maximum suppression was turned off (albeit it might be strange the best results were obtained without nonmaximum suppression). U-SURF64 descriptors (where 64 refers to the number of descriptors) were used in conjunction with the FAST detector as those were the ones among SURF that obtained best results and also to be more comparable to the STAR approach. Indeed, in the case of STAR, U-SURF64 were used as descriptors. As for the detection, the maximum size of the features is 32, the threshold for the approximated Laplacian is 10, the ”lineThresholdProjected” and the ”lineThresholdBinarized” are both 10 and the non-maximum suppression size is 2.
u [m/s]
in the Modified Upright SURF - MU-SURF and uses also the sign of the Laplacian for quick matching. OpenCV has Upright SURF (U-SURF) implemented but not MUSURF. U-SURF decreases the computational time as does not include rotational invariance diminishing the complexity of the matching procedure.
0 −0.2 0
u [m/s]
0.2 0 −0.2 0
u [m/s]
0.2 0 −0.2 0
Fig. 2. Laser triangulation optical device mounted inside the Romeo ROV toolsled (bottom view). The trial took 44 minutes and 15 seconds, that correspond to 13275 processed images sampled at 5Hz. 4.2 Speed-meter In order to achieve a fair comparison, the thresholds used in previous works (Ferreira et al., 2009) for template correlation and SURF approaches were maintained. The SURF original work has two kinds of vector descriptors: the shorter one (vector with 64 descriptors) and the extended one (with 128 descriptors). In the new versions of OpenCV there is also the possibility of using Upright SURF (USURF). Thus, all the possible SURF versions were tested with the same parameters. As for the other methods, extensive tuning parameter was performed in order to find the best possible alternative (especially in the STAR case). The number of parameters of these methods is a drawback in comparison with the template correlation as it implies much more tuning and therefore less independence of the results with respect to the parameters. So, for SIFT,
100
150 time [s]
200
Fig. 3. Measured surge for Feature correlation, U-SURF64, SIFT, FAST and STAR approaches The estimation results of the five methods for the surge velocities are shown in Figure 3. The results for the sway speed are akin and therefore not graphically presented. As for the SURF, only the best method is shown as the others have similar results. From the graphs, one can identify easily that the FAST approach is the more noisy one. Although U-SURF64, SIFT and STAR measures resemble the Feature correlation one, it is easy to identify more peaks in the former ones. Indeed, the quantitative results confirm that. Table 1 gives information about the speed measurement noise standard deviation for each method and for both speeds while table 2 shows the percentage of outliers. The quantitative data was obtained by defining a noise signal through a statistical characterization considering the residuals between the raw speed estimate and a smoothed signal, obtained in post-processing with a Butterworth filter of order 11 and a cutoff frequency of 0.15Hz. The per-
Table 1. Speed measurement noise standard deviation [m/s] surge sway
Template correlation 0.0081 0.0088
SURF64 0.0109 0.0094
SURF128 0.0130 0.0100
U-SURF64 0.0096 0.0089
U-SURF128 0.0116 0.0089
SIFT 0.0119 0.0087
FAST 0.0571 0.0696
SIFT 9.47 16.01
FAST 41.69 41.57
STAR 0.0126 0.0140
Table 2. Percentage of outliers [%] surge sway
Template correlation 8.15 15.55
SURF64 8.76 15.78
SURF128 9.85 15.64
centage of outliers is computed by a recursive symmetric median filter before smoothing. The template correlation obtains the best results except for the sway speed where SIFT is slightly better. As expected visually, the percentage of outliers is lower for template correlation too. The poor performance of FAST easily seen in the pictures can be asserted by the quantitative results with more than 40% of outliers and a noise standard deviation over 6 times more than the template correlation’s one. This poor performance might be related to the fact that FAST has some problems with noise according to the authors and these images are blurry in many cases. Moreover, FAST does not perform a multi-scale detection. In what respects template correlation, the fact that only eight templates of 16 × 16 pixels are tracked simultaneously instead of a point-to-point matching with a much higher matching space dimension means that the sources of error are fewer and therefore explains the better performance. By last, table 3 shows the mean time per frame needed by each algorithm to estimate the motion 2 . Template correlation outperforms every other approach by large in many cases. Comparing with previous results, the newer version of OpenCV takes less time than before to perform template correlation while for SURF there was a significant increase of time. It would be possible to get faster results namely by using a Fast Approximate Nearest Neighbor Search (FLANN) based matcher (Muja and Lowe, 2009) but as this is an approximated matcher, the standard deviation noise would increase. As it is already higher for SURF, SIFT, FAST and STAR than for template correlation there is no point of trying faster approximated alternatives trading off noise for time. 5. CONCLUSIONS This work tested several interest point detectors (SURF, SIFT, FAST, STAR) comparing them with the original region-based detector in the context of motion estimation. The way in which these feature detectors were tested, by using the same open-source library, the same kind of matcher and in the case that descriptors were not included, the same kind of descriptors used for other detectors aimed to give a fair comparison among the different detectors and with the original region-based detector. Corroborating previous results (Ferreira et al., 2009, 2010a,b) where only one interest point approach was compared (SURF), the region-based detector proved again to be the best choice either considering the overall result either considering each criterion alone (noise, number of outliers, computational time). The added complexity of the interest point detectors approaches lead to more noise and slower performances 2
As computed in an Intel Core Duo @ 2.66Ghz
U-SURF64 8.53 15.78
U-SURF128 8.92 15.81
STAR 14.14 20.91
thus disfavoring the use of these approaches. Although some of the SURF approaches (and SIFT) got similar results in terms of the sway speed estimate noise, the mean time required to estimate the speed is much higher. As this motion estimation is supposed to work as an intermediate step of a SLAM algorithm being used for mosaicking (Ferreira et al., 2012) and the mosaicking has to be done in real time, compromising time is not an option. Bearing in mind that testing all possible combinations of feature detectors, descriptors and matchers is out of the scope of this work, future work includes further testing on other possible approaches. These include not only different interest point detectors (Shi and Tomasi, 1994; Calonder et al., 2010) but also other region-based detectors (Matas et al., 2002), different descriptors (Rublee et al., 2011) and matchers variations. ACKNOWLEDGEMENTS The authors wish to thank Riccardo Bono, Giorgio Bruzzone and Edoardo Spirandelli for their highly professional and kind support in the development and operation at sea of the Romeo ROV. REFERENCES Agrawal, M., Konolige, K., and Blas, M. (2008). Censure: Center surround extremas for realtime feature detection and matching. In D. Forsyth, P. Torr, and A. Zisserman (eds.), Computer Vision ECCV 2008, volume 5305 of Lecture Notes in Computer Science, 102–115. Springer Berlin / Heidelberg. Bay, H., Tuytelaars, T., and Gool, L.V. (2006). Surf: Speeded up robust features. In Proceedings of the 9th European Conference on Computer Vision, volume 3951 part 1, 404–417. Springer LNCS. Brignone, L., Munaro, M., Allais, A., and Opderbecke, J. (2011). First sea trials of a laser aided three dimensional underwater image mosaicing technique. In OCEANS, 2011 IEEE - Spain, 1 –7. doi:10.1109/OceansSpain.2011.6003483. Caccia, M. (2002). Optical triangulation-correlation sensor for ROV slow motion estimation: experimental results (July 2002 at-sea trials). Rob-02, CNR-IAN. Caccia, M. (2003). Pitch and roll disturbance rejection in vision-based linear speed estimation for UUVs. In Proc. of MCMC 2003, 313–318. Caccia, M. (2006). Laser-triangulation optical-correlation sensor for ROV slow motion estimation. IEEE Journal of Oceanic Engineering, 31(3), 711–727. Caccia, M. (2007). Vision-based ROV horizontal motion control: near-seafloor experimental results. Control Engineering Practice, 15(6), 703–714.
Table 3. Motion estimation mean time [ms/f rame] Template correlation 14.3
SURF64 93.6
SURF128 110.9
U-SURF64 93.5
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). BRIEF: Binary Robust Independent Elementary Features. In K. Daniilidis, P. Maragos, and N. Paragios (eds.), Computer Vision ECCV 2010, volume 6314 of Lecture Notes in Computer Science, chapter 56, 778–792. Springer Berlin / Heidelberg, Berlin, Heidelberg. doi:10.1007/978-3-642-15561-1 56. URL http://dx.doi.org/10.1007/978-3-642-15561-1 56. Eustice, R., Pizarro, O., and Singh, H. (2008). Visually augmented navigation for autonomous underwater vehicles. IEEE Journal of Oceanic Engineering, 33(2), 103–122. Ferreira, F., Orsenigo, F., Veruggio, G., Pavlakis, P., Caccia, M., and Bruzzone, G. (2010a). Comparison between feature-based and phase correlation methods for ROV vision-based speed estimation. In 7th Symposium on Intelligent Autonomous Vehicles. IFAC, Lecce, Italy. Ferreira, F., Orsenigo, F., Veruggio, G., Pavlakis, P., Caccia, M., and Bruzzone, G. (2010b). A numerical comparison between feature correlation and phase correlation for motion estimation relative to sea bottom. In 8th IFAC Conference on Control Applications in Marine Systems. IFAC, Rostock-Warnemnde, Germany. Ferreira, F., Veruggio, G., Caccia, M., and Bruzzone, G. (2009). Speeded up robust features for visionbased underwater motion estimation and slam: comparison with correlation-based techniques. Proceedings of MCMC’2009. Ferreira, F., Veruggio, G., Caccia, M., and Bruzzone, G. (2012). Real-time optical slam-based mosaicking for unmanned underwater vehicles. Intelligent Service Robotics, 5, 55–71. URL http://dx.doi.org/10.1007/s11370-011-0103-x. 10.1007/s11370-011-0103-x. Garcia, R., Batlle, J., Cufi, X., and Amat, J. (2001). Positioning an underwater vehicle through image mosaicking. Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE Internationa l Conference on, 3, 2779– 2784 vol.3. doi:10.1109/ROBOT.2001.933043. Huster, A., Fleischer, S., and Rock, S. (1998). Demonstration of a vision-based dead-reckoning system for navigation of an underwater vehicle. Autonomous Underwater Vehicles, 1998. AUV’98. Proceedings Of The 1998 Workshop on, 185–189. doi:10.1109/AUV.1998.744454. Kudzinava, M. (2007). Feature-based Matching of Underwater Images. Master’s thesis, University of Girona. Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2), 91–110. Lowe, D. (1999). Object recognition from local scaleinvariant features. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, 1150 –1157 vol.2. doi: 10.1109/ICCV.1999.790410. Marks, R., Rock, S., and Lee, M. (1995). Real-time video mosaicking of the ocean floor. IEEE Journal of Oceanic Engineering, 20(3), 229–241.
U-SURF128 93.4
SIFT 352.9
FAST 70.6
STAR 38.2
Matas, J., Chum, O., Martin, U., and Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of British Machine Vision Conference, volume 1, 384–393. London. Misu, T., Hashimoto, T., and Ninomiya, K. (1999). Optical guidance for autonomous landing of spacecraft. IEEE Transactions on aerospace and electronic systems, 35(2), 459–473. Mozos, O.M., Gil, A., Ballesta, M., and Reinoso, O. (2007). Interest point detectors for visual slam. Technical report, LNAI 4788. 170-179. Muja, M. and Lowe, D.G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application VISSAPP’09), 331–340. INSTICC Press. Pizarro, O., Eustice, R., and Singh, H. (2009). Large area 3-d reconstructions from underwater optical surveys. IEEE Journal of Oceanic Engineering, 34(2), 150–169. Rife, J. and Rock, S.M. (2001). A low energy sensor for AUV-based jellyfish tracking. Technical report, Stanford University, Moss Landing, CA 95039. URL http://sun-valley.stanford.edu/papers/RifeR:2001b.pdf. Rosten, E. and Drummond, T. (2005). Fusing points and lines for high performance tracking. In IEEE International Conference on Computer Vision, volume 2, 1508– 1511. doi:10.1109/ICCV.2005.104. Rosten, E. and Drummond, T. (2006). Machine learning for high-speed corner detection. In European Conference on Computer Vision, volume 1, 430–443. doi: 10.1007/11744023 34. Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011). Orb: An efficient alternative to sift or surf. In International Conference on Computer Vision. Barcelona. Shi, J. and Tomasi, C. (1994). Good features to track. In Computer Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, 593 –600. doi:10.1109/CVPR.1994.323794. Shkurti, F., Rekleitis, I., and Dudek, G. (2011). Feature tracking evaluation for pose estimation in underwater environments. In Computer and Robot Vision (CRV), 2011 Canadian Conference on, 160 –167. doi: 10.1109/CRV.2011.28. Thomas, S.J. (2008). Real-time Stereo Visual SLAM. Master’s thesis, Heriot-Watt University,Universitat de Girona,Universite de Bourgogne. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, I–511–I–518 vol.1. doi: 10.1109/CVPR.2001.990517. Williams, S. and Mahon, I. (2004). Simultaneous localisation and mapping on the great barrier reef. In Proc. ofIEEE International Conference on Robotics and Automation ICRA 2004.