Robust 3D Visual Localization Based on RTABmaps

Robust 3D Visual Localization Based on RTABmaps

Alberto Martín Florido (Universidad Rey Juan Carlos, Spain), Francisco Rivas Montero (Universidad Rey Juan Carlos, Spain) and Jose María Cañas Plaza (Universidad Rey Juan Carlos, Spain)
Copyright: © 2018 |Pages: 17
DOI: 10.4018/978-1-5225-5628-2.ch001


Visual localization is a key capability in robotics and in augmented reality applications. It estimates the 3D position of a camera on real time just analyzing the image stream. This chapter presents a robust map-based 3D visual localization system. It relies on maps of the scenarios built with the known tool RTABmap. It consists of three steps on continuous loop: feature points computation on the input frame, matching with feature points on the map keyframes (using kNN and outlier rejection), and 3D final estimation using PnP geometry and optimization. The system has been experimentally validated in several scenarios. In addition, an empirical study of the effect of three matching outlier rejection mechanisms (radio test, fundamental matrix, and homography matrix) on the quality of estimated 3D localization has been performed. The outlier rejection mechanisms, combined themselves or alone, reduce the number of matched feature points but increase their quality, and so, the accuracy of the 3D estimation. The combination of ratio test and homography matrix provides the best results.
Chapter Preview


Cameras are ubiquitous sensors today. They are prevalent on laptops, smartphones and on mobile robots. Extracting information from visual data is not an easy task. One important piece of information that can be extracted on real time from mobile cameras is their 3D localization. Visual localization is the problem of estimating the camera pose from the image flow.

Localization can be used, for instance, in a robot to decide its right behavior. Like in robotic vacuum cleaners as Roomba 980 model. The low-end models just deploy a random coverage navigation algorithm because they only have noisy odometry to estimate their position at home. The high-end models are equipped with cameras and visual localization algorithms. This allows smarter coverage navigation algorithms which clean faster and in a more methodical way. It has been also used in other very different robots like drones or humanoid at the RoboCup. Most interesting robots have cameras.

In smartphones and tablets, it can be used for Augmented Reality and Mixed Reality applications. Knowing the 3D position of the phone some virtual objects can be realistically drawn over the camera image on real time. One interesting example of this is the Tango project from Google. One sample commercial application is IKEA Place app, developed in conjunction with Apple. The new Software Development Kits for Augmented Reality from Google (ARcore) and from Apple (ARkit) take benefit from visual localization technology.

Another use of visual localization is to calibrate the extrinsic parameters of the cameras of a motion capture system. This was a problem in our lab, where a motion capture system is used to track the people position over time. That system uses several RGBD cameras installed on the monitored scenario. In order to compute proper estimations, the position of all the cameras of the system must be accurately known before start. Every time the motion tracking system is going to be installed on a new scenario, it has to be calibrated, that is, the 3D position of all its cameras has to be estimated. The old calibration procedure was manual, pattern based, slow and error prone. A 3D pattern was placed on a certain position of the new scenario, it was observed in the camera images and several relevant points were manually matched. This delivered an estimated 3D position that was also manually refined. The visual localization system proposed on this chapter provides a fast, robust and automatic calibration mechanism. No 3D pattern is needed at all.

Self-localization has been an active topic on robotics since many years ago. Laser based solutions and particle filters provided good and robust estimations (Thrun et al, 2001).

But laser sensors are expensive. Cameras are cheap and widespread sensors, and they are common equipment onboard most robots. New vision based self-localization algorithms were also created (Dellaert et al, 1999). Some of them require previous knowledge of the scene in terms of a map or a collection of beacons. Others, like Visual SLAM (Simultaneous Localization and Mapping) algorithms, calculate the 3D motion of a camera in real-time without prior information of the environment, creating at the same time a map of its surroundings. In addition, the same problem was addressed on the computer vision community, named there the Structure From Motion problem. In that context the real-time operation does not use to be a hard requirement. Visual localization has been one of the main challenges in computer vision and robotics since the early 2000s.

In visual SLAM some interesting concepts appear beyond accuracy and real-time operation like loop closure and relocalization. The relocalization refers to the capability of recovering the right position once the system gets lost by an occlusion, a perception failure or a bad estimation. The loop closure refers to the behavior of the estimation when the camera returns to a previously visited place. The 3D estimations must be the same, but it is usually not the case as the visual localization algorithm may accumulate errors.

This chapter proposes a robust algorithm to locate in 3D a single RGB camera on a given scenario, whose 3D map is available. This map does not include any beacon at all, just a collection of keyframes and their 3D locations, like those built by RTABmap tools from RGBD sensors such Kinect. It provides absolute position on the map. The algorithm has been implemented and it is open source software available on a public repository. The effect of several matching filters on the quality and robustness of the 3D estimate has been also studied. The algorithm has been experimentally validated in different real scenarios where RTABmap maps have been generated.

Complete Chapter List

Search this Book: