Attentive Visual Memory for Robot Localization

Attentive Visual Memory for Robot Localization

Julio Vega (Rey Juan Carlos University, Spain), Eduardo Perdices (Rey Juan Carlos University, Spain) and José María Cañas (Rey Juan Carlos University, Spain)
Copyright: © 2014 |Pages: 27
DOI: 10.4018/978-1-4666-4607-0.ch038
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Cameras are one of the most relevant sensors in autonomous robots. Two challenges with them are to extract useful information from captured images and to manage the small field of view of regular cameras. This chapter proposes a visual perceptive system for a robot with a mobile camera on board that cope with these two issues. The system is composed of a dynamic visual memory that stores the information gathered from images, an attention system that continuously chooses where to look at, and a visual evolutionary localization algorithm that uses the visual memory as input. The visual memory is a collection of relevant task-oriented objects and 3D segments. Its scope and persistence is wider than the camera field of view and so provides more information about robot surroundings and more robustness to occlusions than current image. The control software takes its contents into account when making behavior or navigation decisions. The attention system considers the need of reobserving objects already stored, of exploring new areas and of testing hypothesis about objects in the robot surroundings. A robust evolutionary localization algorithm has been developed that can use both the current instantaneous images or the visual memory. The system has been programmed and several experiments have been carried out both with simulated and real robots (wheeled Pioneer and Nao humanoid) to validate it.
Chapter Preview
Top

Introduction

Computer vision research is growing rapidly, both in robotics and in many other applications, from surveillance systems for security to the automatic acquisition of 3D models for Virtual Reality displays. The number of commercial applications is increasing, like traffic monitoring, parking entrance control, augmented reality videogames and face recognition. In addition, computer vision is one of the most successful sensing modalities used in mobile robotics. Cameras have been incorporated in the last years to robots as common sensory equipment. They are very cheap sensors and may provide much information to robots about their environment. However, extracting relevant information from the image flow is not easy. Vision has been used in robotics for navigation, object recognition, 3D mapping, visual attention, robot localization, etc.

Robots usually navigate autonomously in dynamic environments, and so they need to detect and avoid obstacles. There are several sensors which can detect obstacles in robot's path, such as infrared sensors, laser range finders, ultrasound sensors, etc. When using cameras, obstacles can be detected through 3D reconstruction. Recovering 3D-information has been the main focus of the computer vision community for decades. Stereo-vision methods are the classic ones, based on finding pixel correspondences between the two cameras and triangulation, despite they fail with untextured surfaces. Vision depth sensors like Kinect offer now a different technology for visual 3D reconstruction. In addition, structure from motion techniques builds three-dimensional structure of objects by analyzing local motion signals over time, even from only one camera (Richard & Zisserman, 2003).

Moreover, many works have also been presented in vision based navigation and control that generate robot behavior without explicit 3D reconstruction. The temporal occlusions of relevant stimuli inside the images are one hindrance in this approach. The control algorithm should be robust to the lack of time persistence of relevant stimuli in images. This also poses a problem when the objects lie beyond the current field of view of the camera. To solve it, some systems use omnidirectional vision. Others, like humanoids or robots with pantilt units, use mobile regular cameras that can be orientated at will and manage a visual memory of robot surroundings that integrate the information from the images taken from different locations. The visual representation of interesting objects around the robot beyond current field of view may improve the quality of robot's behavior as it handles more information when making decisions. The problem of selecting where-to-look-at at every time, known as gaze control or overt attention (Itti & Koch, 2001; Zaharescu et al., 2005), arises there. Usually the need to quickly explore new areas and the need to reobserve known objects to update their positions, etc. influence that selection. This kind of attention is also present in humans, as we are able to concentrate on particular regions of interest in a scene by movements of the eyes and the head, just by shifting attention to different parts of it. By driving attention specifically to small regions which are important for the task at hand we avoid wasting effort and processing trying fully understand the whole surroundings, and devote as much as possible only to the relevant part.

Another relevant information that can be extracted from images is robot location. Robots need to know their location inside the environment in order to unfold the proper behavior. Using robot sensors (specially vision) and a map, the robot may estimate its own position and orientation inside a known environment. Robot self-localization has proven to be complex, especially in dynamic environments and in those with much symmetry, where sensors values can be similar at different positions.

Complete Chapter List

Search this Book:
Reset