Almost all autonomous robots need to navigate. We define navigation as do Franz & Mallot (2000): “Navigation is the process of determining and maintaining a course or trajectory to a goal location” (p. 134). We allow that this definition may be more restrictive than some readers are used to - it does not for example include problems like obstacle avoidance and position tracking - but it suits our purposes here. Most algorithms published in the robotics literature localise in order to navigate (see e.g. Leonard & Durrant- Whyte (1991a)). That is, they determine their own location and the position of the goal in some suitable coordinate system. This approach is problematic for several reasons. Localisation requires a map of available landmarks (i.e. a list of landmark locations in some suitable coordinate system) and a description of those landmarks. In early work, the human operator provided the robot with a map of its environment. Researchers have recently, though, developed simultaneous localisation and mapping (SLAM) algorithms which allow robots to learn environmental maps while navigating (Leonard & Durrant-Whyte (1991b)). Of course, autonomous SLAM algorithms must choose which landmarks to map and sense these landmarks from a variety of different positions and orientations. Given a map, the robot has to associate sensed landmarks with those on the map. This data association problem is difficult in cluttered real-world environments and is an area of active research. We describe in this chapter an alternative approach to navigation called visual homing which makes no explicit attempt to localise and thus requires no landmark map. There are broadly two types of visual homing algorithms: feature-based and image-based. The featurebased algorithms, as the name implies, attempt to extract the same features from multiple images and use the change in the appearance of corresponding features to navigate. Feature correspondence is - like data association - a difficult, open problem in real-world environments. We argue that image-based homing algorithms, which provide navigation information based on whole-image comparisons, are more suitable for real-world environments in contemporary robotics.
Visual homing algorithms make no attempt to localise in order to navigate. No map is therefore required. Instead, an image IS (usually called a snapshot for historical reasons) is captured at a goal location S = (xS, yS). Note that though S is defined as a point on a plane, most homing algorithms can be easily extended to three dimensions (see e.g. Zeil et al. (2003)) . When a homing robot seeks to return to S from a nearby position C = (xC, yC), it takes an image IC and compares it with IS. The home vector H= S - C is inferred from the disparity between IS and IC (vectors are in upper case and bold in this work). The robot’s orientation at C and S is often different; if this is the case, image disparity is meaningful only if IC is rotated to account for this difference. Visual homing algorithms differ in how this disparity is computed.
Visual homing is an iterative process. The home vector H is frequently inaccurate, leading the robot closer to the goal position but not directly to it. If H does not take the robot to the goal, another image IC is taken at the robot’s new position and the process is repeated.
The images IS and IC are typically panoramic grayscale images. Panoramic images are useful because, for a given location (x,y) they contain the same image information regardless of the robot’s orientation. Most researchers use a camera imaging a hemispheric, conical or paraboloid mirror to create these images (see e.g. Nayar (1997)).
Some visual homing algorithms extract features from IS and IC and use these to compute image disparity. Alternatively, disparity can be computed from entire images, essentially treating each pixel as a viable feature. Both feature-based and image-based visual homing algorithms are discussed below.
Key Terms in this Chapter
Snapshot Image: In the visual homing literature, this is the image captured at the goal location.
Image-based Visual Homing: Visual homing (see definition below) in which the home vector is estimated from the whole-image disparity between snapshot and current images. No feature extraction or correspondence is required.
Correspondence Problem: The problem of pairing an imaged feature extracted from one image with the same imaged feature extracted from a second image. The images may have been taken from different locations, changing the appearance of the features.
Visual Homing: A method of navigating in which the relative location of the goal is inferred by comparing an image taken at the goal with the current image. No landmark map is required.
Optic Flow: The perceived movement of objects due to viewer translation and/or rotation.
Navigation: The process of determining and maintaining a course or trajectory to a goal location.
Catchment Area: The area from which a goal location is reachable using a particular navigation algorithm.
Feature Extraction Problem: The problem of extracting the same imaged features from two images taken from (potentially) different locations.