Feature Detectors and Descriptors Generations with Numerous Images and Video Applications: A Recap

Feature Detectors and Descriptors Generations with Numerous Images and Video Applications: A Recap

Nilanjan Dey (Techno India College of Technology – Kolkata, India), Amira S. Ashour (Tanta University, Egypt) and Aboul Ella Hassanien (Cairo University, Egypt)
Copyright: © 2017 |Pages: 30
DOI: 10.4018/978-1-5225-1025-3.ch003
OnDemand PDF Download:
List Price: $37.50


Feature detectors have a critical role in numerous applications such as camera calibrations, object recognition, biometrics, medical applications and image/video retrieval. One of its main tasks is to extract point correspondences “Interest points” between two similar scenes, objects, images or video shots. Extensive research has been done concerning the progress of visual feature detectors and descriptors to be robust against image deformations and achieve reduced computational speed in real-time applications. The current chapter introduced an overview of feature detectors such as Moravec, Hessian, Harris and FAST (Features from Accelerated Segment Test). It addressed the feature detectors' generation over time, the principle concept of each type, and their use in image/video applications. Furthermore, some recent feature detectors are addressed. A comparison based on these points is performed to illustrate their respective strengths and weaknesses to be a base for selecting an appropriate detector according to the application under concern.
Chapter Preview


Recently, a huge amount of data to be processed is increased due to the increasing popularity of camera-ready cellphones as well as the broadband wireless devices, which capature high-quality images and video content. Mainly, this information has the form of text, pictures, and graphics or integrated multimedia presentations. Digital images and digital video are pictures and movies, respectively that converted into a binary format. Typically, image means a still picture that does not change with time, while a video changes with time and contains moving and/or changing objects. The real world is rich in visual information, deducing the massive amount of data becomes a challenging process. Thus, presenting and processing enormous amount of image/video data become serious problem. Image/video content information are used in several applications and used in some extent as a search tools. The foremost motivation for extracting the information content is the accessibility problem. To solve this problem, the extraction of relevant information features for a given content domain is required. Vision systems deal with images captured to extract information to perform certain tasks. The process of tracking a moving object(s) continuously using a camera is known as visual object tracking. It is employed to determine the object’s position in frames continuously and reliably in video. It is very imperative task in several computer vision applications. Computer vision is a discipline of artificial intelligence that provide computers with the ability to observe objects. Analysis of the extracted information different types depends on the application to be accomplished. The ultimate goal is to use the detected information to attain an understanding of different objects through their physical and geometrical attributes.

Video consists of a collection of video frames moving at certain speed (frame rate). Each frame is a picture image made of pixels. A sequence of frames which recorded in a single-camera operation is called shot, while the collection of consecutive shots which have semantic similarity in persons, objects, time and space is called scene. Meanwhile, video can be considered as a combination of images collected in frame.

Foremost, image features are used to distinguish image regions and to characterize the appearance/shape of any objects in the images. Features embrace point, line or compound features to be extensively used by numerous computer vision applications including recognition, object detection, camera calibration, stereo, tracking and 3D reconstruction.

In both computer vision and image processing, abundant applications depend upon the robust detection of image features and their parameters’ estimation. Thus, feature detection is an essential concern in the intermediary levels vision applications, such as:

  • Image registration stereo,

  • Motion correspondence,

  • Simultaneous localization and mapping (SLAM) for autonomous vehicles,

  • Object recognition, and

  • Stereoscopic vision (Lowe, 2004).

Moreover, it is considered a powerful tool that has been applied effectively in an extensive range of other systems and application domains include:

  • Facial feature detection,

  • Land mine detection,

  • Medical applications,

  • Video mining,

  • Image retrieval, and

  • Texture analysis (Tien et al., 2008; Shin, & Kim, 2014).

Primarily distinguishing between feature detectors and descriptors is essential. Detectors are operators that search two dimensional (2D) locations in the images (i.e. a point or a region) geometrically stable under different transformations and containing high information content that results ‘interest points’, ‘corners’, ‘affine regions’ or ‘invariant regions’. While, descriptors analyze the image to provide a 2D vector of pixel information for certain positions (e.g. an interest point). This information can be used to classify the extracted points or in a matching process.

Complete Chapter List

Search this Book: