Feature Extraction


This chapter is an illustration of feature extraction for working with large datasets. The basic definition of feature extraction, selection of effective features, and the existing problems and solutions are provided. How feature extraction maps the high dimensional space to smaller space is explained.
Chapter Preview

5.1 Introduction

Feature extraction is an attribute reduction process. Unlike feature selection, which ranks the existing attributes according to their predictive significance, feature extraction actually transforms the attributes. The transformed attributes, or features, are linear combinations of the original attributes.

Digital libraries are places for handling a vast amount of data and information. This provides the ability to access and interact to a lot of documents in the form of electronic version. Schatz (1997) has a definition for digital libraries: “A digital library enables users to interact effectively with information distributed across a network”.

A digital library works as a network information system to support some tasks such as search and display of an item in a database. Once comfortable with the new tools, they demand new materials to be available in digital libraries. Any task in the digital library requires the digital representation of the data. Extending a digital library is an easy task, because of faster and cheaper process than the physical library. Obviously, this increase in the amount of information has a strong impact on the supporting software.

Three are multimedia content in the electronic version of text data, which is an issue. To obtain the image in the document, one can provide the query and ask for all the electronic documents that contain similar pictures. Digital images and any other multimedia data for that matter are complex data. Computers have the ability to represent and manipulate the digital data. However, decoding the contents is an issue for researching (Manolescu, 2000).

5.1.1 Problem

The question is that how a software can handle the requirements related to the applications to deal with large amounts of information, similarity searching and complex data.

5.1.2 Forces

  • Difficulties in information retrieval systems with large amounts of data.

  • Difficulties in similarity searching.

  • Difficulties in multimedia databases containing digital representations of acoustic and visual data.

  • Difficulties in information retrieval systems for fast responding time.

Complete Chapter List

Search this Book: