Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Feature Selection in High Dimension

Sébastien Gadat, Sébastien Gadat

Source Title: Advances in Face Image Analysis: Techniques and Technologies

DOI: 10.4018/978-1-61520-991-0.ch006

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Variable selection for classification is a crucial paradigm in image analysis. Indeed, images are generally described by a large amount of features (pixels, edges …) although it is difficult to obtain a sufficiently large number of samples to draw reliable inference for classifications using the whole number of features. The authors describe in this chapter some simple and effective features selection methods based on filter strategy. They also provide some more sophisticated methods based on margin criterion or stochastic approximation techniques that achieve great performances of classification with a very small proportion of variables. Most of these “wrapper” methods are dedicated to a special case of classifier, except the Optimal features Weighting algorithm (denoted OFW in the sequel) which is a meta-algorithm and works with any classifier. A large part of this chapter will be dedicated to the description of the description of OFW and hybrid OFW algorithms. The authors illustrate also several other methods on practical examples of face detection problems.

Chapter Preview

Top

Introduction

High dimensional data Most of nowadays encountered examples of face analysis tasks involve high-dimensional input variables. To detect faces in an image, we usually consider the set of all possible gray-level pixels of the image as informative features. In this case, features are considered as “low level” features contrary to more sophisticated “high level features” built from linear or non linear combination of the gray-level pixels such as linear filters, edges (mostly coming from some thresholding step), Fourier and Wavelet coefficients, principal components …

In the case of high level features, the number of variables may thus become very large and can even exceed the number of images. Imagine for instance one thousand samples of images described by 60 × 85 pixels, signals are thus defined by 5100 low-level features and the number of samples become rapidly too small to describe efficiently the data. Consequently, algorithms involving face detection tasks must generally solve some large dimensional problems while a large amount of variables may not be so good to reach accuracy and robustness properties as we will see in the next paragraph.

In this chapter, we focus on the problem of selecting features associated to the detection of “faces” (versus “non-faces”) when images are described by high-dimensional signals.

A statistical remark and the curse of dimensionality Let us first discuss in further details why abundance of variables can significantly harm classification in the context of face detection. From a statistical point of view, it is important to remove some irrelevant variables which act as artificial noise in data, especially in the case of images, and limit accuracy of detection tasks.

Moreover, in high dimensional spaces, we face the curse of dimensionality and it is generally impossible to draw some reliable conclusions from databases built with little number of examples regarding some large number of features. This phenomenon has been first pointed by Bellman (1961). Let us describe briefly this statistical property which can be summarized as the exponential growth of hyper-volume as a function of dimensionality. Consider for instance the learning task of face detection. The signal is denoted I while the presence of a face in the signal is Y (Y = 1 when a face is present and Y = 0 otherwise). A good statistical prediction of Y given I corresponds to a good knowledge of the joint law Y|X. If we called the dimension of the space E_d where I lives, the problem is equivalent to find a correct interpolation of this joint law given a Learning Set of size N when the samples are drawn in the space E_d. As d increases, the number of samples N necessary to build a design of fixed precision increases exponentially. For instance, N = 100 uniformly-spaced sample points suffice to sample a unit interval of dimension one with no more than 0.01 distance between points although an equivalent sampling of a 10-dimensional (d = 10) unit hypercube with a lattice spacing of 0.01 between adjacent points would require N = 10²⁰ sample points. Thus, it will be exponentially hardest to approximate the joint law Y|X when d is increasing.

In addition, let us remind the classical Bias/Variance trade-off (see (Geman, Bienenstock, and Doursat, 1992). If one want to predict Y with a function f applied to signal I using observations in a Learning Set D, the square loss decomposition for the prediction f(X) is given as

(1)

In the former decomposition, the bias is typically a non increasing function of the dimension d while the variance term may drastically increase with d. It is thus necessary to follow a Bias/Variance trade-off to find a good prediction f and a good complexity d. One of the goals of the selection of features is to reduce this phenomenon by restricting the set of features to the “good” ones.

At last, we can notice some efficiency issues, since the speed of many classification algorithms is largely improved when the complexity of the data is reduced. For instance, the complexity of the q-nearest neighbour algorithm varies proportionally with the number of variables. In some cases, the application of classification algorithms like Support Vector Machines described in Vapnik (1998) or q-nearest neighbours on the full features space is not possible or realistic due to the time needed to apply the decision rule.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Feature Selection in High Dimension

Abstract

Introduction

Complete Chapter List