Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Feature Selection Algorithms for Classification and Clustering

Arvind Kumar Tiwari

Source Title: Ubiquitous Machine Learning and Its Applications

DOI: 10.4018/978-1-5225-2545-5.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Feature selection is an important topic in data mining, especially for high dimensional dataset. Feature selection is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of learning algorithm. The best subset contains the least number of dimensions that most contribute to accuracy. Feature selection methods can be decomposed into three main classes, one is filter method, another one is wrapper method and third one is embedded method. This chapter presents an empirical comparison of feature selection methods and its algorithm. In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enable to adequately decide which algorithm to use in certain situation. This chapter reviews several fundamental algorithms found in the literature and assess their performance in a controlled scenario.

Chapter Preview

Top

Introduction

The feature selection problem is inescapable in inductive machine learning or data mining setting and its significance is beyond doubt. The main benefit of a correct selection is the terms of learning speed, speculation capacity or simplicity of the induced model. On the other hand there are the straight benefits related with a smaller number of features: a reduced measurement cost and hopefully a better understanding of the domain. A feature selection algorithm (FSA) is a computational solution that should be guided by a certain definition of subset relevance although in many cases this definition is implicit or followed in a loose sense. This is so because, from the inductive learning perspective, the relevance of a feature may have several definitions depending on precise objective (Caruana and Freitag, 1994). Thus the need arises to count on common sense criteria that enable to adequately decide which algorithm to use or not to use in certain situation (Belanche and González, 2011). The feature selection algorithm can be classified according to the kind of output one are giving a (weighed) linear order of features and second are giving a subset of the original features. In this research, several fundamental algorithms found in the literature are studied to assess their performance in a controlled scenario. This measure computes the degree of matching between the output given by a FSA and the known optimal solution. Sample size effect also studied. The result illustrates the strong dependence on the particular conditions of the FSA used and on the amount of irrelevance and redundancies in the data set description, relative to the total number of feature. This should prevent the use of single algorithm even when there is poor knowledge available about the structure of the solution. The basic idea in feature selection is to detect irrelevant and/or redundant features as they harm the learning algorithm performance (Lee and Moore, 2014). There is no unique definition of relevance, however it has to do with the discriminating ability of a feature or a subset to distinguish the different class labels (Dash and Liu, 1997). However, as pointed out in the paper (Guyon and Elisseeff, 2003a), an irrelevant variable may be useful when taken with others and even two irrelevant variables that are useless by themselves can be useful when taken together.

Figure 1.

Feature selection criteria

The Feature Selection Problem

Let X be the original set of features which cardinality. The continuous feature selection problem (also called feature weighing) refers to the assignment of weights w_i to each feature in such a way that the order corresponding to its theoretical relevance is preserved. The binary feature selection problem (also called feature subset selection) refers to the choice of a subset of feature that jointly maximizes a certain measure related to subset relevance. This can be carried out directly as many FSA (Almuallim and Dietterich, 1991: Caruana and Freitag, 1994) or setting a cut point in the output of this continuous problem solution. Although both types can be seen in a unified way (the latter case corresponds to the assignment of weights in {0, 1}), these are quite different problems that reflect different design objectives. In the continuous case, one is interested in keeping all the features but in using them differentially in the learning process. On the contrary in the binary case one is interested in keeping just a subset of the features and (most likely) using them equally in the learning process.

A common instance of the feature selection problem can be formally stated as follows. Let J be a performance evaluation measure to be optimized (say to maximize) defined as. This function accounts for a general evaluation measure that may or may not be inspired in a precise and previous definition of relevance. Let represent the cost of variable x and call for . Let be the cost of the whole feature set. It is assumed here that c is additive, that is, (Belanche and González, 2011).