Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Amit Saxena; John Wang

doi:10.4018/jdwm.2010040102

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Amit Saxena, John Wang

Source Title: International Journal of Data Warehousing and Mining (IJDWM) 6(2)

DOI: 10.4018/jdwm.2010040102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.

Article Preview

Top

Introduction

In most of the computer-based applications today, datasets are having a large number of patterns and relatively a smaller number of classes. Each pattern is characterized by a number of features and each pattern belongs to one of the total classes. Classification of these patterns is a major step in data mining (Han & Kamber, 2006). In majority of the data mining applications, patterns are required to be classified. As classification requires feature analysis, the latter becomes an important component of data mining. Feature analysis consists of feature selection and feature extraction. The function of a feature selection process is to select a subset of features from the entire set of features in a dataset. Feature extraction on the other hand, may combine or re-compute features among themselves to create a new feature. Curse of dimensionality caused due to redundancy of extra or derogatory features is a major issue of concern in data mining. Feature selection process can be useful to counter curse of dimensionality problem. Feature selection is applied to select most significant features in a dataset. By significant features here, we mean those features, which alone can predict the classes of the patterns in a dataset with maximum accuracy. If the feature selection makes use of information (such as class of a pattern) given before the process is applied, then the approach is called supervised. If no information is supplied a priory to grouping the patterns, the approach is called an unsupervised. In later case, the features are combined on the basis of some similarity (such as clustering). A number of supervised feature selection methods exist which use Neural Networks, Fuzzy logic, k-nearest neighbor search (k-nn) algorithms. On the contrary, the problem of unsupervised feature selection has been addressed rarely.

In most of the unsupervised feature selection approaches, grouping of features is based on the distance among individual features with each other. The computation of distance is central in the unsupervised approaches to decide the level of similarity among features. The decision of selecting significant features is heuristic. To select most effective features from a large number of features, evolutionary computing techniques can be applied. Genetic algorithm (GA) (Romero & Abelló, 2009; Goldberg, 1989), is a powerful evolutionary computing technique based on the principles of evolution. GA can be applied to select features in this manner.

With reduced number of feature selected through GA, next essential objective is to test the classification accuracy (CA) of the dataset due to this subset of features. The k-nn classification is a supervised method used to determine the CA of a data set. The distance between the test pattern and each pattern in the dataset is determined and the class is decided on the basis of the class of the pattern having minimum distance from the test pattern. In most cases, k=1, i.e. the pattern having minimum distance from the test pattern is marked as the class of the test pattern. A popularly known distance used for this purpose is a Euclidean distance, which is a special case of Minkowski metric or non-Euclidean norms. In this paper, we vary the Minkowski metric parameters for different values including the popular Euclidean distance to observe the effect of variation of Minkowski metric on CA of the dataset due to a particular feature set. The paper is organized as follows. Next section presents review of earlier work in the field. Section Genetic Algorithm, highlights brief description of GA. After that we outline Minkowski metric. The proposed method is explained in next. The brief summary of datasets is presented in Section Datasets. Simulation studies and Result Analysis are described separately in Appendix. Conclusions and future research scopes are presented in last section of main text.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 6 Issues (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Abstract

Introduction

Complete Article List