Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Fuzzy Mutual Information Feature Selection Based on Representative Samples

Omar A. M. Salem, Liwei Wang

Source Title: International Journal of Software Innovation (IJSI) 6(1)

DOI: 10.4018/IJSI.2018010105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Building classification models from real-world datasets became a difficult task, especially in datasets with high dimensional features. Unfortunately, these datasets may include irrelevant or redundant features which have a negative effect on the classification performance. Selecting the significant features and eliminating undesirable features can improve the classification models. Fuzzy mutual information is widely used feature selection to find the best feature subset before classification process. However, it requires more computation and storage space. To overcome these limitations, this paper proposes an improved fuzzy mutual information feature selection based on representative samples. Based on benchmark datasets, the experiments show that the proposed method achieved better results in the terms of classification accuracy, selected feature subset size, storage, and stability.

Article Preview

Top

Introduction

Nowadays, classification models have various applications in many areas such as medical, business, engineering, life and social sciences. As the size of real-world datasets from these areas continues to increase, building classification models become a significantly more difficult task (Janecek et al., 2008). Although high-dimensional data include important features, it may also include undesirable data such as irrelevant and redundant features. The presence of undesirable features leads to a decrease in classification accuracy (Dash and Liu, 2003; Vieira et al., 2012). Moreover, it increases storage space and memory usage (Dash and Liu, 2003; Janecek et al., 2008). So, selecting relevant features and eliminating irrelevant or redundant features helps to build effective classification models (Yu et al., 2011).

Features selection as a preprocessing step aims to select the minimum subset that describes the data efficiently and increases the classification accuracy (Guyon and Elisseeff, 2003). It can be grouped into a wrapper, filter, and embedded approaches. Both wrapper and embedded approaches can be considered as classifier-dependent feature selection, while filter approaches can be considered as a classifier-independent feature selection (Bennasar et al., 2015). In this study, we use filter approach according to its advantages over wrapper or embedded approaches. The main advantages of filter approaches are classifier-independent, less time consuming and more practical for classification models (Saeys et al., 2007).

Filter approaches try to filter undesirable features out before classification process (Garc´ıa et al., 2015). They select the highly ranked features based on characteristics of the training data (Guyon and Elisseeff, 2003). The main characteristics of data depend on two relations: relevance and redundancy (Chandrashekar and Sahin, 2014). Relevance describes how the features can discriminate the different classes, while redundancy describes how the features depend on each other. So maximizing feature relevance and minimizing feature redundancy leads to best feature ranking. To evaluate the characteristics of features, filter approach uses many evaluation measures such as correlation (Hall, 1999), Shannon mutual information (Vergara and Est´evez, 2014). Correlation measures are suitable only for a linear relationship among features, while Shannon mutual information is suitable for linear and non-linear relations among features (Lee et al., 2012). However, Shannon mutual information has some limitations: First, it requires discretization step before dealing with continuous data. But, it is difficult to avoid information loss results from discretization (Ching et al., 1995; Shen and Jensen, 2004). Second, it depends only on the inner-class information without considering outer-class information (Liang et al., 2002).

To overcome these limitations, various algorithms based on mutual information with fuzzification has been introduced in many literatures. Yu et al. (2011). proposed a fuzzy mutual information using logarithmic concept. Another algorithm was proposed to estimate a fuzzy mutual information using complement instead of logarithmic concept (Zhao et al., 2015). Both of fuzzy mutual information algorithms depend on the fuzzy binary relation. This relation can be represented in relation matrix. The size of relation matrix depends on the number of samples in the input feature. Each row or column in the relation matrix represents the relation between one sample and each of the remaining samples. So, estimating relation matrix requires more storage and computational time, especially for datasets with a tremendous amount of samples (Yu et al., 2007). Motivated by these limitations of fuzzy mutual information, we proposed a new estimation of relation matrix. To create this matrix, we estimated the relation between one sample and representative samples. These samples consist of the averages of data samples belonging to the same class. Using representative samples instead of all samples can reduce the size of relation matrix.

Complete Article List

Search this Journal:

Reset

Volume 12: 1 Issue (2024)

Volume 11: 1 Issue (2023)

Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 9: 4 Issues (2021)

Volume 8: 4 Issues (2020)

Volume 7: 4 Issues (2019)

Volume 6: 4 Issues (2018)

Volume 5: 4 Issues (2017)

Volume 4: 4 Issues (2016)

Volume 3: 4 Issues (2015)

Volume 2: 4 Issues (2014)

Volume 1: 4 Issues (2013)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Fuzzy Mutual Information Feature Selection Based on Representative Samples

Abstract

Introduction

Complete Article List