Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Causal Feature Selection

Walisson Ferreira Carvalho, Luis Zarate

Source Title: Integration Challenges for Analytics, Business Intelligence, and Data Mining

DOI: 10.4018/978-1-7998-5781-5.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Feature selection is a process of the data preprocessing task in business intelligence (BI), analytics, and data mining that urges for new methods that can handle with high dimensionality. One alternative that have been researched to deal with the curse of dimensionality is causal feature selection. Causal feature selection is not based on correlation, but the causality relationship among variables. The main goal of this chapter is to present, based on the issues identified on other methods, a new strategy that considers attributes beyond those that compounds the Markov blanket of a node and calculate the causal effect to ensure the causality relationship.

Chapter Preview

Top

Introduction

Year after year, the volume of data has proliferated at remarkable speed. However, large volumes and variety of data do not necessarily translate into quality and, due to this exponential growth, researchers are dealing with new challenges on the process of discovering knowledge. These challenges involve: the comprehension and modeling of the problem being considered, that quality of data, and identifying relevant data. One well-known problem is the Curse of Dimensionality. The Curse of Dimensionality is a term presented by Bellman in 1957 to describe a problem caused by an exponential increase in volume, especially complications when it comes to analyzing and organizing data in high-dimensional spaces (Keogh & Mueen, 2017).

The more data is available, the greater the need to analyze it in order transform it into knowledge, and then convert knowledge into information. Three areas of knowledge are currently dealing with this very subject: Business Intelligence (BI), Analytics, and Data Mining.

Business Intelligence can be defined as the process of transforming data into information and, consequently, into knowledge. Analytics can be defined as the process of transforming data into insights. Whereas Data Mining is the process of discovering potentially useful and unknown information from a collection of data. All three processes have the same input: data. Their shared aim to produce information and knowledge to support decisions’ makers.

Despite their minor differences, all three processes are dependent of the quality of data, not only on the volume that enters the pipeline. Therefore, quality data is a critical factor of success. This quality of data can be understood from the concept of Smart Data, which refers to the process of transforming raw data into quality data. The process of discovering smart data is defined by the Gartner Group as “a next-generation data discovery capability that provides business users or citizen data scientists with insights from advanced analytics.”

It is well known that the pipeline for transforming raw data into knowledge and, consequently, in information (or insights) includes the preprocessing stage. According to Garcıa et al. (2015) preprocessing is the most important stage in data mining and is affected by the volume of data as well. In the event raw data is not ready to be analyzed, it is necessary to prepare it before being processed by learner’s model algorithm. The preprocessing phase is responsible for transforming data and includes data cleaning, integration, normalization, and dealing with missing data.

One strategy used during the preprocessing stage is dimensionality reduction, a technique that can be feature extraction, feature selection, or instance selection. Feature extraction is associated with constructing new features as functions of existing ones. Transformation, discretization, and Principal Components Analysis (PCA) are techniques of feature extraction. Meanwhile, feature selection aims to reduce the number of features by selecting the more representative subset of variables in a given problem.

The reduction of dimensionality can also consider attributes and samples in a process known as hybrid partitioning. In other words, the data set can be reduced in terms of column (attributes) or rows (samples). The reduction of sample is known as Instance Selection and is a technique used to select the best subset of examples and naturally improves the performance of the learning’s algorithm, but the focus of this chapter is on feature selection because it facilitates the learning task and aims to select the optimal subset of features that best represents a problem.

Triguero et al. (2019) emphasized that data preprocessing is one of the most important stages in the process of transforming data into information and Feature Selection is a data preprocessing strategy that should be applied to mitigate problems in the data pipeline.

Take, for instance, the Analytics’ process that, despite of its growth, is still prone to some challenges such as how to handle the amount of data, the lack of quality in data, computational resources, and high dimensionality. Analytics can be classified as Descriptive, Predictive, and Prescriptive. Descriptive is related to historical data. In this preliminary stage, the question to be answered is “What is happening?”. Predictive is related to the future, using data from the past to predict the future to answer such questions as “What will happen in the future?”. Prescriptive is dedicated to trying to answer the question “What should be done?”. In general, applying a satisfactory Analytics process requires having smart data that can answer these questions.

Key Terms in this Chapter

Feature Selection: A task in the preprocessing stage that aims to select the most relevant subset of features given a target.

Global Learning: It is an approach that learn Bayesian Network searching the whole DAG space, using all variables.

Markov Blanket: In a graph, Markov Blanket is a subset of features that includes parents, children and spouses of a specific node.

Curse of Dimensionality: Refers to the problem when analyzing data in high dimensional space that does not occur in low dimensional.

Neighborhood: In graph theory, the neighborhood of a vertex V is the subgraph composed of all vertices adjacent to V.

Causal Effect: Given two variables X and Y, causal effect of X on Y can be summarized as a function from X to the probability distribution of Y.

Direct Effect: Given two variables X and Y, direct effect measures how sensible Y is in relation to X when other variables of the model are fixed.

Local Learning: It is an approach that learn Bayesian Network limiting the DAG space to some variables that are potential candidates for local structures such as Markov Blanket or Parents and Children of a given target.

Markovian Parents: In a graph, given a variable X represented by a node, Markovian Parents is a subset of predecessor’s variables, nodes, that renders X.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Causal Feature Selection

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List