Introduction to Missing Data

Tshilidzi Marwala

doi:10.4018/978-1-60566-336-4.ch001

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Introduction to Missing Data

Tshilidzi Marwala

Source Title: Computational Intelligence for Missing Data Imputation, Estimation, and Management: Knowledge Optimization Techniques

DOI: 10.4018/978-1-60566-336-4.ch001

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this chapter, the traditional missing data imputation issues such as missing data patterns and mechanisms are described. Attention is paid to the best models to deal with particular missing data mechanisms. A review of traditional missing data imputation methods, namely case deletion and prediction rules, is conducted. For case deletion, list-wise and pair-wise deletions are reviewed. In addition, for prediction rules, the imputation techniques such as mean substitution, hot-deck, regression and decision trees are also reviewed. Two missing data examples are studied, namely: the Sudoku puzzle and a mechanical system. The major conclusions drawn from these examples are that there is a need for an accurate model that describes inter-relationships and rules that define the data and that a good optimization method is required for a successful missing data estimation procedure.

Chapter Preview

Top

Introduction

Datasets are frequently characterized by their incompleteness. There are a number of reasons why data become missing (Ljung, 1989). These include sensor failures, omitted entries in databases and non-response in questionnaires. In many situations, data collectors put in place firm measures to circumvent any incompleteness in data gathering. Nevertheless, it is unfortunate that despite all these efforts, data incompleteness remains a major problem in data analysis (Beunckens, Sotto, & Molenberghs, 2008; Schafer, 1997; Schafer & Olsen, 1998). The specific reason for the incompleteness of data is usually not known in advance, particularly in engineering problems. Consequently, methods for averting missing data are normally not successful. The absence of complete data then hampers decision-making processes because of the dependence of decisions on full information (Stefanakos & Athanassoulis, 2001; Marwala, Chakraverty, & Mahola, 2006).

In one way or another, most scientific, business and economic decisions are related to the information available at the time of making such decisions. For example, many business decisions are dependent on the availability of sales data and other information, while progresses in research are based on discovery of knowledge from various experiments and measured parameters. For example, in aerospace engineering, there are many fault detection mechanisms where the measured data are either partially corrupted or otherwise incomplete (Marwala & Heyns, 1998). In many applications, merely ignoring the incomplete record is not an optimal option because this may lead to biased results in statistical modeling resulting in, for example, a breakdown in machine automation or control. For this reason, it is essential to make decisions based on available data.

Most decision support systems such as the commonly used neural networks, support vector machines and many other computational intelligence techniques are predictive models that take observed data as inputs and predict an outcome (Bishop, 1995; Marwala & Chakraverty, 2006). Such models fail when one or more inputs are missing. Consequently, they cannot be used for decision-making purpose if the data variables are not complete. The end goal of the missing data estimation process is usually to make optimal decisions. To achieve this goal, appropriate approximations to the missing data need to be found. Once the missing variables values have been estimated, then pattern recognition tools for decision-making can be used.

The problem that missing data poses to a decision making process is more apparent in online applications where data have to be used nearly instantly after being obtained. In a situation where some variables are not available, it becomes difficult to carry on with the decision making process thereby stopping the application all together. In essence, the major challenge is that the standard computational intelligence techniques are not able to process input data with missing values. They cannot perform classification or regression if one of the variables is missing. Another major issue that is of concern here is that many missing data imputation techniques developed thus far are mainly suited for survey datasets. In this case, data analysts do have adequate time to study the reasons why data components are missing. However, in many engineering problems, missing data are usually required in real-time. Therefore, there is no time to understand why data components are missing. This calls for a development of robust methods that are effective for missing data estimation regardless of the cause of why the data are missing.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Introduction to Missing Data

Abstract

Introduction

Complete Chapter List