Emerging Missing Data Estimation Problems: Heteroskedasticity; Dynamic Programming and Impact of Missing Data

Tshilidzi Marwala

doi:10.4018/978-1-60566-336-4.ch013

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Emerging Missing Data Estimation Problems: Heteroskedasticity; Dynamic Programming and Impact of Missing Data

Tshilidzi Marwala

Source Title: Computational Intelligence for Missing Data Imputation, Estimation, and Management: Knowledge Optimization Techniques

DOI: 10.4018/978-1-60566-336-4.ch013

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter is divided into three parts: The first part presents a computational intelligence approach for predicting missing data in the presence of concept drift using an ensemble of multi-layered feed-forward neural networks. An algorithm that detects concept drift by measuring heteroskedasticity is proposed. Six instances prior to the occurrence of missing data are used to approximate the missing values. The algorithm is applied to simulated time series data sets resembling non-stationary data from a sensor. Results show that the prediction of missing data in non-stationary time series data is possible but is still a challenge. In the second part, an algorithm that uses dynamic programming and neural networks to solve the problem of missing data imputation is presented. A model that uses autoassociative neural networks and genetic algorithms is used as a basis; however, the neural networks are not trained using the entire data set. Data are broken up into granules and various models are created. The models are tested on a real dataset and the results show that the proposed method is effective in missing data estimation. In the third part of this chapter, a study of the impact of missing data estimation on fault classification in mechanical systems is undertaken. The fault classification task is implemented using the extension network as well as Gaussian mixture models. When the imputed values are used in the classification of faults using the extension networks, the fault classification accuracy of 95% is observed for single-missing-entry cases and 92% for two-missing-entry cases while the full database set is able to give classification accuracy of 97%. On the other hand, the Gaussian mixture model gives 94% for single-missing-entry cases and 92% for two-missing-entry cases while the full database set is able to give classification accuracy of 96%.

Chapter Preview

Top

Introduction: Heteroskedasticity

The problem of missing data has intensively been researched but continues to be mainly unsettled. One of the causes for this is that the complexity of approximating missing variables is exceedingly reliant on the problem domain. This complexity, moreover, increases when data are missing in an on-line application where data have to be used as soon as they are obtained. A difficult characteristic of the missing data problem is when data are missing from a time series that exhibit non-stationarity. Most machine learning techniques and algorithms that have been developed thus far assume that data will continuously be obtainable. In addition, they assume that data conform to a stationary distribution.

Non-stationarity of a data essentially means that the character or the nature of the data is actually changing as a function of time. There are lots of non-stationary quantities in the natural world that fluctuate with time. Familiar examples include the stock market, weather, heartbeats, seismic waves as well as animal populations. There are some engineering and measurement systems that have been developed to detect and to quantify non-stationary quantities. Such instruments are not resistant to failures. These instruments include the wavelet methods which are time-frequency analysis methods (Marwala, 2002; Bujurke et al., 2007) and fractals methods (Lunga & Marwala, 2006a; Sadana, 2003&2005; Reiter, 1994). In this chapter, a procedure known as heteroskedasticity (Nelwamondo & Marwala, 2007a) is used to analyze concept drift with the aim of ensuring that the deployed missing data estimation method remains relevant even in the presence of the concept drift.

Computational intelligence techniques have previously been employed for analyzing non-stationary data such as the stock-market, nevertheless, the volatility of the data render the problem too complex to easily analyze. The 2003 Nobel Prize Laureates in Economics, Granger (2003) and Engle (1982) made an exceptional contribution to non-linear data analysis. Granger showed that long-established statistical methods could be deceiving if applied to variables that wander over time without returning to some long-run resting position. Engle (1982) on the other hand contributed a pioneering innovation of an Autoregressive Conditional Heteroskedasticity (ARCH), a technique to analyze and understand unpredictable movements in financial market prices. This method is, moreover, applicable to risk assessment. Dufour et al. (2004) introduced simulation-based finite-sample tests for Heteroskedasticity and ARCH effects. Hafner and Herwartz (2001) proposed option pricing under linear autoregressive dynamics, heteroskedasticity, and conditional leptokurtosis whereas Khalaf, Saphores, and Bilodeau (2003) introduced simulation-based exact jump tests in models with conditional heteroskedasticity and Inkmann (2000) introduced mis-specified heteroskedasticity in the panel probit model and made a comparison between Gaussian mixture models (GMM) and simulated maximum likelihood. Other work on the heteroskedasticity include its use on analyzing the performance of bootstrap neural tests for conditional heteroskedasticity in ARCH models (Siani & Peretti, 2007), pooling of cross-sectional and time-series data in the presence of heteroskedasticity as well as analyzing auto-correlation- and heteroskedasticity-consistent t-values with trending data (Krämer & Michels, 1997).

Numerous techniques for solving missing data problems have been developed and discussed at length in the literature (Little & Rubin, 1987). However, limited attempt has been made to approximate missing data in strictly non-stationary processes, where concepts change with time. The challenge with missing data problems in this application is that the approximation process must be complete before the next sample is taken. Moreover, more than one technique may be required to approximate the missing data due to drifting of concepts. As a result, the computational time needed, the amount of computational memory required and the model complexity may grow indefinitely as new data continually arrive (Last, 2002).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Emerging Missing Data Estimation Problems: Heteroskedasticity; Dynamic Programming and Impact of Missing Data

Abstract

Introduction: Heteroskedasticity

Complete Chapter List