Data Mining with Incomplete Data

Data Mining with Incomplete Data

Hai Wang (Saint Mary’s University, Canada) and Shouhong Wang (University of Massachusetts Dartmouth, USA)
Copyright: © 2009 |Pages: 5
DOI: 10.4018/978-1-60566-010-3.ch082
OnDemand PDF Download:
$37.50

Abstract

Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data set that contains complete entries of each observation for all of the variables. Commonly, surveys and questionnaires are often only partially completed by respondents. The possible reasons for incomplete data could be numerous, including negligence, deliberate avoidance for privacy, ambiguity of the survey question, and aversion. The extent of damage of missing data is unknown when it is virtually impossible to return the survey or questionnaires to the data source for completion, but is one of the most important parts of knowledge for data mining to discover. In fact, missing data is an important debatable issue in the knowledge engineering field (Tseng, Wang, & Lee, 2003). In mining a survey database with incomplete data, patterns of the missing data as well as the potential impacts of these missing data on the mining results constitute valuable knowledge. For instance, a data miner often wishes to know how reliable a data mining result is, if only the complete data entries are used; when and why certain types of values are often missing; what variables are correlated in terms of having missing values at the same time; what reason for incomplete data is likely, etc. These valuable pieces of knowledge can be discovered only after the missing part of the data set is fully explored.
Chapter Preview
Top

Main Thrust

There have been two primary approaches of data mining with incomplete data: conceptual construction and enhanced data mining.

Complete Chapter List

Search this Book:
Reset