is the relationship between missingness and the known attributes in the input data matrix, i.e., the probability that a set of values are missing given the values taken by the observed and missing features.
Published in Chapter:
Classification with Incomplete Data
Pedro J. García-Laencina (Universidad Politécnica de Cartagena, Spain), Juan Morales-Sánchez (Universidad Politécnica de Cartagena, Spain), Rafael Verdú-Monedero (Universidad Politécnica de Cartagena, Spain), Jorge Larrey-Ruiz (Universidad Politécnica de Cartagena, Spain), José-Luis Sancho-Gómez (Universidad Politécnica de Cartagena, Spain), and Aníbal R. Figueiras-Vidal (Universidad Carlos III de Madrid, Spain)
Copyright: © 2010
|Pages: 29
DOI: 10.4018/978-1-60566-766-9.ch007
Abstract
Many real-word classification scenarios suffer a common drawback: missing, or incomplete, data. The ability of missing data handling has become a fundamental requirement for pattern classification because the absence of certain values for relevant data attributes can seriously affect the accuracy of classification results. This chapter focuses on incomplete pattern classification. The research works on this topic currently grows wider and it is well known how useful and efficient are most of the solutions based on machine learning. This chapter analyzes the most popular and proper missing data techniques based on machine learning for solving pattern classification tasks, trying to highlight their advantages and disadvantages.