Data Mining and the KDD Process

Data Mining and the KDD Process

Ana Funes (Universidad Nacional de San Luis, Argentina) and Aristides Dasso (Universidad Nacional de San Luis, Argentina)
DOI: 10.4018/978-1-5225-7598-6.ch038

Abstract

Nowadays, there exists an increasing number of applications where analysis and discovery of new patterns have fueled the research and development of new methods, all related to machine learning, knowledge extraction, knowledge discovery in databases or KDD, and data mining. The development of data mining and other related disciplines has benefited from the existence of large volumes of data proceeding from the most diverse sources and domains. KDD process and methods of data mining allows for the discovery of knowledge in data that is hidden to humans, presenting this knowledge under different ways. In this chapter, an overview of the KDD process with special focus in the phase of data mining is given. A discussion on data mining tasks and methods, a possible classification of them, the relation of data mining to other disciplines, and an overview of future challenges in the field are also given.
Chapter Preview
Top

Background

There exists some confusion in the use of the terms of Knowledge Discovery in Databases or KDD and Data Mining. Frequently these terms are interchanged, using Data Mining as synonym of KDD. Although they are strongly related, it is important to clarify the differences between them.

Several definitions of Data Mining can be found in the literature. Witten and Frank (2000) refers to Data Mining as the process of extraction of previously-unknown, useful and understandable knowledge from big volumes of data, which can be in different formats and come from different sources. In a much more short way, Hernández-Orallo, Ferri and Ramírez-Quintana (2004) define Data Mining as the process of converting data into knowledge. Sometimes Data Mining is also referred by many other names including knowledge extraction, information discovery, information harvesting, data archeology, and data pattern processing (Fayyad et al, 1996a).

The notion of Data Mining is not new. Since the 60s, other terms as Data Fishing or Data Dredging have been used by statisticians to refer to the idea of finding correlations in data without a previous hypothesis as underlying causality. However, it is not until the late 80s that Data Mining became a discipline of Computer Science and scientific community adopted the term. In fact, as Witten and Frank (2005) point out, the first book on data mining appeared in 1991 (Piatetsky-Shapiro and Frawley, 1991) –a collection of papers presented at a workshop on knowledge discovery in databases in the late 1980s.

Complete Chapter List

Search this Book:
Reset