Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Data Mining and the KDD Process

Ana Funes, Aristides Dasso

Source Title: Encyclopedia of Information Science and Technology, Fourth Edition

DOI: 10.4018/978-1-5225-2255-3.ch167

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Nowadays, there exists an increasing number of applications where analysis and discovery of new patterns have fueled the research and development of new methods, all related to Machine Learning, Knowledge Extraction, Knowledge Discovery in Databases or KDD, and Data Mining. The development of Data Mining and other related disciplines has benefited from the existence of large volumes of data proceeding from the most diverse sources and domains. KDD process and methods of Data Mining allows for the discovery of knowledge in data that is hidden to humans, presenting this knowledge under different ways. In this chapter, an overview of the KDD process with special focus in the phase of Data Mining is given. A discussion on Data Mining tasks and methods, a possible classification of them, the relation of Data Mining to other disciplines, and an overview of future challenges in the field are also given.

Chapter Preview

Top

Background

There exists some confusion in the use of the terms of Knowledge Discovery in Databases or KDD and Data Mining. Frequently these terms are interchanged, using Data Mining as synonym of KDD. Although they are strongly related, it is important to clarify the differences between them.

Several definitions of Data Mining can be found in the literature. Witten and Frank (2000) refers to Data Mining as the process of extraction of previously-unknown, useful and understandable knowledge from big volumes of data, which can be in different formats and come from different sources. In a much more short way, Hernández-Orallo, Ferri and Ramírez-Quintana (2004) define Data Mining as the process of converting data into knowledge. Sometimes Data Mining is also referred by many other names including knowledge extraction, information discovery, information harvesting, data archeology, and data pattern processing (Fayyad et al, 1996a).

The notion of Data Mining is not new. Since the 60s, other terms as Data Fishing or Data Dredging have been used by statisticians to refer to the idea of finding correlations in data without a previous hypothesis as underlying causality. However, it is not until the late 80s that Data Mining became a discipline of Computer Science and scientific community adopted the term. In fact, as Witten and Frank (2005) point out, the first book on data mining appeared in 1991 (Piatetsky-Shapiro and Frawley, 1991) –a collection of papers presented at a workshop on knowledge discovery in databases in the late 1980s.

Key Terms in this Chapter

Inductive Learning: Induction is the inference of information from data and inductive learning is a model building process where the data are analyzed to find hidden patterns.

Supervised Learning: Learning process of a predictive model from a set of objects, where a supervisor define classes and supply objects of each class. Once the model has been formulated it can be used to predict the class(es) of new objects.

Data Mining: The process of extraction of implicit, previously unknown, and potentially useful knowledge from data. It uses Machine Learning, statistical and visualization techniques to discover and present knowledge in a form that is easily comprehensible to humans. It is a phase in a bigger process: the Knowledge Discovery in Databases (KDD) process.

Unsupervised Learning: Learning process of a descriptive model (patterns) by observation and discovery from a set of unlabeled objects.

Classification: Inductive task where a predictive model is learnt from objects labeled with a class and whereby it is possible to predict the class of new objects.

KDD Process: The KDD process is an iterative process that consists in the selection, cleaning and transformation of data coming not only from databases but also from other heterogeneous sources, such as plain text, data warehouses, images, sound, etc., aimed to apply to them data mining algorithms in order to discover valid, novel, potentially useful, and understandable hidden patterns.

Clustering: Inductive task where a set of unlabeled objects is partitioned into groups (clusters) and where objects in a same cluster have similar characteristics, maximizing the similarity intra cluster and minimizing the similarity inter cluster.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Data Mining and the KDD Process

Abstract

Background

Key Terms in this Chapter

Complete Chapter List