Knowledge Discovery in Databases and Data Mining

Knowledge Discovery in Databases and Data Mining

Petr Berka (University of Economics, Prague, Czech Republic & University of Finance and Administration, Prague, Czech Republic)
Copyright: © 2015 |Pages: 10
DOI: 10.4018/978-1-4666-5888-2.ch174
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Introduction

Knowledge discovery in databases (KDD) or data mining (DM) is aimed at acquiring implicit knowledge from data and using it to build classification, prediction, description, etc. models for decision support. As more data is gathered, with the amount of data doubling every three years, data mining becomes an increasingly important tool to transform this data into knowledge. While it can be used to uncover hidden patterns, it cannot uncover patterns which are not already present in the data set. This article covers the following topics:

  • Basic definitions of knowledge discovery in databases and data mining

  • Tasks and application areas

  • The process of knowledge discovery in databases

  • Standardization effort in the area of data mining

  • Data Mining tools

  • Text mining and web mining as specific subfields of data mining

  • Important research challenges

Top

Background

The rapid growth of data collected and stored in various application areas brings new problems and challenges in their processing and interpretation. While database technology provides tools for data storage and “simple” querying, and statistics offers methods for analyzing small sample data, new approaches are necessary to face these challenges. These approaches are usually called knowledge discovery in databases or data mining. These terms are often used interchangeably. We will support the view that knowledge discovery in databases is a broader concept covering the whole process in which data mining (also called modeling or analysis) is just one step applying machine learning or statistical algorithms to preprocessed data and building (classification or prediction) models or finding interesting patterns. Thus, we will understand knowledge discovery in databases as the

Non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns from data (Fayyad et al., 1996, p. 6),

or as an

Analysis of observational data sets to find unsuspected relationships and summarize data in novel ways that are both understandable and useful to the data owner (Hand et al., 2001, p. 1).

Similarly, data mining refers to extracting knowledge from large amounts of data (Han et al., 2011, p. 5).

Top

Data Mining Tasks And Application Areas

Knowledge discovery in databases is commonly used to perform the tasks of data description and summarization, segmentation, concept description, classification, prediction, dependency analysis, or deviation detection (Fayyad et al., 1996; Chapman et al., 2000).

Data Description and Summarization

The goal is a concise description of the data characteristics, typically in elementary and aggregated form. This gives the user an overview of the data structure. Even a very simple and preliminary analysis of this kind is appreciated by data owners and users.

Segmentation

Segmentation (or clustering) aims at separation of the data into interesting and meaningful subgroups or classes where all members of a subgroup share common characteristics. Client profiling and clustering of gene expression data are two examples of this type of task.

Client profiling can be based on the purchase history or service usage history of customers or clients; similar behavior patterns can be used to divide clients into groups and to create profiles of these groups.

Clustering of gene expression data (data in the form of so-called DNA microarrays that are obtained by measuring mRNA levels in cells) can help us identify groups of genes with related expression patterns. Genes with a “close” expression pattern will tend to participate in a similar biological function. We thus can use these patterns, e.g., to group together normal cells belonging to various tissue types.

Key Terms in this Chapter

Text Mining: Data mining applied to unstructured textual data.

Concept Description: Data mining task in which the goal is to build a model that describes a concept or class in a comprehensible way.

Deviation Detection: Data mining task in which the goal is to build a model that describes the most significant changes in the data from previously measured or normative values.

Classification: Data mining task in which the goal is to build a model that assigns class labels to previously unseen and unlabeled examples.

Segmentation: Data mining task in which the goal is to build a model that separates data into interesting and meaningful subgroups where all members of a subgroup share common characteristics.

Dependency Analysis: Data mining task in which the goal is to build a model that describes significant dependencies or associations between data items or events.

Prediction: Data mining task in which the goal is to build a model that assigns a numeric value to the target attribute for previously unseen examples.

Web Mining: Data mining applied to data gathered from web.

Complete Chapter List

Search this Book:
Reset