Chapter Preview
TopData Mining And The Knowledge Discovery In Databases Process
“The KDD process, as presented in (Fayyad, Piatetski-Shapiro, & Smyth, 1996), is the process of using DM methods to extract what is considered knowledge according to the specification of measures and thresholds, using a database along with any required preprocessing, sub sampling, and transformation of the database. There are five stages considered, namely, selection, preprocessing, transformation, data mining, and interpretation/evaluation as presented in Figure 1:
- •
Selection: This stage consists on creating a target data set, or on focusing in a subset of variables or data samples, on which discovery is to be performed;
- •
Preprocessing: This stage consists on the target data cleaning and preprocessing in order to obtain consistent data;
- •
Transformation: This stage consists on the transformation of the data using dimensionality reduction or transformation methods;
- •
Data Mining: This stage consists on the searching for patterns of interest in a particular representational form, depending on the DM objective (usually, prediction);
- •
Interpretation/Evaluation: This stage consists on the interpretation and evaluation of the mined patterns.” (Azevedo & Santos, 2008, p. 183)
Key Terms in this Chapter
Data Mining Language: Allows users to directly manipulate data and models at the same level.
Data Mining: One of the phases of the KDD process and concerns, mainly, to the means by which the patterns/models are extracted and enumerated from data. Many times is identified with the complete KDD process.
Data Mining Models: Are obtained from the application of data mining methods/algorithms. There simple models, such as rules or trees, and more complex models, such as neural networks.
Data Mining Task: What is pretended to achieve when applying Data Mining, for instance classify.
Standard: That obeys to settled parameters.
Data Mining Algorithms: A sequence of steps that allows obtaining a data mining model.