Basic Principles of Data Mining

Basic Principles of Data Mining

Karl-Ernst Erich Biebler (Ernst-Moritz-Arndt-University, Germany)
DOI: 10.4018/978-1-60566-196-4.ch015

Abstract

This chapter gives a summary of data types, mathematical structures, and associated methods of data mining. Topological, order theoretical, algebraic, and probability theoretical mathematical structures are introduced. The n-dimensional Euclidean space, the model used most for data, is defined. It is executed briefly that the treatment of higher dimensional random variables and related data is problematic. Since topological concepts are less well known than statistical concepts, many examples of metrics are given. Related classification concepts are defined and explained. Possibilities of their quality identification are discussed. One example each is given for topological cluster and for topological discriminant analyses.
Chapter Preview
Top

Data Types

Observations at objects are informed about as data. One can receive these observations as measuring, numbers or verbal descriptions, for example. Sometimes they concern a quality, often also more qualities. Also more complicated facts can be included concerning the objects, such as relations. It is therefore required to distinguish data types. Data types relevant for the data analyses are described in the following.

One knows data types also from programming languages. These shall not be treated here.

A set 978-1-60566-196-4.ch015.m01 in the set-theoretical meaning consists of elements978-1-60566-196-4.ch015.m02, 978-1-60566-196-4.ch015.m03. The index 978-1-60566-196-4.ch015.m04 may be finite or infinite. According to this one distinguishes finite and infinite sets. The sets 978-1-60566-196-4.ch015.m05 and 978-1-60566-196-4.ch015.m06 are the same in the set-theoretical meaning. This means all elements of a set are different.

Data sets are collections of elements of a set. The data sets 978-1-60566-196-4.ch015.m07 and 978-1-60566-196-4.ch015.m08 have to be distinguished. The same element of a set can appear repeatedly in a data set.

String data are signs or character strings (e.g. letters, words, abstract words). Numerical data are numbers (e.g. 3, 324, 2.1482). Dates are not regarded as numeric data. They form a type of their own.

Complete Chapter List

Search this Book:
Reset