Data Mining

Data Mining

Martin Atzmueller (University of Kassel, Germany)
DOI: 10.4018/978-1-60960-741-8.ch005


Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data mining focusing on approaches for pattern mining, cluster analysis, and predictive model construction. For those, we discuss exemplary techniques that are especially useful in the applied natural language processing context. Additionally, we describe how the presented data mining approaches are connected to text mining, text classification, and clustering, and discuss interesting problems and future research directions.
Chapter Preview


Data mining, also popularly referred to as knowledge discovery in databases (KDD) is concerned with the automatic or semi-automatic extraction of patterns. These patterns represent knowledge implicitly stored in large databases, data warehouses, the Web, other massive information repositories, and data streams. Informally, data mining is used for obtaining patterns and summaries of new and nontrivial information based on the available data (description), alternatively for the creation of predictive models of a certain system or phenomena (prediction).

The literature mentions several definitions of data mining, also in relation to knowledge discovery in databases. Fayyad et al. (1996), for example, define KDD as: “the process of discovering valid, novel, interesting, and potential useful knowledge,” data mining is considered as the core step of the whole KDD process, that is, the concrete knowledge discovery method. Other definitions regard data mining as the “process of discovering various models, summaries, and derived values from a given collection of data” – see Kantardzic (2002). Han and Kamber (2006) also consider data mining a step in the knowledge discovery process (i.e., as an “essential process where intelligent methods are applied in order to extract data patterns”), but choose to use the term “data mining” in favor of “knowledge discovery in databases,” subsuming the older term. They therefore take a broad view of data mining functionality, and consider data mining as the general process of discovering interesting knowledge from large amounts of data stored in databases, data warehouses, or other information repositories.

Complete Chapter List

Search this Book: