Since the First KDD Workshop back in 1989 when “Knowledge Mining” was recognized as one of the top 5 topics in future database research (Piatetsky-Shapiro 1991), many scientists as well as users in industry and public organizations have considered data mining as highly relevant for their respective professional activities. We have witnessed the development of advanced data mining techniques as well as the successful implementation of knowledge discovery systems in many companies and organizations worldwide. Most of these implementations are static in the sense that they do not contemplate explicitly a changing environment. However, since most analyzed phenomena change over time, the respective systems should be adapted to the new environment in order to provide useful and reliable analyses. If we consider for example a system for credit card fraud detection, we may want to segment our customers, process stream data generated by their transactions, and finally classify them according to their fraud probability where fraud pattern change over time. If our segmentation should group together homogeneous customers using not only their current feature values but also their trajectories, things get even more difficult since we have to cluster vectors of functions instead of vectors of real values. An example for such a trajectory could be the development of our customers’ number of transactions over the past six months or so if such a development tells us more about their behavior than just a single value; e.g., the most recent number of transactions. It is in this kind of applications is where dynamic data mining comes into play! Since data mining is just one step of the iterative KDD (Knowledge Discovery in Databases) process (Han & Kamber, 2001), dynamic elements should be considered also during the other steps. The entire process consists basically of activities that are performed before doing data mining (such as: selection, pre-processing, transformation of data (Famili et al., 1997)), the actual data mining part, and subsequent steps (such as: interpretation, evaluation of results). In subsequent sections we will present the background regarding dynamic data mining by studying existing methodological approaches as well as already performed applications and even patents and tools. Then we will provide the main focus of this chapter by presenting dynamic approaches for each step of the KDD process. Some methodological aspects regarding dynamic data mining will be presented in more detail. After envisioning future trends regarding dynamic data mining we will conclude this chapter.
In the past a diverse terminology has been used for emerging approaches dealing with “dynamic” elements in data mining applications. Learning from data has been defined as incremental if the training examples used become available over time, usually one at a time; see e.g., (Giraud-Carrier, 2000). Mining temporal data deals with the analysis of streams of categorical data (e.g., events; see e.g., Domingos, Hulten, 2003) or the analysis of time series of numerical data (Antunes, Oliveira 2001; Huang, 2007). Once a model has been built, model updating becomes relevant. According to the CRISP-DM methodology such updating is part of the monitoring and maintenance plan to be performed after model construction.
The following listing provides an overview on applications of dynamic data mining.
Intrusion detection (Caulkins et al., 2005).
Traffic state identification (Crespo, Weber, 2005).
Predictive maintenance (Joentgen et al., 1999).
Scenario analysis (Weber 2007).
Time series prediction (Kasabov, Song, 2002)