Article Preview
TopIntroduction: On The Need For A New Data Mining Method
A miner who has to work with only very current information can never detect trends and long-term patterns of behavior. Historical information is crucial to understanding the seasonality of business and the larger cycles of business to which every corporation is subject (Inmon, 1996).
The crucial element of this quote is ‘the patterns of behavior’. The main technique used to retrieve those patterns is called data mining. There are several definitions which describe data mining. We use a definition from Shaw et al. (2001): “Data mining is the process of searching and analyzing data in order to find implicit, but potentially useful, information. It involves selecting, exploring and modeling large amounts of data to uncover previously unknown patterns, and ultimately comprehensible information, from large databases”. In the early nineties, data mining was often described as “a blend of statistics, AI, and data base research” and was not considered to be a field of interest for staticians, where some of them described it as “a dirty word in Statistics” (Pregibon, 1997). Nevertheless, the research area of data mining has increasingly become an important field of interest to both academics and practioners.
Data mining can be positioned as a corollary from business intelligence (Kudyba et al., 2001; Shmueli et al, 2006). This claim is also supported by business intelligence tool providers such as Microsoft and Oracle, who both position their data mining tool as an integral part of their overall business intelligence solution (Microsoft, 2008; Oracle, 2007). Business Intelligence (BI) can be defined as the process of turning data into information and then into knowledge (Golfarelli et al., 2004). It was first introduced in the early nineties, “to satisfy the managers’ request for efficiently and effectively analyzing the enterprise data in order to better understand the situation of their business and improving the decision process.” (Golfarelli et al., 2004). Data mining supports this by providing companies the unique ability to review historical data to help improve the managers’ decision-making processes (Golfarelli et al., 2004).
Most research performed in the area of data mining is aimed at adjusting existing data mining techniques to solve a specific problem, thus creating a new data mining technique (e.g., Hui et al., 1999; Rygielski et al., 2002). This research, on the other hand, has a different goal, which is the creation of a method concerning the whole process of data mining. Two methods (one emerged from the field of statistics, one emerged from business needs) have become the standards with regard to the description of the process. The first method was suggested by Fayyad et al. (1996) and involves five different stages. Its input is data, which eventually leads to knowledge (see Figure 1). The method embraces the description of the process, but does not include the use of specific tools or include a section of how to implement data mining results. Furthermore, the method does not include business needs. The business environment needs a practical model to apply data mining, one which also includes the business aspects of specific organizations.
Figure 1. The knowledge discovery in databases (KDD) process (© 1996, AAAI)