Discovering Interesting Patterns in Numerical Data with Background Knowledge

Discovering Interesting Patterns in Numerical Data with Background Knowledge

Szymon Jaroszewicz
DOI: 10.4018/978-1-60566-754-6.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The paper presents an approach to mining patterns in numerical data without the need for discretization. The proposed method allows for discovery of arbitrary nonlinear relationships. The approach is based on finding a function of a set of attributes whose values are close to zero in the data. Intuitively such functions correspond to equations describing relationships between the attributes, but they are also able to capture more general classes of patterns. The approach is set in an association rule framework with analogues of itemsets and rules defined for numerical attributes. Furthermore, the user may include background knowledge in the form of a probabilistic model. Patterns which are already correctly predicted by the model will not be considered interesting. Interesting patterns can then be used by the user to update the probabilistic model.
Chapter Preview
Top

Background

We will now discuss previous work related to mining association rules in numerical data without discretization. In Rückert, Richter, Kramer, (2004), Georgii, Richter, Rückert, Kramer (2005), Rückert, Kramer (2006) an approach is presented based on finding rules of the type “if a linear combination of some set of attributes exceeds some threshold a, than another linear combination of another set of attributes is likely to exceed some threshold b”. As sharp thresholds are used, the approach cannot represent functional relationships between attributes, contrary to the approach presented in this Chapter. In Achtert, Böhm, Kriegel, Kröger, Zimek (2006) a method for summarizing clusters of numerical data using linear equations is described. The authors use a clustering algorithm to do the actual pattern discovery, and their approach does not follow the association rule framework.

The idea of Steinbah, Tan, Xiong & Kumar (2004) is closer to our approach. They present a definition of support for numerical data, which does not require discretization. Unfortunately the presented definition of support is not very intuitive, although some interpretation in terms of a lower bound on scalar products is proposed. In similar spirit Calders, Goethals, Jaroszewicz (2006) and Jaroszewicz (2006) presented definitions of support for numerical data based on ranks and polynomials which are easier to interpret.

Complete Chapter List

Search this Book:
Reset