Most data of practical relevance are structured in more complex ways than is assumed in traditional data mining algorithms, which are based on a single table. The concept of relations allows for discussing many data structures such as trees and graphs. Relational data have much generality and are of significant importance, as demonstrated by the ubiquity of relational database management systems. It is, therefore, not surprising that popular data mining techniques, such as association rule mining, have been generalized to relational data. An important aspect of the generalization process is the identification of challenges that are new to the generalized setting.
Several areas of databases and data mining contribute to advances in association rule mining of relational data.
Relational data model: Underlies most commercial database technology and also provides a strong mathematical framework for the manipulation of complex data. Relational algebra provides a natural starting point for generalizations of data mining techniques to complex data types.
Inductive Logic Programming, ILP (Džeroski & Lavrač, 2001, pp. 48-73): Treats multiple tables and patterns as logic programs. Hypothesis for generalizing data to unseen examples are solved using first-order logic. Background knowledge is incorporated directly as a program.
Association Rule Mining, ARM (Agrawal & Srikant, 1994): Identifies associations and correlations in large databases. The result of an ARM algorithm is a set of association rules in the form A→C. There are efficient algorithms such as Apriori that limit the output to sets of items that occur more frequently than a given threshold.
Graph Theory: Addresses networks that consist of nodes that are connected by edges. Traditional graph theoretic problems typically assume no more than one property per node or edge. Solutions to graph-based problems take into account graph and
subgraph isomorphism. For example, a subgraph should only count once
per isomorphic instance. Data associated with nodes and edges can be modeled within the relational algebra framework.
Link-based Mining (Getoor & Diehl, 2005): Addresses data containing sets of linked objects. The links are exploited in tasks such as object ranking, classification, and link prediction. This work considers multiple relations in order to represent links.
Association rule mining of relational data incorporates important aspects of these areas to form an innovative data mining area of important practical relevance.Top
Main Thrust Of The Chapter
Association rule mining of relational data is a topic that borders on many distinct topics, each with its own opportunities and limitations. Traditional association rule mining allows extracting rules from large data sets without specification of a consequent. Traditional predictive modeling techniques lack this generality and only address a single class label. Association rule mining techniques can be efficient because of the pruning opportunity provided by the downward closure property of support, and through the simple structure of the resulting rules (Agrawal & Srikant, 1994).