Algorithms for Association Rule Mining

Algorithms for Association Rule Mining

Vasudha Bhatnagar (University of Delhi, India), Anamika Gupta (University of Delhi, India) and Naveen Kumar (University of Delhi, India)
Copyright: © 2009 |Pages: 9
DOI: 10.4018/978-1-59904-849-9.ch012
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Association Rule Mining (ARM) is one of the important data mining tasks that has been extensively researched by data-mining community and has found wide applications in industry. An Association Rule is a pattern that implies co-occurrence of events or items in a database. Knowledge of such relationships in a database can be employed in strategic decision making in both commercial and scientific domains. A typical application of ARM is market basket analysis where associations between the different items are discovered to analyze the customer’s buying habits. The discovery of such associations can help to develop better marketing strategies. ARM has been extensively used in other applications like spatial-temporal, health care, bioinformatics, web data etc (Hipp J., Güntzer U., Nakhaeizadeh G. 2000). An association rule is an implication of the form X ? Y where X and Y are independent sets of attributes/ items. An association rule indicates that if a set of items X occurs in a transaction record then the set of items Y also occurs in the same record. X is called the antecedent of the rule and Y is called the consequent of the rule. Processing massive datasets for discovering co-occurring items and generating interesting rules in reasonable time is the objective of all ARM algorithms. The task of discovering co-occurring sets of items cannot be easily accomplished using SQL, as a little reflection will reveal. Use of ‘Count’ aggregate query requires the condition to be specified in the where clause, which finds the frequency of only one set of items at a time. In order to find out all sets of co-occurring items in a database with n items, the number of queries that need to be written is exponential in n. This is the prime motivation for designing algorithms for efficient discovery of co-occurring sets of items, which are required to find the association rules. In this article we focus on the algorithms for association rule mining (ARM) and the scalability issues in ARM. We assume familiarity of the reader with the motivation and applications of association rule mining
Chapter Preview
Top

Background

Let I = {i1, i2,…, in} denote a set of items and D denote a database of N transactions. A typical transaction TD may contain a subset X of the entire set of items I and is associated with a unique identifier TID. An item-set is a set of one or more items i.e. X is an item-set if XI. A k-item-set is an item-set of cardinality k. A transaction is said to contain an item-set X if XT. Support of an item set X, also called Coverage is the fraction of transactions that contain X. It denotes the probability that a transaction contains X.

An item-set having support greater than the user specified support threshold (ms) is known as frequent item-set.

An association rule is an implication of the form X →Y [Support, Confidence] where XI, YI and XY =∅, where Support and Confidence are rule evaluation metrics. Support of a rule X → Y in D is ‘S'’ if S% of transactions in D contain XY. It is computed as:

Support indicates the prevalence of a rule. In a typical market basket analysis application, rules with very low support values represent rare events and are likely to be uninteresting or unprofitable. Confidence of a rule measures its strength and provides an indication of the reliability of prediction made by the rule. A rule X → Y has a confidence ‘C'‘ in D if C % of transactions in D that contain X, also contain Y. Confidence is computed, as the conditional probability of Y occuring in a transaction, given X is present in the same transaction, i.e.

Key Terms in this Chapter

Frequent Closed Item-Set: An item-set X is a closed item-set if there exists no item-set X’ such that i. X’ is a proper superset of X, ii. Every transaction containing X also contains X’. A closed item-set X is frequent if its support exceeds the given support threshold.

Generator Item-Set: A generator p of a closed item-set c is one of the smallest item-sets such that h(p) = c.

Association Rule: An Association rule is an implication of the form X?Y where X ? I, Y? I and XnY =Ø, I denotes the set of items.

Galois Connection: Let D = (O,I,R) be a data mining context where O and I are finite sets of objects (transactions) and items respectively. R ? O x I is a binary relation between objects and items. For O ? O, and I ? I, we define as shown in Exhibit C. f(O) associates with O the items common to all objects o ? O and g(I) associates with I the objects related to all items i ? I. The couple of applications (f,g) is a Galois connection between the power set of O (i.e. 2O) and the power set of I (i.e. 2I). The operators h = f o g in 2I and h’ = g o f in 2o are Galois closure operators. An item-set C ? I from D is a closed item-set iff h(C) = C.

Formal Concept: A formal context K = (G,M,I) consists of two sets G (objects) and M (attributes) and a relation I between G and M. For a set A?G of objects, A’={meM | gIm for all geA} (the set of all attributes common to the objects in A). Correspondingly, for a set B of attributes we define, B’ = {geG | gIm for all meB} (the set of objects common to the attributes in B). A formal concept of the context (G,M,I) is a pair (A,B) with A?G,B?M, A’=B and B’=A, A is called the extent and B is the intent of the concept (A,B).

Data Mining: Extraction of interesting, non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases.

Non-Redundant Association Rules: Let Ri denote the rule X1i?X2i, where X1,X2 ? I. Rule R1 is more general than rule R2 provided R2 can be generated by adding additional items to either the antecedent or consequent of R1. Rules having the same support and confidence as more general rules are the redundant association rules. Remaining rules are non-redundant rules.

Complete Chapter List

Search this Book:
Reset