A novel approach is presented for effectively mining weighted fuzzy association rules (ARs). The authors address the issue of invalidation of downward closure property (DCP) in weighted association rule mining where each item is assigned a weight according to its significance wrt some user defined criteria. Most works on weighted association rule mining do not address the downward closure property while some make assumptions to validate the property. This chapter generalizes the weighted association rule mining problem with binary and fuzzy attributes with weighted settings. Their methodology follows an Apriori approach but employs T-tree data structure to improve efficiency of counting itemsets. The authors’ approach avoids pre and post processing as opposed to most weighted association rule mining algorithms, thus eliminating the extra steps during rules generation. The chapter presents experimental results on both synthetic and real-data sets and a discussion on evaluating the proposed approach.
TopIntroduction
Association rules (ARs) (Agrawal, Imielinski & Swami, 1993) are a well established data mining technique used to discover co-occurrences of items mainly in market basket data. An item is usually a product amongst a list of other products and an itemset is a combination of two or more products. The items in the database are usually recorded as binary data (present or not present). The technique aims to find association rules (with strong support and high confidence) in large databases. Classical Association Rule Mining (ARM) deals with the relationships among the items present in transactional databases (Agrawal & Srikant, 1994; Bodon, 2003). Typically, the algorithm first generates all large (frequent) itemsets (attribute sets) from which association rule (AR) sets are derived. A large itemset is defined as one that occurs more frequently in the given data set according to a user supplied support threshold. To limit the number of ARs generated, a confidence threshold is used to limit the number by careful selection of the support and confidence thresholds. By so doing, care must be taken to ensure that itemsets with low support but from which high confidence rules may be generated are not omitted. We define the problem as follows:
Given a set of items I = {i1, i2,..,im}and a database of transactions D = {t1, t2,..,tn} where
,
and
if
with k = |X| is called a k-itemset or simply an itemset. Let a database D be a multi-set of subsets of I as shown. Each supports an itemset
if
holds. An association rule is an expression
, where X, Y are item sets and
holds. Number of transactions T supporting an item X w.r.t D is called support of X,
. The strength or confidence (c) for an association rule
is the ratio of the number of transactions that contain
to the number of transactions that contain X,
.
For non-binary items, fuzzy association rule mining (firstly expressed as quantitative association rule mining (Srikant & Agrawal, 1996) has been proposed using fuzzy sets such that quantitative and categorical attributes can be handled (Kuok, Fu & Wong, 1998). A fuzzy rule represents each item as
pair. Fuzzy association rules are expressed in the following form:
If X is A satisfies Y is B
For example,
if (age is young) → (salary is low)