Article Preview
TopReview Of Literature
Data mining helps analysis of information (rules) that can mine useful patterns from large databases for decision makers. The discovered knowledge can be referred to as rules describing properties of the data, frequently occurring patterns, clustering of objects in the database which can be used to support various intelligent activities such as decision making, planning and problem solving (Jiawei, Kamber, & Kaufmann, 2007).
Let I ={ i1, i2 .i3,….in }be a set of N distinct literals called items, and D be a set of transactions over I. Each transaction contains a set of items i1, i2, i3,….ik € I. A transaction has an associated unique identifier called TID (Transaction Identification Number). An association rule is an implication of the form A→B, where A, B I, and A∩B = null set. A is called the antecedent of the rule, and B is called the consequent. A set of items (such as the antecedent or the consequent of a rule) is called an item set. Each item set has an associated statistical measure called support, denoted as supp. For an item set A I, supp(A) = s, if the fraction of transactions in D containing A equals to s. A rule A→B has a measure of strength called confidence (denoted as Conf) which is defined as the ratio supp(AB) / supp(A).
The problem of mining association rules is to generate all rules A→B that have both support and confidence greater than or equal to some user specified threshold, called minimum support (minsupp) and minimum confidence (minconf), respectively (Hand & Mannila, 2004). For regular associations, supp(AuB) ≥ minsupp, conf(A→B) = supp(AB) / supp(A) ≥ minconf.
Synthesizing rules is the process of putting all rules together and to produce valid rules from that. To mine transaction databases for large organizations that have multiple data sources, there are two possible ways.
- i.
Putting all data together from different sources to amass a centralized database for centralized processing, possibly using parallel and distributed mining techniques.
- ii.
Reusing all promising rules discovered from different data sources to form a large set of rules and then searching for valid rules that are useful at the organization level.