Frequent Itemset Mining and Association Rules

Frequent Itemset Mining and Association Rules

Susan Imberman (City University of New York, USA) and Abdullah Uz Uz Tansel (Bilkent University, Turkey)
Copyright: © 2011 |Pages: 11
DOI: 10.4018/978-1-59904-931-1.ch033

Abstract

With the advent of mass storage devices, databases have become larger and larger. Point-of-sale data, patient medical data, scientific data, and credit card transactions are just a few sources of the ever-increasing amounts of data. These large datasets provide a rich source of useful information. Knowledge Discovery in Databases (KDD) is a paradigm for the analysis of these large datasets. KDD uses various methods from such diverse fields as machine learning, artificial intelligence, pattern recognition, database management and design, statistics, expert systems, and data visualization.

Key Terms in this Chapter

Knowledge Discovery in Databases (KDD): A paradigm for the analysis of large datasets. The process is cyclic and iterative, with several steps including data preparation, analysis, and interpretation. KDD uses various methods from such diverse fields such as machine learning, artificial intelligence, pattern recognition, database management and design, statistics, expert systems, and data visualization.

Association Rule: Given a set I = { i 1 , i 2 , i 3 , … i n } of items, any subset of I is called an itemset. Let X and Y be subsets of I such that X n Y = ?. An association rule is a probabilistic implication X ? Y.

Confidence: Given an association rule X ? Y the confidence of a rule is the number of transactions that satisfy X ? Y divided by the number of transactions that satisfy X.

Apriori: A level-wise algorithm for finding association rules. Apriori uses the support of an itemset to prune the search space of all itemsets. It then uses the confidence metric to find association rules.

Data Mining: One step of the KDD process. Can include various data analysis methods such as decision trees, clustering, statistical tests, neural networks, nearest neighbor algorithms, and association rules

Support: Given an association rule X ? Y, the support of the rule is the number of transactions that satisfy or match X ? Y, divided by the total number of transactions. Support is an indication of a rule's statistical significance.

Quantitative Association Rules: Shows associations with numeric and categorical data. Quantitative rules would express associations such as: Age: 30 to 39 and Owns car = yes -> Median Income = 40,000

Interestingness: Methods used to order and prune the set of rules produced by association rule algorithms. This facilitates their use and interpretation by the user. Metrics for interestingness include measures such as confidence, added value, mutual information and conviction measures.

Complete Chapter List

Search this Book:
Reset