Filtering Association Rules by Their Semantics and Structures

Filtering Association Rules by Their Semantics and Structures

Rangsipan Marukatat (Mahidol University, Thailand)
DOI: 10.4018/978-1-60566-754-6.ch014
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Association rule mining produces a large number of rules but many of them are usually redundant ones. When a data set contains infrequent items, the authors need to set the minimum support criterion very low; otherwise, these items will not be discovered. The downside is that it leads to even more redundancy. To deal with this dilemma, some proposed more efficient, and perhaps more complicated, rule generation methods. The others suggested using simple rule generation methods and rather focused on the post-pruning of the rules. This chapter follows the latter approach. The classic Apriori is employed for the rule generation. Their goal is to gain as much insight as possible about the domain. Therefore, the discovered rules are filtered by their semantics and structures. An individual rule is classified by its own semantic, or by how clear its domain description is. It can be labelled as one of the following: strongly meaningless, weakly meaningless, partially meaningful, and meaningful. In addition, multiple rules are compared. Rules with repetitive patterns are removed, while those conveying the most complete information are retained. They demonstrate an application of our techniques to a real case study, an analysis of traffic accidents in Nakorn Pathom, Thailand.
Chapter Preview
Top

Introduction

Association rules describe relationship between items in a transactional database. Intuitively, a rule X → Y implies that when X occurs in a transaction, Y is likely to occur as well. Apriori is the most well-known method to extract association rules from a data set (Agrawal et al., 1993; Agrawal & Srikant, 1994). It works in two major steps. First, frequent itemsets, or sets of items whose support is greater than a minimum support threshold (minsup), are discovered. Then, association rules whose confidence is greater than a minimum confidence threshold (minconf) are generated from these frequent itemsets. Rules that describe frequent patterns can be exploited in many real-world applications. For example, in retailing, products which are highly associated can be sold together as a special offer package (Svetina & Zupancic, 2005).

Some applications focus on rare or infrequent patterns only. An example is a surveillance of nosocomial infection, or infection that patients acquire during their hospital stay (Ma et al., 2003). Patients’ microbiological transactions were sampled over the period of three months, and three sets of association rules were generated, one for each month. Ma and his colleagues focused on rules with low support (less than 1%) and low confidence (less than 30%) because these rules may capture irregularity in patients’ antibiotic resistance. They identified potential outbreaks of the infection by comparing similar patterns across three months, and noticing sudden changes in their confidence measures.

When a database contains a huge number of transactions and some items rarely occur, we usually set the minsup criterion very low, in order to not miss useful association rules. A downside of this strategy is that such rules would be overwhelmed by redundant, spurious ones. There are two ways to tackle this problem. First, rule generation methods equipped with filters or pruning mechanisms are employed instead of Apriori. MSApriori (Liu et al., 1999b), RSAA (Yun et al. 2003), and Apriori-Inverse (Ko & Rountree, 2005) used alternative support criteria so that rules with low support but high confidence are generated. These rules are called sporadic or exceptional rules. Daly and Taniar (2004) proposed another method to find exceptional rules by combining the ideas of infrequent and negative patterns. They introduced an exceptionality measure. By their definition, if a rule X → Y has high support and high confidence, then ~XY or X~Y is an exceptional rule if it has low support but high exceptionality measure. Zhou and Yau (2007) proposed two algorithms for mining infrequent patterns, which are matrix-based and hash-based. These algorithms required few database accesses and were able to discover rules from a minimum search space.

Unlike the classic Apriori, the above methods have not yet been available in mainstream data mining software. It is quite common in applied research (including this work) that standard, widely-acknowledged tools are applied to certain problems or domains. It allows us to focus on the domains and not worry about developing the tools. Hence, the second way to tackle the rare item problem is to employ Apriori for the rule generation, and use some criteria to select or prune the discovered rules afterwards. The second approach is the main focus of our chapter.

Complete Chapter List

Search this Book:
Reset