Visualization to Assist the Generation and Exploration of Association Rules

Visualization to Assist the Generation and Exploration of Association Rules

Claudio Haruo Yamamoto (Universidade de São Paulo, Brazil), Maria Cristina Ferreira de Oliveira (Universidade de São Paulo, Brazil) and Solange Oliveira Rezende (Universidade de São Paulo, Brazil)
DOI: 10.4018/978-1-60566-404-0.ch012
OnDemand PDF Download:
$37.50

Abstract

Miners face many challenges when dealing with association rule mining tasks, such as defining proper parameters for the algorithm, handling sets of rules so large that exploration becomes difficult and uncomfortable, and understanding complex rules containing many items. In order to tackle these problems, many researchers have been investigating visual representations and information visualization techniques to assist association rule mining. In this chapter, an overview is presented of the many approaches found in literature. First, the authors introduce a classification of the different approaches that rely on visual representations, based on the role played by the visualization technique in the exploration of rule sets. Current approaches typically focus on model viewing, that is visualizing rule content, namely antecedent and consequent in a rule, and/or different interest measure values associated to it. Nonetheless, other approaches do not restrict themselves to aiding exploration of the final rule set, but propose representations to assist miners along the rule extraction process. One such approach is a methodology the authors have been developing that supports visually assisted selective generation of association rules based on identifying clusters of similar itemsets. They introduce this methodology and a quantitative evaluation of it. Then, they present a case study in which it was employed to extract rules from a real and complex dataset. Finally, they identify some trends and issues for further developments in this area.
Chapter Preview
Top

Introduction

Huge volumes of data are now available, but we still face many difficulties in handling all such data to obtain actionable knowledge (Fayyad et al., 1996). Data visualization currently plays an important role in Knowledge Discovery processes, as it helps miners to create and validate hypotheses about the data and also to track and understand the behavior of mining algorithms (Oliveira & Levkowitz, 2003). Interactive visualization allows users to gain insight more easily by taking advantage of their vision system while performing complex investigation tasks. Combining the power of visual data exploration with analytical data mining, known as visual data mining (VDM), is now a trend.

Researchers (Ankerst, 2000, Oliveira & Levkowitz, 2003) identified three approaches for VDM: exploratory data visualization prior to mining, visualization of data mining models, and visualization of intermediate results or representations, during mining. The first approach concerns the use of visualization techniques during the data preprocessing stages of the knowledge discovery process. Visualization of data mining results focuses on visually representing the models extracted with data mining algorithms, to enhance comprehension. Finally, the third approach seeks to insert the miner into the knowledge discovery loop, not only to view intermediate patterns, but also to drive the process of exploring the solution space, for example, by providing feedback based on user previous knowledge about the domain or about the process itself.

Extracting association rules from a transaction database is a data mining task (association rule mining task) defined by Agrawal et al. (1993), in which the goal is to identify rules of the format A→B satisfying minimum support and minimum confidence values. A and B denote one or multiple items occurring in the transactions. The rule extraction problem is split into two main stages: (1) generating frequent itemsets; (2) extracting association rules from the frequent itemsets obtained. Several efficient algorithms have been designed for the first task, while the second one is actually trivial. A major problem, however, is that the process typically generates too many rules for analysis. Moreover, issues such as setting input parameters properly, understanding rules and identifying the interesting ones are also difficult. Visual representations have been proposed to tackle some of these problems. At first, visualization was employed mainly to assist miners in exploring the extracted rule set, based on visual representations of the rule space. There are also recent examples of employing visual representations to aid miners along the execution of the mining algorithm, e.g., to show intermediate results and to allow user feedback during the process.

This chapter is organized as follows. In the “Systematic survey” section we present a survey of approaches that employ visual representations in association rule mining. The survey is divided into two parts. In the first one, “Visualization of results of association rule mining”, we discuss contributions that employ information visualization techniques at the end of the mining process to visually represent its final results. In the second part, “Visualization during association rule mining”, we discuss contributions that employ visualization during the discovery process. In the third section, “An itemset-driven cluster-oriented approach”, we introduce a rule extraction methodology which aims at inserting the miner into the knowledge discovery loop. It adopts an interactive rule extraction algorithm coupled with a projection-based graph visualization of frequent itemsets, an itemset-based rule extraction approach and a visually-assisted pairwise comparison of rules for exploration of results. We also introduce the I2E System, that adopts the proposed methodology to enable an exploratory approach towards rule extraction, allowing miners to explore the rule space of a given transaction dataset. Furthermore, we present a quantitative evaluation of the methodology and a case study in which it is applied to a real and challenging dataset. Given the scenario, we then point out some challenges and trends for further developments in this area. Finally, we present some conclusions about this work.

Complete Chapter List

Search this Book:
Reset