Interactive Visual Analytics of Databases and Frequent Sets

Interactive Visual Analytics of Databases and Frequent Sets

Carson K.S. Leung (University of Manitoba, Winnipeg, Canada), Christopher L. Carmichael (University of Manitoba, Winnipeg, Canada), Patrick Johnstone (University of Manitoba, Winnipeg, Canada) and David Sonny Hung-Cheung Yuen (University of Manitoba, Winnipeg, Canada)
Copyright: © 2013 |Pages: 21
DOI: 10.4018/ijirr.2013100107

Abstract

In numerous real-life applications, large databases can be easily generated. Implicitly embedded in these databases is previously unknown and potentially useful knowledge such as frequently occurring sets of items, merchandise, or events. Different algorithms have been proposed for managing and retrieving useful information from these databases. Various algorithms have also been proposed for mining these databases to find frequent sets, which are usually presented in a lengthy textual list. As “a picture is worth a thousand words”, the use of visual representations can enhance user understanding of the inherent relationships among the mined frequent sets. Many of the existing visualizers were not designed to visualize these mined frequent sets. In this journal article, an interactive visual analytic system is proposed for providing visual analytic solutions to the frequent set mining problem. The system enables the management, visualization, and advanced analysis of the original transaction databases as well as the frequent sets mined from these databases.
Article Preview

Introduction

As technology advances, large volumes of data—such as (i) structured data in relational or transactional databases and (ii) semi-structured data in text documents or the World Wide Web—can be generated easily. Embedded within these data is potentially useful knowledge that professionals, researchers, students, and practitioners want to discover. This calls for data mining (Frawley et al., 1991), which aims to “retrieve” or discover implicit, previously unknown and potentially useful information or knowledge from large volumes of data. A common data mining task is frequent set mining (Agrawal et al., 1993), which analyzes the data to find frequently occurring sets of items (e.g., frequently collocated events, frequently purchased bundles of merchandise products) . These frequent sets serve as building blocks for many other data mining tasks such as the mining of association rules, correlation, sequences, episodes, emerging patterns, web access patterns, maximal patterns, closed frequent sets, and constrained patterns (Pasquier et al., 1999; Pei et al., 2000; Lakshmanan et al., 2003; Leung et al., 2007; Kumar et al., 2012; Leung et al., 2012). Moreover, these frequently occurring sets of items can be used in mining tasks like classification (e.g., associative classification (Liu, 2009)). Frequent sets can also answer many questions that help users make important decisions for real-life applications in different domains such as health care, bioinformatics, social science, as well as business. For example, knowing the sets of frequently purchased merchandise helps store managers make intelligent business decisions like item shelving, finding the sets of popular elective courses helps students select the combination of courses they wish to take, and discovering the sets of frequently occurring patterns in genes helps professionals and researchers get a better understanding of certain biomedical or social behaviours of human beings.

Frequent set mining has drawn the attention of many researchers as it has played important roles in many data mining tasks and has contributed to various real-life applications. Since the introduction of the frequent set mining problem (Agrawal et al., 1993), numerous algorithms (Han et al., 2007; Cheng & Han, 2009) have been proposed to mine frequent sets from databases. Most of these algorithms return the mining results in textual forms such as a very long unsorted list of frequent sets of items. Presenting a large number of frequent sets in such a conventional lengthy list does not lead to ease of understanding. Consequently, users may not easily discover the useful knowledge that is embedded in the databases.

As “a picture is worth a thousand words”, a visual representation matches the power of the human visual and cognitive system. Hence, having a visual representation of the frequent sets makes it easier for users (e.g., professionals, researchers, students, practitioners) to view and analyze the mining results when compared to presenting a lengthy textual list of frequent sets of items. This leads to visual analytics, which is the science of analytical reasoning supported by interactive visual interfaces (Thomas & Cook, 2005; Keim et al., 2008; Keim et al., 2009a; Keim et al., 2009b). Since numerous frequent set mining algorithms (which analyze large volumes of data to find frequent sets of items) have been proposed, what we need are interactive systems for visualizing the mining results so that we can take advantage of both worlds (i.e., combine advanced data analysis with visualization).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing