Article Preview
TopIntroduction
As technology advances, large volumes of data—such as (i) structured data in relational or transactional databases and (ii) semi-structured data in text documents or the World Wide Web—can be generated easily. Embedded within these data is potentially useful knowledge that professionals, researchers, students, and practitioners want to discover. This calls for data mining (Frawley et al., 1991), which aims to “retrieve” or discover implicit, previously unknown and potentially useful information or knowledge from large volumes of data. A common data mining task is frequent set mining (Agrawal et al., 1993), which analyzes the data to find frequently occurring sets of items (e.g., frequently collocated events, frequently purchased bundles of merchandise products) . These frequent sets serve as building blocks for many other data mining tasks such as the mining of association rules, correlation, sequences, episodes, emerging patterns, web access patterns, maximal patterns, closed frequent sets, and constrained patterns (Pasquier et al., 1999; Pei et al., 2000; Lakshmanan et al., 2003; Leung et al., 2007; Kumar et al., 2012; Leung et al., 2012). Moreover, these frequently occurring sets of items can be used in mining tasks like classification (e.g., associative classification (Liu, 2009)). Frequent sets can also answer many questions that help users make important decisions for real-life applications in different domains such as health care, bioinformatics, social science, as well as business. For example, knowing the sets of frequently purchased merchandise helps store managers make intelligent business decisions like item shelving, finding the sets of popular elective courses helps students select the combination of courses they wish to take, and discovering the sets of frequently occurring patterns in genes helps professionals and researchers get a better understanding of certain biomedical or social behaviours of human beings.
Frequent set mining has drawn the attention of many researchers as it has played important roles in many data mining tasks and has contributed to various real-life applications. Since the introduction of the frequent set mining problem (Agrawal et al., 1993), numerous algorithms (Han et al., 2007; Cheng & Han, 2009) have been proposed to mine frequent sets from databases. Most of these algorithms return the mining results in textual forms such as a very long unsorted list of frequent sets of items. Presenting a large number of frequent sets in such a conventional lengthy list does not lead to ease of understanding. Consequently, users may not easily discover the useful knowledge that is embedded in the databases.
As “a picture is worth a thousand words”, a visual representation matches the power of the human visual and cognitive system. Hence, having a visual representation of the frequent sets makes it easier for users (e.g., professionals, researchers, students, practitioners) to view and analyze the mining results when compared to presenting a lengthy textual list of frequent sets of items. This leads to visual analytics, which is the science of analytical reasoning supported by interactive visual interfaces (Thomas & Cook, 2005; Keim et al., 2008; Keim et al., 2009a; Keim et al., 2009b). Since numerous frequent set mining algorithms (which analyze large volumes of data to find frequent sets of items) have been proposed, what we need are interactive systems for visualizing the mining results so that we can take advantage of both worlds (i.e., combine advanced data analysis with visualization).