Article Preview
Top1. Introduction
CUBIST1 is an EU funded research project with an approach that leverages (Business Intelligence) BI to a new level of precise, meaningful and user-friendly analytics of data by following a best-of-breed approach that combines essential features of Semantic Technologies, Business Intelligence and Visual Analytics based on FCA (Formal Concept Analysis). CUBIST aims to:
- •
Support federation of data from unstructured and structured sources;
- •
Persist the federated data in an information warehouse; an approach based on a bi enabled triple store;
- •
Provide novel ways of applying visual analytics based on meaningful diagrammatic representations.
The Visual Analytics part of CUBIST is complementing traditional BI-means by utilizing Formal Concept Analysis (FCA) for analyzing the data in the triple store. FCA is a well-known theory of data analysis which allows to conceptually clustering objects with respect to a given set of attributes and then visualize the (lattice-ordered) set of clusters, e.g. by means of Hasse-diagrams. The starting point of FCA is a formal context (O,A,I) consisting of a set O of formal objects, a set A of formal attributes, and an incidence-relation M ⊆ O × A between the formal objects and attributes. There exists a variety of tools2 to carry out analysis of formal contexts (e.g. conexp, Lattice Miner, or conflexplore), but nearly all of them take a formal context as input. Real data to be analyzed, however, often comes in different forms:
- •
Conceptually, often attributes are not binary, but have values like numbers, strings, or dates (e.g. We have many-valued attributes);
- •
Technically, data can come in form of csv-files, databases, triple stores, etc.
For dealing with many-valued attributes, the best-known and most-used method is conceptual scaling (Ganter & Wille, 1989). Essentially, for a given-many valued attribute, a conceptual scale is a specific context with the values of the many-valued attribute as formal objects. The choice of the formal attributes of the scale is a question of the design of the scale: The formal attributes are meaningful attributes to describe the values; they might be different entities or they might even be the values of the property again. Using a conceptual scale, a dataset with a many-valued attribute can be “translated” into a formal context, where the objects are the objects of the dataset and the attributes are the attributes of the conceptual scale, and the derived formal context can be analyzed with FCA.
From the technical point of view, there are (to the author’s knowledge) essentially two tools which allow for scaling real datasets:
- •
Toscanaj (Becker & Correia, 2005) is a suite of tools which allows to creating conceptual scales out of data from a relational database and then interactively visualizing and exploring the generated concept lattices;
- •
Fcabedrock (Andrews, 2009; Andrews & Orphanides, 2010) is a tool which converts csv-files into formal contexts. It is “taking each many-valued attribute and converting it into as many Boolean attributes as it has values and converting continuous values using ranges.” (Andrews & Orphanides, 2010).