CobWeb Multidimensional Model and Tag-Cloud Operators for OLAP of Documents

CobWeb Multidimensional Model and Tag-Cloud Operators for OLAP of Documents

Omar Khrouf (University of Sfax, MIR@CL Laboratory, Sfax, Tunisia), Kais Khrouf (Jouf University, Sakaka, Saudi Arabia) and Jamel Feki (FCIT, University of Jeddah, Jeddah, Saudi Arabia)
Copyright: © 2018 |Pages: 23
DOI: 10.4018/IJGC.2018070104

Abstract

There is an explosion in the amount of textual documents that have been generated and stored in recent years. Effective management of these documents is essential for better exploitation in decisional analyses. In this context, the authors propose their CobWeb multidimensional model based on standard facets and dedicated to the OLAP (on-line analytical processing) of XML documents; it aims to provide decision makers with facilities for expressing their analytical queries. Secondly, they suggest new visualization operators for OLAP query results by introducing the concept of Tag clouds as a means to help decision-makers to display OLAP results in an intuitive format and focus on main concepts. The authors have developed a software prototype called MQF (Multidimensional Query based on Facets) to support their proposals and then tested it on documents from the PubMed collection.
Article Preview

1. Introduction

As the amount of data grows very fast inside and outside organizations, it is getting important to seamlessly analyze both of them in order to help decision makers to better understand the business processes of their organizations and make well-founded decisions they cannot make relying on a conventional (numerical) Data Warehouse (DW). In this context, several studies have been interested in the manipulation of documentary information through OLAPing documents (Zhang et al., 2009), or by modeling documents relying on facets (Kumar et al., 2012) that describe several viewpoints.

For OLAPing documents, two categories of works can be distinguished: (1) Those having enriched the classical Multidimensional Models (MM) (i.e., star, snowflake and constellation schemas) with extensions for textual data processing ((Feki et al., 2013) and (Hachaichi et al., 2010) for data-centric documents; (Lin et al., 2008) for document-centric documents), and (2) Those who proposed MM specific for documents such as Galaxy model (Ravat et al., 2008) and Diamond model (Azabou et al., 2018).

Other research works as (Cabanac et al., 2010) and (Hernandez et al., 2008) were interested in the multi-representation of documents by using the concept of facet; a facet describes useful aspects of documents as semantics and context. In fact, various types of facets have been proposed in the literature; however, they are application domain-dependent. Therefore, it would be interesting to seek for standard facets, i.e., application domain-independent, thus enabling modeling of documents of any field.

In this paper, we propose our CobWeb as an extension to the Galaxy model (Ravat et al., 2008) and we base it on standard facets. Each facet includes a set of data and is considered as a means for users to express their needs; that is why we transform later facets into dimensions. Note that in multidimensional modeling, a dimension is a set of attributes called parameters that are organized, from the finest to the highest granularity, into hierarchies (e.g., a hierarchy for the TIME dimension could be: Day < Month < Quarter < Semester < Year) (Kimball, 1997). The dimension is an analysis axis whereas a parameter of a hierarchy represents an analysis level. Hierarchy parameters enable aggregating the fact’s measures1 and DrillDown and RollUp OLAP operations.

Integration of facets in an OLAP model raises a set of specific problems for which the classical DW MM are not made for and, therefore do not expect solutions. As examples of such problems, we cite the recursion of a parameter within a given hierarchy and the multiple use of a same dimension within the same analysis. To alleviate the drawbacks of existing models, the CobWeb model brings a set of extensions namely i) the exclusion constraint inter-dimensions, which prohibits using a given couple of dimensions in the same analysis; ii) the recursive parameter as a multi-valued parameter which values are organized hierarchically; iii) the duplicated dimension, i.e., used twice in the same analytical query; and iv) the correlated dimension which enables the move between dimensions during the same analysis.

From the other hand, we have proposed a set of three operators for the visualization of results of OLAP queries on documents; these operators rely on the concept of Tag-clouds in order to help decision-makers to better see and interpret query.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 1 Issue (2016)
Volume 6: 2 Issues (2015)
Volume 5: 2 Issues (2014)
Volume 4: 2 Issues (2013)
Volume 3: 2 Issues (2012)
Volume 2: 2 Issues (2011)
Volume 1: 2 Issues (2010)
View Complete Journal Contents Listing