Yet Another Multidimensional Model for XML Documents

Yet Another Multidimensional Model for XML Documents

Maha Azabou, Kais Khrouf, Jamel Feki, Chantal Soulé-Dupuy, Nathalie Vallès
DOI: 10.4018/IJSITA.2017070105
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The Diamond model is a multidimensional model dedicated to XML document warehouses. It considers structured and unstructured data simultaneously. Furthermore, it orders the semantics of documents via a specific semantic dimension linked to conventional dimensions, thus breaking the classical orthogonality rule of dimensions. After giving an overview of their three-phase quasi-automatic approach for the generation of the diamond model, the authors focus on the Diamond-Gen software tool that supports the proposed approach. The authors illustrate the Diamond-Gen functionalities and assess it through an experimental study using a set of 1500 XML documents issued from the PubMed collection.
Article Preview
Top

1. Introduction

To support their data analytical processes, today’s organizations deploy data warehouses and client OLAP tools (On-Line Analytical Processing) to access, analyze and visualize integrated and aggregated data. The literature review distinguishes two categories of data: structured, and unstructured. Structured data often organize either as relational databases for OLPT (On-Line Transaction Processing) or as data-centric documents for electronic/Internet uses, whereas unstructured data are presented as text-oriented documents. To manage and analyze these both categories, organizations exploit the powerful features the eXtensible Markup Language (XML) offers. XML documents constitute a core source for some decisional analyses since their contents help decision makers to better manage the evolution of business processes. An XML document is generally compliant to a generic grammar called DTD (Document Type Definition). In order to conduct analyses on XML documents it is necessary to extend the traditional data warehouse model (Golfarelli, Maio, & Rizzi, 1998), initially dedicated to numeric data, with capabilities for handling the content of these documents and, in particular, focusing on the document semantics.

To do so, we have proposed the Diamond multidimensional model (Azabou, Khrouf, Feki, Soulé-Dupuy, & Vallès, 2016) dedicated to the design and OLAP of XML documents. The Diamond model focuses on three aspects: (1) text-data structure, (2) text-data semantic, and (3) flexibility of the analysis in the sense the specification of multidimensional analyses performs without constraining the decision-maker with predefined subjects of analysis (i.e., facts). Thus, the Diamond model offers OLAP analysis on the semantics contained in XML text-oriented documents and on structural data simultaneously. This semantic OLAPing is becoming possible thanks to two new dimensions introduced in the Diamond model, namely Semantic and Standard dimensions (Azabou, Khrouf, Feki, Soulé-Dupuy, & Vallès, 2014).

Furthermore, we help the designer designing a document warehouse (DocW) schema according to the Diamond model starting from a collection of XML documents and the logical structure (DTD or XSchema). For this purpose, we suggest first, a design approach relying on a set of eleven heuristic rules for determining the components of the model (e.g., dimensions and hierarchies), and secondly, we develop a software prototype called Diamond-Gen that supports the rule-based approach; it enables generating automatically a Diamond DocW schema from a set of XML documents.

This paper is organized as follows. Section 2 presents related works dealing with the multidimensional modeling of documents. In Section 3, we give an overview of our approach for building a document warehouse schema compliant to the Diamond model. Section 4 describes our rules for generating a Diamond multidimensional schema adapted to the OLAP analysis of XML documents. Section 5 shows the functionalities of the Diamond-Gen software prototype. Finally, Section 6 concludes the paper and gives an overview of our current works.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing