OLAP over XLM Data

OLAP over XLM Data

Alfredo Cuzzocrea
Copyright: © 2014 |Pages: 9
DOI: 10.4018/978-1-4666-5202-6.ch150
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Chapter Preview

Top

Introduction

Data cubes (Gray et al., 1997) are widely regarded as powerful tools for OnLine Analytical Processing (OLAP) (Codd et al., 1993). Based on a multidimensional and multi-resolution vision of data, data cubes are able of supporting a wide set of analysis methodologies for decision making purposes in a plethora of application scenarios ranging from Data Warehousing (DW) to Decision Support Systems (DSS) and Business Intelligence (BI). The success of OLAP analysis methodologies in the context of multidimensional data arising in the above-mentioned application scenarios has given rise to a solid and well-developed technology (Chaudhuri & Dayal, 1997), with important follows in both the academic and industrial research communities.

Symmetrically, eXtensible Markup Language (XML) has become very popular in actual Enterprise Information Systems, due to its well-understood capabilities and nice amenities. Among these, capturing and modeling typical enterprise data, which are inherently semi-structured in nature, plays a leading role. Furthermore, XML is also a dominant data integration language/formalism, due to its capability of acting as lingua franca for heterogeneous data schemas and formats.

It has been very easy to foresee the marriage between OLAP and XML technologies. In fact, on a side, enterprise data become more and more trendy in the context of next-generation application scenarios where heterogeneity of data and software platforms is the main issue to be faced-off, such as Web and Grid Service-based Systems, P2P Networks, Business Process Management Systems, and so forth. On the other side, OLAP analysis methodologies are able to support critical analysis methodologies over multidimensional semi-structured data sets underlying the above-mentioned environments, being these data sets characterized by large volumes, high dimensionality, strong heterogeneity, high correlation. To give some relevant examples, noticeable ones are: decision making, trend analysis, time series analysis, analytics, and so forth.

From the convergence of OLAP and XML technologies, a challenging research issue derives: “How to compute an OLAP data cube over XML data?” This problem can be formalized as follows. Given an XML data source X and an OLAP logical schema 978-1-4666-5202-6.ch150.m01 modeling a data cube 978-1-4666-5202-6.ch150.m02, compute from elements in 978-1-4666-5202-6.ch150.m03 the set of SQL-based aggregations populating data cells in 978-1-4666-5202-6.ch150.m04. This implies that a certain SQL aggregate operator 978-1-4666-5202-6.ch150.m05 is given as input. Popular aggregate operators that are widely used in OLAP are:

  • 1.

    SUM: which retrieves the summation of a set of (numerical) XML elements 978-1-4666-5202-6.ch150.m06 stored in 978-1-4666-5202-6.ch150.m07;

  • 2.

    COUNT: which counts the number of elements of a set of (numerical or categorical) XML elements 978-1-4666-5202-6.ch150.m08 stored in 978-1-4666-5202-6.ch150.m09;

  • 3.

    AVG: which retrieves the average value of a set of (numerical) XML elements 978-1-4666-5202-6.ch150.m10 stored in 978-1-4666-5202-6.ch150.m11.

Key Terms in this Chapter

Data Warehousing: A central repository of current and historical data made by integrating data from heterogeneous sources.

SQL Aggregate Operator: A SQL statement that applies an aggregate function to a set of tuples with the goal of grouping them together, based on certain criteria to form a single value.

Business Intelligence: A set of theories, methodologies, architectures, and technologies that transform raw data into meaningful and useful information and knowledge for business purposes, by handling large amounts of both structured and unstructured data.

Data Cube: A multidimensional dataset used to explore and analyze business data from many different perspectives.

On-Line Analytical Processing (OLAP): Designate a set of software techniques for interactive analysis of large amounts of multidimensional data from multiple perspectives.

XML: Markup language designed for exchanging data on the World Wide Web.

Decision Support Systems: A class of systems that, based on Data Mining and OLAP methodologies, provide support to decision makers of large business organizations.

Complete Chapter List

Search this Book:
Reset