Formalizing the Mapping of UML Conceptual Schemas to Column-Oriented Databases

Formalizing the Mapping of UML Conceptual Schemas to Column-Oriented Databases

Fatma Abdelhedi, Amal Ait Brahim, Gilles Zurfluh
Copyright: © 2018 |Pages: 25
DOI: 10.4018/IJDWM.2018070103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.
Article Preview
Top

1. Introduction

Typically, a decision support system is based on two components: Data Warehouse (DW) and one or more Data Marts (DMs) (Figure 1). DW is a database used for decision-making, where data are either gathered from existing sources or directly entered to meet the needs of a decision-support application (for this last case, see the medical application presented in Section Motivation). Staring from DW we extract subsets, called Data Marts, on which we apply OLAP operations. DMs are designed according to a multidimensional model (star schema, snowflake schema or fact constellation schema) (Teste, 2010) in order to meet the particular demands of a specific decision makers group. In contrast, DW is not directly accessible to decision makers, there is therefore no need to use a multidimensional model to describe it; the relational model was the most effective model used for this.

The influence of Big Data challenged this traditional approach that uses relational databases for data warehousing. This is primarily due to data that has become highly distributed, loosely structured and is growing at exponential rates. Usually, we use Volume, Variety and Velocity, known as 3Vs (Douglas, 2001), to characterize the concept of Big Data. Volume is the size of the data set that needs to be processed, Variety describes different data types including factors such as format, structure, and sources, and Velocity refers to the speediness with which data may be analyzed and processed. Most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform these storage and analytical processes, it’s necessary to deal with new challenges in designing and creating DW.

Indeed, some new considerations should be verified by the database used for data warehousing. It should have the ability to: (1) integrate all possible data structures, (2) combine multiple data sources, (3) scale at relatively low cost, and (4) analyze large volumes of data. Relational warehouses are mature data management technology. However, with the rise of Big Data, these systems became unfit for large, distributed data management. The major problems of relational technologies are: (1) the horizontal scale: Relational databases were mainly designed for single-server configurations. To scale relational database, it has to be distributed across multiple powerful servers that are expensive. Furthermore, handling tables across different servers is difficult. (2) a strict data model to design prior to data processing: in Big Data context, it should be easy to add and analyze new data regardless of its type (structured, semi-structured or unstructured); But the problem is that relational models are hard to change incrementally without impacting performance or taking the database offline. As a result, new kind of DBMS, known as “NoSQL” (Cattell, 2011), have appeared. NoSQL databases are well suited for managing large volume of data and they keep good performance when scaling up (Angadi, 2013). Using NoSQL for data warehousing has become a necessity for a wider number of reasons, mainly relating to the high performance provided by these systems (Herrero, 2016).

This work deal with creating a DW in Big Data context, and is motivated by the needs of a medical application. This application generates a continuous stream of complex data (patient histories, visit summaries, paper prescriptions, radiology reports, etc.) that will be directly entered into a DW (§2). To describe this DW, a conceptual data model closer to human thinking is required; the choice for such model has been UML (Abello, 2015). Our purpose is to assist developers in creating the DW on a NoSQL database. For this, we propose an automatic process that transforms UML conceptual model describing DW into a NoSQL model.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing