Design and Implementation of Active Stream Data Warehouses

Design and Implementation of Active Stream Data Warehouses

Sandro Bimonte (National Research Institute of Science and Technology for Environment and Agriculture, Aubière, France), Omar Boussaid (Eric Lyon2, Bron, France), Michel Schneider (LIMOS, Aubiere, France) and Fabien Ruelle (Eric Lyon2, Bron, France)
Copyright: © 2019 |Pages: 21
DOI: 10.4018/IJDWM.2019040101

Abstract

In the era of Big Data, more and more stream data is available. In the same way, Decision Support Systems (DSS) tools, such as data warehouses and alert systems, become more and more sophisticated, and conceptual modeling tools are consequently mandatory for successfully DSS projects. Formalisms such as UML and ER have been widely used in the context of classical information and data warehouse systems, but they have not been investigated yet for stream data warehouses to deal with alert systems. Therefore, in this article, the authors introduce the notion of Active Stream Data Warehouse (ASDW) and this article proposes a UML profile for designing Active Stream Data Warehouses. Indeed, this article extends the ICSOLAP profile to take into account continuous and window OLAP queries. Moreover, this article studies the duality of the stream and OLAP decision-making process and the authors propose a set of ECA rules to automatically trigger OLAP operators. The UML profile is implemented in a new OLAP architecture, and it is validated using an environmental case study concerning the wind monitoring.
Article Preview
Top

1. Introduction

Decision-support systems are tools that make it possible to gain useful information from data. In particular, Business Intelligence (BI) systems, such as OLAP and data mining, represent a particular kind of DSS that allows decision-makers to explore warehoused data to aggregate and reveal unknown trends. BI tools are usually applied in the marketing and agricultural domains, among others. Alert systems are DSSs that inform users about some particular trend or outlier in the data. Risks such as fire and flood as well as fraud detection are classical applications of alert systems.

Nowadays, the variety and multitude of available data and Decision Support Systems (DSSs) can make it difficult to identify the right solution for applications. Consequently, the conceptual design of Business intelligence (BI) systems appears more and more useful since, as widely demonstrated, conceptual models allow users to exclusively focus on functional requirements and put aside technological issues (Torlone, 2002). Moreover, conceptual models, when implemented in Computer-aided software engineering (CASE) tools, allow for automatic and free-error implementations, which can be translated into important time and economic gains. Motivated by the relevance of the conceptual design of BI applications (Bimonte et al., 2016), in this work, we focus on Data warehouses (DWs) and stream data. Data warehouse and On-Line Analytical Processing (OLAP) tools are technologies intended to support decision making (Kimball, 1996). Data Warehouses are collections of historical data that naturally evolve over time. Several works address multidimensional data and structures changing over time from the theoretical and implementation points of view (Golfarelli & Rizzi, 2009). Among temporal data, we can distinguish stream data. Stream data have been defined as a real-time continuous ordered sequence of items (Golab & Özsu, 2003). Thus, it is impossible to control the order of arrival of items and store the data stream in its entirety. Queries on data streams run continuously over a period of time and incrementally return results as new data arrive. In the context of Big Data, more and more stream data sources are available, such as web clicks, sensors networks, and mobile phones. These systems produce a huge quantity of data in a real-time and continuous way. Data streams require particular data models, query languages and tools (Data Stream Management System - DSMS) (Esper, 2017; Golab & Özsu, 2003) for their storage and analysis. The major difference between Database Management Systems (DBMSs) and DSMSs is that the latter allow queries on stored data and also on data streamed from several sources. In addition, queries and results are triggered when new data arrive. As data come and go in DSMSs, new kinds of queries are defined using the concept of the window (Golab & Özsu, 2003), which defines a buffer based on time or number of received items. Windows can also slide over these parameters (sliding windows). Alert systems are usually implemented using DSMSs for the previously described features.

Warehousing and OLAPing stream data (Stream Data Warehouse) have been broadly investigated in the academic and industrial communities (Golab et al., 2009; Quartet, 2017). Indeed, Stream Data Warehouses (SDWs) have been effectively used in several application domains such as social network analysis and fraud detection. Some works investigate the definition of OLAP operations on stream data (Cuzzocrea & Chakravarthy, 2010). These works investigate the feasibility of the aggregation computation on data streams by providing new query languages, indexing methods, compression techniques, and system architectures. However, despite the relevance of conceptual design, to the best of our knowledge, there is no work investigating the conceptual modeling of stream data warehouses.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 17: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 16: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing