Towards Convergence in Information Systems Design

Towards Convergence in Information Systems Design

Copyright: © 2020 |Pages: 17
DOI: 10.4018/978-1-7998-2975-1.ch011
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Three technologies—business intelligence, big data, and machine learning—developed independently and address different types of problems. Data warehouses have been used as systems for business intelligence, and NoSQL databases are used for big data. In this chapter, the authors explore the convergence of business intelligence and big data. Traditionally, a data warehouse is implemented on a ROLAP or MOLAP platform. Whereas MOLAP suffers from having propriety architecture, ROLAP suffers from the inherent disadvantages of RDBMS. In order to mitigate the drawbacks of ROLAP, the authors propose implementing a data warehouse on a NoSQL database. They choose Cassandra as their database. For this they start by identifying a generic information model that captures the requirements of the system to-be. They propose mapping rules that map the components of the information model to the Cassandra data model. They finally show a small implementation using an example.
Chapter Preview
Top

Introduction

Business Intelligence (BI), Big Data and Machine Learning (ML) are three among the major technological developments in the last 15 years. Business Intelligence encompasses query reporting, data mining in the context of providing decision support. It is based on Data Warehouse (DW) technology. Traditionally, Data Warehouse (DW) star schemas are implemented either using a relational database which allows ROLAP operations or on a multi-dimensional database that allows MOLAP operations. While the data in the former is stored in relational tables, the data in the latter are stored in multidimensional databases (MDB). MDBs use either multi-dimensional array or hypercubes to store this data. A number of RDBMS offer support for building DW systems and for ROLAP queries. MOLAP engines have proprietary architectures. This results in niche servers and is often a disadvantage.

One of the early views of Big Data is that any data satisfying the properties of Velocity, Volume, Variety is big data; this was expanded to include Veracity. Clearly, based on this definition there are two major concerns (a) building a repository for storage of large amounts of data, (b) accommodating a variety of data. To address (a), there was a shift away from vertical scaling to what is called horizontal scaling. Unlike vertical scaling, horizontal scaling is done using commodity machines. Horizontal scaling leads to a repository of data which is distributed across nodes and datacenters. Now, to address (b), variety includes structured, semi-structured and unstructured data. While traditional relational databases are able to store structured data, unstructured data can be stored as a BLOB. The BLOB does not allow full range of querying and processing. Thus, a new model and architecture for databases was required that also provided horizontal scaling. The answer was found in NoSQL databases.

The third technological development is Machine Learning (ML). The area develops and applies algorithms enabling a system to learn. Notice, the system learns by itself without any additional explicit program being written. This may be done through learning patterns or inference rules. The aim of this learning is to gain insights and improve user experience. ML algorithms make no commitment to data storage and management.

If we compare the three technologies from the query viewpoint, we find that BI is oriented to provide business information; Big Data systems improve execution of unstructured and distributed data; and finally ML improves the quality of data in the hands of the user. The first relies on an explicit data storage and architecture of a data warehouse, the second relies on the NoSQL data storage and architecture whereas the ML de-emphasizes the data aspects but deals with the processing aspects almost exclusively. It can be seen that these three technologies reflect the tension between data orientation and process orientation in information systems with BI and Big Data at the data end and ML at the process end.

Figure 1.

The three technological islands

978-1-7998-2975-1.ch011.f01

The three technologies developed independently and at different times (see figure 1): BI was the earliest followed by Big data and ML that were developed almost at the same time. Notice that these three technologies, in so far as they address different domains, are isolated from one another. Yet, there is no reason why these could not benefit from cross fertilization. Indeed there is a case for convergence of these three.

Notice that ML algorithms depend on “lots of data” to effectively run the algorithms. In fact, they not only need bulk storage of data but also historical data. For example, they may need voice data for the last 4 years for analysis. Further they require a system that enables quick random “reads”. Based on the white paper (TDWI, 2018) there are eight requirements for storage of ML data some of which are need for scalability, durability and parallel architecture. Notice, these requirements are satisfied by a Big Data system. The input to an ML algorithm can be unstructured or structured data. Outputs are typically smaller and output storage can be often handled easily.

Complete Chapter List

Search this Book:
Reset