View Materialization Over Big Data

View Materialization Over Big Data

Akshay Kumar, T. V. Vijay Kumar
Copyright: © 2021 |Pages: 25
DOI: 10.4018/IJDA.2021010103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Advances in technology have resulted in the generation of a large volume of heterogeneous big data for large enterprises engaged in e-commerce, healthcare, education, etc. This is being created at a rapid rate but is low in its veracity. This big data includes large sets of semi-structured and unstructured data and is stored over a distributed file system (DFS). This data can be processed in a fault tolerant manner using several frameworks, tools, and advanced database technologies. Big data can provide important information, which can be used for business decision making. View materialization, which has been widely studied for structured databases or data warehouse, has been extended to big data to enhance efficiency of big data query processing. This paper focuses on the selection of big data views for materialization. The big data views can be identified by extracting a set of query attributes from the set of query workload of an enterprise. The query attributes are interrelated resulting in the creation of alternate access paths for query evaluation. The cost of query processing using big data views involves the integrity of different data types of heterogeneous big data, frequency of queries, change in the size of big data, selected sets of big data materialized views, and updates on big data and these sets of materialized views. The cost of query processing is computed using the stored size of big data views on the DFS system, which is a consistent processing framework of DFS. A big data view selection algorithm that is capable of selecting views from structured, semi-structured, and unstructured data has been proposed in this paper. The proposed algorithm would select big data views that would result in faster processing of most user queries resulting in efficient decision making.
Article Preview
Top

2 The Big Data Architecture For Large Data Stores

Big data is not just the data of multiple thousand entities but the repetitive data generated by such entities that make data Big (Jacobs, 2009). For example, the number of users, products, vendors etc. of e-commerce web sites may not make the data Big, but storing the number of transactions and other actions made by these users over the e-commerce application is what that makes the data Big. (Jacobs, 2009) suggested that a database of size more than 100 GB involving joins on non-key attributes, will require potentially very large computing resources and, therefore, cannot be considered as small data. Big data is also heterogeneous, as it includes structured data (Relational data); semi-structured data (XML or similar object data); unstructured data (text, voice, audio), data from web (social media, blogs, web logs, click streams); spatial data (coordinates, GPS data); data from sensors and RFID; and scientific data.

Complete Article List

Search this Journal:
Reset
Volume 5: 1 Issue (2024)
Volume 4: 1 Issue (2023)
Volume 3: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 2: 2 Issues (2021)
Volume 1: 2 Issues (2020)
View Complete Journal Contents Listing