Multi-Objective Big Data View Materialization Using NSGA-II

Akshay Kumar, T. V. Vijay Kumar

Source Title: Information Resources Management Journal (IRMJ) 34(2)

DOI: 10.4018/IRMJ.2021040101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data views, in the context of distributed file system (DFS), are defined over structured, semi-structured and unstructured data that are voluminous in nature with the purpose to reduce the response time of queries over Big data. As the size of semi-structured and unstructured data in Big data is very large compared to structured data, a framework based on query attributes on Big data can be used to identify Big data views. Materializing Big data views can enhance the query response time and facilitate efficient distribution of data over the DFS based application. Given all the Big data views cannot be materialized, therefore, a subset of Big data views should be selected for materialization. The purpose of view selection for materialization is to improve query response time subject to resource constraints. The Big data view materialization problem was defined as a bi-objective problem with the two objectives- minimization of query evaluation cost and minimization of the update processing cost, with a constraint on the total size of the materialized views. This problem is addressed in this paper using multi-objective genetic algorithm NSGA-II. The experimental results show that proposed NSGA-II based Big data view selection algorithm is able to select reasonably good quality views for materialization.

Article Preview

Top

1 Introduction

In this era of information, knowledge and wisdom, informed decisions are taken using processed data, presented in various visual forms, which forms the backbone of Information Society. The development in information and communication technology, IoT devices (Cai et al,, 2016; Qin et. al. 2016), medical devices, Internet technologies etc. have resulted in generation of large volumes of Big data. The Big data is available in structured, semi-structured and unstructured forms and formats. It is very large in volume and is generated continuously at a rapid rate, but has low integrity (Jacobs A. 2009; Zikopoulos et al. 2011; Gupta et al., 2012, Kumar & Vijay Kumar, 2015; Tsai et al., 2015; Zhang et al. 2016). Big data is heterogeneous in nature, and is collected for specific application. Some of the major applications of Big data are in e-commerce, healthcare, scientific application, education, social welfare (Global Pulse, 2012), IoT (Ahmed et al., 2017) etc. The Big data is to be analyzed in real time for timely decisions. For example, a large University may use Big data analytics to predict it's future infrastructure development plans from the Big data consisting of student intake of past years, geographic data of student residential locations, success rate of the students, effectiveness of past publicity interventions, future demands of newer curriculum etc. A large amount of data, especially semi-structured and unstructured data, will get generated for such systems. The processing of such data is time consuming and require enhanced techniques for faster processing. View materialization is one such technique used for faster processing of data.

View materialization is a complex problem, as there can be very large number of possible views for materialization, but only few of these can be materialized due to storage space constraint. The view selection problem aims at identifying a set of views that optimizes the query response time simultaneously with continuous data updates, while utilizing minimal resources. It is a NP-Hard problem (Harinarayan et al., 1996; Chirkova et al. 2001).With the emergence of Big data, selection of views for materialization is required to address additional issues due to large data volume, continuous heterogeneous data and integrity of data. In addition, the data processing paradigm also shifted from structured data to semi-structured and unstructured data, which also resulted in the change in the data processing framework. The newer frameworks involve distributed file system (DFS) (Hadoop 2008, 2012; Manyika 2011), Apache Hadoop (Dezyre, 2015), map-reduce (Dean & Ghemawat, 2012; Hadoop, 2008, 2012), cloud map-reduce (Dahiphale et al., 2014) and rich set of newer databases and data warehouse technologies like NoSQL, Hive, BigTable, Neo4j (Kumar & Vijay Kumar 2021a).

Big data view materialization was studied in the context of Hive databases on a standard dataset using the map reduce cost of queries and views (Goswami et al. 2017). However, it was just an extension of view materialization for data warehouse to Big data warehouse and did not incorporate Big data characteristics for computing the fitness values of the objective functions. (Kumar & Vijay Kumar, 2021a) presented the Big data view materialization problem as a single objective optimization problem in the context of Big data characteristics. It also suggested to use number of DFS blocks of stored data to compute the fitness value of the objective functions. The Big data view selection problem was presented as bi-objective optimization problem in (Kumar & Vijay Kumar., 2021b), which was solved using Vector Evaluated Genetic Algorithm (VEGA). This paper addresses this bi-objective optimization problem using Non-dominated Sorting Genetic Algorithm (NSGA-II) (Deb et al. 2002; Deb, 2014).

Section 2, presents the view materialization problem in the context of different DBMS and types of data. Section 3 presents the view materialization in the context of Big data; Section 4 presents the process of identification of candidate views; Section 5 discusses the model for computation of costs for Big data views, which was defined in (Kumar & Vijay Kumar., 2021a). Section 6 presents the bi-objective Big data view materialization problem (Kumar & Vijay Kumar., 2021b). Section 7 presents the NSGA-II based algorithm for selection of Big data views for materialization. Section 8 presents an example for the algorithm; Section 9 presents the experimental results of the algorithm followed by conclusion in Section 10.

Next, a brief account of different research issues related to View materialization are presented.

Complete Article List

Search this Journal:

Reset

Volume 37: 1 Issue (2024)

Volume 36: 1 Issue (2023)

Volume 35: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 34: 4 Issues (2021)

Volume 33: 4 Issues (2020)

Volume 32: 4 Issues (2019)

Volume 31: 4 Issues (2018)

Volume 30: 4 Issues (2017)

Volume 29: 4 Issues (2016)

Volume 28: 4 Issues (2015)

Volume 27: 4 Issues (2014)

Volume 26: 4 Issues (2013)

Volume 25: 4 Issues (2012)

Volume 24: 4 Issues (2011)

Volume 23: 4 Issues (2010)

Volume 22: 4 Issues (2009)

Volume 21: 4 Issues (2008)

Volume 20: 4 Issues (2007)

Volume 19: 4 Issues (2006)

Volume 18: 4 Issues (2005)

Volume 17: 4 Issues (2004)

Volume 16: 4 Issues (2003)

Volume 15: 4 Issues (2002)

Volume 14: 4 Issues (2001)

Volume 13: 4 Issues (2000)

Volume 12: 4 Issues (1999)

Volume 11: 4 Issues (1998)

Volume 10: 4 Issues (1997)

Volume 9: 4 Issues (1996)

Volume 8: 4 Issues (1995)

Volume 7: 4 Issues (1994)

Volume 6: 4 Issues (1993)

Volume 5: 4 Issues (1992)

Volume 4: 4 Issues (1991)

Volume 3: 4 Issues (1990)

Volume 2: 4 Issues (1989)

Volume 1: 1 Issue (1988)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Multi-Objective Big Data View Materialization Using NSGA-II

Abstract

1 Introduction

Complete Article List