Multi-Objective Materialized View Selection Using Improved Strength Pareto Evolutionary Algorithm

Multi-Objective Materialized View Selection Using Improved Strength Pareto Evolutionary Algorithm

Jay Prakash (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India) and T. V. Vijay Kumar (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India)
DOI: 10.4018/IJAIML.2019070101

Abstract

A data warehouse system uses materialized views extensively in order to speedily tackle analytical queries. Considering that all possible views cannot be materialized due to maintenance cost and storage constraints, the selection of an appropriate set of views to materialize that achieve an optimal trade-off among query response time, maintenance cost, and the storage constraint becomes an essential necessity. The selection of such an appropriate set of views for materialization is referred to as the materialized views selection problem, which is an NP-Complete problem. In the last two decades, several new selection approaches, based on heuristics, have been proposed. Most of these have used a single objective or weighted sum approach to address the various constraints. In this article, an attempt has been made to address the bi-objective materialized view selection problem, where the objective is to minimize the view evaluation cost of materialized views and the view evaluation cost of the non-materialized views, using the Improved Strength Pareto Evolutionary Algorithm. The experimental results show that the proposed multi-objective view selection algorithm is able to select the Top-K views that achieves a reasonable trade-off between the two objectives. Materializing these selected views would reduce the query response times for analytical queries and thereby facilitates the decision-making process.
Article Preview
Top

1. Introduction

In today’s era of information, data has become a crucial ingredient for strategic decision making. The data is collected, analyzed and mined in order to get an insight into the business performance and which facilitates improved strategic decisions to stay competitive in the business world. The data is collected and stored about every activity in the organization. It is spread across the world in multiple disparate databases due to business requirements. There are two ways to access this data from the disparate databases for the purpose of analysis. The first one is the on-demand or lazy approach, in which the data is retrieved from different sources at the time of query processing based on the requirement of the analytical query (Widom, 1995). The second one is the in-advance or eager approach, in which the data is retrieved in advance from disparate data sources and collected in a large repository. The analytical queries are processed against this repository. The first approach is useful when source databases are frequently updated, and the latter approach is suitable when updates in source databases occur infrequently (Widom, 1995). The in-advance or eager approach is referred to as data warehousing. A data warehouse is an organized collection of data or a repository of integrated information from various disparate distributed databases maintained for querying and analysis. Data warehouse stores data about business operations that is subject oriented, integrated, time-variant and non-volatile (Inmon, 2003; Kimball & Ross, 2002). A data warehouse usually has a vast database, and it grows over time. It is used in decision support systems for identifying patterns and trends in business operations. The analytical queries posed in a data warehousing environment require many complex joins and aggregate operations. These queries need a large amount of time for execution, which is generally unviable for business decisions, (Harinarayan et al., 1996). The one way improve query response time is to precompute the result of frequently-asked queries and store them as materialized views (Roussopoulos, 1997) in the data warehouse (Bello et al., 1998; Labio et al., 1997; Mohania et al., 1999). The materializing of all possible views can result in faster response times. However, it may not be possible to store such a large number of views due to many constraints, e.g., query processing cost, storage space, and maintenance cost. Thus, there is a need to select a suitable subset of views to materialize that improve the performance and satisfies the constraints in the data warehouse environment. The selection of such a suitable subset of views conforming to resource constraints is referred to as the view selection problem (Chirkova et al., 2001; Gupta, 1997; Yousri et al., 2005). View selection is one of the most challenging problems in the data warehouse and is known to be NP-complete, (Karloff & Mihail, 1999; Widom, 1995). View selection is discussed next.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 2 Issues (2020): 1 Released, 1 Forthcoming
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing