Materialized View Selection Using Set Based Particle Swarm Optimization

Materialized View Selection Using Set Based Particle Swarm Optimization

Amit Kumar (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India) and T.V. Vijay Kumar (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India)
DOI: 10.4018/IJCINI.2018070102

Abstract

A data warehouse is a central repository of historical data designed primarily to support analytical processing. These analytical queries are exploratory, long and complex in nature. Further, the rapid and continuous growth in the size of data warehouse increases the response times of such queries. Query response times need to be reduced in order to speedup decision making. This problem, being an NP-Complete problem, can be appropriately dealt with by using swarm intelligence techniques. One such technique, i.e. the set-based particle swarm optimization (SPSO), has been proposed to address this problem. Accordingly, a SPSO based view selection algorithm (SPSOVSA), which selects the Top-K views from a multidimensional lattice, is proposed. Experimental based comparison of SPSOVSA with the most fundamental view selection algorithm shows that SPSOVSA is able to select comparatively better quality Top-K views for materialization. The materialization of these selected views would improve the performance of analytical queries and lead to efficient decision making.
Article Preview

1. Introduction

Most organizations continuously generate and/or capture data, as these are the most important assets available to the managers. The amount of data available, and generated day to day, is growing at an exponential rate. Further, the ubiquitous On-Line Transaction Processing systems in today’s business world are becoming increasingly obsolescent as these systems have certain inherent constraints. It is becoming increasingly time-consuming to analyze large volumes of data by using these systems. Thus, in order to deal with loads of business data, and also to analyze all the data gainfully to enhance decision-making, there is a need of a data warehouse (Inmon, 2003). A data warehouse integrates, or brings together, data from multiple heterogeneous sources e.g., marketing, sales, finance, etc. into one central site. The immense amount of data in the data warehouse includes current and historical data. It also contains a copy of the transaction data for the purpose of query and analysis (Kimball & Ross, 2002). It is used to generate reports or uncover trends. A data warehouse provides support to analytical reporting, structured queries, ad hoc queries and decision making.

Decision making queries are typically analytical and complex. The response times of these queries is excessive when dealt with using a huge data warehouse. Data warehousing and On-Line Analytical Processing (OLAP) tools are increasingly being used to obtain valuable information from the data generated in the organizations. OLAP Queries contain joins between tables or aggregations such as SUM, or both. These operations require considerable amounts of time and processing power resulting in unacceptable query performance, when executed against a data warehouse. In order to solve this problem, materialized views are stored in the warehouse. Materialized views pre-aggregate the data and accelerate query access by avoiding the raw data sources. Prior to the execution of the queries, they pre-compute the costly join and aggregation operations on the database and store such summarized the results separately in the database.

A fact table in a data warehouse consists of millions of rows whereby significant resources are required to process a single query. Pre-aggregation is an expedient OLAP strategy to improve the response time of queries. It requires the result to be saved to disk as materialized views. A view is formed by joining a set of dimension tables as specified by an OLAP query. The query optimizer spontaneously identifies a present materialized view to satisfy a request and uses it through rewriting the query. Thus, the queries are answered directly by materialized views and not by the original detail tables. A materialized view is like a cache and provides fast access to data. Analogous to normal views, materialized views are logical views of the data where the essential query result is saved to a table. Materialized views accelerate query execution. Since these associate various issues viz. selection, maintenance, evolution and answering queries using views. The focus of this paper is view selection. It is not possible to pre-compute and store all possible views due to space and maintenance cost constraints. So, a suitable subset of views needs to be selected that conforms to these constraints. For a fact table with n dimension tables, there are 2n possible views; therefore, due to the limited disk space and high maintenance cost, it would be impractical to think of materializing all possible views. The problem of choosing a suitable subset of views for materialization is known as the view selection problem, which is an NP complete problem (Gupta, 1997). There are a number of parameters that should be considered while selecting materialized views like the frequencies of user queries, base relation updates and query costs etc.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 12: 4 Issues (2018): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing