Article Preview
Top1. Introduction
Most organizations continuously generate and/or capture data, as these are the most important assets available to the managers. The amount of data available, and generated day to day, is growing at an exponential rate. Further, the ubiquitous On-Line Transaction Processing systems in today’s business world are becoming increasingly obsolescent as these systems have certain inherent constraints. It is becoming increasingly time-consuming to analyze large volumes of data by using these systems. Thus, in order to deal with loads of business data, and also to analyze all the data gainfully to enhance decision-making, there is a need of a data warehouse (Inmon, 2003). A data warehouse integrates, or brings together, data from multiple heterogeneous sources e.g., marketing, sales, finance, etc. into one central site. The immense amount of data in the data warehouse includes current and historical data. It also contains a copy of the transaction data for the purpose of query and analysis (Kimball & Ross, 2002). It is used to generate reports or uncover trends. A data warehouse provides support to analytical reporting, structured queries, ad hoc queries and decision making.
Decision making queries are typically analytical and complex. The response times of these queries is excessive when dealt with using a huge data warehouse. Data warehousing and On-Line Analytical Processing (OLAP) tools are increasingly being used to obtain valuable information from the data generated in the organizations. OLAP Queries contain joins between tables or aggregations such as SUM, or both. These operations require considerable amounts of time and processing power resulting in unacceptable query performance, when executed against a data warehouse. In order to solve this problem, materialized views are stored in the warehouse. Materialized views pre-aggregate the data and accelerate query access by avoiding the raw data sources. Prior to the execution of the queries, they pre-compute the costly join and aggregation operations on the database and store such summarized the results separately in the database.
A fact table in a data warehouse consists of millions of rows whereby significant resources are required to process a single query. Pre-aggregation is an expedient OLAP strategy to improve the response time of queries. It requires the result to be saved to disk as materialized views. A view is formed by joining a set of dimension tables as specified by an OLAP query. The query optimizer spontaneously identifies a present materialized view to satisfy a request and uses it through rewriting the query. Thus, the queries are answered directly by materialized views and not by the original detail tables. A materialized view is like a cache and provides fast access to data. Analogous to normal views, materialized views are logical views of the data where the essential query result is saved to a table. Materialized views accelerate query execution. Since these associate various issues viz. selection, maintenance, evolution and answering queries using views. The focus of this paper is view selection. It is not possible to pre-compute and store all possible views due to space and maintenance cost constraints. So, a suitable subset of views needs to be selected that conforms to these constraints. For a fact table with n dimension tables, there are 2n possible views; therefore, due to the limited disk space and high maintenance cost, it would be impractical to think of materializing all possible views. The problem of choosing a suitable subset of views for materialization is known as the view selection problem, which is an NP complete problem (Gupta, 1997). There are a number of parameters that should be considered while selecting materialized views like the frequencies of user queries, base relation updates and query costs etc.