Article Preview
Top1. Introduction
In today’s era of information, data has become a crucial ingredient for strategic decision making. The data is collected, analyzed and mined in order to get an insight into the business performance and which facilitates improved strategic decisions to stay competitive in the business world. The data is collected and stored about every activity in the organization. It is spread across the world in multiple disparate databases due to business requirements. There are two ways to access this data from the disparate databases for the purpose of analysis. The first one is the on-demand or lazy approach, in which the data is retrieved from different sources at the time of query processing based on the requirement of the analytical query (Widom, 1995). The second one is the in-advance or eager approach, in which the data is retrieved in advance from disparate data sources and collected in a large repository. The analytical queries are processed against this repository. The first approach is useful when source databases are frequently updated, and the latter approach is suitable when updates in source databases occur infrequently (Widom, 1995). The in-advance or eager approach is referred to as data warehousing. A data warehouse is an organized collection of data or a repository of integrated information from various disparate distributed databases maintained for querying and analysis. Data warehouse stores data about business operations that is subject oriented, integrated, time-variant and non-volatile (Inmon, 2003; Kimball & Ross, 2002). A data warehouse usually has a vast database, and it grows over time. It is used in decision support systems for identifying patterns and trends in business operations. The analytical queries posed in a data warehousing environment require many complex joins and aggregate operations. These queries need a large amount of time for execution, which is generally unviable for business decisions, (Harinarayan et al., 1996). The one way improve query response time is to precompute the result of frequently-asked queries and store them as materialized views (Roussopoulos, 1997) in the data warehouse (Bello et al., 1998; Labio et al., 1997; Mohania et al., 1999). The materializing of all possible views can result in faster response times. However, it may not be possible to store such a large number of views due to many constraints, e.g., query processing cost, storage space, and maintenance cost. Thus, there is a need to select a suitable subset of views to materialize that improve the performance and satisfies the constraints in the data warehouse environment. The selection of such a suitable subset of views conforming to resource constraints is referred to as the view selection problem (Chirkova et al., 2001; Gupta, 1997; Yousri et al., 2005). View selection is one of the most challenging problems in the data warehouse and is known to be NP-complete, (Karloff & Mihail, 1999; Widom, 1995). View selection is discussed next.