Article Preview
Top1. Introduction
The penetration of smart technologies has made it increasingly convenient to capture and store the data of day to day business transactions. It has become standard practice in the business world to make every transaction in digital form. The transactional data has a hidden value, which can provide useful insight about business performance. These trends and insights aid smarter business decisions. If this transactional data is analyzed and used properly, it can empower the business world to make smarter decisions about their future business operations. In today’s competitive business environment, smarter decisions are necessary in order to sustain in the global market. The multi-national companies capture business transactional data and store them in multiple disparate databases spread across the globe.
There are two approaches to access this information namely the lazy (on-demand) approach or eager (in-advance) approach (Widom, 1995). In the former approach, the data is gathered based on the user query and is used when data at local data sources changes frequently. In the latter approach, the data is accumulated and stored apriori in a central repository and queries are processed against this already stored information. A data warehouse is based on the latter approach. In a data warehouse, relevant data accumulated from multiple disparate databases, spread across multiple locations, is integrated and stored for analytical query processing. A data warehouse stores subject-specific data, which is non-volatile and time-variant, integrated from multiple sources for supporting strategic decision making (Inmon, 2003; Kimball & Ross, 2003). The complex analytical queries are posed against the data in the data warehouse in order to get insights and trends for business operations. These complex and analytical queries take a lot of time for processing considering that a data warehouse grows continuously with time as data in it is non-volatile. This processing time can be reduced by materializing views in the data warehouse and use these for querying purposes (Mohania et al., 1999). Since all possible views cannot be materialized due to storage space constraints, an appropriate subset amongst them needs to be selected that conform to the storage space constraint and can result in efficient decision making. The selection of such a subset of relevant views is referred to as view selection (Chirkova et al., 2002). View selection is discussed next.