Query Frequency based View Selection

Query Frequency based View Selection

Mohammad Haider (Saudi Electronic University, Dammam, Saudi Arabia) and T.V. Vijay Kumar (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India)
Copyright: © 2017 |Pages: 20
DOI: 10.4018/IJBAN.2017010103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

View selection deals with the selection of appropriate sets of views capable of improving the response times for queries while conforming to space constraints. Materializing all views is infeasible, as the number of possible views is exponential with respect to the number of dimensions and, thus, would not fit within the available storage space. Further, optimal view selection is an NP-Complete problem. Thus, the only remaining alternative is to select a subset of views that reduce the query response time and fit within the available space for materialization. The most fundamental greedy view selection algorithm HRUA considers the size parameter for computing the Top-K views for materialization. In each iteration, it computes the benefit, with respect to size, of all non-selected views, and selects the one entailing the highest benefit for materialization. Though these selected views may be beneficial in respect of their size, they may not be capable of answering large numbers of future queries thereby becoming an unnecessary space overhead. Existing query frequency based view selection algorithms, which address this problem, have been compared in this paper. Experimental results show that each of these algorithms, in comparison to HRUA, are able to select fairly good quality views that provide answers to comparatively greater numbers of queries. Materializing these selected views would facilitate the business decision making process.
Article Preview

Introduction

Globalization of businesses has led to voluminous amount of data being generated continuously over time. In this age of ever-changing data and a wants-driven economy, readily available and updated information plays a vital role in the formulation of optimal business strategies for gaining competitive advantage. To be, and remain, competitive in today’s volatile market, considerable efforts are required like conducting market research for identifying customer demands as against their needs. Exponential growth in the areas of information technology and information processing has been observed in the last few decades. Proper and timely availability of this processed information holds the key for businesses to survive. In order to meet this demand for information, the capture, and efficient storage, of the turbulent data that is to be processed for the purpose of analysis, should be the major focus. Such processed data generally proves useful for knowledge workers and/or decision makers in the decision making process. Availability of such data shall provide the business houses a substantive edge over their competitors.

With the advent of the era of technological enhancement in areas of software, analytics, hardware capabilities and data communication, most organizations have collected massive amounts of raw data. As a result, although most such organizations are data rich, they are lacking in cogent information (Gray & Watson, 1998; Han & Kamber, 2000) leading to valuable information getting lost inside humongous data, resulting in organizations struggling for the appropriate information. Hence, the availability of the appropriate information to the appropriate individual gets delayed. This reinforces the need for the data available in the operational/other data sources to be converted into useful information to enable the knowledge workers, and/or the decision makers to access and extract the hidden essential patterns for making optimal decisions at the right time. In order to analyze such mammoth data, a sound decision support system is required for suggesting solutions to business problems/queries that are complex and unstructured. In this era of mounting competition amongst organizations, the need of the hour is to have systems capable of extracting, storing and analyzing the hidden information concealed in the mammoth data available. There are two ways to access data in data sources namely, the lazy or the on-demand approach and the eager or the in-advance approach (Widom, 1995). In the former, operational sources are accessed in response to queries thus reducing the storage overhead, whereas in the latter, data is pre-computed and stored beforehand resulting in reduced communication costs. Query response times for the on-demand approach are comparatively high as it explores the operational sources in response to the query posed. On the other hand, data is pre-computed and stored in the warehouse resulting in improved query response times in case of the eager or in-advance approach. Data warehouse is based on the eager or in-advance approach (Widom, 1995).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 4 Issues (2015)
Volume 1: 4 Issues (2014)
View Complete Journal Contents Listing