Distributed Query Plan Generation using Bacterial Foraging Optimization

Distributed Query Plan Generation using Bacterial Foraging Optimization

Jay Prakash (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India), Neha Singh (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India) and T.V. Vijay Kumar (School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India)
Copyright: © 2017 |Pages: 26
DOI: 10.4018/IJKSS.2017010101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In distributed database systems, relations are replicated and fragmented at multiple sites to ensure easy availability and greater reliability. This leads to an exponential increase in the possible alternatives available for selecting the set of sites, constituting a query plan, for processing. Computing the optimal query plans, from amongst all possible query plans, is a discrete combinatorial optimization problem. This Distributed Query Plan Generation (DQPG) problem has been addressed using Bacterial Foraging Optimization (BFO) in this paper. Here, a novel BFO based DQPG algorithm (DQPGBFO), which generates the Top-K distributed query plans having the minimum total query processing cost, has been proposed. Experimental comparison of DQPGBFO with the existing Genetic Algorithm (GA) based DQPG algorithm (DQPGGA) shows that the former is able to generate Top-K query plans that have a comparatively lower total cost of processing a distributed query. This, in turn, leads to a reduction in the query response time and thus aids in decision making.
Article Preview

1. Introduction

Distributed databases are essential in the current business environment, in which data spread across multiple geographical locations is connected by a computer network (Ceri & Pelagatti, 1984). An effective distributed database management system (DDBMS) is required to manage and retrieve data from such disparate data sources. DDBMS provides a simple, unified interface to the user, which appears like a single database instead of disparate databases (Ozsu & Valduriez, 2011). Query processing plays a crucial role in the performance of DDBMS (Rho & March, 1997). The query processing problem is far more challenging in a distributed environment, since a distributed query may involve relations that are fragmented and/or replicated across multiple distinct sites leading to the incurring of site-to-site data transmission costs. In order to address this problem, an optimal Distributed Query Processing (DQP) strategy has to be devised. In DQP, the user queries are analysed and transformed into a set of data manipulation operations (Ozsu & Valduriez, 2011). In DDBMS, first the user query is passed to a query parser, where the user query is parsed and the corresponding relational algebraic expression is generated (Liu & Yu, 1993). Using the query optimizer, an optimal relational algebraic expression, comprising an effective query processing plan, is generated. The user query is broken into sub-queries and the execution plan is devised for all such sub-queries, each of which is processed in parallel at their respective sites. The results of these sub-queries are thereafter integrated to produce the final result for the user query (Ozsu & Valduriez, 2011). Thus, the goal of the DQP strategy is to generate query processing plans in a manner that reduces the data transmission cost, which would substantially reduce the total cost of processing a distributed query. This query processing cost comprises of local processing costs and site-to-site data transmission cost (Ozsu & Valduriez, 2011).

In DQP, the data transmission cost is more significant than the local processing cost (Ozsu & Valduriez, 2011). In order to make distributed query processing more efficient, communication of data between sites needs to be minimized in order to reduce the site-to-site data transmission cost. Further, an optimal ordering of operations like selection, projection and join could minimize the inter site data transfers resulting in reduced query response times. In DDBMS, multiple copies of a relation are stored at different sites to enhance data availability. Often user queries require access to multiple relations stored at multiple sites for their processing. So, it would be economical to access only those copies of relations for answering the user query that would minimize the inter-site data transmission.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing