Distributed Query Plan Generation Using Firefly Algorithm

Distributed Query Plan Generation Using Firefly Algorithm

Neha Singh, Jay Prakash, T.V. Vijay Kumar
DOI: 10.4018/IJOCI.2016010103
(Individual Articles)
No Current Special Offers


A large number of queries are posed daily against the distributed databases spread across the globe. Query processing strategies are used to generate efficient query plans for these queries. The number of such query plans increases exponentially with the increase in the number of involved sites and relations accessed by the query. Further, this complexity increases if the data is fragmented and replicated across multiple sites. This problem, referred to as the distributed query plan generation (DQPG) problem, is a combinatorial optimization problem. An attempt has been made in this paper to solve this DQPG problem using the Firefly Algorithm (FA), which is inspired by the flashing behaviour of fireflies in nature. The proposed FA based DQPG algorithm (DQPGFA), aims to generate distributed query plans incurring minimum Query Proximity Cost (QPC) value. The experimental results show that DQPGFA, in comparison to the GA based DQPG algorithm (DQPGGA), was able to select Top-K query plans that had a comparatively lesser average QPC value. Such generated query plans would, most likely, lead to an improvement in the query response time and thereby would result in effective and efficient decision making.
Article Preview

1. Introduction

Almost all modern business information systems, administrative processes in organizations, sciences and governments rely upon the database management systems (DBMS) (Haigh, 2006). As the volume of data is increasing, bottlenecks like central point failure, computation speed and storage limitation are emerging in centralized databases. Distributed database systems (DDBS) were designed to overcome these limitations (Elmasri & Navathe, 2008) (Özsu & Valduriez, 2011) (Ramakrishnan & Gehrke, 2000). A distributed database (DDB) is a collection of data, which is logically interconnected and is spread over a communication network (Özsu & Valduriez, 2011). DDBMS is a software system that manages distributed databases (Özsu & Valduriez, 2011). DDBMS makes the distribution and transaction transparent to the users. Multiple queries may be posed against the data stored in DDBMS, using a query language in an interactive way. This would allow timely retrieval of data, data storage and data modification (Haigh, 2006), (Ramakrishnan & Gehrke, 2000). Distributed database systems have the capability to decentralize the data at geographically dispersed locations. The distributed query posed by a user accesses data from different sites in order to obtain the desired result. The process of retrieval of query specific information from different sites connected by a network, is referred to as distributed query processing (Apers, Hevner, & Yao, 1983). Generating an optimized strategy for executing a distributed query in a most cost-effective way is a major challenge and problem in distributed query processing. Query processing in a distributed environment is more complex than in a centralized environment because of the large number of parameters affecting the performance of the distributed queries. The data may be fragmented and/or replicated across multiple sites. Thus, accessing this data residing at multiple sites may lead to an increase in the query response time (Pandey, 2013). The high level query is posed on a distributed database system by a user in the form of a relational calculus query, but the execution of the low level query is done at the remote local database site using the extension of relational algebra with communication operators (Aljanaby, Abuelrub, & Odeh, 2005). Distributed query processing maps the input calculus query to an optimized distributed query execution plan. This process of producing query execution plans (QEP) is also referred to as query optimization. QEP denotes an execution strategy, which aims to minimize the objective cost function. Query optimizer is a software module, which performs such query optimization. Computing the best query execution plan is difficult in a distributed environment because the relations accessed by the query may be fragmented or replicated at multiple sites (Ioannidis & Kang, 1990) (Alom, Henskens, & Hannaford, 2009; Jarke & Koch, 1984). It leads to high query response time due to high data transmission cost. The query response time can be reduced if the ordering of operations is optimized. DQP’s main goal is to transform a distributed query into an efficient query execution strategy that minimizes the data transmission between sites thereby reducing the query response time. To execute a distributed query, all sites, containing data required to answer the query, need to be accessed. Distributed query plans that access only those copies of data that minimize the inter-site data transmission cost are considered economical. This problem is a combinatorial optimization problem in distributed database system (Jarke & Koch, 1984) (Taylor, 2010). One such distributed query plan generation(DQPG) problem given in (Vijay Kumar, Singh, & Verma, 2010, 2011) has been addressed in this paper. This problem is discussed next.

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing