Bulk Construction of Geo-Textual Indices

Bulk Construction of Geo-Textual Indices

Dongsheng Li, Jinkun Pan, Jiaxin Li, Kian-Lee Tan
Copyright: © 2014 |Pages: 19
DOI: 10.4018/ijdwm.2014070102
(Individual Articles)
No Current Special Offers


With the proliferation of online web objects, spatial keyword queries that retrieve objects satisfying both spatial and keyword conditions are becoming popular. The geo-textual index that integrates a spatial index with a keyword filter offers a promising approach for processing such queries efficiently. However, it is still an open problem on how the geo-textual indices can be effectively constructed from scratch. This paper proposes a bulk loading algorithm for the geo-textual indices, which constructs the indices top-down efficiently. It utilizes a two-phase method to construct the children of nodes at each level of the geo-textual index, taking both spatial and keyword information into consideration, and thus optimizes the geo-textual index for spatial keyword queries. The time complexity of the algorithm is analyzed theoretically, and comprehensive experiments with both real and synthetic datasets show that it can build high quality geo-textual indices with good query performance and space utilization.
Article Preview


With the proliferation of Internet computing (Lu et al., 2013; Zhang et al., 2013), spatial web objects that possess both a geographical location and a textual description are increasing. Location-positioning technologies, such as the build-in GPS in mobile devices, allow people to keep track of accurate users and objects locations. Many location-based services (Cong et al., 2009; Safar, M., 2009; Taniar et al., 2011) provide users the ability to associate geographical information to web objects, known as geo-tagging. At the same time, the textual information can be paired with web objects by the presence of a set of keywords, such as street addresses, building names or descriptive terms. Studies have found that at least one fifth of web queries are directed at location-related web objects.

This gives significance to spatial keyword queries (Cong et al., 2009; Li et al., 2011; Zhang et al., 2009; Cao et al., 2011; Wu et al., 2011; Rocha-Junior et al., 2011; Christoforaki et al., 2011; Wu et al., 2013), which given a location and a set of keywords, retrieve objects based on both spatial proximity and keyword similarity. One popular category of processing methods for spatial keyword queries is to create a geo-textual index (Cong et al., 2009; Li et al., 2011; Zhang et al., 2009; Cao et al., 2011; Wu et al., 2011; De Felipe, Hristidis, & Rishe, 2008; Zhang, Ooi, & Tung, 2010; Cao et al., 2012) by integrating a spatial index (e.g., the R-tree (Guttman, 1984) or its variations) with a keyword index filter, allowing keyword-based pruning when searching in the spatial index tree. The IR-tree (Cong et al., 2009) is one important geo-textual index for spatial keyword queries, and it has been used to support top-k spatial keyword queries (Cong et al., 2009; Cao et al., 2012), collective spatial keyword (Cao et al., 2011; Cao et al., 2012), moving top-k spatial keyword queries (Wu et al., 2011), etc.

However, it remains a challenge on how the geo-textual indices can be effectively constructed from scratch. Most of existing researches build the geo-textual indices incrementally, inserting spatial web objects into the index one by one. This does not take advantage of the case when all objects are known beforehand and thus can be inserted by using a single operation, called bulk loading. Generally, bulk loading is valuable to construct and optimize the indices when the datasets are static or not frequently updated. Many bulk loading algorithms (Roussopoulos & Leifker, 1985; Kamel & Faloutsos, 1993; Leutenegger, Lopez, & Edgington, 1997; Garcia, Lopez, & Leutenegger, 1998; Alborzi & Samet, 2007; Bercken & Seeger, 2001; Ghanem et al., 2004; Aronovich & Spiegler, 2010) have been proposed to construct spatial indices (e.g., the R-tree) effectively. Unfortunately, these algorithms consider only spatial factors, such as the overlap of sibling nodes. They work fairy well for certain kind of traditional spatial queries, but are not designed for spatial keyword queries. It is still an open problem as to how bulk loading can be utilized to improve the geo-textual index for spatial keyword queries.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing