Article Preview
TopIntroduction
With the proliferation of Internet computing (Lu et al., 2013; Zhang et al., 2013), spatial web objects that possess both a geographical location and a textual description are increasing. Location-positioning technologies, such as the build-in GPS in mobile devices, allow people to keep track of accurate users and objects locations. Many location-based services (Cong et al., 2009; Safar, M., 2009; Taniar et al., 2011) provide users the ability to associate geographical information to web objects, known as geo-tagging. At the same time, the textual information can be paired with web objects by the presence of a set of keywords, such as street addresses, building names or descriptive terms. Studies have found that at least one fifth of web queries are directed at location-related web objects.
This gives significance to spatial keyword queries (Cong et al., 2009; Li et al., 2011; Zhang et al., 2009; Cao et al., 2011; Wu et al., 2011; Rocha-Junior et al., 2011; Christoforaki et al., 2011; Wu et al., 2013), which given a location and a set of keywords, retrieve objects based on both spatial proximity and keyword similarity. One popular category of processing methods for spatial keyword queries is to create a geo-textual index (Cong et al., 2009; Li et al., 2011; Zhang et al., 2009; Cao et al., 2011; Wu et al., 2011; De Felipe, Hristidis, & Rishe, 2008; Zhang, Ooi, & Tung, 2010; Cao et al., 2012) by integrating a spatial index (e.g., the R-tree (Guttman, 1984) or its variations) with a keyword index filter, allowing keyword-based pruning when searching in the spatial index tree. The IR-tree (Cong et al., 2009) is one important geo-textual index for spatial keyword queries, and it has been used to support top-k spatial keyword queries (Cong et al., 2009; Cao et al., 2012), collective spatial keyword (Cao et al., 2011; Cao et al., 2012), moving top-k spatial keyword queries (Wu et al., 2011), etc.
However, it remains a challenge on how the geo-textual indices can be effectively constructed from scratch. Most of existing researches build the geo-textual indices incrementally, inserting spatial web objects into the index one by one. This does not take advantage of the case when all objects are known beforehand and thus can be inserted by using a single operation, called bulk loading. Generally, bulk loading is valuable to construct and optimize the indices when the datasets are static or not frequently updated. Many bulk loading algorithms (Roussopoulos & Leifker, 1985; Kamel & Faloutsos, 1993; Leutenegger, Lopez, & Edgington, 1997; Garcia, Lopez, & Leutenegger, 1998; Alborzi & Samet, 2007; Bercken & Seeger, 2001; Ghanem et al., 2004; Aronovich & Spiegler, 2010) have been proposed to construct spatial indices (e.g., the R-tree) effectively. Unfortunately, these algorithms consider only spatial factors, such as the overlap of sibling nodes. They work fairy well for certain kind of traditional spatial queries, but are not designed for spatial keyword queries. It is still an open problem as to how bulk loading can be utilized to improve the geo-textual index for spatial keyword queries.