Article Preview
Top1. Introduction
Linked data provides a universal paradigm for publishing and sharing structural knowledge on the web. The adoption of the Linked Data best practices has led to the extension of the Web with a global data space connecting data from diverse domains. From these data, an important knowledge we can discover is the explicit or hidden relationship among multiple objects, which is terminologically named as Semantic Association. An early statement of semantic association can be found in (Aleman-meza, Halaschek, Arpinar, & Sheth, 2003), in which Semantic Associations were defined as paths connecting two objects. Discussions of Semantic Associations in linked data have lasted for more than ten years. Researchers have proposed different association models, from path model (Pirrò, 2015) to sub-graph model (Xiang Zhang, Zhao, & Wang, 2012), (Cheng, Liu, & Qu, 2016), from associations between object pairs (Fang, Sarma, Yu, & Bohannon, 2011) to associations among multiple objects (Chen, C., Wang, G., Liu, H., Xin, J., Yuan, 2011). Various mining approaches have been proposed based on these models. Discovering and utilizing Semantic Associations in Linked Data has found many applications in diverse fields, such as to detect potential terrorists in the field of national security (Sheth et al., 2005), to help understand relations between diseases and side effect in the field of drug discovery (Wild et al., 2011), or to rank biomedical resources in the field of gene analysis (Makita et al., 2013). For the utilization of Semantic Association in different fields, it requires an effective and efficient way to retrieve Semantic Associations based on users’ information need.
A Semantic Association is a combination of textual information (localnames and annotations of objects) and structural information (group relationship among objects). Textual information of objects can be extracted, and textual search enables the possibility of finding relevant associations using keywords. People can find associations through an interface similar to a web search. While this approach is simple and intuitive, it ignores the structural information in associations, which possibly lead to massive irrelevant search results. When looking for an association, the users’ need is usually structured. For the typical scenario in Figure 1, when someone searches associations matching “Tim Berners-Lee, Chris Bizer and Tom Heath”, he actually expects that exactly three related objects are hit and there will be meaningful associations connecting these objects. The answer of this search is an association indicating that these researchers (in red font in Figure 1) used to co-author a paper “Linked Data – The Story So Far” (Bizer, Heath, & Berners-Lee, 2009), which was published on IJSWIS. This is different from the basic idea of web search, where keywords can hit anywhere in a page, and no relation will be considered among keywords. The simple textual search would become ineffective if it cannot capture the structure information in the search.
Besides, the ever-increasing volume of Linked Data becomes a great challenge to association search. Large-scale Linked Data, such as DBpedia1 or YAGO2 knowledge graph contains tens of millions to billions of triples. Enormous objects have been defined in these data, and the number of associations discovered in these data will explode exponentially in amount. A structural search will become inefficient and will have unacceptable response time when it works on Semantic Associations discovered in Large-scale Linked Data. For example, there are nearly 6 million objects contained in DBpedia. Even with the rigorous limitation set in our experiments, we still discover more than 60 million associations. An association search considering both textual and structural information on this huge dataset is hard to satisfy the efficiency requirement of users.