Article Preview
TopIntroduction
Berners-Lee’s vision of the Semantic Web (Berners-Lee, 2001) has become increasingly popular in the last few years. The World Wide Web would evolve to a highly interconnected network of data that could be easily accessed and understood by machines. Applications could for instance use the Semantic Web to construct customized answers to a particular question. In such applications the user is no longer required to search for information or pore through results. A question can e.g. be ‘What are locations of the restaurants in London?’. To answer this question, a structured dataset has to be available containing places located in London (entities), associated with their location and semantic type (properties).
However, a lot of information on the Web is still unstructured or only semi-structured. Therefore, there is a need for automated methods to extend structured datasets using existing Web data. Several methods of this form have been proposed, e.g. YAGO2 (Hoffart, 2013) and BabelNet (Navigli, 2010) are knowledge bases that are constructed using Wikipedia and Wordnet. Other research focuses on establishing structured datasets containing information of a specific type. For instance, LinkedGeoData (Auer, 2009) is a dataset of places constructed using OpenStreetMap, an application in which users can submit geographical data such as place semantics.
In this paper, we will focus on improving existing databases of places. More precisely, we will add new places and discover likely errors using data from the Web. Social media data is particularly promising in this respect, due to the large amounts of geographically annotated data produced by these media. For example, about 1.5% of all Twitter posts (i.e. tweets) are annotated with geographical coordinates (Murdock, 2011). In addition, there are currently more than 190 million geotagged Flickr photos (Flickr, 2013). This data has been used to e.g. automatically detect events (Rattenbury, 2007; Sakaki, 2010; Lee, 2011), to find popular places (Crandall, 2009; Van Canneyt, 2011) and tourist routes (Choudhury, 2010; Jain, 2010).