Semi-Automatic Knowledge Extraction to Enrich Open Linked Data

Semi-Automatic Knowledge Extraction to Enrich Open Linked Data

Elena Baralis (Politecnico di Torino, Italy), Giulia Bruno (Politecnico di Torino, Italy), Tania Cerquitelli (Politecnico di Torino, Italy), Silvia Chiusano (Politecnico di Torino, Italy), Alessandro Fiori (Politecnico di Torino, Italy) and Alberto Grand (Politecnico di Torino, Italy)
Copyright: © 2013 |Pages: 25
DOI: 10.4018/978-1-4666-2827-4.ch008


In this chapter we present the analysis of the Wikipedia collection by means of the ELiDa framework with the aim of enriching linked data. ELiDa is based on association rule mining, an exploratory technique to discover relevant correlations hidden in the analyzed data. To compactly store the large volume of extracted knowledge and efficiently retrieve it for further analysis, a persistent structure has been exploited. The domain expert is in charge of selecting the relevant knowledge by setting filtering parameters, assessing the quality of the extracted knowledge, and enriching the knowledge with the semantic expressiveness which cannot be automatically inferred. We consider, as representative document collections, seven datasets extracted from the Wikipedia collection. Each dataset has been analyzed from two point of views (i.e., transactions by documents, transactions by sentences) to highlight relevant knowledge at different levels of abstraction.
Chapter Preview


Open linked data are collections of interlinked structured data available on the web. Like web data, linked data are constructed with documents on the web (Berners-Lee, 2006). However, unlike web data, where links are relationship anchors in hypertext documents written in HTML, links between objects are represented in RDF format (Manola & Miller, 2004).

Linked data allow data sources to be more easily crawled by search engines, accessed using generic data browsers, and connect data from different sources. Working on linked data, search engines can provide sophisticated query capabilities, similar to those provided by conventional relational databases, thus enabling a new class of applications (Bizer et al., 2007). Since a significant number of individuals and organizations have adopted linked data to publish their data, a giant global graph is being constructed consisting of billions of RDF statements from numerous sources covering all sorts of topics (Heath & Bizer, 2011). The prototypical example of cross-domain linked data is DBpedia (Bizer et al., 2009), generated from publicly available Wikipedia dumps. RDF statements in DBPedia are generated by extracting information from various parts of Wikipedia articles, in particular from the infoboxes, usually located on the right hand side of Wikipedia articles. Other major sources of cross-domain linked data are Freebase (Bollacker et al., 2008) and YAGO (Suchanek et al., 2007), both of which are linked to DBpedia.

Complete Chapter List

Search this Book: