Discovering and Analysing Ontological Models From Big RDF Data

Discovering and Analysing Ontological Models From Big RDF Data

Carlos R. Rivero (Rochester Institute of Technology, Rochester, NY, USA), Inma Hernández (University of Seville, Seville, Spain), David Ruiz (University of Seville, Seville, Spain) and Rafael Cochuelo (University of Seville, Seville, Spain)
Copyright: © 2015 |Pages: 14
DOI: 10.4018/JDM.2015040104
OnDemand PDF Download:
No Current Special Offers


We are witnessing an increasing popularity of the Web of Data, which exposes a large variety of web sources that provide their data using RDF. Ontological models are used as the schema to organize this data. These models are usually shared by several communities and, to devise them, there is usually an agreement amongst those communities. As a result, it is common to have more than one ontological model to understand some RDF data; therefore, there might be a gap between the ontological models and the RDF data, which is not negligible in practice. In this article, the authors present a technique to automatically discover ontological models from raw RDF data. It is based on the intensive usage of a set of SPARQL 1.1 structural queries that are generic and independent from the RDF data. The final result of the authors' technique is an ontological model that is derived from the RDF data, and includes types and properties, subtypes, domains and ranges of properties and subproperties. The authors have conducted experiments with millions of triples that prove that their technique is suitable to deal with Big RDF Data. As far as they know, this is the first technique to discover such ontological models in the context of RDF data and the Web of Data.
Article Preview


In 2001, there was a movement called the Semantic Web whose goal was to endow the current Web with metadata, and, as a result, had the goal of evolving it into a Web of Data to improve its accessibility by computers (Polleres & Huynh, 2009; Shadbolt et al., 2006). Currently, we are witnessing an increasing popularity of the Web of Data, chiefly in the context of Linked Open Data, which is a successful initiative that consists of a number of principles to publish, connect, and query data in the Web (Bizer et al., 2009a). The consequence of this popularity is the existence of a large variety of web sources, which focus on several domains, such as government, life sciences, geography, media, libraries, or scholarly publications (Heath & Bizer, 2011). Furthermore, these sources offer their data using the RDF language, and they can be queried using the SPARQL query language (Antoniou & van Harmelen, 2008).

Scientists are currently working with the Web of Data as a large database to answer structured queries from users (Polleres & Huynh, 2009). As a result, one the main challenges scientists are facing in this context is coping with scalability, i.e., processing data at Web scale, which is usually referred to as Big Data (Bizer et al., 2011). Another challenge is not only to implement scalable solutions to deal with this amount of data, but also dealing with the steadily growth of sources in the context of the Web of Data, e.g., in the domain of Linked Open Data, there were roughly 12 such sources in 2007 and, as of the time of writing this article, there exist 226 sources (LOD Cloud, 2012).

Ontological models are used to provide schema semantics to RDF data. These models comprise types, data properties, and object properties, each of which is identified by a URI (Antoniou & van Harmelen, 2008). Ontological models are shared and developed with the consensus of one or more communities (Rivero et al., 2013b), which define a number of inherent constraints over the models, such as subtypes, the domains and/or ranges of a property, or subproperties.

In traditional information systems that comprises a back-end database, developers first need to create a data model according to the user requirements, which is later populated. Contrarily, in the Web of Data, data can exist without an explicit model, since the way it is implemented is that data in the Web already existed and models were added later. Not only that, several models may exist for the same set of data. As a result, in the context of the Web of Data, we cannot usually rely on existing ontological models to understand RDF data since there might be a gap between the models and the data, i.e., the data and the model are usually devised in isolation, without taking each other into account (Glimm et al., 2012). Furthermore, RDF data may not satisfy a particular ontological model related to these data, which is mandatory to perform a number of tasks, such as data integration (Makris et al., 2012), data exchange (Rivero et al., 2013c), data warehousing (Glorio et al., 2012), or ontology evolution (Flouris et al., 2008). As a final conclusion, current techniques to perform information integration can leverage from the discovering of conceptual models (Rivero et al., 2013a).

To give an idea that this gap between ontological models and RDF data is not negligible in practice, we provide two real-world examples based on current models and data (see (Arenas et al., 2014) for an in-depth discussion on this topic). The examples are as follows:

Complete Article List

Search this Journal:
Open Access Articles
Volume 32: 4 Issues (2021): 2 Released, 2 Forthcoming
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing