Template Based Semantic Integration: From Legacy Archaeological Datasets to Linked Data

Template Based Semantic Integration: From Legacy Archaeological Datasets to Linked Data

Ceri Binding, Michael Charno, Stuart Jeffrey, Keith May, Douglas Tudhope
Copyright: © 2015 |Pages: 29
DOI: 10.4018/IJSWIS.2015010101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The online dissemination of datasets is becoming common practice within the archaeology domain. Since the legacy database schemas involved are often created on a per-site basis, cross searching or reusing this data remains difficult. Employing an integrating ontology, such as the CIDOC CRM, is one step towards resolving these issues. However, this has tended to require computing specialists with detailed knowledge of the ontologies involved. Results are presented from a collaborative project between computer scientists and archaeologists that created lightweight tools to make it easier for non-specialists to publish Linked Data. Archaeologists used the STELLAR project tools to publish major excavation datasets as Linked Data, conforming to the CIDOC CRM ontology. The template-based Extract Transform Load method is described. Reflections on the experience of using the template-based tools are discussed, together with practical issues including the need for terminology alignment and licensing considerations.
Article Preview
Top

1. Introduction

Linked Data can be seen as a step towards the Semantic Web vision of creating a globally accessible web of data. In this context there has been much interest in exposing cultural heritage data online to encourage interoperability and reuse (Bizer, Heath & Berners-Lee, 2009; Linked Data). In practice, this has tended to require specialists in semantic technologies and detailed knowledge of the ontologies involved. This paper presents results from a collaborative project between computer scientists and archaeologists, where a key aim was to make it easier for archaeologists new to semantic technologies to create and publish Linked Data.

Archaeology has seen an increasing use of the Web in recent years for dissemination of datasets describing the results of archaeological interventions. Archaeology datasets are disseminated in a platform neutral format as delimited text files, enabling import and manipulation by a wide range of tools. Most of the excavation fieldwork datasets in the UK are produced by commercial archaeology units. However there are many hundreds of these archaeological contractors who vary in their working practices. Datasets are often created on a per-site basis structured according to differing schema and employing different vocabularies, and as a consequence cross search, comparison or other reuse of the data in any meaningful way remains difficult. This hinders the reassessment of the original archaeological findings and reinterpretation in the light of evolving research questions.

The use of an integrating framework, such as the CIDOC Conceptual Reference Model (CIDOC CRM; Doerr 2003), is seen as one step towards resolving these issues. However in practice this activity requires an understanding of the source dataset schema, together with specialist knowledge of the target ontological model and the techniques required for expressing mappings. In many organisations a single person does not possess all of the required skills; as a result the overall process can be resource intensive and error prone. There is a need for tools and approaches to assist the creation of Linked Data by people other than experts in semantic technologies. This general point is also emphasised by Shakya et al. (2009), although their approach makes use of social platforms to create very informal ontologies, which in turn drive community based Linked Data. Addressing similar general goals by different methods, the work presented here investigates the use of lightweight techniques and tools to map and extract archaeological data conforming to a formal ontology to be published as Linked Data.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing