Building the Semantic Layer of the Józef Piłsudski Digital Archive With an Ontology-Based Approach

Building the Semantic Layer of the Józef Piłsudski Digital Archive With an Ontology-Based Approach

Laura Pandolfo, Luca Pulina
Copyright: © 2021 |Pages: 21
DOI: 10.4018/IJSWIS.2021100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Using semantic web technologies is becoming an efficient way to overcome metadata storage and data integration problems in digital archives, thus enhancing the accuracy of the search process and leading to the retrieval of more relevant results. In this paper, the results of the implementation of the semantic layer of the Józef Piłsudski Institute of America digital archive are presented. In order to represent and integrate data about the archival collections housed by the institute, the authors developed arkivo, an ontology that accommodates the archival description of records but also provides a reference schema for publishing linked data. The authors describe the application of arkivo to the digitized archival collections of the institute, with emphasis on how these resources have been linked to external datasets in the linked data cloud. They also show the results of an experiment focused on the query answering task involving a state-of-the-art triple store system. The dataset related to the Piłsudski Institute archival collections has been made available for ontology benchmarking purposes.
Article Preview
Top

1. Introduction

The advent of the web and the availability of several digital data sources changed the way of doing historical research. Since the 90s, a great number of high-quality historical documents have been published into web-based repositories, such as digital libraries and digital archives, so that they can be easily searched and queried. The digitization of physical archival collections shifted the conditions of doing research by providing the possibility to any user to have direct access to millions of primary sources as well as enabling novel methods of enquiry through computational techniques (Khan et al., 2018). Currently, the huge amount of available digital collections, usually converted in interchangeable formats, and the publication of several datasets on the web offer a comprehensive picture of historical and social patterns by allowing researchers to explore unknown interactions between data that could reveal important new knowledge about the past.

Digital archives are facing new challenges in order to exceed traditional data management and information browsing. The recent research and practice in the Semantic Web (SW) and linked data (Hitzler, 2021; Polleres et al., 2020) fields are significantly contributing with effective solutions for the problems of data management and integration, by facilitating archival metadata storage and adding semantic capabilities to the systems, which increase the quality of the information retrieval process. In general, digital archives and libraries could benefit from the application of SW technologies and linked data in the following ways:

  • Semantic associations and knowledge discovery: Semantic data makes easy semantic browsing using several methods, such as aggregation of relevant information, knowledge and association discovering.

  • Data integration: Providing a data integration layer is useful to exploit information coming from external sources on the web. The user can access data of different content providers via a single user interface due to the use of shared ontologies that enables interoperability and promotes consistency between different systems.

  • Semantic search: The goal of semantic search is to go beyond keyword-based search and support more advanced information seeking strategies by exploiting the semantic metadata (Whitelaw, 2015; Fafalios et al., 2017). Representing semantic of content allows implementing different facilities, such as faceted search, semantic search or free text search.

In such a context, ontologies play a key role in supporting some of the main aspects of knowledge repositories, namely the description of resources by means of standard taxonomies and vocabularies. In the last decade, there has been a great amount of effort in designing vocabularies and metadata standards to catalogue documents and collections, such as Functional Requirements for Bibliographic Records (FBRB) (O’Neill, 2011), Metadata Object Description Schema (MODS) (Guenther, 2003), Bibliographic Framework (BIBFRAME) (Schreur, 2017) and Encoded Archival Description (EAD3, 2015), just to cite a few well-known examples. Metadata standards such as FRBR, EAD and MODS seem to be more devoted to human consumption rather than machine processing (Alemu et al., 2017). Also, MODS is focused on objects such as books, and EAD, even reflecting the hierarchy of an archive, is focused on finding aids and the support for digitized objects is limited. BIBFRAME, on the one hand, provides foundation for harvesting and sharing bibliographic metadata over the web for libraries but, on the other, it does not reuse any existing ontology or vocabulary or pattern, thereby making linked data procedures more complicated. We provide a summary of benefits and limitations of existing metadata standard for cataloguing documents and collections in the following Table 1.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing