Retrieving Structured Information from (Semi-)/(Un-)Structured Cultural Object Documentation

Retrieving Structured Information from (Semi-)/(Un-)Structured Cultural Object Documentation

Stella Markantonatou (Athena Research Centre, Athens, Greece), Panagiotis Minos (Athena Research Centre, Athens, Greece) and George Pavlidis (Athena Research Centre, University Campus at Kimmeria, Xanthi, Greece)
DOI: 10.4018/IJCMHS.2017010106


In the course of developing facilities for integrating cultural heritage in the everyday education practice, highly structured information was retrieved from both the structured and the unstructured Europeana documentation contributed by the Greek cultural institutions (~480K entries); Modern Greek is the working language. Satisfactory results were obtained by using in-house developed medium sized Getty/AAT compatible vocabularies and simple heuristics. The paper reports on the development of controlled vocabularies and the retrieval of structured information from the unstructured Europeana documentation. Retrieval results show the importance of controlled vocabularies and thesauri as regards the exploitation of digital library content.
Article Preview


This work is about facilities for integrating cultural heritage into the everyday teaching practice; more particularly, it is about the development of platforms for creating serious games that take advantage of the cultural information in the web.

Two terms will be used throughout this paper: ‘learning object’ and ‘cultural object’ and they both denote how the respective digital objects have been documented rather than referring to their content. Thus, in the framework of this research, a learning object is a digital object retrieved from a repository that uses the international standards LOM1/LRE2 to document the objects it contains. Similarly, a cultural object is a digital object retrieved from a repository that uses a CIDOC-CRM3 (or some other international standard for cultural object documentation) compatible documentation. The underlying idea is that a learning object has been formulated and documented to address educational needs and it may contain one or more cultural objects, or even other learning objects for this matter of fact, that have not been developed or documented with education needs in mind (Markantonatou, Minos, Tzortzi & Pavlidis, 2016).

In addition to retrieval requirements, education software imposes quality control restrictions especially because it is interactive and open to the younger ages. A database that contains all the objects used by an educational system facilitates quality control, as opposed to free web access. Such a database has to ensure communication with international repositories of both cultural and learning objects; therefore it has to be compatible with cultural object documentation standards such as CIDOC-CRM (Doer, 2003; Crofts, Doerr, Gill, Stead & Stiff, 2009) and learning object documentation standards, such as LRE-MAP4. A database designed in this way would accommodate both cultural objects and learning objects and would support the principled documentation and storage as well as the flexible search and retrieval of learning objects that contain cultural objects, for example serious games that contain pieces of music and 3D representations of statues and paintings. The authors of this paper have developed a database that satisfies these requirements and have populated it with cultural objects together with their standards compatible documentation that they retrieved from Europeana5. All the contributions to Europeana by Greek cultural institutions (>480K objects) were retrieved and stored.

This database required information that was more structured than the information available in Europeana. The quality of the structure of the information in Europeana varies with the provider. Quite often Europeana provides access to unstructured textual data – unstructured because they contain units of information that (1) from a standardization point of view, should have been codified under different rubrics (2) occur in unpredictable format and order – see the example cases (1)-(3) in the next section. Since structured information was required, the unstructured Europeana documentation had to be subject to some semantic analysis in order to make sure that the right information was accommodated in the right database slot. This is not the typical task of retrieving Europeana objects relevant to some description – such as the tasks discussed in (Petras, Ferro, Gäde et al., 2012; Petras, Bogers, Toms et al., 2013). Instead, for each object in Europeana, a new object was developed in the database; the new object had standardized metadata that were retrieved from both the standardized and the unstructured metadata of Europeana. The method applied drew on a combination of controlled vocabularies/thesauri and simple heuristics; satisfactory results were obtained.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 3: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 2: 2 Issues (2018)
Volume 1: 2 Issues (2017)
View Complete Journal Contents Listing