A New Approach for Assessing Metadata Completeness in Open Data Portals

A New Approach for Assessing Metadata Completeness in Open Data Portals

Juan Ribeiro Reis, Flavia Bernadini, Jose Viterbo
Copyright: © 2022 |Pages: 20
DOI: 10.4018/IJEGR.313636
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Citizens and developers are gaining full access to public data sources through open data portals (ODPs). These datasets allow the creation of applications for helping citizens in different ways, allowing them to actively participate in government processes such as decision-making and policymaking. However, researchers have identified several drawbacks regarding the data they provide. One of the recurring problems cited in literature is related to the metadata associated with each dataset, which may lead to inadequate descriptions or classifications of datasets, affecting open data usability. As completeness is an important metadata quality metric, in this work, the authors propose an approach based on the alignment of metadata schemes to assess the degree of metadata completeness associated to datasets available in ODPs. To evaluate the proposed approach, they conducted two case studies, leading them to observe that the larger the number of metadata schemes, the lower the completeness of the datasets, although not linear.
Article Preview
Top

Introduction

Nowadays, publicizing open government data is widely disseminated among several countries, comprising all the different administrative levels (Huijboom and Van den Broek, 2011). Not only the population have gaining access to data from the various sectors of public activity, but also developers have gaining access to public data sources made available in Open Data Portals (ODPs). These machine- readable datasets, available for re-use, enable the creation of applications that help the population in several ways, allowing them to participate in governance processes actively, such as decision and policymaking (Attard et al., 2015). Furthermore, the provision of public information on a variety of themes also brings greater visibility through various media that amplify the dissemination of open government data (Starke et al., 2016). In this scenario, we observe a huge growth of the number of ODPs and the volume of data they provide worldwide (Castellani Ribeiro et al., 2015; Tygel et al., 2016). Tygel et al. (2016) point out that the number of data portals grows fast over the years. In 2012, there were already 115 portals available, offering about 710,000 data sets (Hendler et al., 2012). According to Open Knowledge International (Open Knowledge International, 2011), there are currently more than 500 open government data portals available in all continents. Considering this huge number of portals and data volume, problems affecting the use of open data are becoming more evident, being more and more tackled by different works in literature (Machado et al., 2018; Sampaio et al., 2022).

One of the recurring problems is related to dataset metadata, as we pointed out in a previous work (Reis, Viterbo and Bernardini, 2018). They are of extreme importance throughout the data life cycle. Metadata help to create order in datasets by describing, classifying and organizing information (Zuider- wijk et al., 2012).Also, metadata improves the accessibility of data by helping to describe, locate and retrieve the data efficiently. Neumaier, Umbrich and Polleres (2016) conclude that some metadata quality issues could disrupt the success of open data. However, relevant surveys have shown that data publisher tend to fill out only particular metadata elements that could be considered “popular”, while they ignore other elements of less popularity (Friesen, 2004; Guinchard, 2002; Najjar et al., 2003). Considering the aspect of data governance (Nwabude et al., 2014), one of the Khatri and Brown’s data governance decision domains (Khatri and Brown, 2010) is the metadata domain. Also, this domain plays an essential role in the data discovery, retrieval, collation, and analysis. In a previous work, we traced a parallel between data governance and open data issues and showed that the administrator must perform a search to identify the best set of characteristics that describe their datasets, making it easier for users to retrieve information (Reis, Viterbo and Bernardini, 2018). So, approaches for facilitating this task are extremely necessary.

Different quality metrics have been used in literature to measure metadata quality. Ochoa and Duval (2009) used metadata quality metrics such as accuracy, conformance to expectations, logical consistency, and coherence, among others. However, completeness is a metric frequently used for measuring metadata quality in literature (Brümmer et al., 2014; Reiche and Höfig, 2013; Duval et al., 2002). Completeness is the degree to which the metadata instance contains all the information needed to have a comprehensive representation of the described resource (Ochoa and Duval, 2009). It is measured based on the presence or absence of values in metadata fields (also called metadata elements), defined in different metadata standards (Margaritopoulos et al., 2012). As far as we know, there is a lack of unified approaches that measure the metadata completeness of ODPs.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing