Article Preview
TopIntroduction
Nowadays, publicizing open government data is widely disseminated among several countries, comprising all the different administrative levels (Huijboom and Van den Broek, 2011). Not only the population have gaining access to data from the various sectors of public activity, but also developers have gaining access to public data sources made available in Open Data Portals (ODPs). These machine- readable datasets, available for re-use, enable the creation of applications that help the population in several ways, allowing them to participate in governance processes actively, such as decision and policymaking (Attard et al., 2015). Furthermore, the provision of public information on a variety of themes also brings greater visibility through various media that amplify the dissemination of open government data (Starke et al., 2016). In this scenario, we observe a huge growth of the number of ODPs and the volume of data they provide worldwide (Castellani Ribeiro et al., 2015; Tygel et al., 2016). Tygel et al. (2016) point out that the number of data portals grows fast over the years. In 2012, there were already 115 portals available, offering about 710,000 data sets (Hendler et al., 2012). According to Open Knowledge International (Open Knowledge International, 2011), there are currently more than 500 open government data portals available in all continents. Considering this huge number of portals and data volume, problems affecting the use of open data are becoming more evident, being more and more tackled by different works in literature (Machado et al., 2018; Sampaio et al., 2022).
One of the recurring problems is related to dataset metadata, as we pointed out in a previous work (Reis, Viterbo and Bernardini, 2018). They are of extreme importance throughout the data life cycle. Metadata help to create order in datasets by describing, classifying and organizing information (Zuider- wijk et al., 2012).Also, metadata improves the accessibility of data by helping to describe, locate and retrieve the data efficiently. Neumaier, Umbrich and Polleres (2016) conclude that some metadata quality issues could disrupt the success of open data. However, relevant surveys have shown that data publisher tend to fill out only particular metadata elements that could be considered “popular”, while they ignore other elements of less popularity (Friesen, 2004; Guinchard, 2002; Najjar et al., 2003). Considering the aspect of data governance (Nwabude et al., 2014), one of the Khatri and Brown’s data governance decision domains (Khatri and Brown, 2010) is the metadata domain. Also, this domain plays an essential role in the data discovery, retrieval, collation, and analysis. In a previous work, we traced a parallel between data governance and open data issues and showed that the administrator must perform a search to identify the best set of characteristics that describe their datasets, making it easier for users to retrieve information (Reis, Viterbo and Bernardini, 2018). So, approaches for facilitating this task are extremely necessary.
Different quality metrics have been used in literature to measure metadata quality. Ochoa and Duval (2009) used metadata quality metrics such as accuracy, conformance to expectations, logical consistency, and coherence, among others. However, completeness is a metric frequently used for measuring metadata quality in literature (Brümmer et al., 2014; Reiche and Höfig, 2013; Duval et al., 2002). Completeness is the degree to which the metadata instance contains all the information needed to have a comprehensive representation of the described resource (Ochoa and Duval, 2009). It is measured based on the presence or absence of values in metadata fields (also called metadata elements), defined in different metadata standards (Margaritopoulos et al., 2012). As far as we know, there is a lack of unified approaches that measure the metadata completeness of ODPs.