Creation of Value-Added Services by Retrieving Information From Linked and Open Data Portals

Creation of Value-Added Services by Retrieving Information From Linked and Open Data Portals

Antonio Sarasa-Cabezuelo
Copyright: © 2021 |Pages: 19
DOI: 10.4018/978-1-7998-6697-8.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In recent decades, different initiatives have emerged in public and private institutions with the aim of offering free access to the data generated in their activity to anyone. In particular, there are two types of initiatives: open data portals and linked data portals. Open data portals are characterized in that it offers access to its content in the form of a REST-type web services API that acts as a query language. On the other hand, linked data portals are characterized in that its data is represented using ontologies encoded by RDF triplets of the subject-predicate-object style forming a knowledge graph. This chapter presents a set of value-added service creation cases using the information stored in open data and linked data repositories. The objective is to show the possibilities offered by the exploitation of these repositories in various fields such as education, tourism, or services such as the search for taxis at an airport.
Chapter Preview
Top

Introduction

In the last decades, different public and private institutions have emerged that have provided access to enormous amounts of data (Kitchin, 2014) on the activity it is carried out. In general, this information is available free of charge to anyone who wants to consult it or wants to use it to process it. It is true that there are some institutions that sell the data or special permits are required to access it, such as data from hospitals or personal data (Russell, 2013). In particular, there are two prominent initiatives: open data portals and linked data portals. Open data portals are characterized (Hossain et al, 2016) in that access to information is done through a Rest web services (Masse, 2011) API. The API (Michel et al, 2019) acts as a query language so that each service can be configured with a set of parameters that establish what type of data it is able to retrieve, what type of filtering conditions can be done on the data, or the data format in which is retrieved the information. Normally, these portals offer to retrieve the data of the queries done, in the most standard formats such as JSON, XML, CSV and others. Another characteristic of these portals is that normally is available a catalog of the services offered. The catalogs specify each service, the way to invoke it, the parameters that can be configured, and the data formats in which the data can be returned, and even in some cases a test page is offered where these services can be tested. Also in some portals, it is possible to retrieve data directly without the need to invoke web services from any application, so that direct download is allowed in the same data formats mentioned above. One of the advantages of this initiative (Janssen et al, 2016) is the ease of adding an open data portal as the data source of a computer application that exploits the portal data. To do this, calls to web services are embedded within the application code, and the information retrieved in any of the formats can be stored as a document within the directory system or in a database (Larson, 2010). In this way, the information can be processed as if it were local data.

With regard to linked data portals, these are characterized (Hausenblas et al, 2010) by the way in which the information is stored. In this sense, the contents are encoded using domain ontologies that are used to describe the information in the form of RDF triplets of the subject-predicate-object form (Kahan et al, 2002). A property of triplets is that it can be linked together to form a network of related triplets that takes the form of an information graph. Thus, each graph represents the information of a different domain. Furthermore, it is possible to link graphs from different information domains, so that these portals become a great universal knowledge base (Hausenblas et al, 2009). The main advantages of these portals are the ease of creating and maintaining them, since the domain ontologies are created in a particular way in each case and it is the own representation structure of RDF standard that allows the described contents to be related (Heath et al, 2011). Another feature of these portals is the way of retrieving information. To do this, a standard query language called SPARQL (Quilitz et al, 2008) is used. The language syntax is similar to the SQL query language for relational databases. However, it has some differences since it is oriented to retrieve information from a graph, so the data structures or search conditions are adapted to the structure of the graph (Pauwels et al, 2018). The information retrieved through a SPARQL query can be stored in files with the most common data formats such as JSON, XML, CSV, RDF and others. In many portals there is what is called a SPARQL endpoint, which is a place in the portal that offers an interface where to execute SPARQL queries. Retrieval can be done directly from the portal, or it is able to embed retrieval SPARQL queries within the code in a specific programming language, so that just like in open data portals, the information is retrieved and stored local in files or in a database for further processing. In this way, linked data portals can be added to computer applications as a data source (Tsou, 2015).

Key Terms in this Chapter

Open Data: It is an initiative that aims to provide the data generated in the institutions so that anyone can use them to exploit them.

Web Service: It is a way to implement services on the web, which are associated with web resources.

SPARQL: It is a query language on documents described in RDF.

Digital Repository: It is a computer application that allows you to store information and offers different services to the user. Essentially it allows searching and retrieving stored information.

Linked Data: It is an initiative that aims to relate data and information to create a large semantic network that can be consulted.

Wikidata: It is an initiative supported by Wikimedia that maintains a repository of linked data.

RDF: It is a language that allows to represent knowledge using triplets of the subject-predicate-object type.

Complete Chapter List

Search this Book:
Reset