Aggregation and Maintenance of Multilingual Linked Data

Aggregation and Maintenance of Multilingual Linked Data

Ernesto William De Luca (Berlin Institute of Technology, Germany)
Copyright: © 2012 |Pages: 25
DOI: 10.4018/978-1-4666-0188-8.ch008
OnDemand PDF Download:


In this chapter, the author presents his approach to aggregating and maintaining Multilingual Linked Data. He describes Lexical Resources and Lexical Linked Data, presenting a hybridization that ports the largest lexical resource EuroWordNet to the Linked Open Data cloud, interlinking it with other lexical resources. Furthermore, he shows the LexiRes RDF/OWL tool that gives the possibility to navigate this lexical information, helping authors of already available lexical resources in deleting or restructuring concepts using automatic merging methods. The chapter is concluded by a discussion on personalizing information according to user preferences, filtering relevant information while taking into account the multilingual background of the user.
Chapter Preview


With the advent of Linked Open Data (LOD) 1, more resources are interconnected and shared on the Web. The idea of Linked Open Data is to connect and share data, information, and knowledge following Semantic Web principals like URIs and RDF descriptions. While most Linked Data concentrates on linking facts, like music, movies, geo- or demographic information, we believe that one important task is to connect language resources in order to support the process of Language Engineering. We also believe that natural language processing plays an important role in order to achieve this goal. Language Engineering involves the development and application of software systems that perform tasks concerning the processing of human natural language (Cunningham, 1999). Different tools have been designed, constructed, and are used for tasks like translation, language teaching, information extraction, and indexing. Other, more intangible “language engineering tools” are language resources. Language resources are essential components of language engineering, containing a wide range of linguistic information with different degrees of complexity. These linguistic resources are sets of language data and descriptions in machine readable form, used for building, improving, and evaluating natural language and speech systems or algorithms. Cole et al. (1997) give a brief overview about the various types of language resources, i.e. written and spoken language corpora, lexicons, and terminological databases.

Lexical Resources

In the following, we concentrate on lexical resources that provide linguistic information about words. This information can be represented in very diverse data structures, from simple lists to complex repositories with many types of linguistic information and relations attached to each entry, resulting in network-like structures. Lexical resources are used in Natural Language Processing, for example, to obtain descriptions and usage examples of different word senses. Different word senses refer to different concepts, and concepts can be distinguished from each other not only by their definitions or “glosses,” but also by their specific relations to other concepts. Such disambiguating relations are intuitively used by humans. However, if we want to automate the process of distinguishing between word senses (word sense disambiguation), we have to use resources that provide appropriate knowledge, i.e. sufficient information about the usage context of a word. One of the most important resources available for this purpose is WordNet (Fellbaum, 1998) and its multilingual variants, including MultiWordNet (Pianta, et al., 2002) and EuroWordNet (Vossen, 1999).

Lexical Linked Data

Because the Web is evolving from a global information space of linked documents to one where both documents and data are linked, we agree that a set of best practices for publishing and connecting structured data on the Web is necessary and known as Linked Data. The Linked Open Data (LOD) project (Bizer, et al., 2009) is bootstrapping the Web of Data by converting it into RDF and publishing existing available “open datasets.” In addition, LOD datasets often contain natural language texts, which are important to link and explore data not only in a broad LOD cloud vision, but also in localized applications within large organizations that make use of linked data (Baldassarre, et al., 2010; Nuzzolese, et al., 2011).

The combination of natural language processing and Semantic Web techniques has become important, in order to exploit lexical resources directly represented as linked data. One of the major examples is the WordNet RDF dataset (Schreiber, et al., 2006), which provides concepts (called synsets), each representing the sense of a set of synonymous words (Gangemi, et al., 2003). It has a low level of concept linking, because synsets are linked mostly by means of taxonomic relations, while LOD data is mostly linked by means of domain relations, such as parts of things, ways of participating in events or socially interacting, topics of documents, temporal and spatial references, etc. (Nuzzolese, et al., 2011).

An example of interlinking lexical resources like FrameNet2 (Baker, et al., 1998) to the LOD Cloud is given in Gangemi and Presutti (2010). They create a LOD dataset that provides new possibilities to the lexical grounding of semantic knowledge and boosts the “lexical linked data” section of LOD by linking FrameNet to other LOD datasets such as WordNet RDF (Schreiber, et al., 2006).

Complete Chapter List

Search this Book: