A MapReduce Implementation of the Spreading Activation Algorithm for Processing Large Knowledge Bases Based on Semantic Networks

A MapReduce Implementation of the Spreading Activation Algorithm for Processing Large Knowledge Bases Based on Semantic Networks

Jorge González Lorenzo (Department of Computer Science, University of Oviedo, Oviedo, Asturias, Spain), José Emilio Labra Gayo (Department of Computer Science, University of Oviedo, Oviedo, Asturias, Spain) and José María Álvarez Rodríguez (Department of Computer Science, University of Oviedo, Oviedo, Asturias, Spain)
Copyright: © 2012 |Pages: 10
DOI: 10.4018/jksr.2012100105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The emerging Web of Data as part of the Semantic Web initiative and the sheer mass of information now available make it possible the deployment of new services and applications based on the reuse of existing vocabularies and datasets. A huge amount of this information is published by governments and organizations using semantic web languages and formats such as RDF, implicit graph structures developed using W3C standard languages: RDF-Schema or OWL, but new flexible programming models to process and exploit this data are required. In that sense the use of algorithms such as Spreading Activation is growing in order to find relevant and related information in this new data realm. Nevertheless the efficient exploration of the large knowledge bases has not yet been resolved and that is why new paradigms are emerging to boost the definitive deployment of the Web of Data. This cornerstone is being addressed applying new programming models such as MapReduce in combination with old-fashioned techniques of Document and Information Retrieval. In this paper an implementation of the Spreading Activation technique based on the MapReduce programming model and the problems of applying this paradigm to graph-based structures are introduced. Finally, a concrete experiment with real data is presented to illustrate the algorithm performance and scalability.
Article Preview

Previous Work

MapReduce

MapReduce is a framework introduced by Google in 2004 for processing huge datasets using a large number of machines in a parallel and distributed way (Dean & Ghemawat, 2004). MapReduce framework transparently handles system-level details, such as scheduling, fault tolerance or synchronization. The main advantages of the framework is the simplicity of the map and reduce operations, that allow a high degree of parallelism with little overhead, at the cost of writing programs in a way that fits this programming model. MapReduce has proven to be efficient and is used by Google internally for processing petabyte order datasets. This success has motivated the apparition of the open source initiative Hadoop (http://hadoop.apache.org), which is an Apache project mainly developed and supported by Yahoo.

MapReduce handles all the information using tuples of the form <key, value>. Every job consists of two phases: a map phase and a reduce phase. The map phase process the input tuples and produce some others intermediate tuples. Input tuples are divided in groups, each of them processed by a map function running in a single machine. Then, these intermediate tuples are grouped together according to their key value forming a group. Finally, each group is processed by the reduce function, producing a set of output tuples.

Complete Article List

Search this Journal:
Reset
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing