Enabling Folksonomies for Knowledge Extraction: A Semantic Grounding Approach

Enabling Folksonomies for Knowledge Extraction: A Semantic Grounding Approach

Andrés García-Silva (Ontology Engineering Group, Universidad Politécnica de Madrid, Spain), Iván Cantador (Information Retrieval Group, Universidad Autónoma de Madrid, Spain) and Oscar Corcho (Ontology Engineering Group, Universidad Politécnica de Madrid, Spain)
Copyright: © 2012 |Pages: 18
DOI: 10.4018/jswis.2012070102


Folksonomies emerge as the result of the free tagging activity of a large number of users over a variety of resources. They can be considered as valuable sources from which it is possible to obtain emerging vocabularies that can be leveraged in knowledge extraction tasks. However, when it comes to understanding the meaning of tags in folksonomies, several problems mainly related to the appearance of synonymous and ambiguous tags arise, specifically in the context of multilinguality. The authors aim to turn folksonomies into knowledge structures where tag meanings are identified, and relations between them are asserted. For such purpose, they use DBpedia as a general knowledge base from which they leverage its multilingual capabilities.
Article Preview


Social tagging systems are popular Web 2.0 applications that let users to classify and exchange resources (e.g., photos, products, and web pages) by means of manual annotations or tags. Folksonomies are the classification structures that emerge from the aggregation of individual annotations in social tagging systems. The fact that a large user community is annotating resources, often in collaborative environments, makes folksonomies an interesting source for acquiring knowledge. From these rich structures connecting users, tags and resources, it is possible to identify vocabularies that tend to stabilize over time around resources (Golder & Huberman, 2006) and users (Marlow, Naaman, Boyd, & Davis, 2006). Moreover, the underlying semantics elicited from folksonomies can be characterized by different similarity measures between tags (Cattuto, Benz, Hotho, & Stumme, 2008; Markines, Cattuto, Menczer, Benz, Hotho, & Stumme, 2009), which allow exploiting folksonomies in knowledge acquisition processes at large scale.

Despite such benefits, tags lack explicit semantics (Angeletou, Sabou, & Motta, 2008; Cantador, Szomszor, Alani, Fernández, & Castells, 2008; Tesconi, Ronzano, Marchetti, & Minutoli, 2008), and therefore their use as components of knowledge bases (i.e., classes, instances, and data and object properties) is not straightforward. Synonyms, acronyms and spelling variations of a given concept must be identified so that they can be properly represented in a knowledge base, avoiding duplicity of information. Ambiguous tags have to be disambiguated so that they can be added to the knowledge base according to their intended meaning. Moreover, as it happens with the rest of user-generated content, tags are available in multiple languages, and in order to benefit from their multilingual information, a knowledge acquisition process should be aware of the meaning of a tag in its language, and should be able to establish correspondences between equivalent tags written in different languages.

Some approaches (Begelman, Keller, & Smadja, 2006; Giannakidou, Koutsonikola, Vakali, & Kompatsiaris, 2008; Mika, 2007; Jaschke, Hotho, Schmitz, Ganter, & Stumme, 2008; Cantador, Bellogín, Fernández-Tobías, & López-Hernández, 2011) tackle the lack of semantics associated with tags by clustering them, in the hope that obtained clusters expose the meanings of the tags. The clusters are created according to certain relations between tags, usually relying on the definition of tag similarity measures (Cattuto et al., 2008; Markines et al., 2009). Other approaches (Angeletou et al., 2008; Cantador et al., 2008; Tesconi et al., 2008; Cantador, Konstas, & Jose, 2011), on the other hand, address this problem by relating tags to semantic entities in ontologies. Clustering-based approaches have the drawback that the meaning of the relations grouping the tags is not explicitly identified, which hampers the incorporation of the clusters into a knowledge base. Ontology-based approaches strongly depend on the ontology coverage of tags in the folksonomy. A low coverage limits the amount of knowledge that can be added to the knowledge base. Moreover, these approaches are limited to the language in which reference ontologies are written, and currently most of the ontologies are written in English.

Our approach aims to solve the lack of semantics in folksonomies by grounding tags to semantic entities in a knowledge base. We follow the method presented in Harnad (1990), which addresses the grounding task, i.e., figuring out the intrinsic (or intentional) meaning of symbols. The method associates symbols with taxonomies called “categorical representations”, and these categories are used to identify and discriminate symbols. In the case of folksonomy tags, the considered taxonomies must be large enough so that tags can be related to entities in a large extent.

Complete Article List

Search this Journal:
Open Access Articles
Volume 15: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing